Strong consistency in parallel systems provides high programmability, but requires expensive coordination and scales poorly. This challenge exists in multiple layers of abstraction across the whole hardware and software stack. Examples include multicore processors, parallel transaction processing, and distributed systems.
In this talk, I will introduce a simple primitive called logical leases to achieve strong consistency while maintaining good scalability and performance. Logical leases allow a system to avoid conflicts by reordering operations in both physical and logical time. I have applied logical leases across the hardware/software stack and implemented three new systems on top of it: a cache coherence protocol (Tardis), a multicore concurrency control algorithm (TicToc), and a distributed transaction management protocol (Sundial). Logical leases improve these systems in terms of simplicity, scalability, and performance compared to state-of-the-art implementations. I will also discuss my vision for future high-performance strongly consistent parallel systems through hardware and software codesign.
Xiangyao Yu is a fifth year PhD from CSAIL, MIT where he is advised by Srinivas Devadas. His research interests include computer architecture and databases. He is interested in improving performance and efficiency of parallel systems through both hardware and software innovations. Before starting his PhD, he received his BS degree from the department of microelectronics at Tsinghua University in 2012.
Personal photos are enjoying explosive growth with the popularity of photo-taking devices and social media. The vast amount of online photos largely exhibit users’ interests, emotion and opinions. Mining user interests from personal photos can boost a number of utilities, such as advertising, interest based community detection and photo recommendation. In this talk, I will introduce our work on mining user interests from personal photos.
We propose a User Image Latent Space Model to jointly model user interests and image contents. User interests are modeled as latent factors and each user is assumed to have a distribution over them. By inferring the latent factors and users’ distributions, we can discover what the users are interested in. We model image contents with a four-level hierarchical structure where the layers correspond to themes, semantic regions, visual words and pixels respectively. Users’ latent interests are embedded in the theme layer.
Given image contents, users’ interests can be discovered by doing posterior inference. We use variational inference to approximate the posteriors of latent variables and learn model parameters. Experiments on 180K Flickr photos demonstrate the effectiveness of our model.
MongoDB is the next-generation database that helps businesses transform their industries by harnessing the power of data. The world’s most sophisticated organizations, from cutting-edge startups to the largest companies, use MongoDB to create applications never before possible at a fraction of the cost of traditional databases.
This talk will review the feature set of MongoDB and the history and rationale behind those features, including BSON, the document model, replication, and database sharding. MongoDB's feature set represents a particular set of trade-offs, and the advantages and disadvantages of those trade-offs will be considered in the context of various use cases.
Andrew Morrow majored in CS at Brown University. After graduation he co-founded a telco start-up in Providence. He later moved to NYC to work for Google, focusing on concurrency and network IO. After a quick stint in finance he joined MongoDB two years ago. His work at MongoDB has focused on C++ portability and language standards conformance, the document update path in the database server, build system and library design, and most recently on the C and C++ database drivers
Faculty Host: Andy Pavlo
Current trends have seen data grow larger, more intertwined, and more diverse, as more and more users contribute to and use it. This trend has given rise to the need to support richer data analysis tasks. Such tasks involve determining the causes of observations, finding and correcting the sources of error in query results, as well as modifying the data in order to make it conform to complex desirable properties.
In this talk, I will discuss two main challenges: (a) providing explanations through support for causal queries ("Why"), and (b) modifying datasets based on high-level declarative constraints ("How"). First, I will show how to apply causal reasoning to tuple provenance in order to determine the causes of query results, and to identify the source of possible errors. I will present analysis of the data complexity for the case of conjunctive queries, and focus on a complete dichotomy between NP-hard and PTIME cases for the problem of computing responsibility.
Finally, I will present the Tiresias system, the first how-to query engine, which seamlessly integrates database systems with constrained problem solving capabilities. The contributions of the system are threefold: (a) a declarative interface for defining how-to queries over a database, (b) translation rules from the declarative statements to the constrained problem specification, and (c) a suite of data-specific optimizations that allow scaling to large data sizes. Initial results of our prototype system implementation show order-of-magnitude speedups to state-of-the-art solver runtimes, which indicates that there are significant gains in pushing this functionality within the database engine. I will conclude with a discussion of the next steps with the Tiresias system, and the bigger vision of reverse data management.
User provided rating data about products and services is one key feature of websites such as Amazon, TripAdvisor, or Yelp. Since these ratings are rather static but might change over time, a temporal analysis of rating distributions provides deeper insights into the evolution of a products' quality. Given a time-series of rating distributions, in this talk, we answer the following questions: (1) How to detect the base behavior of users regarding a product's evaluation over time? (2) How to detect points in time where the rating distribution differs from this base behavior, e.g., due to attacks or spontaneous changes in the product's quality?
Long gone is the time when malware spread through infected floppy disk. Today, online social networks offer a fresh medium of propagation that is currently writing a new chapter in the evolution of computer malware. Using data collected from 3.5 million Facebook accounts I show how online social network malware exploits socio-monetary incentives to convince potential victims to visit webpages containing clickjacking attacks. Once infected, the victim is impersonated in the social network unknowingly exposing his or her friends to the same campaign through bogus posts, creating a word-of-mouth infection that cascades throughout the network. I then present evidence that socio-monetary incentives have a profound impact on the spread of malware on Facebook, showing how these observations challenge our current understanding of word-of-mouth diffusions on networks. Among other findings, we will see that campaigns with combined socio-monetary incentives infect more accounts and last longer than campaigns with pure monetary or social incentives. To finish I unveil a surprising new connection between computer security, human psychology, and biological pathogens.
This is a joint work with Ting-Kai Huang (Google), Harsha V. Madhyastha (UCR), and Michalis Faloutsos (UNM).
Low-rank decomposition (or approximation) is a key tool for the analysis of tensor data. An important reason for this is that the latent factors are essentially unique in the case of low-rank tensor decomposition, unlike matrix decomposition. We will begin with a retrospective on uniqueness issues, from the early results to more recent ones, which have pushed the boundary of when uniqueness holds almost surely. We will also touch upon the main algorithmic approaches for low-rank tensor approximation, from Alternating Least Squares to very recent work dealing with scalable computation on Hadoop/MapReduce. When the tensor is too big to fit in main memory, one possibility is to spawn parallel processing threads that analyze judiciously sampled parts of the tensor. An alternative is to compress the big tensor down to a far smaller one that fits in main memory, in a way that preserves the latent low-rank structure. Towards this end, a multi-linear extension of compressed sensing to multi-way tensor compression will be presented, which allows exact recovery of the latent factors of the big tensor from the compressed data.
Nicholas Sidiropoulos (Fellow, IEEE) received the Diploma in Electrical Engineering from the Aristotelian University of Thessaloniki, Greece, and M.S. and Ph.D. degrees in Electrical Engineering from the University of Maryland—College Park, in 1988, 1990 and 1992, respectively. He has served as Assistant Professor in the Department of Electrical Engineering at the University of Virginia (1997-1999); Associate Professor in the Department of Electrical and Computer Engineering at the University of Minnesota—Minneapolis (2000-2002); Professor in the Department of Electronic and Computer Engineering at the Technical University of Crete, Chania—Crete, Greece (2002-2011); and Professor in the Department of Electrical and Computer Engineering at the University of Minnesota—Minneapolis (2011-).
His research interests are in signal processing for communications, convex optimization, cross-layer resource allocation for wireless networks, and multiway analysis – i.e., linear algebra for data arrays indexed by three or more variables. His current research focuses primarily on signal and tensor analytics, with applications in cognitive radio, big data, and preference measurement. He received the NSF/CAREER award in 1998, and the IEEE Signal Processing Society (SPS) Best Paper Award in 2001, 2007, and 2011. He served as IEEE SPS Distinguished Lecturer (2008-2009), and as Chair of the IEEE Signal Processing for Communications and Networking Technical Committee. He received the 2010 IEEE Signal Processing Society Meritorious Service Award, and the Distinguished Alumni Award from the ECE Department of the University of Maryland – College Park (2013).
Social media services such as Flickr, Twitter and Facebook allow users to share multimedia content, generating large volume of images. These images present context data, including textual information, geographic coordinates and social features such as the graph of users’ friendship relationships. For this reason, social media services constitute a valuable source from which we can mine images combining content and context. However, content and context combination is not trivial, presenting challenges to be solved. For example, content and context combination may aggravate the curse of dimensionality or not being able to detect correlations between content and context. My PhD research project proposes to address the challenges of combining content and context using the theory of frequent itemset mining. The proposed research is motivated by recent studies that have employed itemset mining to find patterns in visual features. This project aims to extend this approach to find correlations not only between visual features, but also between content and context. For this purpose, we propose the development of a new itemset image representation, based on content and context data. The generated itemsets will be used to find patterns and association rules that will be applied to mine social media services images.