This talk seeks to capture a few key ideas from my experiences in developing computer systems and working in both research and product environments. One theme is simplicity - great ideas are often very simple (in hind sight) but very difficult to formulate. Another theme is the human aspect of computer system design. Your code is a kind of "message in a bottle" to your future self and future co-workers. I will look at these themes in the context of a couple of distributed system examples

Brent Welch got his PhD at UC Berkeley where he built the distributed file system for the Sprite network operating system. (Prof. Gibson was an early Sprite user, and my roommate.) At Xerox PARC he built software for gadgets dreamed up by fellow co-workers (e.g., early color scanners and the Liveboard). He built many tools in Tcl/Tk, including the exmh email user interface and the tclhttpd web server, and wrote Practical Programming in Tcl and Tk. At Panasas he was architect and CTO for a company that built a high performance scalable file system. He is currently at Google working on their public cloud platform.

Faculty Host: Garth Gibson

SQL Anywhere is an embedded SQL database engine designed from its first release in 1992 to give good performance out of the box in a range of environments from small devices (Raspberry Pi and handhelds) up to server class machines supporting databases of hundreds of gigabytes and thousands of users.

From the beginning, SQL Anywhere was designed to offer self-management features supporting deployment as an embedded database system where the database administrator cannot immediately connect and diagnose and solve problems. In some (extreme) cases, connecting to the database involves a trip on a float plane to a remote Arctic location or a helicopter ride to an oil rig. Packaging and deployment options are important to support embedded applications, and it also critical that the database server give robust good performance under conditions that change long after the application has shipped. While performance needs to be robust under changing conditions, embedded applications still have many of the same needs as those applications where a DBA is able to continually manage and tune system parameters. SQL Anywhere addresses these needs by choosing good defaults and managing system parameters automatically. For example, SQL Anywhere achieves the best price/performance on the TPC-C benchmark with very minimal configuration. We have also found that embedded applications need similar processing capabilities as any other database application, and make good use of materialized views, procedures, triggers, text indexes, geospatial support and database encryption on handheld devices all the way up to server class machines.

This talk describes the embedded database applications that are supported by SQL Anywhere, the mechanisms and policies implemented to support them, and our experience with problems that are not yet solved in this environment.

Ivan T. Bowman works for SAP Labs Canada (1993-) as a development Expert in HANA Platform Data Management with primary responsibility for the software architecture of the SQL Anywhere DBMS and the time series support of SAP HANA. His professional interests include query execution, end-to-end performance for database applications, methods to understand and control the architecture of large software systems, formal languages, and integrating non-relational data into RDBMSs (XML, text, JSON and time series). Ivan completed his academic work concurrently with industry work: BMath 1995, MMath 1999 (advisers Ric Holt and Mike Godfrey--software architecture recovery of object-oriented systems), and a PhD in computer science 2005 (adviser Ken Salem--optimizing client access patterns to avoid the overheads of fine grained database requests).

Faculty Host: Andy Pavlo

Partially funded by Yahoo Labs.

The Lightning Memory-Mapped Database (LMDB) was introduced at LDAPCon 2011 and has been enjoying tremendous success in the intervening time. LMDB was written for the OpenLDAP Project and has proved to be the world's smallest, fastest, and most reliable transactional embedded data store. It has cemented OpenLDAP's position as world's fastest directory server, and its adoption outside the OpenLDAP Project continues to grow, with a wide range of applications including big data services, crypto-currencies, machine learning, and many others.

The talk will cover highlights of the LMDB design as well as the impact of LMDB on other projects.

Howard Chu has been writing Free/Open Source software since the 1980s. His work has spanned a wide range of computing topics, including most of the GNU utilities (gcc, gdb, gmake, etc.), networking protocols and tools, kernel and filesystem drivers, and focused on maximizing the useful work from a system. Howard has led the OpenLDAP Project since 2007 and his experience has made OpenLDAP the world's fastest and most efficient directory software since 2005.

Faculty Host: Andy Pavlo

1:00 pmTirthankar Lahiri, Oracle
Oracle Database In-Memory: A Dual Format In-Memory Database

2:00 pmSorin Faibish, EMC
Redefine Storage: Two-Tiers Architecture New Storage Architecture with Auto-tiering Between a Fast Tier on Flash and a Capacity Tier on Cloud

2:30 pm — Break

2:45 pmRoger MacNicol, Oracle
Query Franchising: a High Performance Solution for Heterogeneous Data Environments

3:30 pmTanj Bennett, Microsoft Research
Anatomy of Bing Cloud Apps

4:15 pmHIdeaki Kimura, HP Labs
The Machine: What HP and HP Labs Are Up To

Computer systems are adopting more and more I/O devices, including accelerators and sensors. Accelerators provide high performance and energy efficiency via specialized hardware. Sensors on a mobile system enable it to interact with the physical world in novel ways. Towards unlocking the full potentials of I/O devices, we envision that the system must provide two important properties: (i) unified interface for accessing any device either natively or from across the machine boundaries, and (ii) security guarantees both for the system accessing the device and for the system hosting it. These two properties can empower important use cases such as I/O virtualization in data centers and I/O sharing between mobile systems of a user.

In this talk, we present our first steps towards achieving these properties. We briefly discuss Paradice and Rio, two systems that respectively virtualize and share I/O devices at the Unix device file interface. We then focus on library drivers, a novel driver architecture that reduces the size and attack surface of the driver Trusted Computing Base (TCB) and hence significantly improves the system security.


Ardalan Amiri Sani is a Ph.D. candidate at Rice University. He works on low-level system software with a focus on new hardware devices appearing in modern computers from mobile systems to data centers. His work on I/O sharing between mobile systems received the MobiSys'14 Best Paper Award. His work on I/O virtualization is open source and available at

Ardalan received his B.Sc. from Sharif University. He was an intern with Microsoft Research at Redmond. He chaired and served at ACM S3 and ACM MobiSys PhD Forum workshops.

Faculty Host: M. Satyanarayanan

In this work we propose a non-intrusive approach for monitoring virtual machines (VMs) in the cloud. At the core of this approach is a mechanism for selective real-time monitoring of guest file updates within VM instances. This mechanism is agentless, requiring no guest VM support. It has low virtual I/O overhead, low latency for emitting file updates, and a scalable design. Its central design principle is distributed streaming of file updates inferred from introspected disk sector writes. The mechanism, called Distributed Streaming Virtual Machine Introspection (DS-VMI), enables many system administration tasks that involve monitoring files to be performed outside VMs.

Wolfgang Richter is a 5th year PhD student in CS at CMU. He is interested in cloud computing and distributed systems. His main thread of research is developing general and scalable techniques for exploring the runtime and historic file system state of virtual machines (VMs). The focus is on performant solutions which incur low overhead and scale well to tens of thousands of running instances or stored virtual disk snapshots. He has developed two techniques exposing this state at scale: introspection, and retrospection.

SmashFS is a scale out filesystem that is built on the notion of immutable objects in a distributed store. The underpinnings of the filesystem rely on consistent hashing of the object data to have a globally unique identifier that can be calculated on any node of the cluster. This identifier can be used to establish the ownership of an object by a node in the cluster and avoid any centralized directory service to track the location of an object. Using this approach, Exablox has built a storage infrastructure that has built-in de-dup, data verification, decentralized lookups, snapshots, and location independence. There are several opportunities to evaluate this system with regards to garbage collection, filesystem consensus, fault tolerance, and performance.


Charles Hardin is a Software Architect at Exablox with 13 years of engineering experience. Prior to joining Exablox, Charles was a Technical Director at 2Wire (acquired by Pace) during the development of U-Verse and various TeleCom related activities. He attended Carnegie Mellon University as a member of the Parallel Data Lab before venturing to the Silicon Valley.

Ramesh Iyer Balan has been in the storage industry for over 15 years. He is currently VP of Engineering at Exablox, a clustered storage start-up based in Sunnyvale, CA. Previously he has worked at Data Domain on the backup de-duplication product and at Veritas where he worked on a transaction log based file system. He received his M.S. in Computer Science from Columbia University.

Faculty Host: Greg Ganger

Increasing scale and the need for rapid response to changing requirements are hard to meet with current monolithic cluster scheduler architectures. This restricts the rate at which new features can be deployed, decreases efficiency and utilization, and will eventually limit cluster growth. We present a novel approach to address these needs using parallelism, shared state, and lock-free optimistic concurrency control. We compare this approach to existing cluster scheduler designs, evaluate how much interference between schedulers occurs and how much it matters in practice, present some techniques to alleviate it, and finally discuss a use case highlighting the advantages of our approach -- all driven by real-life Google production workloads.

Omega: flexible, scalable schedulers for large compute clusters. Malte Schwarzkopf (University of Cambridge Computer Laboratory), Andy Konwinski (University of California Berkeley), Michael Abd-el-Malek and John Wilkes (Google Inc.) Proc. EuroSys'13 (Prague), April 2013.


John Wilkes has been at Google since 2008, where he is working on cluster management and infrastructure services. He is interested in far too many aspects of distributed systems, but a recurring theme has been technologies that allow systems to manage themselves. In his spare time he continues, stubbornly, trying to learn how to blow glass. 


Subscribe to SDI / ISTC