Matthew Wachs - Research Page

Matthew Wachs, Ph.D.

E-mail: Look at the URL of this page. Take everything between the tilde (~) and the following slash (/), and append ".com" to it. Then prepend "misc@" to it.

Research

I worked on performance insulation for shared storage servers. Shared storage servers are an appealing alternative to per-application, dedicated storage systems. However, it is essential that applications sharing a server receive good performance, fairness, and efficiency. Unfortunately, interference between workloads may reduce all three of these. With a combination of three techniques (timeslicing, amortization, and cache partitioning), we've been able to approach the goal of providing each of n clients 1/n of their standalone throughput, while keeping average response times reasonable [ read more | web site ]. These techniques have been implemented in the Argon storage server. We've also demonstrated how to extend this technique to a workload using multiple servers to store its data [ read more ].

Our latest work in this area, Cesium, shows how to provide specific bandwidth guarantees to workloads while building on the high efficiency of Argon. A new timeslicing-based scheduler grows or shrinks timeslices depending on the access patterns of workloads to provide them with their specified bandwidth requirements. When a guarantee cannot be met, we are able to differentiate between fundamental violations (those where the workload's access pattern is temporarily too demanding for its guarantee to be met) and avoidable violations. Our scheduler is able to prevent nearly all of the avoidable violations, whereas other approaches that do not explicitly manage efficiency suffer from many avoidable violations when the workloads are complex [ read more ].

My thesis, on these topics, may be viewed here.

I've also worked on a number of other topics. We explored making it possible to use a file system implementation in one operating system from within another. Not all file systems are available on all operating systems. Porting file systems can be a significant burden for implementers. One type of "porting" is merely maintaining compatibility with newer versions of a kernel; even minor kernel revisions often change file system interfaces enough to require significant effort from developers. While file systems can be exported from one operating system to another using file sharing / network file systems like NFS, the semantics of these protocols often differ dramatically from the file system of interest. If NFS is used, the semantics become the "lowest common denominator." Our solution, which preserves semantics, is File System Virtual Appliances (FSVAs). These are virtual machines which host a file system, using its operating system of choice. Other virtual machines on the same machine can then access the file system as if it were local to them. This is accomplished by installing a relatively simple kernel module in the operating systems of both virtual machines. The module performs VFS forwarding (redirecting kernel file system API calls) between the machines [ read more | web site ].

I've also worked on parallel application I/O tracing for benchmarking. The best benchmark for a real application is the real application, or trace replay based on traces from that application. Unfortunately, running the real application against a new or different storage system can be difficult, or even impossible if the application or data set are classified, confidential, or sensitive. Trace replay can be significantly more straightforward and can be done with 'dummy' data. For parallel applications, however, accurate trace replay requires respecting the dependencies between multiple nodes. Thus, it is necessary to discover these dependencies during the trace extraction process. We've proposed and implemented a black-box technique to do this by running a parallel application, slowing down nodes, and observing how other nodes react [ read more | web site ].

Publications

Incremental Algorithm for Updating Betweenness Centrality in Dynamically Growing Networks. Miray Kas, Matthew Wachs, Kathleen M. Carley, L. Richard Carley. Proceedings of the 2013 IEEE / ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013). August 25-28, 2013, Niagara Falls, ON.

File System Virtual Appliances: Portable File System Implementations. Michael Abd-El-Malek, Matthew Wachs, James Cipar, Karan Sanghi, Gregory R. Ganger, Garth A. Gibson, Michael K. Reiter. ACM Transactions on Storage 8, 3, Article 9 (September 2012), 26 pages. Supersedes Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-10-105. May 2010.

Incremental Centrality Computations for Dynamic Social Networks. Miray Kas, Matthew Wachs, L. Richard Carley, Kathleen M. Carley. Conference Presentation at XXXII International Sunbelt Social Network Conference (Sunbelt 2012). March 12-18, 2012, Rodendo Beach, CA. [ read more ]

Exertion-based Billing for Cloud Storage Access. Matthew Wachs, Lianghong Xu, Arkady Kanevsky, Gregory R. Ganger. Proceedings of the 3rd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '11). June 14-15, 2011, Portland, OR. Supersedes Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-11-105. March 2011.
Abstract / PDF [65K]

Improving Storage Bandwidth Guarantees with Performance Insulation. Matthew Wachs, Gregory R. Ganger. Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-10-113. October 2010.
Abstract / PDF [285K]

Co-scheduling of Disk Head Time in Cluster-based Storage. Matthew Wachs, Gregory R. Ganger. Proceedings of the 28th International Symposium On Reliable Distributed Systems (SRDS'09). September 27–30, 2009, Niagara Falls, NY. Supersedes Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-08-113, October 2008.
Abstract / PDF [245K]

Relative Fitness Modeling. Michael P. Mesnier, Matthew Wachs, Raja R. Sambasivan, Alice X. Zheng, Gregory R. Ganger. Communications of the ACM (Vol 52, No 4, pg 91-96). April, 2009.
Abstract / PDF [775K]

Modeling the relative fitness of storage. Michael P. Mesnier, Matthew Wachs, Raja R. Sambasivan, Alice X. Zheng, Gregory R. Ganger. Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS'07). June 12th–16th 2007, San Diego, CA.
Awarded Best Paper
Abstract / PDF [235K]

Argon: Performance Insulation for Shared Storage Servers. Matthew Wachs, Michael Abd-El-Malek, Eno Thereska, Gregory R. Ganger. Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST '07), February 13–16, 2007, San Jose, CA. Supercedes Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-06-106, May 2006.
Abstract / PDF [167K]

//TRACE: Parallel Trace Replay with Approximate Causal Events. Michael P. Mesnier, Matthew Wachs, Raja R. Sambasivan, Julio Lopez, James Hendricks, Gregory R. Ganger, David O'Hallaron. Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST '07), February 13–16, 2007, San Jose, CA. Supercedes Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-06-108, September 2006.
Abstract / PDF [187K]

Early Experiences on the Journey Towards Self-* Storage. Michael Abd-El-Malek, William V. Courtright II, Chuck Cranor, Gregory R. Ganger, James Hendricks, Andrew J. Klosterman, Michael Mesnier, Manish Prasad, Brandon Salmon, Raja R. Sambasivan, Shafeeq Sinnamohideen, John D. Strunk, Eno Thereska, Matthew Wachs, Jay J. Wylie. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, September 2006.
Abstract / PDF [113K] / Postscript [745K]

Stardust: Tracking Activity in a Distributed Storage System. Eno Thereska, Brandon Salmon, John Strunk, Matthew Wachs, Michael Abd-El-Malek, Julio Lopez, Gregory R. Ganger. Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems, (SIGMETRICS'06). June 26th-30th 2006, Saint-Malo, France.
Abstract / PDF [578K]

Relative fitness models for storage. Michael Mesnier, Matthew Wachs, Brandon Salmon, Gregory R. Ganger. SIGMETRICS Performance Evaluation Review (Vol 33, No 4, pg 23-38). March, 2006.
Ursa Minor: Versatile Cluster-based Storage. Michael Abd-El-Malek, William V. Courtright II, Chuck Cranor, Gregory R. Ganger, James Hendricks, Andrew J. Klosterman, Michael Mesnier, Manish Prasad, Brandon Salmon, Raja R. Sambasivan, Shafeeq Sinnamohideen, John D. Strunk, Eno Thereska, Matthew Wachs, Jay J. Wylie. Proceedings of the 4th USENIX Conference on File and Storage Technology (FAST '05). December 13–16, 2005, San Francisco, CA. Supercedes Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-05-104, April 2005.
Awarded Best Paper
Abstract / PDF [490K]

Support

I appreciate the support, while I was a graduate student, of an NDSEG (National Defense Science and Engineering) Graduate Fellowship, thanks to the Air Force Office of Scientific Research (AFOSR).

Education

I received my Ph.D. from Carnegie Mellon University . I was a member of the Computer Science Department in the School of Computer Science.

I double-majored in Computer Science and Math in the College of Arts and Sciences at Cornell University.

While I was a student, I enjoyed being a part of a number of interesting courses:

I was a teaching assistant for 15-212 (Fall 2009), Carnegie Mellon's course on functional programming (ML). It was taught by Professor Steven Brookes.

I was a teaching assistant for 15-213 (Fall 2007), Carnegie Mellon's course on computer architecture from a programmer's perspective (such as representation of ints and floats, understanding assembly language, and buffer overflows). It was taught by Professor Todd Mowry and Professor Greg Ganger.

I was a teaching assistant for CS 482 in Spring 2004 with Professor Jon Kleinberg. CS 482 is Cornell's required CS theory course covering algorithms topics such as greedy algorithms, dynamic programming, network flow, and NP-completeness.

I was a teaching assistant for CS 381 in Fall 2003 with Professor John Hopcroft. CS 381 is Cornell's required CS theory course covering finite automata, context-free languages, and Turing machines.

Last Modified: September 2014