Elie Krevat
Graduate Student
Department of Computer Science
Carnegie Mellon University
GHC 6221
ekrevat at cs dot cmu dot edu
Hi there! I'm a Ph.D. student in computer science at Carnegie Mellon, researching many flavors of distributed systems, storage, networks, virtual machines, and cloud computing. I'm also a member of the Parallel Data Lab where I am advised by Greg Ganger.
News
April 15, 2011
The camera-ready version of our upcoming HotOS paper is now complete! The paper is about a growing trend of heterogeneity across otherwise equivalent disk drives, caused by new manufacturing techniques, and with large implications on the design of efficient parallel systems that statically partition their work across nodes or depend on uniform performance. I also just got back from NSDI a few weeks ago, but there's no time to rest with HotOS and Sigmetrics coming up in the next 2 months!
August 25, 2010
I spent the last summer at HP Labs in the Storage and Information Management Platforms Lab. My research on improving the efficiency of data-intensive computing systems is going well -- we're developing a fast but less feature-full version of a parallel map-reduce framework to explore the fundamental factors that affect efficiency.
June 4, 2009
I'm spending this summer in Pittsburgh, working on a number of Cloud Computing projects! Stay tuned...
August 28, 2008
Just got back to Pittsburgh for the start of the Fall semester, where I'm excited to get back to research and to TA 15-213: Introduction to Computer Systems.
I really enjoyed my summer in Cambridge, MA working for VMware, and I can also finally talk a little bit about my work because they've just recently gone public with their intention to enter the cloud computing space. I helped build a prototype cloud services model and management layer that experiments with using some forward-looking Semantic Web and other web service technologies to monitor and manage virtual appliances in the cloud.
Research
My current research interests involve coordination and synchronization problems in large distributed systems with a focus on cloud computing and distributed storage. Some of my research projects are listed below.
Seeking Efficient Data-Intensive Computing
New programming frameworks for scale-out parallel analysis, such as MapReduce and Hadoop, have become a cornerstone for exploiting large datasets. However, there has been little analysis of how these systems perform relative to the capabilities of the hardware on which they run. We have developed a simple model of I/O resource consumption and applied it to a map-reduce workload to produce an ideal lower bound on its runtime, exposing the inefficiency of popular scale-out systems. Using a simplified dataflow processing tool called Parallel DataSeries (PDS), we have demonstrated that the model's ideal can be approached within 20%. Current research explores the reasons for the gap between ideal and actual performance that are faced by any DISC system built atop standard OS and networking services. We have found that disk stragglers and network slowdown effects are the prime culprits for lost efficiency in PDS. We are also building up PDS with more features (e.g., fault tolerance), to understand more areas where efficiency is lost at scale.
Incast: TCP Throughput Collapse in Cluster-based Storage Systems
Building cluster-based storage systems using commodity TCP/IP and Ethernet networks is attractive because of their low cost, ease-of-use, and the desire to combine routing infrastructures for LAN, SAN, and high performance computing. However, an important barrier to their use is the problem of TCP throughput collapse, where bursty traffic from synchronized reads in cluster-based storage systems produce a one to two order magnitude TCP throughput collapse. We have studied the network conditions that cause this TCP throughput collapse in both simulation and real-world deployments, examined the effectiveness of TCP- and Ethernet-level solutions, and with our latest publication we have found reasonable solutions to the problem with high resolution timers that implement a microsecond-granularity TCP retransmission timeout. This solution is both feasible and practical for fast storage networks while also safe for wide area networks, revisiting an older assumption on spurious TCP retransmissions that no longer appears to hold true.
Publications
-
- Disks Are Like Snowflakes: No Two Are Alike.
- Elie Krevat, Joseph Tucek, and Gregory R. Ganger
- To appear in 13th Workshop on Hot Topics in Operating Systems (HotOS 2011)
- May 2011.
- [pdf]
-
- Diagnosing Performance Changes by Comparing Request Flows.
- Raja Sambasivan, Alice Zheng, Michael De Rosa, Elie Krevat, Spencer Whitman, Michael Stroucken, William Wang, Lianghong Xu, and Gregory Ganger.
- In Proceedings of 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2011).
- March 2011.
- [pdf]
-
- Applying performance models to understand data-intensive computing efficiency.
- Elie Krevat, Tomer Shiran, Eric Anderson, Joseph Tucek, Jay J. Wylie, and Gregory R. Ganger
- Carnegie Mellon University Technical Report CMU-PDL-10-108.
- May 2010.
- [pdf]
-
- Safe and Effective Fine-grained TCP Retransmissions for Datacenter Communication.
- Vijay Vasudevan, Amar Phanishayee, Hiral Shah, Elie Krevat, David Andersen, Gregory Ganger, Garth Gibson, and Brian Mueller.
- ACM SIGCOMM.
- August 2009.
- [pdf]
-
- Tashi: Location-aware Cluster Management.
- Michael Kozuch, Michael Ryan, Richard Gass, Steven Schlosser, David O’Hallaron, James Cipar, Elie Krevat, Julio López, Michael Stroucken, and Gregory Ganger.
- In Proceedings of First Workshop on Automated Control for Datacenters and Clouds.
- June 2009.
- [pdf]
-
- Measurement and Analysis of TCP Throughput Collapse in Cluster-Based Storage Systems.
- Amar Phanishayee, Elie Krevat, Vijay Vasudevan, David Andersen, Gregory Ganger, Garth Gibson, and Srinivasan Seshan.
- In Proceedings of File and Storage Technologies (FAST 2008).
- February 2008.
- [pdf]
-
- On Application-level Approaches to Avoiding TCP Throughput Collapse in Cluster-based Storage Systems.
- Elie Krevat, Vijay Vasudevan, Amar Phanishayee, David Andersen, Gregory Ganger, Garth Gibson, and Srinivasan Seshan.
- In Proceedings of Petascale Data Storage Workshop (PDSW at Supercomputing 2007).
- November 2007.
- [pdf] [ppt]
-
- Scheduling Algorithms to Improve Utilization in Toroidal-Interconnected Systems.
- Elie Krevat.
- MIT Master of Engineering Thesis.
- May 2003.
- [pdf]
-
- An Overview of the BlueGene/L Supercomputer.
- NR Adiga et al. (large author list).
- In ACM/IEEE conference on Supercomputing.
- November 2002.
- [pdf]
-
- Job Scheduling for the BlueGene/L System.
- Elie Krevat, Jose G. Castanos, and Jose E. Moreira.
- In Job Scheduling Strategies for Parallel Processing, 8th International Workshop (JSSPP 2002).
- July 2002.
- [pdf]
Other Projects and Presentations
-
- Energy-Efficient Dynamic Source Routing in Ad-Hoc Wireless Networks.
- Elie Krevat and Arian Shahdadi.
- Computer Networks.
- December 2001.
- [pdf]
Teaching
I TAed 15-213: Introduction to Computer Systems in Fall '08.
I also TAed 6.033: Computer System Engineering at MIT in Spring '03 when I was earning my M.Eng. degree.
Background
Before CMU, I completed a B.S. and M.Eng. in computer science at MIT, with a minor in economics. My master's thesis included work from a few summers and a semester of research at IBM T.J. Watson Research Center on system software for the Blue Gene supercomputer. I also spent 3 years working for Microsoft as a software design engineer, where I played around with pre-alpha Vista technologies and developed the first two versions of Office Accounting Professional, a stand-alone product and third-party development platform for small business accounting.
Fun
I first got excited about sailing just before I left MIT, and after taking sailing lessons in Seattle. Sailing options in Pittsburgh are a bit more limited.
I returned to playing another sport that I haven't attempted since my undergraduate days -- ice hockey! I'm not ready to crack the Penguins roster just yet, but there's no better way to improve your athletic agility than avoiding large speeding human missiles on ice.
When I can find the time, I enjoy traveling to exotic world destinations. Some of my longer trips have included Spain, Italy, Greece, Thailand, Israel, Brazil, and Argentina (where I went on a month-long volunteering and solidarity trip in 2003 during their economic crisis).
I'm always on a search for good sushi restaurants - in Pittsburgh, Sushi Kim and Chaya are my favorites.
I have two wonderful and very talented sisters, Ariela and Rina, but when I tell them that Rina says "Ariela is both of them." See, they're funny too! Ariela gets many kudos for designing this web page on the fly (she's a freelance graphic design artist in New York City, and promises to make it even splashier later). Rina is just in her first year at UPenn's Digital Media Design program, but her artistic abilities are already amazing. Check out Ariela's web site here and Rina's here.