James Cipar
Ph.D. Student, Carnegie Mellon University
Research Scientist, Fiksu Inc.

About Me

I am particularly interested in the application of Big Data to solve difficult real-world problems. My current research focuses on the design of large-scale storage systems, including distributed file systems and databases, to support big data applications. Previously I have explored low-level operating system support for contributory applications, these are applications such as Folding@Home and the Great Internet Mersenne Prime Search, that allow a user to contribute computing resources to projects from which they do not directly benefit.

Academic Background

I'm a sixth year PhD student in the Computer Science department at Carnegie Mellon University, advised by Greg Ganger.

I hold a Masters degree from the University of Massachusetts at Amherst, where I was co-advised by Mark Corner and Emery Berger. My Bachelor of Science degree is also from UMass, a dual degree in Computer Science and Mathematics.

Awards and Fellowships

  • APC Fellowship and Award in Data Center Efficiency Research, 2009
  • Best Paper Award, TFS: A Transparent File System for Contributory Storage, FAST, 2007
  • Best Student Project, VMware, Cambridge, MA, Presented at VMWorld, 2007
  • Best Undergraduate Research in Computer Science, University of Massachusetts, 2005

Research

My current research is focused on exploiting staleness tolerance in system design. To that end, I am working closely with a number of Machine Learning researchers to demonstrate how the error-tolerance of ML algorithms -- staleness tolerance in particular -- can be used in large scale parallel machine learning.

Additionally, I have worked with HP Labs on LazyBase, a distributed database designed for high-throughput updates and inserts, while allowing low latency analytical queries. It does this by exploiting a tradeoff between query result freshness and query latency.

Along with Alexey Tumanov, I am working on AlSched, a system that allows cluster frameworks to specify resource requests as composable algebraic utility functions. The scheduler then optimizes over these functions to create a globally optimal resource assignment.

Papers

Solving the straggler problem with bounded staleness
HotOS 2013
James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R. Ganger, Garth Gibson, Kimberly Keeton, Eric Xing
Presented at conference
@inproceedings{lazytables-hotos2013,
author = {James Cipar and Qirong Ho and Jin Kyu Kim and Seunghak Lee
and Gregory R. Ganger and Garth Gibson and Kimberly Keeton and Eric Xing},
title = {Solving the straggler problem with bounded staleness},
booktitle = {Proc. of the 14th Usenix Workshop on Hot Topics in
Operating Systems},
series = {HotOS '13},
year = {2013},
location = {Santa Ana Pueblo, NM},
Publisher = {Usenix},
}
    
AlSched: Algebraic scheduling of mixed workloads
in heterogeneous clouds
SoCC 2012
Alexey Tumanov, James Cipar, Michael A. Kozuch, Gregory R. Ganger
@inproceedings{alsched-socc12,
author = {Alexey Tumanov and James Cipar and Michael A. Kozuch and Gregory R. Ganger},
title = {{a}lsched: algebraic scheduling of mixed workloads in heterogeneous clouds},
booktitle = {Proc. of the 3nd ACM Symposium on Cloud Computing},
series = {SOCC '12},
year = {2012},
location = {San Jose, CA},
Publisher = {ACM},
}
LazyBase: Trading Freshness for Performance in a Scalable Database
EuroSys 2012
James Cipar, Gregory R. Ganger, Kimberly Keeton, Charles B. Morrey III, Craig A.N. Soules, Alistair Veitch
Presented at conference
@inproceedings{Cipar:2012:LTF:2168836.2168854,
 author = {Cipar, James and Ganger, Greg and Keeton, Kimberly and
 Morrey,III, Charles B. and Soules, Craig A.N. and Veitch, Alistair},
 title = {{L}azy{B}ase: trading freshness for performance in a scalable database},
 booktitle = {Proceedings of the 7th ACM european conference on Computer Systems},
 series = {EuroSys '12},
 year = {2012},
 isbn = {978-1-4503-1223-3},
 location = {Bern, Switzerland},
 pages = {169--182},
 numpages = {14},
 url = {http://doi.acm.org/10.1145/2168836.2168854},
 doi = {10.1145/2168836.2168854},
 acmid = {2168854},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {consistency, freshness, pipeline},
} 
LazyBase is a distributed database designed for high-throughput updates and inserts, while allowing low latency queries. It does this by exploiting a tradeoff between query result freshness and query latency.
Robust and Flexible Power-Proportional Storage
SoCC 2010
Hrishikesh Amur, James Cipar, Varun Gupta, Gregory R. Ganger, Michael A. Kozuch, Karsten Schwan
@inproceedings{Amur:2010:RFP:1807128.1807164,
 author = {Amur, Hrishikesh and Cipar, James and Gupta, Varun and
 Ganger, Gregory R. and Kozuch, Michael A. and Schwan, Karsten},
 title = {Robust and flexible power-proportional storage},
 booktitle = {Proceedings of the 1st ACM symposium on Cloud computing},
 series = {SoCC '10},
 year = {2010},
 isbn = {978-1-4503-0036-0},
 location = {Indianapolis, Indiana, USA},
 pages = {217--228},
 numpages = {12},
 url = {http://doi.acm.org/10.1145/1807128.1807164},
 doi = {10.1145/1807128.1807164},
 acmid = {1807164},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {cluster computing, data-layout, power-proportionality},
} 
    
Introduces a distributed file system (Rabbit) that can elastically scale performance and power consumption to meet a wide range performance targets with little to no loss of efficiency.
Tashi: Location-Aware Cluster Management
ACDC 2009
Michael A. Kozuch, Michael P. Ryan, Richard Gass, Steven W. Schlosser, David O'Hallaron, James Cipar, Elie Krevat, Julio López, Michael Stroucken, and Gregory R. Ganger
@inproceedings{Kozuch:2009:TLC:1555271.1555282,
 author = {Kozuch, Michael A. and Ryan, Michael P. and Gass, Richard 
 and Schlosser, Steven W. and O'Hallaron, David and Cipar, James 
 and Krevat, Elie and L\'{o}pez, Julio and Stroucken, Michael and Ganger, Gregory R.},
 title = {{T}ashi: location-aware cluster management},
 booktitle = {Proceedings of the 1st workshop on Automated control for datacenters and clouds},
 series = {ACDC '09},
 year = {2009},
 isbn = {978-1-60558-585-7},
 location = {Barcelona, Spain},
 pages = {43--48},
 numpages = {6},
 url = {http://doi.acm.org/10.1145/1555271.1555282},
 doi = {10.1145/1555271.1555282},
 acmid = {1555282},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {cloud computing, cluster management, virtualization},
} 
    
Presents Tashi, a cluster management system designed for enabling cloud computing applications to operate on repositories of Big Data. A key technique is location-awareness: frameworks and applications should be aware of the relative distances (network-wise) between resources.
TFS: A Transparent File System for Contributory Storage
FAST 2007
James Cipar, Mark D. Corner, Emery D. Berger
Presented at conference
Best paper award
@inproceedings{Cipar:2007:TTF:1267903.1267931,
 author = {Cipar, James and Corner, Mark D. and Berger, Emery D.},
 title = {{TFS}: a transparent file system for contributory storage},
 booktitle = {Proceedings of the 5th USENIX conference on File and Storage Technologies},
 series = {FAST '07},
 year = {2007},
 location = {San Jose, CA},
 pages = {28--28},
 numpages = {1},
 url = {http://dl.acm.org/citation.cfm?id=1267903.1267931},
 acmid = {1267931},
 publisher = {USENIX Association},
 address = {Berkeley, CA, USA},
} 
    
A file system that allows background applications to use all unused disk space, without impacting ordinary applications in terms of available space or file system fragmentation.
Transparent Contribution of Memory
Usenix ATC 2006
James Cipar, Mark D. Corner, and Emery D. Berger
Presented at conference
@inproceedings{Cipar:2006:TCM:1267359.1267370,
 author = {Cipar, James and Corner, Mark D. and Berger, Emery D.},
 title = {Transparent contribution of memory},
 booktitle = {Proceedings of the annual conference on USENIX '06 Annual Technical Conference},
 series = {ATEC '06},
 year = {2006},
 location = {Boston, MA},
 pages = {11--11},
 numpages = {1},
 url = {http://dl.acm.org/citation.cfm?id=1267359.1267370},
 acmid = {1267370},
 publisher = {USENIX Association},
 address = {Berkeley, CA, USA},
} 
    
Prioritization of physical memory into foreground and background tasks. Accurately and efficiently determines the memory needs of foreground applications so they are not impacted by background activity.

Talks

Solving the Straggler Problem with Bounded Staleness
HotOS 2013
The title of Slide 24 ("Total convergence time") is incorrect. The results in the figure are the time to execute a fixed number of iterations (50).
LazyBase: Trading Freshness for Performance in a Scalable Database
EuroSys 2012
TFS: A Transparent File System for Contributory Storage
FAST 2007
Transparent Contribution of Memory
Usenix ATC 2006