doc: doc's our cluster

DOC is the compute cluster and data server for the Medical Robotics Technology Center at the Robotics Institute at Carnegie Mellon University.

  1. Why DOC?
  2. What makes up DOC?
Don't miss the photos of doc.

Condor Queueing System

Condor is software for harnessing the power of numerous machines while minimizing the need for the user to worry about where a given job runs. While you can just run a binary directly on Condor using the "vanilla universe", it is far more equitable, efficient, and reliable to run your jobs under the "standard universe". The standard universe allows jobs to checkpoint, or save the state of the running program in memory. This is extremely useful in case of power outage, maintenance, or for pausing longer-running jobs that would otherwise prevent shorter multi-processor jobs from running. In addition, because the cluster nodes of doc are hidden behind a firewall, vanilla jobs will be limited to the cluster nodes. Standard jobs can also take advantage of idle processing time of workstations that are also part of our virtual cluster, or Condor pool. The Condor pool includes the cluster nodes, and more! Standard universe jobs only require a relinking of your executable using "condor_compile".

Golden rule: If possible, please submit your job using the standard universe!

MPI-based mutli-processor jobs and java jobs also have their own "universes". Matlab jobs can be run by submitting a shell script as the executable (do not make "matlab" itself the executable).

For an overview of Condor, take a look at this tutorial.

Funding

This material is based upon work supported by the National Science Foundation under Grant No. 0305719. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.