Instructor: John Galeotti: galeotti+miia ATcs.cmu.edu
TA: Tejas Sudharshan Mathai: tmathai+miia AT andrew.cmu.edu
Due Date: Tentative project proposals must be emailed to your instructor by the night of March 18th. Presentation Slides must be checked into svn by 10 AM on Tuesday, April 22. Final code must be checked into svn by midnight (~11:59 PM EST) on Thursday night, April 24. Large and unexpected problems can show up at any time, so plan to finish early!
E-mail your TA or instructor with questions or problems.
The requirements and expectations for your final project were gone over in class and are available for download as a pdf handout of the lecture slides.
Tentative Project Proposal: You must email your professor a tentative proposal for your project's topic. You should write about half a page of text, plus include pictures, etc. Your proposal must indicate:
The best topic for your project is one on which you are already working. If this class can help you with your thesis work or lab project, then it is usually best to use some component of your larger research agenda as your final project. Benefits typically include your own increased motivation, having an already established research team you can go to for help and guidance, and already having good data with which to work.
If, however, you desire or need a new/different topic for your class project, then one good way to start is by browsing publicly available data sets, looking for things that you could reasonably segment/register/analyze/etc. There are numerous online repositories of biomedical images, including such diverse images as simulated brain MRI scans, multimodal patient studies, pathology slides, and confocal microscopy image series. When choosing a data set on which to base your project, keep in mind that if you plan on doing substantial validation or shape analysis, then you will probably need access to expert/"ground-truth" data in addition to the original images. (Ground truth data may consist of segmentations/masks/labelmaps, object coordinates, registration transforms, or whatever is appropriate for your project.) Because most publicly available datasets do not include ground truth, you will have much less data to choose from. Simulated/synthetic data is a notable good exception, since simulated data is almost always generated for the express purpose of having exact ground-truth available.
Not all projects will have a validaiton component, but for those that do, please read this section carefully.
By "validation," I mean checking one of the following:
By "substantial validation" I mean carefully comparing validation data with the the results of one or more algorithms, using one or more parameter sets. As an example, if you have a favorite segmentation algorithm that has 3 parameters, each of which you want to test with 4 different values, and you have 4 test images, then you would test your segmentation algorithm 256 different ways (3 different parameters, with 4 possible values for each, and 4 images = 4^3*4 = 256 possibilities). For each of the 256 tests, you would then compare the segmentation result against the "correct" segmentation for the specific image that you used. The comparison needs to produce a numeric score, and the comparison would almost certainly have to be automated. A naive and simple comparison would be to automatically count the number of pixels that overlap between your segmentation and the "correct" one, and then divide by the total number of pixels in the "correct" segmentation. Instead, I recommend using something more intelligent, such as the DICE comparison/similarity metric.
Insight Journal's Midas Repository is one of the largest and most diverse sets of online biomedical images. (Insight Journal is an open-access journal that is closely coupled with ITK, and often serves as the means of submitting new code and/or data to the ITK community.) Presently, the two largest "Communities" available there are for the National Alliance for Medical Image Computing (NAMIC) and National Library of Medicine (NLM)'s Imaging Methods Assessment and Reporting. Data sets which include some form of validation/testing data include:
The Osirix Project maintains another really diverse set of easily browsable medical images in their DICOM Sample Image Sets (DICOM is a standard file/directory format for medical images). Note that most of these don't have validation data, but they are very diverse and can often lead to interesting projects without requiring advanced medical knowledge.
BrainWeb is a great source of simulated brain MRI data, which contains exact reference segmentations for a variety of brain structures. Unfortunately, you will have to deal with files that are either "raw" or in the MINC format. ("Normal" ITK does not support MINC due to MINC's non-ITK-compatabile LGPL license, but you may be able to read them more easily than raw files with either a VTK reader, or by recompiling ITT after downloading the MINC2 library and enabling the advanced CMake option ITK->ITK_USE_MINC2.)
Other Brain Databases: The Kennedy Krieger Institute's F. M. Kirby Research Center for Functional Brain Imaging has DTI Brain images, but no validation data that I am aware of, but they are also involved in some way with the following two other databases. The Center for Imaging Science at Johns Hopkins University allows you to register for access to several brain MRI images, each of which is expertly segmented according to either anatomy or function (functional areas are unfortunately not easily segmented without registering to a pre-segmented atlas, so I suggest using anatomical datasets like the Hippocampus instead). NITRC hosts a couple of giant multi-modal brain databases (briefly described here): BIRN and Kirby 21, neither of which contains any validation data that I could find. Harvard's The Whole Brain Atlas contains a variety of health and diseased brain MRI scans, but they are relatively low resolution and you may have to individually, manually download each slice of your desired volume from their website. Finally, the Biomedical Informatics Research Network hosts the Mouse Diffusion Tensor Imaging (DTI) Atlas of developing mouse brains.
ELCAP's Public Lung Image Database contains 50 low-dose CT scans of lungs during a breath hold, and it includes the locations of nodules as found by radiologists. It does, however, require you to register for access to the data.
LRTC's Diffuse Lung Disease CT Database contains many examples of Chronic Obstructive Pulmonary Disease (COPD) and idiopathic pulmonary fibrosis (IPF). The Lung Tissue Research Consortium (LTRC) is an NHLBI sponsored project with a large public repository of histological, radiological, and clinical data. Its goal is to have comprehensive data for the vast majority of enrollees, including a volumetric high-resolution CT of the chest, extensive clinical history and questionnaire results, pulmonary function testing, genetic and laboratory testing, stored serum, blood and lung tissue. The pathological specimens and CT scans will each have a corresponding structured semi-quantitative report and coded diagnostic assessment.. The CT scan reports will include subjective assessment of the regional distribution of specific named radiologic signs and visual characteristics. The process to request particular sets of data or tissue specimens requires a brief application process and review of the proposed experimental protocol by the LTRC Protocol Review Committee, and the application material can be found at www.ltrcpublic.com/forms.
Andreopoulos & Tsotsos' Cardiac MRI Dataset contains cardiac MR images with expert segmentations of the left ventricle's muscle-containing wall (endocardial=inside, touching blood, and epicardial=outside, against the pericardium sac). Unfortunately, you will have to deal with files that are in MATLAB .mat format (which you could use MATLAB to convert to, e.g., DICOM image files that are readable by ITK). They request that you cite their paper if you publish anything using their data.
Liver Tumor Segmentation 08: Siemens Corporate Technology's Center for Medical Imaging Validation hosted a 2008 competition to evaluate different 3D liver-tumor segmentation methods. Their training data contains both abdominal CT scans plus reference segmentations for each tumor. Note that use of this data comes with several major stipulations, some of which may be negotiable if you contact the organizers since the competition is now presumably over.
PathoPic is a very extensive pathology image database, which can be searched and used for unpublished educational purposes. See their Guided Tour.