Final Project Information
Methods In Medical Image Analysis (BioE 2630 : 16-725) - Spring 2024

Creative Commons License Final Project Information by John Galeotti, ©2012-2024 Carnegie Mellon University, is licensed under a Creative Commons Attribution 3.0 Unported License. Permissions beyond the scope of this license may be available by sending e-mail to itk ATgaleotti.net. This assignment was made possible in part by NIH NLM contract# HHSN276201000580P.

40% of final grade (25% code & 15% presentation)

Due Date: Tentative  project proposals should be emailed to your instructor by Wednesday March 20 (sooner is suggested). Final presentation slides (and video if presenting remotely) must be submitted electronically by 1pm on April 23rd, and no further updates to presentation slides will be allowed. Final code must be  uploaded as zip file in Canvas by 11:59 PM Eastern Time Friday April 26. Large and unexpected problems can show up at any time, so plan to finish early!

E-mail your TA or instructor with questions or problems.

Requirements and Expectations

The requirements and expectations for your final project were gone over in class and are available for download as a pdf handout of the lecture slides.

Tentative Project Proposal: You must email your professor a tentative proposal for your project's topic. You should write about half a page of text (less if using an instructor-suggested topic), plus include pictures, etc. Your proposal must indicate:

Presentation: Your presentation slides should be PowerPoint format (pdf slides are also acceptable, but may more difficult for you to present). Your presentation should be submitted with a filename based on your own name, as follows (with the appropriate extension, either .ppt, .pptx, or .pdf): presentation_{Your_last_name}_{Your_first_name}.{ppt} For example, presentation_Galeotti_John.pptx

Project Topics and Data Sets

If you have no idea what to do, consider one of the common topics in this short list of suggestions.   

The best topic for your project is one on which you are already working. If this class can help you with your thesis work or lab project, then it is usually best to use some component of your larger research agenda as your final project. Benefits typically include your own increased motivation, having an already established research team you can go to for help and guidance, and already having good data with which to work.

If, however, you desire or need a new/different topic for your class project, then one good way to start is by browsing publicly available data sets, looking for things that you could reasonably segment/register/analyze/etc. There are numerous online repositories of biomedical images, including such diverse images as simulated brain MRI scans, multimodal patient studies, pathology slides, and confocal microscopy image series. When choosing a data set on which to base your project, keep in mind that if you plan on doing substantial validation or shape analysis, then you will probably need access to expert/"ground-truth" data in addition to the original images. (Ground truth data may consist of segmentations/masks/labelmaps, object coordinates, registration transforms, or whatever is appropriate for your project.) Because most publicly available datasets do not include ground truth, you will have much less data to choose from. Simulated/synthetic data is a notable good exception, since simulated data is almost always generated for the express purpose of having exact ground-truth available.

Current students can take a look at a few representative presentations from past years, but must read the associated ReadMe.txt file for import details and restrictions.

Grayscale Data Conversion

Warning: Some public data sets unfortunately store grayscale images using RGB-format images. This will slow down image processing, and worse it can cause problems because some image filters require scalar pixel types (these filters will complain about vector-valued inputs if you have RGB-format images). If your grayscale images are stored in png/jpeg/bmp/etc. image formats, make sure they are single-channel (i.e. scalar) grayscale images, not RGB-format images that just look gray. If you unfortuantely do have RGB-format images, I suggest downloading and installing ImageMagik (most Linux distributions will already have it installed), and then running this conversion command for each RGB-format image:

convert {name_of_RGB_input_image.png} -set colorspace Gray -separate -average {name_of_grayscale_output_image.png}

SimpleITK for Microscopy

For those working with microscopy images, here is a SimpleITK Notebook analyzing Scanning Electron Microscopy images of bacteria.

There is also an example using SimpleITK for fluorescent microscopy, unfortunately using"R" instead of Python, but the methods and filters are the same. Just scroll down to "Cell segmentation and splitting" starting on page 23.

About Validation

Not all projects will have a validaiton component, but for those that do, please read this section carefully.

By "validation," I mean checking one of the following:

By "substantial validation" I mean carefully comparing validation data with the the results of one or more algorithms, using one or more parameter sets. As an example, if you have a favorite segmentation algorithm that has 3 parameters, each of which you want to test with 4 different values, and you have 4 test images, then you would test your segmentation algorithm 256 different ways (3 different parameters, with 4 possible values for each, and 4 images = 4^3*4 = 256 possibilities). For each of the 256 tests, you would then compare the segmentation result against the "correct" segmentation for the specific image that you used. The comparison needs to produce a numeric score, and the comparison would almost certainly have to be automated. A naive and simple comparison would be to automatically count the number of pixels that overlap between your segmentation and the "correct" one, and then divide by the total number of pixels in the "correct" segmentation. Instead, I recommend using something more intelligent, such as the DICE comparison/similarity metric.

Public Data Sets (tell your instructor if any of these links no longer works)

Radiopaedia has a large collection of radiology cases with images, but even if an article has a full CT volume, you have to download it one 2D slice at a time and then reassemble it into a volume on your computer (ITK's image series reader is great for cases like this).

The Osirix Project maintains another really diverse set of easily browsable medical images in their DICOM Sample Image Sets (DICOM is a standard file/directory format for medical images). Note that most of these don't have validation data, but they are very diverse and can often lead to interesting projects without requiring advanced medical knowledge.

Magnetic Resonance - Technology Information Portal is a great source of MRI images from various anatomic regions.

BrainWeb is a great source of simulated brain MRI data, which contains exact reference segmentations for a variety of brain structures. Unfortunately, you will have to deal with files that are either "raw" or in the MINC format. ("Normal" ITK does not support MINC due to MINC's non-ITK-compatabile LGPL license, but you may be able to read them more easily than raw files with either a VTK reader, or by recompiling ITT after downloading the MINC2 library and enabling the advanced CMake option ITK->ITK_USE_MINC2.)

Other Brain Databases: The Kennedy Krieger Institute's F. M. Kirby Research Center for Functional Brain Imaging has DTI Brain images, but no validation data that I am aware of, but they are also involved in some way with the following two other databases. The Center for Imaging Science at Johns Hopkins University allows you to register for access to several brain MRI images, each of which is expertly segmented according to either anatomy or function (functional areas are unfortunately not easily segmented without registering to a pre-segmented atlas, so I suggest using anatomical datasets like the Hippocampus instead). NITRC hosts a couple of giant multi-modal brain databases (briefly described here): BIRN and Kirby 21, neither of which contains any validation data that I could find. Harvard's The Whole Brain Atlas contains a variety of health and diseased brain MRI scans, but they are relatively low resolution and you may have to individually, manually download each slice of your desired volume from their website. Finally, the Biomedical Informatics Research Network hosts the Mouse Diffusion Tensor Imaging (DTI) Atlas of developing mouse brains.

ELCAP's Public Lung Image Database contains 50 low-dose CT scans of lungs during a breath hold, and it includes the locations of nodules as found by radiologists. It does, however, require you to register for access to the data.

Lung Nodule Analysis 2016 Grand Challenge:  Details, Dataset

ISBI 2018 Lung Nodule Malignancy Prediction Challenge, hosted at the National Lung Screening Trial (NLST), requires application and approval.

LRTC's Diffuse Lung Disease CT Database  (temporarily offline while changing to a new server) contains many examples of Chronic Obstructive Pulmonary Disease (COPD) and idiopathic pulmonary fibrosis (IPF). The Lung Tissue Research Consortium (LTRC) is an NHLBI sponsored project with a large public repository of histological, radiological, and clinical data. Its goal is to have comprehensive data for the vast majority of enrollees, including a volumetric high-resolution CT of the chest, extensive clinical history and questionnaire results, pulmonary function testing, genetic and laboratory testing, stored serum, blood and lung tissue. The pathological specimens and CT scans will each have a corresponding structured semi-quantitative report and coded diagnostic assessment.. The CT scan reports will include subjective assessment of the regional distribution of specific named radiologic signs and visual characteristics. The process to request particular sets of data or tissue specimens requires a brief application process and review of the proposed experimental protocol by the LTRC Protocol Review Committee, and the application material can be found at www.ltrcpublic.com/forms.

Andreopoulos & Tsotsos' Cardiac MRI Dataset contains cardiac MR images with expert segmentations of the left ventricle's muscle-containing wall (endocardial=inside, touching blood, and epicardial=outside, against the pericardium sac). Unfortunately, you will have to deal with files that are in MATLAB .mat format (which you could use MATLAB to convert to, e.g., DICOM image files that are readable by ITK). They request that you cite their paper if you publish anything using their data.

Liver Tumor Segmentation 08: Siemens Corporate Technology's Center for Medical Imaging Validation hosted a 2008 competition to evaluate different 3D liver-tumor segmentation methods. Their training data contains both abdominal CT scans plus reference segmentations for each tumor. Note that use of this data comes with several major stipulations, some of which may be negotiable if you contact the organizers since the competition is now presumably over.

PathoPic is a very extensive pathology image database, which can be searched and used for unpublished educational purposes. See their Guided Tour.

Ultrasound video of anything other than babies can be hard to come by. You could try google searching for particular types of ultrasound, such as "vascular ultrasound video", which led to this YouTube video containing many video sequences, as well as relevant other videos in the side bar. You could also try Google searching for "abdominal ultrasound video." You can also try searching for ultrasound videos on Radiopaedia. See also the items below.

Prostate MRI & Ultrasound Volumes are available from SPL & NCIGT. You can download the images here. Annotations are also available. See details on the website.

Thyroid ultrasound database.