Final Project

Final Project Information
Methods In Medical Image Analysis (BioE 2630 : 16-725) - Spring 2026

Final Project Information by John Galeotti, ©2012-2026 Carnegie Mellon University, is licensed under a Creative Commons Attribution 3.0 Unported License. Permissions beyond the scope of this license may be available by sending e-mail to itk ATgaleotti.net. This assignment was made possible in part by NIH NLM contract# HHSN276201000580P.

40% of final grade (25% code & 15% presentation)

Due Dates: As posted in CANVAS. Tentative project proposals should be emailed to your instructor by Thursday March 19 (sooner is suggested). Final presentation slides (and video if presenting remotely) must be submitted electronically according to the due dates in CANVAS.

Large and unexpected problems can show up at any time, so plan to finish early!

E-mail your TA or instructor with questions or problems.

Requirements and Expectations

The requirements and expectations for your final project were gone over in class and are available for download as a pdf handout of the lecture slides.

Tentative Project Proposal: You must email your professor a tentative proposal for your project's topic. You should write about half a page of text (less if using an instructor-suggested topic), plus include pictures, etc. Your proposal must indicate:

What images you will be working with? Include a representative 2D image. (For 3D volumes, consider opening your volume in ITK-SNAP and then taking a screen-shot).
What general problem you are trying to solve? Include an example solution picture, if possible. If you're implementing a published algorithm, then include (or link to) the relevant paper.
How will your project be creative and/or experimental (see slide 2 of the above pdf handout)?
How do you plan on trying to solve this problem?

Presentation: Your presentation slides should be PowerPoint format (pdf slides are also acceptable, but may more difficult for you to present). You should assume that the presenting laptop will NOT have internet access during the presentations. If you create your presentation in Google Slides, I suggest starting out in PPTX format, and you may need save any videos as separate files. Your presentation should be submitted with a filename based on your own name, as follows (with the appropriate extension, either .ppt, .pptx, or .pdf): presentation_{Your_last_name}_{Your_first_name}.{ppt} For example, presentation_Galeotti_John.pptx

Any separate videos should be saved in H264 format inside an MP4 container, with sequential naming: presentation_{Your_last_name}_{Your_first_name}_video_{number}.mp4

Project Topics and Data Sets

If you have no idea what to do, consider one of the common topics in this short list of suggestions.

The best topic for your project is one on which you are already working. If this class can help you with your thesis work or lab project, then it is usually best to use some component of your larger research agenda as your final project. Benefits typically include your own increased motivation, having an already established research team you can go to for help and guidance, and already having good data with which to work.

If, however, you desire or need a new/different topic for your class project, then one good way to start is by browsing publicly available data sets, looking for things that you could reasonably segment/register/analyze/etc. There are numerous online repositories of biomedical images, including such diverse images as simulated brain MRI scans, multimodal patient studies, pathology slides, and confocal microscopy image series. When choosing a data set on which to base your project, keep in mind that if you plan on doing substantial validation or shape analysis, then you will probably need access to expert/"ground-truth" data in addition to the original images. (Ground truth data may consist of segmentations/masks/labelmaps, object coordinates, registration transforms, or whatever is appropriate for your project.) Because most publicly available datasets do not include ground truth, you will have much less data to choose from. Simulated/synthetic data is a notable good exception, since simulated data is almost always generated for the express purpose of having exact ground-truth available.

Current students can take a look at a few representative presentations from past years, but must read the associated ReadMe.txt file for import details and restrictions.

Grayscale Data Conversion

Warning: Some public data sets unfortunately store grayscale images using RGB-format images. This will slow down image processing, and worse it can cause problems because some image filters require scalar pixel types (these filters will complain about vector-valued inputs if you have RGB-format images). If your grayscale images are stored in png/jpeg/bmp/etc. image formats, make sure they are single-channel (i.e. scalar) grayscale images, not RGB-format images that just look gray. If you unfortuantely do have RGB-format images, I suggest downloading and installing ImageMagik (most Linux distributions will already have it installed), and then running this conversion command for each RGB-format image:

convert {name_of_RGB_input_image.png} -set colorspace Gray -separate -average {name_of_grayscale_output_image.png}

SimpleITK for Microscopy

For those working with microscopy images, here is a SimpleITK Notebook analyzing Scanning Electron Microscopy images of bacteria.

There is also an example using SimpleITK for fluorescent microscopy, unfortunately using"R" instead of Python, but the methods and filters are the same. Just scroll down to "Cell segmentation and splitting" starting on page 23.

About Validation

Not all projects will have a validaiton component, but for those that do, please read this section carefully.

By "validation," I mean checking one of the following:

Did a segmentation algorithm (with a certain parameter set) *correctly* segment the (anatomic) object(s) of interest out of a test image? This requires that we know the correct segmentation, which we could get one of several ways. We can try to manually segment the image ourselves, assuming that we know what we're doing. We could also ask one or more expert radiologists to segment the image for us. Finally, if our primary purpose is validation rather than immediate patient care, then instead of trying to get the correct answer for a real patient's scan, we can instead test our segmentation algorithm on simulated data, either derived from a simulated scan (e.g., simulated MRI) of a simulated patient, or from a real scan of a physically simulated patient, called a "phantom." In the case of simulated data, we know the exact arrangement of the simulated patient's internal anatomy, because someone "created" the simulated patient, and so we know what the correct result should be.
Did a registration algorithm (with a certain parameter set) *correctly* register one image to another? (Unless you're doing deformable registration, this means checking if the final transform is correct.) We can check for correct registration in several ways, such as seeing if it looks good to us, or if it looks good to a radiologist, or (preferably) by checking our registration algorithm on data specially acquired for registration testing. Such validation data is either derived from a complex simulation (similar to simulation for validating segmentation, above), or is derived by otherwise knowing the mechanical transform applied to either a patient or to a phantom (as described above).
Did some shape analysis algorithm *correctly* measure an object? We can check for correct measurements in ways similar to how we can validate a segmentation (above).
Did some shape analysis algorithm (or possibly some other image descriptors) combined with some classifier algorithm *correctly* identify an object or *correctly* produce a diagnosis? This requires only that we know where the object(s) are, which is often something we can visually check ourselves, but it's still best to have proper validation data, derived from either expert radiologists/pathologists, or from some form of simulation (as discussed for simulation, above).

By "substantial validation" I mean carefully comparing validation data with the the results of one or more algorithms, using one or more parameter sets. As an example, if you have a favorite segmentation algorithm that has 3 parameters, each of which you want to test with 4 different values, and you have 4 test images, then you would test your segmentation algorithm 256 different ways (3 different parameters, with 4 possible values for each, and 4 images = 4^3*4 = 256 possibilities). For each of the 256 tests, you would then compare the segmentation result against the "correct" segmentation for the specific image that you used. The comparison needs to produce a numeric score, and the comparison would almost certainly have to be automated. A naive and simple comparison would be to automatically count the number of pixels that overlap between your segmentation and the "correct" one, and then divide by the total number of pixels in the "correct" segmentation. Instead, I recommend using something more intelligent, such as the DICE comparison/similarity metric.

Public Data Sets (tell your instructor if any of these links no longer works)

Radiopaedia has a large collection of radiology cases with images, but even if an article has a full CT volume, you have to download it one 2D slice at a time and then reassemble it into a volume on your computer (ITK's image series reader is great for cases like this).

The Osirix Project maintains another really diverse set of easily browsable medical images in their DICOM Sample Image Sets (DICOM is a standard file/directory format for medical images). Note that most of these don't have validation data, but they are very diverse and can often lead to interesting projects without requiring advanced medical knowledge.

Magnetic Resonance - Technology Information Portal is a great source of MRI images from various anatomic regions.

BrainWeb is a great source of simulated brain MRI data, which contains exact reference segmentations for a variety of brain structures. Unfortunately, you will have to deal with files that are either "raw" or in the MINC format. ("Normal" ITK does not support MINC due to MINC's non-ITK-compatabile LGPL license, but you may be able to read them more easily than raw files with either a VTK reader, or by recompiling ITT after downloading the MINC2 library and enabling the advanced CMake option ITK->ITK_USE_MINC2.)

Other Brain Databases: The Kennedy Krieger Institute's F. M. Kirby Research Center for Functional Brain Imaging has DTI Brain images, but no validation data that I am aware of, but they are also involved in some way with the following two other databases. The Center for Imaging Science at Johns Hopkins University allows you to register for access to several brain MRI images, each of which is expertly segmented according to either anatomy or function (functional areas are unfortunately not easily segmented without registering to a pre-segmented atlas, so I suggest using anatomical datasets like the Hippocampus instead). NITRC hosts a couple of giant multi-modal brain databases (briefly described here): BIRN and Kirby 21, neither of which contains any validation data that I could find. Harvard's The Whole Brain Atlas contains a variety of health and diseased brain MRI scans, but they are relatively low resolution and you may have to individually, manually download each slice of your desired volume from their website. Finally, the Biomedical Informatics Research Network hosts the Mouse Diffusion Tensor Imaging (DTI) Atlas of developing mouse brains.

ELCAP's Public Lung Image Database contains 50 low-dose CT scans of lungs during a breath hold, and it includes the locations of nodules as found by radiologists. It does, however, require you to register for access to the data.

Lung Nodule Analysis 2016 Grand Challenge: Details, Dataset

ISBI 2018 Lung Nodule Malignancy Prediction Challenge, hosted at the National Lung Screening Trial (NLST), requires application and approval.

LRTC's Diffuse Lung Disease CT Database (temporarily offline while changing to a new server) contains many examples of Chronic Obstructive Pulmonary Disease (COPD) and idiopathic pulmonary fibrosis (IPF). The Lung Tissue Research Consortium (LTRC) is an NHLBI sponsored project with a large public repository of histological, radiological, and clinical data. Its goal is to have comprehensive data for the vast majority of enrollees, including a volumetric high-resolution CT of the chest, extensive clinical history and questionnaire results, pulmonary function testing, genetic and laboratory testing, stored serum, blood and lung tissue. The pathological specimens and CT scans will each have a corresponding structured semi-quantitative report and coded diagnostic assessment.. The CT scan reports will include subjective assessment of the regional distribution of specific named radiologic signs and visual characteristics. The process to request particular sets of data or tissue specimens requires a brief application process and review of the proposed experimental protocol by the LTRC Protocol Review Committee. Here is the online applicaiton form.

Andreopoulos & Tsotsos' Cardiac MRI Dataset contains cardiac MR images with expert segmentations of the left ventricle's muscle-containing wall (endocardial=inside, touching blood, and epicardial=outside, against the pericardium sac). Unfortunately, you will have to deal with files that are in MATLAB .mat format (which you could use MATLAB to convert to, e.g., DICOM image files that are readable by ITK). They request that you cite their paper if you publish anything using their data.

PathoPic is a very extensive pathology image database, which can be searched and used for unpublished educational purposes. See their Guided Tour.

Ultrasound video of anything other than babies can be hard to come by. You could try google searching for particular types of ultrasound, such as "vascular ultrasound video", which led to this YouTube video containing many video sequences, as well as relevant other videos in the side bar. You could also try Google searching for "abdominal ultrasound video." You can also try searching for ultrasound videos on Radiopaedia. See also the items below.

Prostate MRI & Ultrasound Volumes are available from SPL & NCIGT. You can download the images here. Annotations are also available. See details on the website.

Thyroid ultrasound database.

Final Project Information Methods In Medical Image Analysis (BioE 2630 : 16-725) - Spring 2026