15-110 FALL 2009 [CORTINA]

LAB 5

A DNA sequence can be represented as a string containing the letters A, C, T, and G in some order. Computational biologists develop algorithms to analyze these DNA sequences for various patterns to learn more about the underlying DNA and its properties. Here is a random 60-letter DNA sequence:

CTAATGGCATTATAGGGTGCCGGCGTGTCGGATATTCGGGGGATAGTGCTTAGTGAGCAT

In this lab, you will experiment with writing two static methods to process DNA sequences to do some basic DNA analysis.

EXERCISES

SET-UP: Download the project DNAComputer.zip and unzip the zip file if it does not unzip automatically. Move this folder into your workspace for Eclipse and open up this project in Eclipse. To do this on the Linux machines, start a new Java project. Select "Create a new project in workspace" and enter the project name DNAComputer. You should see a warning at the bottom of the dialog box that says "The wizard will automatically configure the JRE and the project layout based on the existing source." This is ok. This means that Eclipse found a folder with source code in it already so it will start the project with that code.

  1. This program starts by generating a random DNA sequence for you using a static method makeRandomDNA. Compile and run the program and look at the first lines of output to see the random sequence that is generated. Run it a few times to see how this sequence changes each time. Make sure you understand how makeRandomDNA works before you move on. (Discuss this briefly with your neighbor.)

  2. The complement of a DNA sequence is a new sequence formed based on the original sequence with each 'A' becoming a 'T', each 'T' becoming an 'A', each 'C' becoming a 'G', and each 'G' becoming a 'C'. Complete the static method complement that requires a parameter representing the original DNA sequence and returns a new string that contains the DNA sequence's complement. Use a for loop in your answer. Your answer should work for a sequence of ANY length. Be sure to remove the return statement that is there before you start working on this problem. It is there so the program will compile for now until you write the correct code. Test your program again to see if the complement output is correct.

  3. The next method count should count the number of occurrences of a subsequence in the original DNA sequence. The method takes two parameters, one for the original sequence and one for the subsequence that we're looking for. The method should return an integer that represents the number of occurrences of the subsequence in the sequence. Complete this method using a for loop. Your answer should work for a sequence and subsequence of ANY length. Be sure to remove the return statement that is there before you start working on this problem. Test your program again to see if the output gives you correct answers when this method is called from the main method.

HANDIN

At the end of lab, create a zip file of your program and submit it to the handin server http://handin.intro.cs.cmu.edu/v1.