15-110 FALL 2009 [CORTINA]
LAB 5
A DNA sequence can be represented as a string containing the letters A, C, T,
and G in some order. Computational biologists develop algorithms to analyze
these DNA sequences for various patterns to learn more about the underlying
DNA and its properties. Here is a random 60-letter DNA sequence:
CTAATGGCATTATAGGGTGCCGGCGTGTCGGATATTCGGGGGATAGTGCTTAGTGAGCAT
In this lab, you will experiment with writing two static methods to
process DNA sequences to do some basic DNA analysis.
EXERCISES
SET-UP:
Download the project
DNAComputer.zip and unzip the zip file if it
does not unzip automatically. Move
this folder into your workspace for Eclipse and open up this project in
Eclipse. To do this on the Linux machines, start a new Java project. Select
"Create a new project in workspace" and enter the project name
DNAComputer.
You should see a warning at the bottom of the dialog box that says "The wizard
will automatically configure the JRE and the project layout based on the
existing source." This is ok. This means that Eclipse found a folder with
source code in it already so it will start the project with that code.
-
This program starts by
generating a random DNA sequence for you using a static method
makeRandomDNA.
Compile and run the program and look at the first lines of output to see the
random sequence
that is generated. Run it a few times to see how this sequence changes each
time. Make sure you understand how makeRandomDNA works before you
move on. (Discuss this briefly with your neighbor.)
-
The complement of a DNA sequence is a new sequence formed based on the
original
sequence with each 'A' becoming a 'T', each 'T' becoming an 'A', each 'C'
becoming a 'G', and
each 'G' becoming a 'C'. Complete the static method complement
that requires a parameter
representing the original DNA sequence and returns a new string that contains
the DNA
sequence's complement. Use a for loop in your answer. Your answer
should work for a
sequence of ANY length. Be sure to remove the return statement that
is there before you
start working on this problem. It is there so the program will compile for now
until you
write the correct code. Test your program again to see if the complement
output is correct.
-
The next method count should count the number of occurrences of a subsequence
in the
original DNA sequence. The method takes two parameters, one for the original
sequence and
one for the subsequence that we're looking for. The method should return an
integer that
represents the number of occurrences of the subsequence in the sequence.
Complete this
method using a for loop. Your answer should work for a sequence and
subsequence of
ANY length. Be sure to remove the return statement that is there
before you start
working on this problem. Test your program again to see if the output gives
you correct
answers when this method is called from the main method.
HANDIN
At the end of lab, create a zip file of your program and
submit it to the handin server
http://handin.intro.cs.cmu.edu/v1.