SQL Exercises

Some Links:

my SQL tutorial
SQL Tutorial
Official MySQL Manual
How-to set up bobsedge's tools for this document.
- Run mysqlcc for the graphical client.
- Run mysql -p for the console client.

SELECT-ng

SELECT [STRAIGHT_JOIN] [SQL_SMALL_RESULT] [SQL_BIG_RESULT] [SQL_BUFFER_RESULT] [SQL_CACHE | SQL_NO_CACHE] [SQL_CALC_FOUND_ROWS] [HIGH_PRIORITY] [DISTINCT | DISTINCTROW | ALL] select_expression,... [INTO {OUTFILE | DUMPFILE} 'file_name' export_options] [FROM table_references [WHERE where_definition] [GROUP BY {unsigned_integer | col_name | formula} [ASC | DESC], ... [WITH ROLLUP]] [HAVING where_definition] [ORDER BY {unsigned_integer | col_name | formula} [ASC | DESC] ,...] [LIMIT [offset,] row_count | row_count OFFSET offset] [PROCEDURE procedure_name(argument_list)] [FOR UPDATE | LOCK IN SHARE MODE]]

1. GenBank.names contains taxIDs corresponding to scientific names. Find the taxID for "Homo sapiens".

2. SwissProt entries list taxa in which the protein is found. How many proteins are found in humans? Hint: SELECT COUNT(*) will return the number of entries in a resultant set.

3. If you didn't, do #2 using a single SELECT statement -- no reusing the info you got in #1 !

4. Examine the schema for CDART. Which clusterIDs have hits on pig 193?

5. You can use SELECT DISTINCT(fieldname) to return only one entry for each unique value of a fieldname. How many different clusters are there?

6. You can acheive similar results but agregating the associated data. Try SELECTing for clusterID and AVG(bitscore), and append "GROUP BY [bitscore column]" to the end of your query. "GROUP BY" creates more temporary sets according to one field, and runs agregation functions on the subsets. Other functions like SUM or CONCAT or even STDDEV let you perform more complex operations on a set.

7. Let's do a long join. How many different pigs have hits from pssms in the same cluster as "histone"? Do this in one query. What's the best (use MAX() ) eVal of those hits for each pig?

INSERT-ing

There are many ways to insert data into the database. Often it's done programattically from a loop parsing data out of another source. But you can also build on the set-algebra offered by SELECT, which is especially handy in creating sets of data in our current schema.

INSERT [LOW_PRIORITY | DELAYED] [IGNORE] [INTO] tbl_name [(col_name,...)] VALUES ((expression | DEFAULT),...),(...),... [ ON DUPLICATE KEY UPDATE col_name=expression, ... ] or INSERT [LOW_PRIORITY | DELAYED] [IGNORE] [INTO] tbl_name [(col_name,...)] SELECT ... or INSERT [LOW_PRIORITY | DELAYED] [IGNORE] [INTO] tbl_name SET col_name=(expression | DEFAULT), ... [ ON DUPLICATE KEY UPDATE col_name=expression, ... ]

Look at the second format. That's what we'll be using to create data sets -- let's consider a SwissProt set to begin with. To create a set of SwissProt entries in our schema, you need to:

Create an entry in SwissProt.entry_sets describing your set.
Create enties for each member of the set in SwissProt.entry_set_membership

1. Consider a set of proteins you're interested in examining. What is a good short name for the set, and what are the constraints you'll use to build the set? Execute the following command in the MySQL console, substituting you're name and description.

INSERT INTO SwissProt.entry_sets (name, description) VALUES ('name', 'description');

2. Note that you didn't have to specify a seqID -- the table is constructed to generate one automatically and not duplicate previous sets. But you'll need to retrieve this if you want to put entries in the set! Execute the following:

SELECT LAST_INSERT_ID();

Yes, you can execute a SELECT command using constants or functions, that doesn't even require specifying a FROM table! Write down the ID of what you just inserted.

3. Now create the SELECT statement you need to retrieve members of your set. For SwissProt, we uses the protein's name to specify the set, so SELECT that. To make sure you're generating the correct set, double-check a few entries -- but you can't take in a list of 1000s of entries at once! Here's two ways:

Limit the results of your query to 10 entries:

SELECT * FROM SwissProt.entries WHERE ... LIMIT 10;

Specify a "pager" program to feed the output of MySQL to:

PAGER more ; SELECT * FROM SwissProt.entries WHERE ... ;

4. Populate your set by filling in the blanks in the following:

INSERT INTO SwissProt.entry_set_membership (setID, name) SELECT [setID], [entries.name] [your select criteria]

back home
bobsedge@andrew.cmu.edu