Quasar Mapping
Quasars are the most massive objects known in the universe, by far. They act, therefore, as an effective tracer of the mass in the universe, making a large-scale quasar catalog useful for answering a wide range of astronomical and astrophysical questions. The largest existing quasar catalog consisted of about 30k quasars. Using a highly accurate nonparametric Bayes classifier and our new algorithm allowing it to scale to training sets of up 500k points, we obtained a very clean (high efficiency, high completeness) catalog of 100k quasars, with 500k easily in reach. Our goal is to identify all of the expected 1 million or so z<3 quasars in the universe. (Richards, Nichol, Gray, et al. Efficient Photometric Selection of Quasars from the Sloan Digital Sky Survey: 100,000 z<3 Quasars from Data Release One [pdf], [ps] Astrophysical Journal Supplement 2004.)

The Origin of Galaxies
??? Basic understanding of galaxies can be pursued by studying the relation between the environment of galaxies and their observed physical properties using recent large datasets such as the SDSS. Our n-point algorithm enabled the calculations behind the spatial statistics of active galactic nuclei in (Wake et al. The Clustering of AGN in the Sloan Digital Sky Survey [pdf], [ps] Astrophysical Journal Letters 2004).
Our kernel density estimation algorithm enabled the calculation behind this influential analysis explaining spiral versus elliptical galaxy formation (Balogh et al. Galaxy Ecology: Groups and Low-Density Environments in the SDSS and 2dFGRS [pdf], Monthly Notices Royal Astron. Soc. 2004).

Galaxy Morphologies and Clusters
Using a variety of clustering methods as well as supervised methods, we obtained statistical characterizations of galaxy morphology classes and compared them to the subjectively-defined Hubble sequence. (de Carvalho et al. Clustering Analysis Algorithms and Their Applications to Digital POSS-II Catalogs, AAS 1995.)
The locations of so-called galaxy clusters have traditionally been subjectively defined. We used Bayesian mixture models to obtain a large-scale cluster catalog statistically. (de Carvalho et al. Towards an Objectively Defined Catalog of Galaxy Clusters from the Digitized POSS-II, Wide-Field Spectroscopy 1997.)
A survey of these and other analyses we performed was in (Djorgovsky et al. Data-Mining a Large Digital Sky Survey: From the Challenges to the Scientific Results, SPIE 1997.)
Computational Astrophysics,
Cosmology, and Astronomy
Modern biology is certainly exciting, but it is not the only dramatic thing going on in science. Modern astrophysics is today's hotspot of physics, and one of science in general. New instruments, and the datasets they produce, keep making this area more exciting. I have been working on astronomical data analysis since 1993, in the days of the POSS-II survey. I'm an official external member of the SDSS Collaboration. Some of my fine current astrophysicist collaborators: Bob Nichol, Andy Connolly, Chris Miller, Gordon Richards, Robert Brunner, Michael Balogh, David Wake, Gauri Kulkarni.

Evidence for Dark Energy
Science Magazine cover, Dec. 2003
Dark energy is a theorized phenomenon which opposes the self-attraction of matter and causes the expansion of the universe to accelerate. It is not simply of astrophysical interest -- if it exists it implies changes to fundamental physics. To learn more about dark energy and the current state of affairs in physics, here's a layman's overview and a more detailed overview by Peebles (the guy who wrote two of the bibles of cosmology).
Our n-point algorithm was used by our collaborators for large-scale exploratory calculations required to spatially cross-correlate the WMAP data with the Sloan Digital Sky Survey. This work resulted in major new evidence for dark energy and was recognized as part of the Science Top Scientific Breakthrough of 2003. (Scranton, Connolly, Nichol, et al. Physical Evidence for Dark Energy, submitted to Physical Review Letters 2003.)

Large-Scale Structure
The 3-point correlation function is a key discriminant of cosmological models which cannot be distinguished (i.e. validated or invalidated) using the usual 2-point correlation function. Our n-point algorithm is currently being used to compute the largest-scale third-order spatial statistics of the universe to date. (Nichol et al. The Three-point Correlation Function of the Sloan Digital Sky Survey Galaxy Sample, submitted to Astrophysical Journal Letters 2004.)