Jeremy Kubica

My primary research interests are data mining, machine learning, knowledge discovery, artificial intelligence, robotics, and efficient large-scale scientific and statistical computation. My current research focuses on the development of new techniques and efficient algorithms and the application of these technologies to real-world problems. In particular, I am strongly motivated by the ability of computer science to address and help solve real-world problems in "big-data" science.

My current research focuses primarily on large-scale data mining problems and the search for fine structure in large, noisy data sets. This work includes the search for spatial associations and structure, asteroid (and other large-scale) tracking, group detection, and link analysis. All of these problems are related through the high density of both true and noise data, but relative sparsity of data points from the structure(s) of interest. For more information about these research areas see the descriptions below and/or my publications page.

Below is a list (in reverse chronological order) of research projects in which I have been involved. The information lags about 6 months to a year behind my current research. A more up to date picture can probably be gotten from my publications page.

Spatial Structure Search - Large-Scale (Asteroid) Tracking - Modular Robotics - Underwater Robotics - Emergent Behaviors/Genetic Algorithms - Link Analysis/Group Detection

Spatial Structure Search - CMU (2003-present):
A significant portion of my current research deals with the problem of finding spatial associations and structure within large, dense, noisy data sets. This problem arises in a wide range of domains, including: data mining, pattern matching, computer vision, computational statistics, and target tracking. The goal is simply to find all sets of points that conform to an underlying model or class of models. Unfortunately, this problem can be computationally expensive, suffering a combinatorial explosion in the number of potential point sets that must be tested. While traditional spatial data structures (e.g. kd-trees, R-trees or metric trees) can help to reduce the computational cost of this search, there exists a need for even more efficient approaches to handle new large-scale and noisy data sets. My work focuses on the development of a new class of search algorithms on these spatial data structures that can often give substantially greater savings.

Large-Scale (Asteroid) Tracking - CMU (2003-present):
Detecting and tracking asteroids from observational data is an important, but extremely computationally expensive, task. My work deals with the computational issues inherent to large-scale target tracking and the development of techniques and algorithms that mitigate or eliminate these issues. What sets this domain apart from previous large-scale multiple target tracking problems is the sheer scale of the problem and the sparse nature of the observation schedule. New sky surveys allow us to observe increasing faint objects, providing millions of true and noise observations in the process. Further, detections of a given object may be sparse, sporadic, and widely spaced.

This work is being done in collaboration with the Pan-STARRS and LSST projects.

Link Analysis/Group Detection - CMU (2001-present):
Link analysis and group detection attempt to find underlying structure from link data. A link is a set of entities that have been linked together by some relation. For example, a market basket could define a single link containing the products purchased. Group detection attempts to extract underlying groupings (or clusters) from this co-occurrence type information. Other underlying structure can include networks, graphs, or probabilistic generative models.

Modular Robotics - Xerox PARC 2000, FXPAL 2001:
Modular robots, or Modular Self-reconfigurable Robots (MSRs), are robots composed of many identical and interchangeable modules. The idea behind such a system is that a robot can rearrange these modules in order to change its own body shape or configuration. This ability to dynamically reconfigure would give the robot an increased robustness and generality. (more)

Underwater Robotics - Cornell University 1999-2001
Autonomous Underwater Vehicles (AUVs) are an exciting topic for two reasons. First, there are many interesting real-world applications for such systems. Effective autonomous underwater vehicles would allow or facilitate exploration, salvage, search and rescue, and scientific studies in deep ocean areas. Second, the highly dynamic and noisy nature of the underwater environment makes the problem a difficult one. (more)

Genetic Algorithms and Forced Emergent Behaviors - Cornell University 2000 (Advisor: Prof. Selman)
Emergent behaviors are behaviors that are not explicitly preprogrammed, but rather arise from interactions within a complex system. This leads to questions of how can these emergent phenomena be "harnessed" to perform in desired ways. I examined a variety of techniques for "forcing" particular emergent behaviors via modification of the low-level rules of a system. A significant portion of my work consisted at examining the use of genetic algorithms