Knowledge about the link between brain and behavior rests, in large part, on electrophysiological investigation of neural activity recorded from one or more electrodes that have been inserted into the brain of an animal. Technological advances have provided vastly improved data collection and storage capabilities, which present both opportunities and challenges. It is now common to record from dozens to hundreds of electrodes simultaneously, and it is also possible for these electrodes to maintain their position well enough to record the same neurons across hours or even days. Because many disorders, such as ADHD, autism, and schizophrenia, as well as stroke and various neurodegenerative diseases, are thought to involve dysfunction of network connectivity, a great hope has been that multi-electrode recording could reveal the way network activity evolves in healthy and diseased states, and thereby supply an important mechanistic description of pathophysiology. However, while the number of recording electrodes used in a single brain has been increasing exponentially fast, statistical methods for handling the complexity of multi-electrode data have lagged behind. In addition to the general problem of handling large-scale electrode recordings, a second major challenge comes from the striking observation that neural interactions occur at multiple timescales, including those involving oscillations and synchrony (the tendency of two or more neurons to fire at nearly the same time), which could provide an essential mechanism of neural network information flow and be a marker that distinguishes normal from diseased states.

Neurons communicate through rapid electrical discharges known as “spikes,” and sequences of spikes are known as “spike trains.” Because each spike occurs over the course of roughly 1 millisecond while behavior occurs over hundreds of milliseconds, it is reasonable to consider a spike train to be a stochastic sequence of isolated points in time, i.e., a point process. I will review the use of point processes to represent interactions of multiple neurons across different timescales. I will also go over a new method that is applicable to many network analyses: false discovery rate regression.

Recent genome sequencing experiments allow us to observe regions of DNA that are spatially close to each other in the nucleus of cells. Analyses from them have shown that the 3D structure of DNA may be closely linked to genome functions such as long-range regulation of gene expression and DNA replication. The algorithms and analysis techniques presented in this dissertation substantially advance our understanding of the relationship between the 3D structure of DNA and genome function on the scale of the whole genome. Specifically, we designed algorithms based on graph rigidity theory to show that these experiments provide enough information to create embeddings in 3D and we designed additional algorithms to identify subsets of constraints with metrically consistent distances. We also established that locally clustered regions of chromosomes (topological domains) are hierarchically organized and provided the first quantification of this organization using an efficient multiscale domain identification method that we designed. Finally, we performed two major genome-wide analyses relating three-dimensional genome structure to gene regulation.

From these analyses, we show that mutations that affect the expression of genes far away on the genome are surprisingly close in 3D and that they occur preferentially at the boundaries of topological domains. We also analyzed a novel structural feature of DNA we call 'dense regions'. They occupy spatially small volumes of the nucleus but can include genomically distant regions of the genome. We find that the majority of transcription or active gene expression occurs within these dense regions despite covering a significantly smaller portion of the genome. We also show that genes within these regions can change expression in concert to a sell signaling event. The algorithms and analysis techniques we developed have enabled us to perform some of the first rigorous quantifications of the relationship of genome structure with gene regulation and these techniques can be easily applied and extended for use with future experimental data.

Thesis Committee:
Carl Kigsford (Advisor)
Ziv Bar-Joseph
Takie Benos
Russell Schwartz
Michelle Girvan (University of Maryland)


Recently, single cell RNA sequencing has revealed large variations in the molecular states of individual cells of seemingly the same type. We have been investigating single cell biology for the past five years and have collected over 1000 datasets from various organisms including human, mouse, rat, zebrafish, etc. Here, I will discuss some of the technical aspects of single cell RNA sequencing, our analysis of five different mouse cell types, and then noise and technical resolution problems with single cell transcriptome profiling. I will conclude with a discussion of the origins of single cell variation, suggesting that individual cells are more like individuals of an ecological community rather than uniform modular units.

About the Speaker.

There is an ever-expanding body of biological data, growing in size and complexity, outstripping the capabilities of standard database tools or traditional analysis techniques. Such examples include molecular dynamics simulations, drug-target interactions, gene regulatory networks, and high-throughput imaging. Large-scale acquisition and curation biological data has already yielded results in the form of lower costs for genome sequencing and greater coverage in databases such as GenBank, and is viewed as the future of biocuration. The "big data" philosophy and its associated paradigms and frameworks have the potential to uncover solutions to problems otherwise intractable with more traditional investigative techniques.

Here, we focus on two biological systems whose data form large, undirected graphs. First, we develop a quantitative model of ciliary motion phenotypes, using spectral graph methods for unsupervised latent pattern discovery. Second, we apply similar techniques to identify a mapping between physiochemical structure and odor percept in human olfaction. In both cases, we experienced computational bottlenecks in our statistical machinery, necessitating the creation of a new analysis framework. At the core of this framework is a distributed hierarchical eigensolver, which we compare directly to other popular solvers. We demonstrate its essential role in enabling the discovery of novel ciliary motion phenotypes and in identifying physiochemical-perceptual associations.

Thesis Committee:
Chakra Chennubhotla (Advisor)
Takis BenosRussell Schwartz
D. Lansing Taylor Cecilia Lo (University of Pitt DevBiology)
Arvind Ramanathan (Oak Ridge Natational Lab)

Although all cells in our body contain the same genetic material in their DNA, they can perform vastly different functions, by selectively expressing subsets of their genes. Cell-type-specific gene regulation is achieved through an interplay between regulatory proteins, such as transcription factors, and epigenetic mechanisms, which affect the higher level organization of the genome. The interactions between these regulatory components and their dependence on DNA sequence information are only partially characterized. A better understanding of condition-specific regulatory mechanisms is important for understanding the causes of genetic diseases and for identifying potential targets for intervention.

Using computational techniques to analyze the variability in epigenetic state across genomic contexts and individuals, we were able to highlight, probably for the first time, the extensive plasticity of the epigenetic landscape. This work demonstrated the link between genetic and epigenetic variability, but also showed that the effects of this variation on gene regulation are highly combinatorial. Motivated by these results, we developed a novel machine learning framework for discovering cell-type-specific rules of regulation based on both the expression patterns of regulators and DNA sequence information. Unlike previous work in this field, our method incorporates the effect of the cell-type-specific activity of distal regulatory elements, such as enhancers, and takes advantage of prior knowledge regarding protein interactions. Using large-scale datasets from the Roadmap Epigenomics and ENCODE Projects, we constructed a regulatory map of a large number of human tissues. Our model achieves high predictive power and discovers both known and novel cell-type-specific regulators and context-specific interactions between them.

Solvation is a key component for understanding the structure, dynamics, and function of biomolecules. Both continuum-level (implicit) and molecular-level (explicit) descriptions for solvent have been used in computational models. While each level of description has its own strengths and weaknesses, implicit solvent models have become popular in many biophysical studies for their simplicity and computational efficiency, along with their reasonable accuracy. Implicit solvent models in which the polar contributions are typically decoupled from nonpolar contributions, are found to be inconsistent with recent studies on the solvation of atomistic and nanoscale solutes.

Unlike most implicit solvent approaches, differential geometry-based models introduce couplingbetween the polar and non-polar free energy functionals through a characteristic function that describes a smooth dielectric interface profile at the solvent-solute boundary in a thermodynamically self-consistent fashion. However, such models have not been systematically parameterized and tested for their predictive power, thus limiting the use of this model. By independently varying two important parameters of the model (hydrodynamic pressure and microscopic surface tension), we studied a set of 17 small organic molecules to investigate how changes in model parameters affect the predicted solvation energies. Additionally, we investigated the effect of different force-fields (AM1BCCv1/ZAP-9, AM1-BCCv1/Bondi, OPLS-AA, and PARSE) on the model performance. Our study provides useful insights on differential geometry-based implicit solvent models as well as improving the performance and robustness of these models.

About the Speaker


The data collected within The Cancer Genome Atlas (TCGA) project is exceptionally heterogeneous. Molecular profiling data generated by different measurement modalities as well as clinical information collected on each patient give rise to continuous, discrete, and categorical data with different distributional properties. Additional data is generated by a variety of analyses carried out on the individual data sets. Examples include functional or structural annotations of mutations, assignment of an expression subtype of a tumor, or enrichment or activity of molecular pathways, for each patient sample. The data also include missing values as well as interdependencies among the features that undoubtedly extend beyond pairwise correlations. I will describe our efforts towards identifying strong multivariate associations in the TCGA data using a framework based on random forest regression as well as development of web-based tools to interactively explore such associations. I will also describe our efforts to integrate these association data with other information from public biomedical resources using big graph analytics, with applications in drug repurposing.

About the Speaker

Neurons are structurally and functionally polarized cells. A hallmark of their polarized structure is the thin and long axon, which can extend at micrometer diameters for up to a meter in humans. Active transport of materials such as proteins and organelles within the axon, a process referred to as axonal transport, is essential to the differentiation, survival, and function of neurons. Axonal transport defects have been strongly implicated in many human neurodegenerative diseases such as Alzheimer.s disease. In this presentation I will introduce recent work of my lab on integrating engineering, computational, biophysical, and cell biological methods to understand how axonal transport is regulated to ensure that the right cargo is delivered to the right destination at the right time. I will start with a brief overview of the image-based computational analysis methods we developed for characterizing spatiotemporal dynamics of axonal transport. I will then focus on presenting results of applying these methods to analyze the regulatory mechanisms axonal transport. Lastly, I will briefly introduce some ongoing work on developing techniques for high-throughput analysis and active control of axonal transport.

About the Speaker.

Transcriptional gene regulation is a dynamic process and its proper functioning is essential for all living organisms. By combining the abundant static regulatory data with time series expression data using an Input-Output Hidden Markov model (IOHMM) we were able to reconstruct a dynamic representations for these networks in multiple species. The models lead to testable temporal hypotheses identifying both new regulators and their time of activation. We have recently extended these methods to allow the modeling of various aspects of post-transcriptional regulation including temporal regulation by microRNAs and linking signaling and dynamic regulatory networks. The reconstructed networks link receptors and proteins that directly interact with the environment to the observed expression outcome.  I will discuss the application and experimental validation of predictions made by our methods focusing on stress response in yeast, lung development in mice. and human flu response. I would also mention a number of other extensions which we have used to study disease progression and the regulation of immune response.


Subscribe to Lane