## Publications

Ajit Singh, Andrew Guillory, Jeff Bilmes. On Bisubmodular
Maximization. To appear, Artificial Intelligence & Statistics,
April 2012.

[Abstract]

**Abstract**: Bisubmodularity extends the concept of submodularity to set
functions with two arguments. We show how bisubmodular maximization
leads to richer value-of-information problems, using examples in
sensor placement and feature selection. We present the first
constant-factor approximation algorithm for a wide class of
bisubmodular maximizations.

Zafer Aydin, Ajit Singh, Jeff Bilmes, and William S. Noble. Learning sparse models for a dynamic Bayesian network classifier of protein secondary structure. BMC Bioinformatics 12:154. 2011.

[Abstract]
[Publisher, open access journal]

**Abstract**:
__Background__

Protein secondary structure prediction provides insight into protein function and is a valuable preliminary step for predicting the 3D structure of a protein. Dynamic Bayesian networks (DBNs) and support vector machines (SVMs) have been shown to provide state-of-the-art performance in secondary structure prediction. As the size of the protein database grows, it becomes feasible to use a richer model in an effort to capture subtle correlations among the amino acids and the predicted labels. In this context, it is beneficial to derive sparse models that discourage over-fitting and provide biological insight.
__Results__

In this paper, we first show that we are able to obtain accurate secondary structure predictions. Our per-residue accuracy on a well established and difficult benchmark (CB513) is 80.3%, which is comparable to the state-of-the-art evaluated on this dataset. We then introduce an algorithm for sparsifying the parameters of a DBN. Using this algorithm, we can automatically remove up to 70-95% of the parameters of a DBN while maintaining the same level of predictive accuracy on the SD576 set. At 90% sparsity, we are able to compute predictions three times faster than a fully dense model evaluated on the SD576 set. We also demonstrate, using simulated data, that the algorithm is able to recover true sparse structures with high accuracy, and using real data, that the sparse model identifies known correlation structure (local and non-local) related to different classes of secondary structure elements.
__Conclusions__

We present a secondary structure prediction method that employs dynamic Bayesian networks and support vector machines. We also introduce an algorithm for sparsifying the parameters of the dynamic Bayesian network. The sparsification approach yields a significant speed-up in generating predictions, and we demonstrate that the amino acid correlations identified by the algorithm correspond to several known features of protein secondary structure. Datasets and source code used in this study are available at http://noble.gs.washington.edu/proj/pssp

Ajit P. Singh and Geoffrey J. Gordon. A Bayesian Matrix Factorization Model for Relational Data. In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI). July 2010.

[Abstract]
[Paper]

- An earlier version of the paper, "A Bayesian Matrix Model for Relational Data", appears at the NIPS Workshop on Transfer Learning for Structured Data (TLSD), Dec 2009.

**Abstract**: Relational learning can be used to augment one data
source with other correlated sources of information, to improve
predictive accuracy. We frame a large class of relational learning
problems as matrix factorization problems, and propose a hierarchical
Bayesian model. Training our Bayesian model using random-walk
Metropolis-Hastings is impractically slow, and so we develop a block
Metropolis-Hastings sampler which uses the gradient and Hessian of the
likelihood to dynamically tune the proposal. We demonstrate that a
predictive model of brain response to stimuli can be improved by
augmenting it with side information about the stimuli.

Ajit P. Singh and Geoffrey J. Gordon. A Unified View of Matrix Factorization Models. In Machine Learning and Knowledge Discovery in Databases, European Conference (ECML/PKDD). September 2008.

[Abstract]
[Paper]
[Publisher]

**Abstract**: We present a unified view of matrix factorization
that frames the differences among popular methods, such as NMF,
Weighted SVD, E-PCA, MMMF, pLSI, pLSI-pHITS, Bregman co-clustering,
and many others, in terms of a small number of modeling choices. Many
of these approaches can be viewed as minimizing a generalized Bregman
divergence, and we show that (i) a straightforward alternating
projection algorithm can be applied to almost any model in our unified
view; (ii) the Hessian for each projection has special structure that
makes a Newton projection feasible, even when there are equality
constraints on the factors, which allows for matrix co-clustering; and
(iii) alternating projections can be generalized to simultaneously
factor a set of matrices that share dimensions. These observations
immediately yield new optimization algorithms for the above
factorization methods, and suggest novel generalizations of these
methods such as incorporating row and column biases, and adding or
relaxing clustering constraints.

Ajit P. Singh and Geoffrey J. Gordon. Relational Learning via Collective Matrix Factorization. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). August 2008.

[Abstract]
[Paper]
[Publisher]

**Abstract**: Relational learning is concerned with predicting unknown values of a relation, given a database of entities and observed relations between entities. An example of relational learning is movie rating prediction, where the entities are users, movies, genres, and actors. Relations encode each user’s ratings of movies, the genre of each movie, and the roles that actors played in movies. A common prediction technique given one pairwise relation, for example a #users × #movies matrix, is low-rank matrix factorization. In domains with multiple relations, represented as multiple matrices, we may improve predictive accuracy by exploiting information from one relation while predicting another. To this end, we propose a collective matrix factorization model: we simultaneously factor a set of matrices, sharing parameters among factors when an entity participates in multiple relations. Each relation can have a different value type and error distribution; to handle this diversity, we generalize our model to allow nonlinear relations between the parameters and outputs, using Bregman divergences to measure error. We extend standard alternating projection algorithms to collective factorization, and derive efficient Newton-Raphson updates. Furthermore, we propose stochastic optimization methods to deal with large, sparse matrices. Our model can handle any pairwise relational schema and a wide variety of error models. We demonstrate that additional relations improve prediction in the movie rating domain. Since our model generalizes several existing matrix factorization problems, this work also yields new large scale optimization algorithms for these problems.

Andreas Krause, Ajit Singh and Carlos Guestrin. Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies. Journal of Machine Learning Research 9(Feb), pp 236–284, 2008.

[Abstract]
[Paper]
[Publisher, open access journal]

**Abstract**: When monitoring spatial phenomena, which can often be
modeled as Gaussian processes (GPs), choosing sensor locations is a
fundamental task. There are several common strategies to address this
task, for example, geometry or disk models, placing sensors at the
points of highest entropy (variance) in the GP model, and A-, D-, or
E-optimal design. In this paper, we tackle the combinatorial
optimization problem of maximizing the mutual information between the
chosen locations and the locations which are not selected. We prove
that the problem of finding the configuration that maximizes mutual
information is NP-complete. To address this issue, we describe a
polynomial-time approximation that is within (1-1/e) of the optimum by
exploiting the submodularity of mutual information. We also show how
submodularity can be used to obtain online bounds, and design branch
and bound search procedures. We then extend our algorithm to exploit
lazy evaluations and local structure in the GP, yielding significant
speedups. We also extend our approach to find placements which are
robust against node failures and uncertainties in the model. These
extensions are again associated with rigorous theoretical
approximation guarantees, exploiting the submodularity of the
objective function. We demonstrate the advantages of our approach
towards optimizing mutual information in a very extensive empirical
study on two real-world data sets.

Tim van Allen, Ajit Singh, Russell Greiner, and Peter Hooper. Quantifying the uncertainty of a belief net response: Bayesian error-bars for belief net inference. Artificial Intelligence 172(4-5), 2008.

[Abstract]
[Paper]
[Publisher]

**Abstract**: A Bayesian belief network models a joint distribution
over variables using a DAG to represent variable dependencies and
network parameters to represent the conditional probability of each
variable given an assignment to its immediate parents. Existing
algorithms assume each network parameter is fixed. From a Bayesian
perspective, however, these network parameters can be random variables
that reflect uncertainty in parameter estimates, arising because the
parameters are learned from data, or because they are elicited from
uncertain experts.

Belief networks are commonly used to
compute responses to queries—i.e., return a number for
P(H=h|E=e). Parameter uncertainty induces uncertainty in query
responses, which are thus themselves random variables. This paper
investigates this query response distribution, and shows how to
accurately model this distribution for any query and any network
structure. In particular, we prove that the query response is
asymptotically Gaussian and provide its mean value and asymptotic
variance. Moreover, we present an algorithm for computing these
quantities that has the same worst-case complexity as inference in
general, and also describe straight-line code when the query includes
all n variables. We provide empirical evidence that (1) our
approximation of the variance is very accurate, and (2) a Beta
distribution with these moments provides a very accurate model of the
observed query response distribution. We also show how to use this to
produce accurate error bars around these responses—i.e., to determine
that the response to P(H=h|E=e) is x±y with confidence 1−δ.

Jure Leskovec, Ajit Singh, and Jon M. Kleinberg. Patterns of Influence in a Recommendation Network. In Advances in Knowledge Discovery and Data Mining, Tenth Pacific-Asia Conference (PAKDD). April 2006.

[Abstract]
[Paper]
[Publisher]

**Abstract**: Information cascades are phenomena in which
individuals adopt a new action or idea due to influence by others. As
such a process spreads through an underlying social network, it can
result in widespread adoption overall. We consider information
cascades in the context of recommendations, and in particular study
the patterns of cascading recommendations that arise in large social
networks. We investigate a large person-to-person recommendation
network, consisting of four million people who made sixteen million
recommendations on half a million products. Such a dataset allows us
to pose a number of fundamental questions: What kinds of cascades
arise frequently in real life? What features distinguish them? We
enumerate and count cascade subgraphs on large directed graphs; as one
component of this, we develop a novel efficient heuristic based on
graph isomorphism testing that scales to large datasets. We discover
novel patterns: the distribution of cascade sizes is approximately
heavy-tailed; cascades tend to be shallow, but occasional large bursts
of propagation can occur. The relative abundance of different cascade
subgraphs suggests subtle properties of the underlying social network
and recommendation process.

Carlos Guestrin, Andreas Krause and Ajit P. Singh. Near-optimal sensor
placements in Gaussian processes. In Proceedings of the Twenty-Second International Conference on Machine Learning (ICML). 2005. Winner of the Best Paper Runner-up Award.

[Abstract]
[Paper]
[Publisher]

**Abstract**: When monitoring spatial phenomena, which are often
modeled as Gaussian Processes (GPs), choosing sensor locations is a
fundamental task. A common strategy is to place sensors at the points
of highest entropy (variance) in the GP model. We propose a mutual
information criteria, and show that it produces better
placements. Furthermore, we prove that finding the configuration that
maximizes mutual information is NP-complete. To address this issue, we
describe a polynomial-time approximation that is within (1 − 1/e) of
the optimum by exploiting the submodularity of our criterion. This
algorithm is extended to handle local structure in the GP, yielding
significant speedups. We demonstrate the advantages of our approach on
two real-world data sets.

Renée Elio, Afsaneh Haddadi and Ajit Singh. Abstract Task Specifications for Conversation Policies. Proceedings of the Fourth International Conference on Autonomous Agents (AGENTS), 2000.

[Abstract]
[Paper]
[Publisher]

**Abstract**: Under some views, a crucial function for conversation
policies is to "constrain the messages that appear on the wire," for
there can be a many-to-many mapping between an agent’s intention and
the message primitive used to express that intention. In this paper,
we argue that the way to constrain messages is to constrain
intentions. We propose a pragmatic approach to doing this through an
abstract task specification or model. Abstract task specifications are
based on a simple state-space representation for problem formulation,
and this representation can be used to delimit the agents’ task
intentions and discourse intentions. This analysis supports a flexible
and pragmatic way of handling "unexpected" messages by reference to
the abstract task specification. We see an abstract task specification
as one component of a publicly posted conversation policy. A simple
search-assistant agent was implemented within a BDI architecture to
illustrate application of these ideas.

Renée Elio, Afsaneh Haddadi and Ajit Singh. Task Models, Intentions, and Agent Conversation Policies. Sixth Pacific Rim International Conference on Artificial Intelligence (PRICAI), 2000.

[Abstract]
[Paper]

**Abstract**: It is possible to define conversation policies, such
as communication or dialogue protocols, that are based strictly on
what messages and, respectively, what performatives may follow each
other. While such an approach has many practical applications, such
protocols support only "local coherence" in a conversation. Lengthy
message exchanges require some infrastructure to lend them "global
coherence." Recognition of agent intentions about the joint task is
essential for this global coherence, but there are further mechanisms
needed to ensure that both local and global coherence are jointly
maintained. This paper presents a general yet practical approach to
designing, managing, and engineering agents that can do simple
run-time intention recognition without creating complex multi-state
protocols. In this approach we promote developing abstract task models
and designing conversation policies in terms of such models. An
implemented agent assistant based on these ideas is briefly described.

## Theses

Ajit P. Singh. Efficient Matrix Models for Relational
Learning. Ph.D. Thesis CMU-ML-09-111. Machine Learning Department,
Carnegie Mellon University. October 2009.

[Abstract]
[Tech Report]

Information integration deals with the setting where one has multiple sources of data, each describing different properties of the same set of entities. We are concerned primarily with settings where the properties are pairwise relations between entities, and attributes of entities. We want to predict the value of relations and attributes, but relations between entities violate the basic statistical assumption of exchangeable data points, or entities. Furthermore, we desire models that scale gracefully as the number of entities and relations increase.

Matrices are the simplest form of relational data; and we begin by distilling the literature on low-rank matrix factorization into a small number of modelling choices. We then frame information integration as simultaneously factoring sets of related matrices: i.e., Collective Matrix Factorization. Each entity is described by a small number of parameters, and if an entity is described by more than one matrix, those parameters participate in multiple matrix factorizations. Maximum likelihood estimation of the resulting model involves a large non-convex optimization, which we reduce to cyclically solving convex optimizations over small subsets of the parameters. Each convex subproblem can be solved by Newton-Raphson, which we extend to the setting of stochastic Newton-Raphson.

To address the limitations of maximum likelihood estimation in matrix factorization models, we extend our approach to the hierarchical Bayesian setting. Here, Bayesian estimation involves computing a high-dimensional integral with no analytic form. If we resorted to standard Metropolis-Hastings techniques, slow mixing would limit the scalability of our approach to large sets of entities. We show how to accelerate Metropolis-Hastings by using our efficient solution for maximum likelihood estimation to guide the sampling process.

This thesis rests on two claims, that (i) that Collective Matrix Factorization can effectively integrate different sources of data to improve prediction; and, (ii) that training scales well as the number of entities and observations increase. Two real-world data sets are considered in experimental support of these claims: augmented collaborative filtering and augmented brain imaging. In augmented collaborative filtering, we show that genre information about movies can be used to increase the predictive accuracy of user's ratings. In augmented brain imaging, we show that word co-occurrence information can be used to increase the predictive accuracy of a model of changes in brain activity to word stimuli, even in regions of the brain that were never included in the training data.

Ajit P. Singh. What to do when you don't have much data: Issues in small sample parameter learning in Bayesian Networks. M.Sc. Thesis. University of Alberta, Department of Computing Science. July 2003.

[Abstract]
[Paper]

**Abstract**: A discrete Bayesian network is a factorization of a
joint distribution over random variables. The most common use of these
networks is for the computation of conditional probabilities (query
responses). The parameters of these networks are often learned from
data. Thus network parameters are themselves uncertain, which induces
a distribution for any query response. When data sets are small, the
effects of parameter uncertainty can be severe. In this thesis we
argue that when data sets are small, the distribution of a query
response is accurately modelled by a Beta distribution. Procedures for
the modeling of the query response are also reviewed. Furthermore, we
examine proposed techniques for parameter learning when data sets are
small and only one query is of interest.

## Other Contributions

Ajit P. Singh, John Halloran, Jeff A. Bilmes, Katrin Kirchoff, William
S. Noble. To appear, RECOMB Satellite Conference on Computational
Proteomics, April 2012.

[Abstract]

**Abstract**: Foundational to shotgun proteomics is the *spectrum
identification problem*: i.e., map each observed MS2
fragmentation spectrum to its corresponding peptide. In the
context of database search, we propose a new class of scoring
functions for peptide-spectrum matches, based on modeling peptide
fragmentation using a dynamic Bayesian network (DBN). Our approach
has no free parameters, involves no training, and outperforms
SEQUEST on fully-tryptic searches. Our approach is also
competitive with Percolator, a trained discriminative model. We
further contend that the use of DBNs leads to scoring functions
that are easy to extend with additional knowledge about peptide
fragmentation.

Ajit P. Singh, Asela Gunawardena, Chris Meek, and Arun
C. Surendran. Recommendations using Absorbing Random Walks. North
East Student Colloquium on Artificial Intelligence (NESCAI). Cornell
University, April 2007.

[Abstract]
[Paper]

**Abstract**: Collaborative filtering attempts to find items of
interest for a user by utilizing the preferences of other users. In
this paper we describe an approach to filtering that explicitly uses
social relationships, such as friendship, to find items of interest to
a user. Modeling user-item relations as a bipartite graph we augment
it with user-user (social) links and propose an absorbing random walk
that induces a set of stationary distributions, one per user, over all
items. These distributions can be interpreted as personalized rankings
for each user. We exploit sparsity of both user-item and user-user
relationships to improve the efficiency of our algorithm.

Ajit P. Singh and Andrew W. Moore. Finding Optimal Bayesian Networks by Dynamic Programming. Technical Report CMU-CALD-05-106, Carnegie Mellon University. 2005.

[Abstract]
[Paper]
[Publisher]

**Abstract**: Finding the Bayesian network that maximizes a score
function is known as structure learning or structure discovery. Most
approaches use local search in the space of acyclic digraphs, which is
prone to local maxima. Exhaustive enumeration requires
super-exponential time. In this paper we describe a “merely”
exponential space/time algorithm for finding a Bayesiannetwork that
corresponds to a global maxima of a decomposable scoring function,
such as BDeu or BIC.

Ajit Singh, Russell Greiner, Victor Dorian and Brent
Lefebvre. Automated diagnosis of IEMs using high-throughput NMR
spectroscopy. Poster, Intelligent Systems for Molecular Biology
(ISMB). 2002.

[Abstract]
[PDF]

**Abstract**: We examine the utility of high throughput Nuclear Magnetic Resonance (NMR) data for the diagnosis of inborn errors of metabolism. We propose a two-layer Bayesian network with the novel addition of Bayesian error bars, which provide confidence intervals for our diagnoses.

Ajit Singh. Implementation of a Real-Time Kalman Filtering Navigation System. Technical Report FR-132, National Research Council Canada. 2000.

[Publisher]