A major step in the regulation of gene expression is binding of regulatory proteins called transcription factors (TFs) to specific short DNA sites in the promoters and enhancers of regulated genes. Mutations in TF binding sites can lead to dysregulated gene expression and contribute to disease. Importantly, even small changes in gene expression can lead to disease over time, and even a small increase or decrease in TF-DNA binding affinity can have important phenotypic consequences. Thus, sensitive quantitative approaches are needed to measure and model binding of TFs to the genome, and to understand how changes in TF binding lead to changes in gene expression levels.

As a first step toward a quantitative understanding of transcriptional regulation, we recently developed highly accurate regression models of TF-DNA binding, trained on in vitro data generated in my laboratory. The models use features derived from the DNA sequence and structure of potential binding sites, and they have numerous advantages over current motif models. Importantly, what allows us to train very accurate and quantitative models is the fact that we use a combination of state-of-art machine learning algorithms (e.g. ε-SVR with feature selection based on a modified version of LASSO) and experimental assays carefully designed to alleviate bias and minimize noise (e.g. genomic-context protein binding microarrays).

In two recent studies, our protein-DNA binding data and models led to new insights into the genomic recruitment of human TFs. We found that oncogenic transcription factors from the E2F family, currently believed to bind indirectly to >80% of their genomic target sites, can bind a wide variety of DNA sites in vitro. We used support vector regression models to capture the complex intrinsic binding specificity of E2Fs, and we found that it completely explains their in vivo genomic occupancy. In another study, focused on paralogous TFs with indistinguishable DNA motifs but different in vivo targets and regulatory roles, we found that such TFs interact differently with their genomic targets even in vitro. Using weighted regression models that incorporate information on the variance observed in replicate experiments, we show that differences in intrinsic specificity between paralogous TFs are a major determinant of their differential in vivo binding.

I will also discuss current and future efforts to quantify the influences of various cellular factors (such as protein competitors and cofactors, epigenetic modifications, etc.) on the genomic recruitment and the regulatory activity of human TFs.

Raluca Gordan is an assistant professor in the Center for Genomic & Computational Biology and the Department of Biostatistics and Bioinformatics at Duke University. She graduated from University of Iasi (Romania) in 2005 with a B.S. in Computer Science. She received her Ph.D. in Computer Science in 2009 from Duke University, followed by two years of postdoctoral training in computational and experimental regulatory genomics at the Harvard Medical School. Her research combines computational modeling and high-throughput experiments to study, at a quantitative level, how transcription factors identify and bind to their specific target sites across the genome, given the complexity of the genomic search space and the numerous competitive and cooperative interactions that happen in the cell's nucleus. Her research is currently supported by NIH and NSF. She is the recipient of starter grant awards from the PhRMA and March of Dimes foundations, and a recipient of the SLOAN fellowship in Computational and Evolutionary Molecular Biology.

Faculty Host: Andreas Pfenning

Improving security requires both empirically-grounded insights into existing systems and threats, as well as theoretically-grounded solutions that anticipate how future users and attackers will adapt. I will present examples of both. I’ll begin by introducing empirical methods that I created to bring quantitative rigor to the question of how users choose authentication secrets (PINs, passwords, and security questions), a topic that has long been misunderstood due to a lack of data. I will then present two theoretically-grounded approaches that apply cryptography to provide transparency that trusted authorities are behaving correctly. The first addresses servers for distributing public keys for secure communication, ensuring that the authority cannot lie without being detected.  The second ensures that banks that store bitcoins are solvent: that they actually are holding as many bitcoins as they have promised to their clients.

Joseph Bonneau is a Postdoctoral Researcher at Stanford University and a Technology Fellow at the Electronic Frontier Foundation. His research focuses on cryptography and security protocols, particularly how they interact with human and organizational behavior and economic incentives. Recently he has focused on Bitcoin and related cryptocurrencies and secure messaging tools. He is also known for his work on passwords and web authentication. He received a PhD from the University of Cambridge under the supervision of Ross Anderson and an BS/MS from Stanford under the supervision of Dan Boneh. Last year he was as a Postdoctoral Fellow at CITP, Princeton and he has previously worked at Google, Yahoo, and Cryptography Research Inc.

Faculty Host: Norman Sadeh

As robotic and autonomous systems become more ubiquitous and their applications more expansive, the problems we look to solve are often best characterized by desirable statistics or distributions. Automating search and exploration for mobile robots, for example, involves being able to make decisions based on distributed, probabilistic, and potentially sporadic information. I will discuss my work in developing optimal control techniques that allow such problems to be formulated directly in terms of spatial statistics using principles from ergodic theory (a trajectory’s distance from ergodicity, or its statistical distance from a distribution, can be used to define a metric suitable for optimal control). I will present experimental results using ergodic optimal control to automate active sensing tasks using a bio-inspired underwater robot, as well as future avenues of work with applications to precision agriculture, assisted surgery and rehabilitation.

Lauren Miller is a postdoctoral researcher at UC Berkeley working in the Automation Science Laboratory. She received an AB/BE in mechanical engineering from Dartmouth College in 2009, and MS and PhD degrees from Northwestern University in 2013 and 2015, respectively, as a member of the Neuroscience and Robotics Laboratory. Her research interests include robotics, optimal control, and active sensing. Lauren is also an active member in the Robotics and Automation society, and was recently chair of the student activities committee and an Administrative Committee member.

Faculty Host: David Wettergreen

Social networking sites make it easy for users to connect with, follow, or "like" each other. Such a mechanism promotes positive connections and helps a social networking site to grow without direct belligerent or negative encounters. This type of one-way connections makes no distinction between indifference and dislike; in other words, two users have only, by default, positive connections. However, it is  apparent that as one's network grows, some users might not be benevolent toward each other, or negative links could form, though not explicitly stated. In this talk, I assess the need for discovering such hidden negative links, explore ways of finding negative links, and show the significanceof negative links in social media applications.

Jiliang Tang is a research scientist at Yahoo Labs. He received his Ph.D. of Computer Science at Arizona State University in 2015, and B.S./M.S. from Beijing Institute of Technology in 2008 and 2010, respectively. His research interests include trust/distrust computing, signed network analysis, social computing, and data mining for social goods. He was awarded the Runner Up of SIGKDD Dissertation Award 2015, Dean’s Dissertation Award 2015, and the ASU President’s Award for Innovation 2014. He is the poster chair of SIGKDD2016 and serves as regular journal reviewers and numerous conference program committees. He co-presented three tutorials in KDD2014, WWW2014, and Recsys2014, and has published innovative works in highly ranked journals and top conference proceedings that have received extensive coverage in the media.

Faculty Host: Kathleen M. Carley

As software grows in size and complexity, it also becomes more interdependent. Multiple internal components often share state and data. Whether these dependencies are intentional or not, I have found that their mismanagement often poses several challenges to testing. My research seeks to make it easier to create reliable software by making testing more efficient and more effective through explicit knowledge of these hidden dependencies.

The first problem I address, reducing testing time, directly impacts the day-to-day work of every software developer, who are likely running tests daily if not more frequently on their code. Typical techniques for accelerating tests (like running only a subset of them, or running them in parallel) often can’t be applied soundly, since there may be hidden dependencies between tests. I have built several systems, VMVM and ElectricTest, that detect different sorts of dependencies between tests and use that information to soundly reduce testing time by several orders of magnitude. To enable more broad use of general dependency information for testing and other analyses, I created Phosphor, the first and only portable and performant dynamic taint tracking system for the JVM. Towards making testing more effective, I created Pebbles, which makes it easy for developers to specify data-related test oracles by thinking in terms of high level objects.

Jonathan Bell is a final-year PhD candidate at Columbia University studying Software Engineering and Systems with Prof Gail Kaiser. His research interests mostly fall under the umbrella of Software Testing and Program Analysis. Jon's recent research in accelerating software testing has been recognized with an ACM SIGSOFT Distinguished Paper Award (ICSE '14), and has been the basis for an industrial collaboration with the bay-area software build acceleration company Electric Cloud. Jon actively participates in the artifact evaluation program committees of ISSTA and OOPSLA, and has served several years as the Student Volunteer chair for OOPSLA.

Faculty Host: Jonathan Aldrich

Anonymity plays an important role in today's connected world. Anonymous communication enables users to perform commercial transactions without disclosing their identities, to participate in political and human rights advocacy and to engage in whistle blowing. However, achieving anonymity is hard for various reasons, for example, anonymity technology is misunderstood by users, abused by miscreants and blocked by Internet services.

I use machine learning, data analysis and measurement to understand the challenges of anonymous communication and design systems to improve anonymity. In this talk, I will discuss two issues: 1) new avenues for anonymity and 2) discrimination against existing anonymous users. Current anonymity systems focus strongly on location-based privacy but do not address many avenues for the leakage, especially identification through the content of data. I will talk about deanonymization attack using writing style and a new tool Anonymouth for improving content-based anonymity. On the other hand, many services on the Internet do not support existing anonymity systems and provide degraded service to anonymous users, resulting in them effectively being relegated to the role of second-class citizens on the Internet.

Sadia Afroz, PhD, is a research scientist at the International Computer Science Institute (ICSI). Before joining ICSI, She was a postdoc at UC Berkeley and a PhD student at Drexel University. Her work focuses on anti-censorship, anonymity and adversarial learning. Her work on adversarial authorship attribution received the 2013 Privacy Enhancing Technology (PET) award, the best student paper award at the 2012 Privacy Enhancing Technology Symposium (PETS) and the 2014 ACM SIGSAC dissertation award (runner-up). More about her research. 

Faculty Host: Lujo Bauer

In this talk, I will argue that understanding incentives of both attackers and targets has become critical to strengthening online security. I will advocate the need for an interdisciplinary research agenda, ranging from network measurements and large-scale data analysis to human factor modeling. Using case studies (online sale of unlicensed pharmaceutical drugs, and anonymous marketplaces), I will first describe how longitudinal, large-scale measurements and data analysis reveal important economic and structural properties of a priori complex criminal ecosystems. I will then discuss how these structural properties can be used to design successful interventions, both from a policy and from a technical angle.

On the policy side, I will show that our criminal ecosystem analysis evidences "concentration points," whose disruption could effectively hamper illicit operations. On the technical side, I will demonstrate how we can use adversaries' incentives to design and build systems that can proactively identify future attack targets. I will conclude by outlining a roadmap for security research combining measurements, mathematical modeling and behavioral aspects.

Nicolas Christin is an Assistant Research Professor in Electrical and Computer Engineering at Carnegie Mellon University, where he has also affiliations with CyLab, the university-wide information security institute, the Information Networking Institute, the department of Engineering and Public Policy, and the Societal Computing doctoral program. He holds a Diplôme d'Ingénieur from École Centrale Lille, and M.S. and Ph.D. degrees in Computer Science from the University of Virginia.

He was a researcher in the School of Information at the University of California, Berkeley, prior to joining Carnegie Mellon in 2005. His research interests are in computer and information systems security; most of his work is at the boundary of systems and policy research. He has most recently focused on security analytics, online crime modeling, and economics and human aspects of computer security. His group's research won several awards including Honorable Mention at ACM CHI 2011 and 2016, and Best Student Paper Award at USENIX Security 2014.

Faculty Host: Lorris Cranor

Program comprehension is a fundamental activity during software maintenance and evolution, accounting for almost half of the resources invested in software change.  Together with the source code, software documentation is a critical resource when comprehending a software system.  Documentation, however, is far from ideal--more often than not, documentation is missing or it is outdated, it is difficult to access, and it lacks of standard format.

This talk will give an overview of my research on supporting developers during software (re)documentation through automatic summarization of various software artifacts. This work includes: the automatic summarization of classes in Object-Oriented systems, the automatic generation of release notes, and mining method API usage examples from existing software. The talk will primarily focus on the automatic generation of release notes, which are complex software artifacts that summarize the changes that occurred between two versions of a software system. I will discuss the different challenges of creating release notes and how my research work addressed them by integrating static code analysis, software summarization, and mining software repositories techniques. The empirical validation of the approach will also be presented. Finally, I will present future directions on the automatic software (re)documentation research.

Laura Moreno is a Ph.D. candidate at the University of Texas at Dallas, advised by Dr. Andrian Marcus.  Her research interest is in Software Engineering, with particular emphasis on Program Comprehension and Software Maintenance and Evolution.  The core of her research is empirical in nature and focuses on the development of tools, methodologies and practices that help software developers better understand and change large-scale software.  Her dissertation work, “Software Documentation through Summarization of Source Code Artifacts”, leverages information contained in various software artifacts and utilizes techniques from diverse fields, such as, natural language processing, data mining, software analysis, and information retrieval. 

Papers resulting from her research have been published in top software engineering venues, including the IEEE/ACM International Conference on Software Engineering (ICSE), the ACM/SIGSOFT Symposium on the Foundations of Software Engineering (FSE), the IEEE/ACM International Conference on Automated Software Engineering (ASE), and the IEEE International Conference on Software Maintenance and Evolution (ICSME). She has served as organizing committee member and program committee member for several conferences in the field.

Faculty Host: Christian Kästner

Humans as well as information are organized in networks. Interacting with these networks is part of our daily lives: we talk to friends in our social network; we find information by navigating the Web; and we form opinions by listening to others and to the media. Thus, understanding, predicting, and enhancing human behavior in networks poses important research problems for computer and data science with practical applications of high impact. In this talk I will present some of my work in this area, focusing on (1) human navigation of information networks and (2) person-to-person opinions in social networks.

Network navigation constitutes a fundamental human behavior: in order to make use of the information and resources around us, we constantly explore, disentangle, and navigate networks such as the Web. Studying navigation patterns lets us understand better how humans reason about complex networks and lets us build more human-friendly information systems. As an example, I will present an algorithm for improving website hyperlink structure by mining raw web server logs. The resulting system is being deployed on Wikipedia's full server logs at terabyte scale, producing links that are clicked 15 times as frequently as the average link added by human Wikipedia editors.

Communication and coordination through natural language is another prominent human network behavior. Studying the interplay of social network structure and language has the potential to benefit both sociolinguistics and natural language processing. Intriguing opportunities and challenges have arisen recently with the advent of online social media, which produce large amounts of both network and natural language data. As an example, I will discuss my work on person-to-person sentiment analysis in social networks, which combines the sociological theory of structural balance with techniques from natural language processing, resulting in a machine learning model for sentiment prediction that clearly outperforms both text-only and network-only versions.

I will conclude the talk by sketching interesting future directions for computational approaches to studying and enhancing human behavior in networks.

Robert West is a sixth-year Ph.D. candidate in Computer Science in the Infolab at Stanford University, advised by Jure Leskovec. His research aims to understand, predict, and enhance human behavior in social and information networks by developing techniques in data science, data mining, network analysis, machine learning, and natural language processing. Previously, he obtained a Master's degree from McGill University in 2010 and a Diplom degree from Technische Universität München in 2007.

Faculty Host: Zico Kolter

Robotics is currently moving from repeating a few deterministic tasks a million times to learning millions of tasks, repeated just a few times in typically stochastic environments. This paradigm shift requires rethinking robot learning algorithms as all decision making processes are thus based on uncertain, incomplete observations obtained from high-dimensional sensory in-put.

Thus, data-driven action generation can no longer rely on simply reproducing good trajectories but rather has to take the un-certainty on demonstrated and experienced movements into account. Using these insights, I will present probabilistic approaches to the representation, execution and learning of movement policies. Central to these approaches is a new skill representation called probabilistic movement primitives (ProMP) that allow capturing the variability and inherent correlations essential to a better generalization of the task from few examples. With such ProMPs, difficult robot learning problems can be treated in a principled manner. For example, coupling of movements to selected perceptual input and prioritized concurrent execution of movements can be achieved using classical operators from probability theory.

While the resulting probabilistic policies naturally enable learning from demonstrations, they can-not automatically address the exploration-exploitation dilemma. I will show that new class of reinforcement algorithms arises from information theoretic insights by bounding both the loss of in-formation and entropy during the policy updates in reward-related self-improvement. The resulting methods have been used to improve single stroke movements and learn complex non-parametric policies in hierarchical reinforcement learning problems. <

To link these policies with high-dimensional partial observations obtained in form of tactile feed-back or visual point clouds, we need implicit feature representations. I will show how such representations can be used both in the robot learning architecture above as well as for model learning, filtering, smoothing and prediction. Results on both real and simulated robot systems underline the success of the presented approaches.

Gerhard Neumann is assistant professor at the TU Darmstadt since September 2014 and head of the Computational Learning for Autonomous Systems (CLAS) group. His research concentrates on policy search methods and movement representations for robotics, hierarchical reinforcement learning, multi-agent reinforcement learning and planning and decision making under uncertainty. Before becoming assistant professor, he joined the IAS group in Darmstadt in November 2011 and became a Group Leader for Machine Learning for Control in October 2013. Gerhard did his Ph.D. in Graz under the supervision of Wolfgang Maass which he finished in April 2012. He is principle investigator of the EU H2020 project "Romans" and project leader of the DFG project "LearnRobots" for the SPP "Autonomous Learning". His current group consists of 2 post-docs and 3 PhD students.

Faculty Host: Artur Dubrawski


Subscribe to SCSFC