Assistant Professor · Carnegie Mellon University · Institute for Software Research

I am an assistant professor in the Institute of Software Research at the Carnegie Mellon University interested in controlling the complexity caused by variability in software systems. I develop mechanisms, languages, and tools to implement variability in a disciplined way, to detect errors, and to improve program comprehension in systems with a high amount of variability. Currently, I investigate approaches to parse and type check all compile-time configurations of the Linux kernel in the TypeChef project.
My research in keywords: variability and reuse; software product lines; multidimensional separation of concerns; modularity, cohesion, information hiding and module systems; program comprehension; virtual separation of concerns; feature-oriented software development; parsers, type systems, and lightweight program analyses; refactoring; aspect-orientation; program synthesis; feature interactions; feature location; Linux; and empirical methods.
Profiles: Curriculum vitae, Google Scholar, Microsoft Academic, ACM, dblp.

For a complete list of publications, see the publication page.
While standardization has empowered the software industry to substantially scale software development and to provide affordable software to a broad market, it often does not address smaller market segments, nor the needs and wishes of individual customers. Software product lines reconcile mass production and standardization with mass customization in software engineering. Ideally, based on a set of reusable parts, a software manufacturer can generate a software product based on the requirements of its customer. The concept of features is central to achieving this level of automation, because features bridge the gap between the requirements the customer has and the functionality a product provides. Thus features are a central concept in all phases of product-line development. The authors take a developer’s viewpoint, focus on the development, maintenance, and implementation of product-line variability, and especially concentrate on automated product derivation based on a user’s feature selection. The book consists of three parts. Part I provides a general introduction to feature-oriented software product lines, describing the product-line approach and introducing the product-line development process with its two elements of domain and application engineering. The pivotal Part II covers a wide variety of implementation techniques including design patterns, frameworks, components, feature-oriented programming, and aspect-oriented programming, as well as tool-based approaches including preprocessors, build systems, version-control systems, and virtual separation of concerns. Finally, Part III is devoted to advanced topics related to feature-oriented product lines like refactoring, feature interaction, and analysis tools specific to product lines. In addition, an Appendix lists various helpful tools for software product-line development, along with a description of how they relate to the topics covered in this book. To tie the book together, the authors use two running examples that are well documented in the product-line literature: data management for embedded systems, and variations of graph data structures. They start every chapter by explicitly stating the respective learning goals and finish it with a set of exercises; additional teaching material is also available online. All these features make the book ideally suited for teaching – both for academic classes and for professionals interested in self-study.
Module systems enable a divide and conquer strategy to software development. To implement compile-time variability in software product lines, modules can be composed in different combinations. However, this way variability dictates a dominant decomposition. Instead, we introduce a variability-aware module system that supports compile-time variability inside a module and its interface. This way, each module can be considered a product line that can be type checked in isolation. Variability can crosscut multiple modules. The module system breaks with the antimodular tradition of a global variability model in product-line development and provides a path toward software ecosystems and product lines of product lines developed in an open fashion. We discuss the design and implementation of such a module system on a core calculus and provide an implementation for C, which we use to type check the open source product line Busybox with 811 compile-time options.
Software-product-line engineering aims at the development of variable and reusable software systems. In practice, software product lines are often implemented with preprocessors. Preprocessor directives are easy to use, and many mature tools are available for practitioners. However, preprocessor directives have been heavily criticized in academia and even referred to as “#ifdef hell”, because they introduce threats to program comprehension and correctness. There are many voices that suggest to use other implementation techniques instead, but these voices ignore the fact that a transition from preprocessors to other languages and tools is tedious, erroneous, and expensive in practice. Instead, we and others propose to increase the readability of preprocessor directives by using background colors to highlight source code annotated with ifdef directives. In three controlled experiments with over 70 subjects in total, we evaluate whether and how background colors improve program comprehension in preprocessor-based implementations. Our results demonstrate that background colors have the potential to improve program comprehension, independently of size and programming language of the underlying product. Additionally, we found that subjects generally favor background colors. We integrate these and other findings in a tool called FeatureCommander, which facilitates program comprehension in practice and which can serve as a basis for further research.
Customizable programs and program families provide user-selectable features to tailor a program to an application scenario. Knowing in advance which feature selection yields the best performance is difficult because a direct measurement of all possible feature combinations is infeasible. Our work aims at predicting program performance based on selected features. The challenge is predicting performance accurately when features interact. An interaction occurs when a feature combination has an unexpected influence on performance. We present a method that automatically detects performance feature interactions to improve prediction accuracy. To this end, we propose three heuristics to reduce the number of measurements required to detect interactions. Our evaluation consists of six real-world case studies from varying domains (e.g. databases, compression libraries, and web server) using different configuration techniques (e.g., configuration files and preprocessor flags). Results show, on average, a prediction accuracy of 95 %.
Superimposition is a composition technique that has been applied successfully in many areas of software development. Although superimposition is a general-purpose concept, it has been (re)invented and implemented individually for various kinds of software artifacts. We unify languages and tools that rely on superimposition by using the language-independent model of feature structure trees (FSTs). On the basis of the FST model, we propose a general approach to the composition of software artifacts written in different languages. Furthermore, we offer a supporting framework and tool chain, called FeatureHouse. We use attribute grammars to automate the integration of additional languages. In particular, we have integrated Java, C#, C, Haskell, Alloy, and JavaCC. A substantial number of case studies demonstrate the practicality and scalability of our approach and reveal insights into the properties that a language must have in order to be ready for superimposition. We discuss perspectives of our approach and demonstrate how we extended FeatureHouse with support for XML languages (in particular, XHTML, XMI/UML, and Ant) and alternative composition approaches (in particular, aspect weaving). Rounding off our previous work, we provide here a holistic view of the FeatureHouse approach based on rich experience with numerous languages and case studies and reflections on several years of research.
Modularity of feature representations has been a long standing goal of feature-oriented software development. While some researchers regard feature modules and corresponding composition mechanisms as a modular solution, other researchers have challenged the notion of feature modularity and pointed out that most feature-oriented implementation mechanisms lack proper interfaces and support neither modular type checking nor separate compilation. We step back and reflect on the feature-modularity discussion. We distinguish two notions of modularity, cohesion without interfaces and information hiding with interfaces, and point out the different expectations that, we believe, are the root of many heated discussions. We discuss whether feature interfaces should be desired and weigh their potential benefits and costs, specifically regarding crosscutting, granularity, feature interactions, and the distinction between closed-world and open-world reasoning. Because existing evidence for and against feature modularity and feature interfaces is shaky and inconclusive, more research is needed, for which we outline possible directions.
In many projects, lexical preprocessors are used to manage different variants of the project (using conditional compilation) and to define compile-time code transformations (using macros). Unfortunately, while being a simply way to implement variability, conditional compilation and lexical macros hinder automatic analysis, even though such analysis would be urgently needed to combat variability-induced complexity. To analyze code with its variability, we need to parse it without preprocessing it. However, current parsing solutions use heuristics, support only a subset of the language, or suffer from exponential explosion. As part of the TypeChef project, we contribute a novel variability-aware parser that can parse unpreprocessed code without heuristics in practicable time. Beyond the obvious task of detecting syntax errors, our parser paves the road for further analysis, such as variability-aware type checking. We implement variabilityaware parsers for Java and GNU C and demonstrate practicability by parsing the product line MobileMedia and the entire X86 architecture of the Linux kernel with 6065 variable features.
What is modularity? Which kind of modularity should developers strive for? Despite decades of research on modularity, these basic questions have no definite answer. We submit that the common understanding of modularity, and in particular its notion of information hiding, is deeply rooted in classical logic. We analyze how classical modularity, based on classical logic, fails to address the needs of developers of large software systems, and encourage researchers to explore alternative visions of modularity, based on nonclassical logics, and henceforth called nonclassical modularity.
Software-product-line engineering is an efficient means to generate a family of program variants for a domain from a single code base. However, because of the potentially high number of possible program variants, it is difficult to test them all and ensure properties like type safety for the entire product line. We present a product-line–aware type system that can type check an entire software product line without generating each variant in isolation. Specifically, we extend the Featherweight Java calculus with feature annotations for product-line development and prove formally that all program variants generated from a well-typed product line are well-typed. Furthermore, we present a solution to the problem of typing mutually exclusive features. We discuss how results from our formalization helped implementing our own product-line tool CIDE for full Java and report of experience with detecting type errors in four existing software-product-line implementations.
The C preprocessor cpp is a widely used tool for implementing variable software. It enables programmers to express variable code of features that may crosscut the entire implementation with conditional compilation. The C preprocessor relies on simple text processing and is independent of the host language (C, C++, Java, and so on). Language independent text processing is powerful and expressive|programmers can make all kinds of annotations in the form of #ifdefs but can render unpreprocessed code difficult to process automatically by tools, such as code aspect refactoring, concern management, and also static analysis and variability-aware type checking. We distinguish between disciplined annotations, which align with the underlying source-code structure, and undisciplined annotations, which do not align with the structure and hence complicate tool development. This distinction raises the question of how frequently programmers use undisciplined annotations and whether it is feasible to change them to disciplined annotations to simplify tool development and to enable programmers to use a wide variety of tools in the first place. By means of an analysis of 40 mediumsized to large-sized C programs, we show empirically that programmers use cpp mostly in a disciplined way: about 85 % of all annotations respect the underlying source-code structure. Furthermore, we analyze the remaining undisciplined annotations, identify patterns, and discuss how to transform them into a disciplined form.
Conditional compilation with preprocessors such as cpp is a simple but effective means to implement variability. By annotating code fragments with #ifdef and #endif directives, different program variants with or without these annotated fragments can be created, which can be used (among others) to implement software product lines. Although, such annotation-based approaches are frequently used in practice, researchers often criticize them for their negative effect on code quality and maintainability. In contrast to modularized implementations such as components or aspects, annotation-based implementations typically neglect separation of concerns, can entirely obfuscate the source code, and are prone to introduce subtle errors. Our goal is to rehabilitate annotation-based approaches by showing how tool support can address these problems. With views, we emulate modularity; with a visual representation of annotations, we reduce source code obfuscation and increase program comprehension; and with disciplined annotations and a product-line–aware type system, we prevent or detect syntax and type errors in the entire software product line. At the same time we emphasize unique benefits of annotations, including simplicity, expressiveness, and being language independent. All in all, we provide tool-based separation of concerns without necessarily dividing source code into physically separated modules; we name this approach virtual separation of concerns. We argue that with these improvements over contemporary preprocessors, virtual separation of concerns can compete with modularized implementation mechanisms. Despite our focus on annotation-based approaches, we do intend not give a definite answer on how to implement software product lines. Modular implementations and annotation-based implementations both have their advantages; we even present an integration and migration path between them. Our goal is to rehabilitate preprocessors and show that they are not a lost cause as many researchers think. On the contrary, we argue that – with the presented improvements – annotation-based approaches are a serious alternative for product-line implementation.
Feature-Oriented Software Development (FOSD) provides a multitude of formalisms, methods, languages, and tools for building variable, customizable, and extensible software. Along different lines of research, different notions of a feature have been developed. Although these notions have similar goals, no common basis for evaluation, comparison, and integration exists. We present a feature algebra that captures the key ideas of feature orientation and provides a common ground for current and future research in this field, in which also alternative options can be explored. Furthermore, our algebraic framework is meant to serve as a basis for the upcoming development paradigms automatic feature-based program synthesis and architectural metaprogramming.
A feature-oriented product line is a family of programs that share a common set of features. A feature implements a stakeholder's requirement and represents a design decision or configuration option. When added to a program, a feature involves the introduction of new structures, such as classes and methods, and the refinement of existing ones, such as extending methods. A feature-oriented decomposition enables a generator to create an executable program by composing feature code solely on the basis of the feature selection of a user – no other information needed. A key challenge of product line engineering is to guarantee that only well-typed programs are generated. As the number of valid feature combinations grows combinatorially with the number of features, it is not feasible to type check all programs individually. The only feasible approach is to have a type system check the entire code base of the feature-oriented product line. We have developed such a type system on the basis of a formal model of a feature-oriented Java-like language. The type system guaranties type safety for feature-oriented product lines. That is, it ensures that every valid program of a well-typed product line is well-typed. Our formal model including type system is sound and complete.
Over 30 years ago, the preprocessor cpp was developed to extend the programming language C by lightweight metaprogramming capabilities. Despite its error-proneness and low abstraction level, the cpp is still widely being used in presentday software projects to implement variable software. However, not much is known about how the cpp is employed to implement variability. To address this issue, we have analyzed forty open-source software projects written in C. Specifically, we answer the following questions: How does program size influence variability? How complex are extensions made via cpp's variability mechanisms? At which level of granularity are extensions applied? What is the general type of extensions? These questions revive earlier discussions on understanding and refactoring of the preprocessor. To answer them, we introduce several metrics measuring the variability, complexity, granularity, and type of extensions. Based on the data obtained, we suggest alternative implementation techniques. The data we have collected can influence other research areas, such as language design and tool support.
Conditional compilation with preprocessors like cpp is a simple but effective means to implement variability. By annotating code fragments with #ifdef and #endif directives, different program variants with or without these fragments can be created, which can be used (among others) to implement software product lines. Although, preprocessors are frequently used in practice, they are often criticized for their negative effect on code quality and maintainability. In contrast to modularized implementations, for example using components or aspects, preprocessors neglect separation of concerns, are prone to introduce subtle errors, can entirely obfuscate the source code, and limit reuse. Our aim is to rehabilitate the preprocessor by showing how simple tool support can address these problems and emulate some benefits of modularized implementations. At the same time we emphasize unique benefits of preprocessors, like simplicity and language independence. Although we do not have a definitive answer on how to implement variability, we want highlight opportunities to improve preprocessors and encourage research toward novel preprocessor-based approaches.
Physical separation with class refinements and method refinements à la AHEAD and virtual separation using annotations à la #ifdef or CIDE are two competing groups of implementation approaches for software product lines with complementary advantages. Although both groups have been mainly discussed in isolation, we strive for an integration to leverage the respective advantages. In this paper, we provide the basis for such an integration by providing a model that supports both, physical and virtual separation, and by describing refactorings in both directions. We prove the refactorings complete, such that every virtually separated product line can be automatically transformed into a physically separated one (replacing annotations by refinements) and vice versa. To demonstrate the feasibility of our approach, we have implemented the refactorings in our tool CIDE and conducted four case studies.
Feature-oriented software development (FOSD) is a paradigm for the construction, customization, and synthesis of large-scale software systems. In this survey, we give an overview and a personal perspective on the roots of FOSD, connections to other software development paradigms, and recent developments in this field. Our aim is to point to connections between different lines of research and to identify open issues.
A software product-line is a family of related programs that are distinguished in terms of features. A feature implements a stakeholders' requirement. Different program variants specified by distinct feature selections are produced from a common code base. The optional feature problem describes a common mismatch between variability intended in the domain and dependencies in the implementation. When this occurs, some variants that are valid in the domain cannot be produced due to implementation issues. There are many different solutions to the optional feature problem, but they all suffer from drawbacks such as reduced variability, increased development effort, reduced efficiency, or reduced source code quality. In this paper, we examine the impact of the optional feature problem in two case studies in the domain of embedded database systems, and we survey different state-of-the-art solutions and their trade-offs. Our intension is to raise awareness of the problem, to guide developers in selecting an appropriate solution for their product-line project, and to identify opportunities for future research.
A software product line (SPL) is a family of related program variants in a well-defined domain, generated from a set of features. A fundamental difference from classical application development is that engineers develop not a single program but a whole family with hundreds to millions of variants. This makes it infeasible to separately check every distinct variant for errors. Still engineers want guarantees on the entire SPL. A further challenge is that an SPL may contain artifacts in different languages (code, documentation, models, etc.) that should be checked. In this paper, we present CIDE, an SPL development tool that guarantees syntactic correctness for all variants of an SPL. We show how CIDE's underlying mechanism abstracts from textual representation and we generalize it to arbitrary languages. Furthermore, we automate the generation of safe plug-ins for additional languages from annotated grammars. To demonstrate the language-independent capabilities, we applied CIDE to a series of case studies with artifacts written in Java, C++, C, Haskell, ANTLR, HTML, and XML.
Features express the variabilities and commonalities among programs in a software product line (SPL). A feature model defines the valid combinations of features, where each combination corresponds to a program in an SPL. SPLs and their feature models evolve over time. We classify the evolution of a feature model via modifications as refactorings, specializations, generalizations, or arbitrary edits. We present an algorithm to reason about feature model edits to help designers determine how the program membership of an SPL has changed. Our algorithm takes two feature models as input (before and after edit versions), where the set of features in both models are not necessarily the same, and it automatically computes the change classification. Our algorithm is able to give examples of added or deleted products and efficiently classifies edits to even large models that have thousands of features.
Superimposition is a composition technique that has been applied successfully in many areas of software development. Although superimposition is a general-purpose concept, it has been (re)invented and implemented individually for various kinds of software artifacts. We unify languages and tools that rely on superimposition by using the language-independent model of feature structure trees (FSTs). On the basis of the FST model, we propose a general approach to the composition of software artifacts written in different languages, Furthermore, we offer a supporting framework and tool chain, called FEATUREHOUSE. We use attribute grammars to automate the integration of additional languages, in particular, we have integrated Java, C#, C, Haskell, JavaCC, and XML. Several case studies demonstrate the practicality and scalability of our approach and reveal insights into the properties a language must have in order to be ready for superimposition.
A software product line (SPL) is an efficient means to generate a family of program variants for a domain from a single code base. However, because of the potentially high number of possible program variants, it is difficult to test all variants and ensure properties like type-safety for the entire SPL. While first steps to type-check an entire SPL have been taken, they are informal and incomplete. In this paper, we extend the Featherweight Java (FJ) calculus with feature annotations to be used for SPLs. By extending FJ's type system, we guarantee that – given a well-typed SPL – all possible program variants are welltyped as well. We show how results from this formalization reflect and help implementing our own language-independent SPL tool CIDE.
Building software product lines (SPLs) with features is a challenging task. Many SPL implementations support features with coarse granularity - e.g., the ability to add and wrap entire methods. However, fine-grained extensions, like adding a statement in the middle of a method, either require intricate workarounds or obfuscate the base code with annotations. Though many SPLs can and have been implemented with the coarse granularity of existing approaches, fine-grained extensions are essential when extracting features from legacy applications. Furthermore, also some existing SPLs could benefit from fine-grained extensions to reduce code replication or improve readability. In this paper, we analyze the effects of feature granularity in SPLs and present a tool, called Colored IDE (CIDE), that allows features to implement coarse-grained and fine-grained extensions in a concise way. In two case studies, we show how CIDE simplifies SPL development compared to traditional approaches.
Software product lines aim to create highly configurable programs from a set of features. Common belief and recent studies suggest that aspects are well-suited for implementing features. We evaluate the suitability of AspectJ with respect to this task by a case study that refactors the embedded database system Berkeley DB into 38 features. Contrary to our initial expectations, the results were not encouraging. As the number of aspects in a feature grows, there is a noticeable decrease in code readability and maintainability. Most of the unique and powerful features of AspectJ were not needed. We document where AspectJ is unsuitable for implementing features of refactored legacy applications and explain why.
Copyright Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
The cool wall was created and evolved during the yearly FOSD student meetings (see fosd.net). With it, we would like to encourage researchers to look for better tool names. Up to 2012, the listing was completely subjective, feel free to complain. Starting 2013, we started voting and giving out a Coolest Tool Name award.
