Christian Kästner

Associate Professor · Carnegie Mellon University · Institute for Software Research

Publications

Key publications highlighted in yellow.

Christian Kästner. Machine Learning in Production: From Models to Products. Cambridge, MA: The MIT Press, April 2025. [ open access, publisher, bib ]

A practical and innovative textbook detailing how to build real-world software products with machine learning components, not just models. Traditional machine learning texts focus on how to train and evaluate the machine learning model, while MLOps books focus on how to streamline model development and deployment. But neither focus on how to build actual products that deliver value to users. This practical textbook, by contrast, details how to responsibly build products with machine learning components, covering the entire development lifecycle from requirements and design to quality assurance and operations. Machine Learning in Production brings an engineering mindset to the challenge of building systems that are usable, reliable, scalable, and safe within the context of real-world conditions of uncertainty, incomplete information, and resource constraints. Based on the author's popular class at Carnegie Mellon, this pioneering book integrates foundational knowledge in software engineering and machine learning to provide the holistic view needed to create not only prototype models but production-ready systems. • Integrates coverage of cutting-edge research, existing tools, and real-world applications • Provides students and professionals with an engineering view for production-ready machine learning systems • Proven in the classroom • Offers supplemental resources including slides, videos, exams, and further readings

Shyam Agarwal, Courtney Miller, Christian Kästner, and Bogdan Vasilescu. 3100 Opinions on Code Review in an AI World: Building Causal Theory from Practitioner Discourse. Technical Report 2607.07980, arXiv, July 2026. [ arXiv, bib ]

Coding agents now author entire pull requests, and practitioners sharply disagree about what this does to code review: whether it becomes the bottleneck, whether human review is still necessary, and whether it quietly erodes the understanding that it once built. Repository-mining studies measure surface trends but seldom explain the mechanisms beneath them, and the trends themselves prove unstable. A motivating observational analysis of public GitHub activity finds that agent-authored pull requests are reviewed less often, merged several times faster, and discussed less than human-authored ones, yet the direction of these trends flips under different but equally defensible analysis choices, so the traces establish what is changing without explaining why. To recover the mechanisms, we synthesize practitioner discourse at scale into an explanatory theory: we collect 38,709 grey-literature documents (engineering blogs and Reddit threads), filter to those substantively about code review, and code a stratified random sample of 3,100 with an LLM-assisted pipeline, from which we build a causal model of 26 constructs and 67 relationships (64 directed, 3 contested). Its organizing claim is that review is the control point through which a coding agent's effect on software is decided, and that AI does not fix the sign of that effect: the team sets it, through the expertise its humans bring and how it structures the review process. The theory makes the competing positions explicit and turns "AI is changing code review" into falsifiable propositions with named constructs and moderators. As a secondary contribution, we offer the underlying LLM-assisted, grey-literature theory-building method as a scalable template for software-engineering research, with a public implementation.

Chenyang Yang, Xinran Zhao, Tongshuang Wu, and Christian Kästner. Better Harnesses, Smaller Models: Building 90% Cheaper Agents via Automated Harness Adaptation. Technical Report 2607.08938, arXiv, July 2026. [ arXiv, bib ]

Frontier LLM agents are automating many business tasks, but their high inference cost makes large-scale deployment unsustainable. Small language models (SLMs) offer a cheaper alternative, yet they typically fall short when swapped into a harness designed for a frontier LLM. We show that for many routine business tasks, SLM agents can match LLM performance at 90% lower cost, when paired with an adapted harness that can be automatically discovered by a meta agent. The key insight is that much of the task difficulty is shared across instances and can be lifted from the model into the harness via tailored instructions, tools, and orchestration loops. To study this systematically, we create a framework that maps agent failure modes to harness adaptation strategies, and build a harness optimizer that automatically discovers effective adaptations from failure trajectories. Across seven business-oriented agentic tasks and three SLM families, we found optimized harnesses significantly improve performance on 16 of 21 task-SLM pairs, with seven pairs closing the SLM-LLM performance gap and the best SLM agent recovering 89.7% of LLM performance at 4% of the cost. Our analysis further shows that adaptation works best for tasks with more repetitive workflows and for SLMs with sufficient base capabilities. Together, these results suggest that harness adaptation can expand the practical deployment range of SLM agents in routine business tasks.

Yining Hong, Yining She, Eunsuk Kang, Christopher Timperley, and Christian Kästner. Don't Make Models Guess Security and Safety: Symbolic Guardrails for Domain-Specific AI Agents. Technical Report 2604.15579, arXiv, April 2026. [ arXiv, bib ]

There is increasing interest in integrating AI agents that invoke tools into domain-specific commercial software, where unintended tool calls can cause serious security and safety incidents. This has drawn growing research attention, and many agent security and safety benchmarks have emerged. They implicitly shape how the community approaches security and safety. Yet existing work exhibits a blind spot: it emphasizes training-based methods and neural guardrails, which reduce the likelihood of insecure or unsafe actions but cannot guarantee their prevention. It generally overlooks opportunities for deductive, symbolic guardrails grounded in standard software engineering practices, which can provide guarantees for some security and safety requirements. Our study has three parts: (1) a systematic review of 80 agent security and safety benchmarks finding that that 85% of benchmarks do not state verifiable requirements (61% provide none, and 24% give only high-level goals); (2) an applicability analysis of which security and safety requirements symbolic guardrails can and cannot enforce on τ2-Bench, CAR-bench, and MedAgentBench, finding that 74% of requirements are symbolically enforceable and 95% of these need only simple, low-cost checks; and (3) an empirical evaluation of symbolic guardrails on the same three benchmarks, finding that symbolic guardrails improve security and safety without sacrificing utility, and often improve it. Our work draws attention to the potential for symbolic guardrails for AI agents, suggesting them as an overlooked but practical path toward deploying domain-specific AI agents in risk-averse commercial software.

Chenyang Yang, Yike Shi, Qianou Ma, Michael Xieyang Liu, Christian Kästner, and Tongshuang Wu. What Prompts Don't Say: Understanding and Managing Underspecification in LLM Prompts. In Proceedings of the Annual Meeting of the Association for Computational Linguistics -- Findings (ACL), pages 9072--9101, July 2026. [ .pdf, arXiv, http, bib ]

Prompt underspecification is a common challenge when interacting with LLMs. In this paper, we present an in-depth analysis of this problem, showing that while LLMs can often infer unspecified requirements by default (41.1%), such behavior is fragile: Under-specified prompts are 2x as likely to regress across model or prompt changes, sometimes with accuracy drops exceeding 20%. This instability makes it difficult to reliably build LLM applications. Moreover, simply specifying all requirements does not consistently help, as models have limited instruction-following ability and requirements can conflict. Standard prompt optimizers likewise provide little benefit. To address these issues, we propose requirements-aware prompt optimization mechanisms that improve performance by 4.8% on average over baselines. We further advocate for a systematic process of proactive requirements discovery, evaluation, and monitoring to better manage prompt underspecification in practice.

Hao He, Courtney Miller, Shyam Agarwal, Christian Kästner, and Bogdan Vasilescu. Speed at the Cost of Quality: How Cursor AI Increases Short-Term Velocity and Long-Term Complexity in Open-Source Projects. In Proceedings of the 24th International Conference on Mining Software Repositories (MSR), New York, NY: ACM Press, April 2026. [ .pdf, arXiv, doi, bib ]

Large language models (LLMs) have demonstrated the promise to revolutionize the field of software engineering. Among other things, LLM agents are rapidly gaining momentum in software development, with practitioners reporting a multifold increase in productivity after adoption. Yet, empirical evidence is lacking around these claims. In this paper, we estimate the causal effect of adopting a widely popular LLM agent assistant, namely Cursor, on development velocity and software quality. The estimation is enabled by a state-of-the-art difference-in-differences design comparing Cursor-adopting GitHub projects with a matched control group of similar GitHub projects that do not use Cursor. We find that the adoption of Cursor leads to a statistically significant, large, but transient increase in project-level development velocity, along with a substantial and persistent increase in static analysis warnings and code complexity. Further panel generalized-method-of-moments estimation reveals that increases in static analysis warnings and code complexity are major factors driving long-term velocity slowdown. Our study identifies quality assurance as a major bottleneck for early Cursor adopters and calls for it to be a first-class citizen in the design of agentic AI coding tools and AI-driven workflows.

Nadia Nahar, Chenyang Yang, Yanxin Chen, Wesley Hanwen Deng, Ken Holstein, Motahhare Eslami, and Christian Kästner. "I Don't Think RAI Applies to My Model" -- Engaging Non-champions with Sticky Stories for Responsible AI Work. In Proceedings of the ACM CHI Conference on Human Factors in Computing Systems (CHI), Article No. 1373, pp. 1--23, New York, NY: ACM Press, April 2026. Best Paper Award. [ acm, arXiv, doi, bib ]

Responsible AI (RAI) tools—checklists, templates, and governance processes—often engage RAI champions, individuals intrinsically motivated to advocate ethical practices, but fail to reach non-champions, who frequently dismiss them as bureaucratic tasks. To explore this gap, we shadowed meetings and interviewed data scientists at an organization, finding that practitioners perceived RAI as irrelevant to their work. Building on these insights and theoretical foundations, we derived design principles for engaging non-champions, and introduced sticky stories—narratives of unexpected ML harms designed to be concrete, severe, surprising, diverse, and relevant, unlike widely circulated media to which practitioners are desensitized. Using a compound AI system, we generated and evaluated sticky stories through human and LLM assessments at scale, confirming they embodied the intended qualities. In a study with 29 practitioners, we found that, compared to regular stories, sticky stories significantly increased time spent on harm identification, broadened the range of harms recognized, and fostered deeper reflection.

Aarya Doshi, Yining Hong, Congying Xu, Eunsuk Kang, Alexandros Kapravelos, and Christian Kästner. Towards Verifiably Safe Tool Use for LLM Agents. In Proceedings of the Proc. International Conference on Software Engineering -- New Ideas Track (ICSE-NIER), pages 201--205, New York, NY: ACM Press, April 2026. [ acm, arXiv, doi, bib ]

Large language model (LLM)-based AI agents extend LLM capabilities by enabling access to tools such as data sources, APIs, search engines, code sandboxes, and even other agents. While this empowers agents, unintended tool interactions may also introduce risks, such as leaking sensitive data or overwriting critical records, which are unacceptable in enterprise contexts. Current approaches, such as model-based safeguards, enhance reliability but cannot guarantee system safety. Methods like information flow control (IFC) and temporal constraints aim to provide guarantees but often require extensive human annotation. We propose a process that starts with applying System-Theoretic Process Analysis (STPA) to identify hazards in agent workflows, derive safety requirements, and formalize them as enforceable specifications on data flows and tool sequences. To enable this, we introduce a capability-enhanced Model Context Protocol (MCP) framework that requires structured labels on capabilities, confidentiality, and trust level. Together, these contributions aim to shift safety from ad hoc reliability fixes to proactive guardrails with guarantees, while reducing dependence on user confirmation and making autonomy a deliberate design choice.

Omid Gheibi, Christian Kästner, and Pooyan Jamshidi. Hardness, Structural Knowledge, and Opportunity: An Analytical Framework for Modular Performance Modeling. Technical Report 2509.11000, arXiv, September 2025. [ arXiv, bib ]

Performance-influence models are beneficial for understanding how configurations affect system performance, but their creation is challenging due to the exponential growth of configuration spaces. While gray-box approaches leverage selective "structural knowledge" (like the module execution graph of the system) to improve modeling, the relationship between this knowledge, a system's characteristics (we call them "structural aspects"), and potential model improvements is not well understood. This paper addresses this gap by formally investigating how variations in structural aspects (e.g., the number of modules and options per module) and the level of structural knowledge impact the creation of "opportunities" for improved "modular performance modeling". We introduce and quantify the concept of modeling "hardness", defined as the inherent difficulty of performance modeling. Through controlled experiments with synthetic system models, we establish an "analytical matrix" to measure these concepts. Our findings show that modeling hardness is primarily driven by the number of modules and configuration options per module. More importantly, we demonstrate that both higher levels of structural knowledge and increased modeling hardness significantly enhance the opportunity for improvement. The impact of these factors varies by performance metric; for ranking accuracy (e.g., in debugging task), structural knowledge is more dominant, while for prediction accuracy (e.g., in resource management task), hardness plays a stronger role. These results provide actionable insights for system designers, guiding them to strategically allocate time and select appropriate modeling approaches based on a system's characteristics and a given task's objectives.

Hao He, Haoqin Yang, Philipp Burckhardt, Alexandros Kapravelos, Bogdan Vasilescu, and Christian Kästner. Six Million (Suspected) Fake Stars on GitHub: A Growing Spiral of Popularity Contests, Spams, and Malware. In Proceedings of the 48th International Conference on Software Engineering (ICSE), April 2026. [ arXiv, bib ]

GitHub, the de-facto platform for open-source software development, provides a set of social-media-like features to signal high-quality repositories. Among them, the star count is the most widely used popularity signal, but it is also at risk of being artificially inflated (i.e., faked), decreasing its value as a decision-making signal and posing a security risk to all GitHub users. In this paper, we present a systematic, global, and longitudinal measurement study of fake stars in GitHub. To this end, we build StarScout, a scalable tool able to detect anomalous starring behaviors across the entire GitHub metadata in the last five years. Analyzing the data collected using StarScout, we find that: (1) fake-star-related activities have rapidly surged since 2024; 2) the accounts and repositories in fake star campaigns have highly trivial activity patterns; (3) the majority of fake stars are used to promote short-lived phishing malware repositories; the remaining ones are mostly used to promote AI/LLM, blockchain, tool/application, and tutorial/demo repositories; (4) while repositories may have acquired fake stars for growth hacking, fake stars only have a promotion effect in the short term (i.e., less than two months) and become a liability in the long term. Our study has implications for platform moderators, open-source practitioners, and supply chain security researchers.

Courtney Miller, Hao He, Weigen Chen, Elizabeth Lin, Chenyang Yang, Bogdan Vasilescu, and Christian Kästner. Designing Abandabot: When Does Open Source Dependency Abandonment Matter? In Proceedings of the 48th International Conference on Software Engineering (ICSE), April 2026. [ .pdf, bib ]

Despite the inevitable risk that depending on abandoned open source dependencies poses, many developers feel a lack of resources and guidance on how to deal with this. Automated detection of abandonment is feasible, but not all abandoned dependencies impact a downstream project equally.In this paper, we perform a need-finding interview study with 22 open source maintainers to explore what makes the abandonment of certain dependencies impactful to their project, as well as their information needs and design requirements for such an automated notification tool. We find four main factors, the depth of integration, the availability of alternatives, the importance of the functionality, and external environmental pressures. Using this emerging theory, we then build an LLM-based classifier to predict the impact of a dependency's abandonment in a given context, and evaluate it with an independent user study with 124 open source maintainers. Our results show that the classifier is effective at predicting whether a dependency's abandonment would be impactful to a project, and that theory-based explanations given by the LLM are useful to developers when making judgments about the potential impactfulness of a given dependency's abandonment.

Zahra Abba Omar, Nadia Nahar, Jacob Tjaden, Inès M. Gilles, Fikir Mekonnen, Jane Hsieh, Christian Kästner, and Menon Alka. Policy alone is probably not the solution: A large-scale experiment on how developers struggle to design meaningful end-user explanations. Technical Report 2503.15512, arXiv, January 2025. [ arXiv, bib ]

Developers play a central role in determining how machine learning systems are explained in practice, yet they are rarely trained to design explanations for non-technical audiences. Despite this, transparency and explainability requirements are increasingly codified in regulation and organizational policy. It remains unclear how such policies influence developer behavior or the quality of the explanations they produce. We report results from two controlled experiments with 194 participants, typical developers without specialized training in human-centered explainable AI, who designed explanations for an ML-powered diabetic retinopathy screening tool. In the first experiment, differences in policy purpose and level of detail had little effect: policy guidance was often ignored and explanation quality remained low. In the second experiment, stronger enforcement increased formal compliance, but explanations largely remained poorly suited to medical professionals and patients. We further observed that across both experiments, developers repeatedly produced explanations that were technically flawed or difficult to interpret, framed for developers rather than end users, reliant on medical jargon, or insufficiently grounded in the clinical decision context and workflow, with developer-centric framing being the most prevalent. These findings suggest that policy and policy enforcement alone are insufficient to produce meaningful end-user explanations and that responsible AI frameworks may overestimate developers' ability to translate high-level requirements into human-centered designs without additional training, tools, or implementation support.

Jacob Tjaden. The Balancing Act of Policies in Developing Machine Learning Explanations. In Proceedings of the International Conference on Software Engineering (Companion) (ICSE-SRC), pages 237--238, New York, NY: ACM Press, 2025. ICSE student research competition, first place. [ doi, bib ]

Machine learning models are often criticized as opaque from a lack of transparency in their decision-making process. This study examines how policy design impacts the quality of explanations in ML models. We conducted a classroom experiment with 124 participants and analyzed the effects of policy length and purpose on developer compliance with policy requirements. Our results indicate that while policy length affects engagement with some requirements, policy purpose has no effect, and explanation quality is generally poor. These findings highlight the challenge of effective policy development and the importance of addressing diverse stakeholder perspectives within explanations.

Hao He, Bogdan Vasilescu, and Christian Kästner. Pinning Is Futile: You Need More Than Local Dependency Versioning to Defend Against Supply Chain Attacks. Proceedings of the ACM on Software Engineering (FSE), 2(FSE):266--289, June 2025. Distinguished Paper Award. [ .pdf, doi, http, bib ]

Recent high-profile incidents in open-source software have greatly raised practitioner attention on software supply chain attacks. To guard against potential malicious package updates, security practitioners advocate pinning dependency to specific versions rather than floating in version ranges. However, it remains controversial whether pinning carries a meaningful security benefit that outweighs the cost of maintaining outdated and possibly vulnerable dependencies. In this paper, we quantify, through counterfactual analysis and simulations, the security and maintenance impact of version constraints in the npm ecosystem. By simulating dependency resolutions over historical time points, we find that pinning direct dependencies not only (as expected) increases the cost of maintaining vulnerable and outdated dependencies, but also (surprisingly) even increases the risk of exposure to malicious package updates in larger dependency graphs due to the specifics of npm’s dependency resolution mechanism. Finally, we explore collective pinning strategies to secure the ecosystem against supply chain attacks, suggesting specific changes to npm to enable such interventions. Our study provides guidance for practitioners and tool designers to manage their supply chains more securely.

Yining Hong, Christopher Timperley, and Christian Kästner. From Hazard Identification to Control Design: Proactive and AI-Supported Safety Engineering for ML-powered Systems. In Proceedings of the International Conference on AI Engineering - Software Engineering for AI (CAIN), pages 113--118, April 2025. [ .pdf, doi, bib ]

Machine learning (ML) components are increasingly integrated into software products, yet their complexity and inherent uncertainty often lead to unintended and potentially hazardous consequences, both for individuals and society at large. Despite these risks, practitioners rarely adopt proactive approaches to anticipate and mitigate potential hazards before they occur. Traditional safety engineering approaches, such as Failure Mode and Effects Analysis (FMEA) and System Theoretic Process Analysis (STPA), offer promising frameworks for systematic early risk identification but are rarely adopted. In this position paper, we argue that hazard analysis should be an integral part of developing any ML-powered software product and that greater support is needed to make this process manageable for developers. By using large language models (LLMs) to partially automate a modified STPA process with human oversight at critical steps, we expect to address two key challenges: the heavy dependency on highly experienced safety engineering experts, and the time-consuming, labor-intensive nature of traditional hazard analysis, which often impedes its integration into real-world development workflows. We illustrate our approach with a running example, demonstrating that many seemingly unanticipated issues can, in fact, be anticipated. We conclude with a call to action for the software engineering community to adopt proactive safety engineering practices for ML-powered systems.

Haesue Baik, Chenyang Yang, Vasudev Vikram, Pooyan Jamshidi, Rohan Padhye, and Christian Kästner. Differential Performance Fuzzing of Configuration Options. In Proceedings of the International Workshop on Search-Based and Fuzz Testing (SBFT), pages 31--34, Los Alamitos, CA: IEEE Computer Society, April 2025. [ .pdf, doi, bib ]

Highly-configurable software often includes performance-sensitive configuration options. There are performance expectations across different configurations, but these expectations may not hold, due to inaccurate mental models, corner cases, or unanticipated interactions with other options. We propose differential performance fuzzing of configuration options, a fuzzing technique that uses differential performance feedback to automatically identify inputs that violate these expectations for specific configuration changes. By guiding fuzzing toward scenarios where a supposedly faster configuration performs worse, differential performance fuzzing reveals unexpected performance behavior effectively. In our preliminary evaluation, our method identified unexpected performance gains in configurations presumed slower for 4 configuration options in Closure, demonstrating the potential for detecting performance issues in real-world applications.

Chenyang Yang, Tesi Xiao, Michael Shavlovsky, Christian Kästner, and Tongshuang Wu. Orbit: A Framework for Designing and Evaluating Multi-objective Rankers. In Proceedings of the Proc. International Conference on Intelligent User Interfaces (IUI), pages 1093--1106, March 2025. [ doi, http, bib ]

Machine learning in production needs to balance multiple objectives: This is particularly evident in ranking or recommendation models, where conflicting objectives such as user engagement, satisfaction, diversity, and novelty must be considered at the same time. However, designing multi-objective rankers is inherently a dynamic wicked problem – there is no single optimal solution, and the needs evolve over time. Effective design requires collaboration between cross-functional teams and careful analysis of a wide range of information. In this work, we introduce Orbit, a conceptual framework for Objective-centric Ranker Building and Iteration. The framework places objectives at the center of the design process, to serve as boundary objects for communication and guide practitioners for design and evaluation. We implement Orbit as an interactive system, which enables stakeholders to interact with objective spaces directly and supports real-time exploration and evaluation of design trade-offs. We evaluate Orbit through a user study involving twelve industry practitioners, showing that it supports efficient design space exploration, leads to more informed decision-making, and enhances awareness of the inherent trade-offs of multiple objectives. Orbit (1) opens up new opportunities of an objective-centric design process for any multi-objective ML models, as well as (2) sheds light on future designs that push practitioners to go beyond a narrow metric-centric or example-centric mindset.

Nadia Nahar, Christian Kästner, Jenna Butler, Chris Parnin, Thomas Zimmermann, and Christian Bird. Beyond the Comfort Zone: Emerging Solutions to Overcome Challenges in Integrating LLMs into Software Products. In Proceedings of the Proc. International Conference on Software Engineering -- Software Engineering in Practice Track (ICSE-SEIP), pages 516--527, April 2025. [ arXiv, doi, bib ]

Large Language Models (LLMs) are increasingly embedded into software products across diverse industries, enhancing user experiences, but at the same time introducing numerous challenges for developers. Unique characteristics of LLMs force developers, who are accustomed to traditional software development and evaluation, out of their comfort zones as the LLM components shatter standard assumptions about software systems. This study explores the emerging solutions that software developers are adopting to navigate the encountered challenges. Leveraging a mixed-method research, including 26 interviews and a survey with 332 responses, the study identifies 19 emerging solutions regarding quality assurance that practitioners across several product teams at Microsoft are exploring. The findings provide valuable insights that can guide the development and evaluation of LLM-based products more broadly in the face of these challenges.

Yining She, Sumon Biswas, Christian Kästner, and Eunsuk Kang. FairSense: Long-Term Fairness Analysis of ML-Enabled Systems. In Proceedings of the 47th International Conference on Software Engineering (ICSE), pages 782--794, April 2025. [ .pdf, arXiv, doi, bib ]

Algorithmic fairness of machine learning (ML) models has raised significant concern in the recent years. Many testing, verification, and bias mitigation techniques have been proposed to identify and reduce fairness issues in ML models. The existing methods are model-centric and designed to detect fairness issues under static settings. However, many ML-enabled systems operate in a dynamic environment where the predictive decisions made by the system impact the environment, which in turn affects future decision-making. Such a self-reinforcing feedback loop can cause fairness violations in the long term, even if the immediate outcomes are fair. In this paper, we propose a simulation-based framework called FairSense to detect and analyze long-term unfairness in ML-enabled systems. In particular, the framework targets systems with an ML model that is trained over tabular data using supervised learning. Given a fairness requirement, FairSense performs Monte-Carlo simulation to enumerate evolution traces for each system configuration. Then, FairSense performs sensitivity analysis on the space of system parameters to understand the impact of configuration decisions on long-term fairness of the system. We demonstrate FairSense's potential utility through three real-world case studies: Loan lending, opioids risk scoring, and predictive policing.

Giacomo Benedetti, Oreofe Solarin, Courtney Miller, Greg Tystahl, William Enck, Christian Kästner, Alexandros Kapravelos, Alessio Merlo, and Luca Verderame. An Empirical Study on Reproducible Packaging in Open-Source Ecosystems. In Proceedings of the 47th International Conference on Software Engineering (ICSE), pages 1052--1063, April 2025. [ .pdf, doi, http, bib ]

The integrity of software builds is fundamental to the security of the software supply chain. While Thompson first raised the potential for attacks on build infrastructure in 1984, limited attention has been given to build integrity in the past 40 years, enabling recent attacks on SolarWinds, event-stream, and xz. The best-known defense against build system attacks is creating reproducible builds; however, achieving them can be complex for both technical and social reasons and thus is often viewed as impractical to obtain. In this paper, we analyze reproducibility of builds in a novel context: reusable components distributed as packages in six popular software ecosystems (npm, Maven, PyPI, Go, RubyGems, and Cargo). Our quantitative study on a representative sample of 4000 packages in each ecosystem raises concerns: Rates of reproducible builds vary widely between ecosystems, with some ecosystems having all packages reproducible whereas others have \issues in nearly every package. However, upon deeper investigation, we identified that with relatively straightforward infrastructure configuration and patching of build tools, we can achieve very high rates of reproducible builds in all studied ecosystems. We conclude that if the ecosystems adopt our suggestions, the build process of published packages can be independently confirmed for nearly all packages without individual developer actions, and doing so will prevent significant future software supply chain attacks.

Chenyang Yang, Yining Hong, Grace Lewis, Tongshuang Wu, and Christian Kästner. What Is Wrong with My Model? Identifying Systematic Problems with Semantic Data Slicing. In Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 306--318, Los Alamitos, CA: IEEE Computer Society, November 2024. Acceptance rate: 26 % (155/587). [ .pdf, doi, http, bib ]

Machine learning models make mistakes, yet sometimes it is difficult to identify the systematic problems behind the mistakes. Practitioners engage in various activities, including error analysis, testing, auditing, and red-teaming, to form hypotheses of what can go (or has gone) wrong with their models. To validate these hypotheses, practitioners employ data slicing to identify examples relevant to their hypotheses. However, traditional data slicing is limited by available features and programmatic slicing functions. In this work, we propose SemSlicer, a framework that supports semantic data slicing, which identifies a semantically coherent slice, without the need for existing features. SemSlicer uses Large Language Models (LLMs) to annotate datasets and generate slices from any user-defined slicing criteria. We show that SemSlicer generates accurate slices with low cost, allows flexible trade-offs between different design dimensions, reliably identifies under-performing data slices, and helps practitioners identify useful data slices that reflect systematic problems.

Menon Alka, Zahra Abba Omar, Nadia Nahar, Xenophon Papademetris, Lynn Fiellin, and Christian Kästner. Lessons from Clinical Communications for AI Systems. In Proceedings of the AAAI Conference on AI, Ethics, and Society (AIES), pages 958--970, October 2024. [ .pdf, doi, http, bib ]

One of the major challenges in the use of opaque, complex AI models is the need or desire to provide an explanation to the end-user (and other stakeholders) as to how the system arrived at the answer it did. While there is significant research in the development of explainability techniques for AI, the question remains as to who needs an explanation, what an explanation consists of, and how to communicate this to a lay user who lacks direct expertise in the area. In this position paper, an interdisciplinary team of researchers argue that the example of clinical communications offers lessons to those interested in improving the transparency and interpretability of AI systems. We identify five lessons from clinical communications: (1) offering explanations for AI systems and disclosure of their use recognizes the dignity of those using and impacted by it; (2) AI explanations can be productively targeted rather than totally comprehensive; (3) AI explanations can be enforced through codified rules but also norms, guided by core values; (4) what constitutes a “good” AI explanation will require repeated updating due to changes in technology and social expectations; 5) AI explanations will have impacts beyond defining any one AI system, shaping and being shaped by broader perceptions of AI. We review the history, debates and consequences surrounding the institutionalization of one type of clinical communication, informed consent, in order to illustrate the challenges and opportunities that may await attempts to offer explanations of opaque AI models. We highlight takeaways and implications for computer scientists and policymakers in the context of growing concerns and moves toward AI governance.

Nadia Nahar, Haoran Zhang, Grace Lewis, Shurui Zhou, and Christian Kästner. The Product Beyond the Model -- An Empirical Study of Repositories of Open-Source ML Products. In Proceedings of the 47th International Conference on Software Engineering (ICSE), pages 1540--1552, April 2025. [ .pdf, arXiv, doi, http, bib ]

Machine learning (ML) components are increasingly incorporated into software products for end-users, but developers face challenges in transitioning from ML prototypes to products. Academics have limited access to the source of commercial ML products, challenging research progress. In this study, first, we contribute a novel process to identify 262 open-source ML products among more than half a million ML-related projects on GitHub. Then, we qualitatively and quantitatively analyze 30 open-source ML products to answer six broad research questions about development practices and system architecture. We find that the majority of the ML products in our sample represent startup-style development reported in past interview studies. We report 21 findings, including limited involvement of data scientists in many ML products, unusually low modularity between ML and non-ML code, diverse architectural choices on incorporating models into products, and limited prevalence of industry best practices such as model testing, pipeline automation, and monitoring. Additionally, we discuss 7 implications of this study on research, development, and education, including the need for tools to assist teams without data scientists, education opportunities, and open-source-specific research for privacy-preserving telemetry.

Courtney Miller, Mahmoud Jahanshahi, Audris Mockus, Bogdan Vasilescu, and Christian Kästner. Understanding the Response to Open-Source Dependency Abandonment in the npm Ecosystem. In Proceedings of the 47th International Conference on Software Engineering (ICSE), pages 2355--2367, April 2025. Distinguished Paper Award. [ .pdf, doi, http, bib ]

Many developers relying on open-source digital infrastructure expect continuous maintenance, but even the most critical packages can become unmaintained. Despite this, there is little understanding of the prevalence of abandonment of widely-used packages, of subsequent exposure, and of reactions to abandonment in practice, or the factors that influence them. We perform a large-scale quantitative analysis of all widely-used npm packages and find that abandonment is common among them, that abandonment exposes many projects which often do not respond, that responses correlate with other dependency management practices, and that removal is significantly faster when a projects end-of-life status is explicitly stated. We end with recommendations to both researchers and practitioners who are facing dependency abandonment or are sunsetting projects, such as opportunities for low-effort transparency mechanisms to help exposed projects make better, more informed decisions.

Nadia Nahar, Jenny Rowlett, Matthew Bray, Zahra Abba Omar, Xenophon Papademetris, Menon Alka, and Christian Kästner. Regulating Explainability in Machine Learning Applications -- Observations from a Policy Design Experiment. In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAccT), pages 2101--2112, June 2024. [ .pdf, doi, bib ]

With the rise of artificial intelligence (AI), concerns about AI applications causing unforeseen harms to safety, privacy, security, and fairness are intensifying. While attempts to create regulations are underway, with initiatives such as the EU AI Act and the 2023 White House executive order, skepticism abounds as to the efficacy of such regulations. This paper explores an interdisciplinary approach to designing policy for the explainability of AI-based products, as the widely discussed "right to explanation" in the EU General Data Protection Regulation is ambiguous. To develop practical guidance for explainability, we conducted an experimental study that involved continuous collaboration among a team of researchers with AI and policy backgrounds over the course of ten weeks. The objective was to determine whether, through interdisciplinary effort, we can reach consensus on a policy for explainability in AI—one that is clearer, and more actionable and enforceable than current provisions. We share nine observations, derived from an iterative policy design process, which included drafting the policy, attempting to comply with it (or circumvent it), and collectively evaluating its effectiveness on a weekly basis. The observations include: iterative and continuous feedback was useful to improve policy drafts over time, discussing what evidence would satisfy policy was necessary during policy design, and human-subject studies were found to be necessary evidence to ensure effectiveness. We conclude with a note of optimism, arguing that meaningful policies can be achieved within a moderate time frame and with limited experience in policy design, as demonstrated by our student researchers on the team. This holds promising implications for policymakers, signaling that practical and effective regulation for AI applications is attainable.

Emily Nguyen. Do All Software Projects Die When Not Maintained? Analyzing Developer Maintenance to Predict OSS Usage. In Proceedings of the Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (FSE-SRC), pages 2195--2197, New York, NY: ACM Press, 2023. FSE student research competition. [ doi, bib ]

Past research suggests software should be continuously maintained in order to remain useful in our digital society. To determine whether these studies on software evolution are supported in modern-day software libraries, we conduct a natural experiment on 26,050 GitHub repositories, statistically modeling library usage based on their package-level downloads against different factors related to project maintenance.

Katrina Wilson. Clearing the Trail: Motivations for Maintenance Work in Open Source. In Proceedings of the International Conference on Systems, Programming, Languages, and Applications: Software for Humanity (Companion) (SPLASH-SRC), pages 34--36, New York, NY: ACM Press, 2023. SPLASH student research competition. [ doi, bib ]

Introducing new maintainers to established projects is critical to the long-term sustainability of open-source projects. Yet, we have little understanding of what motivates developers to join and maintain already established projects. Previous research on volunteering motivations emphasizes that individuals are motivated by a unique set of factors to volunteer in a specific area, suggesting that the motivations behind open-source contributions also depend on the nature of the work. We aim to determine correlations between types of open-source contributions and their specific motivators through surveys of open-source contributors.

Wanqin Ma, Chenyang Yang, and Christian Kästner. (Why) Is My Prompt Getting Worse? Rethinking Regression Testing for Evolving LLM APIs. In Proceedings of the International Conference on AI Engineering - Software Engineering for AI (CAIN), pages 166--171, April 2024. [ .pdf, doi, bib ]

Large Language Models (LLMs) are increasingly integrated into software applications. Downstream application developers often access LLMs through APIs provided as a service. However, LLM APIs are often updated silently and scheduled to be deprecated, forcing users to continuously adapt to evolving models. This can cause performance regression and affect prompt design choices, as evidenced by our case study on toxicity detection. Based on our case study, we emphasize the need for and re-examine the concept of regression testing for evolving LLM APIs. We argue that regression testing LLMs requires fundamental changes to traditional testing approaches, due to different correctness notions, prompting brittleness, and non-determinism in LLM APIs.

Lina Boughton, Courtney Miller, Yasemin Acar, Dominik Wermke, and Christian Kästner. Decomposing and Measuring Trust in Open-Source Software Supply Chains. In Proceedings of the Proc. International Conference on Software Engineering -- New Ideas Track (ICSE-NIER), pages 57--61, April 2024. [ .pdf, doi, bib ]

Trust is integral for the successful and secure functioning of software supply chains, making it important to measure the state and evolution of trust in open source communities. However, existing security and supply chain research often studies the concept of trust without a clear definition and relies on obvious and easily available signals like GitHub stars without deeper grounding. In this paper, we explore how to measure trust in open source supply chains with the goal of developing robust measures for trust based on the behaviors of developers in the community. To this end, we contribute a process for decomposing trust in a complex large-scale system into key trust relationships, systematically identifying behavior-based indicators for the components of trust for a given relationship, and in turn operationalizing data-driven metrics for those indicators, allowing for the wide-scale measurement of trust in practice.

Chenyang Yang, Rishabh Rustogi, Rachel A Brower-Sinning, Grace Lewis, Christian Kästner, and Tongshuang Wu. Beyond Testers’ Biases: Guiding Model Testing with Knowledge Bases using LLMs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing -- Findings (EMNLP), pages 13504--13519, December 2023. [ .pdf, arXiv, doi, bib ]

Current model testing work has mostly focused on creating test cases. Identifying what to test is a step that is largely ignored and poorly supported. We propose Weaver, an interactive tool that supports requirements elicitation for guiding model testing. Weaver uses large language models to generate knowledge bases and recommends concepts from them interactively, allowing testers to elicit requirements for further testing. Weaver provides rich external knowledge to testers and encourages testers to systematically explore diverse concepts beyond their own biases. In a user study, we show that both NLP experts and non-experts identified more, as well as more diverse concepts worth testing when using Weaver. Collectively, they found more than 200 failing test cases for stance detection with zero-shot ChatGPT. Our case studies further show that Weaver can help practitioners test models in real-world settings, where developers define more nuanced application scenarios (e.g., code understanding and transcript summarization) using LLMs.

Courtney Miller, Christian Kästner, and Bogdan Vasilescu. "We Feel Like We're Winging It:" A Study on Navigating Open-Source Dependency Abandonment. In Proceedings of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 1281--1293, New York, NY: ACM Press, December 2023. [ .pdf, doi, bib ]

While lots of research has explored how to prevent maintainers from abandoning the open-source projects that serve as our digital infrastructure, there are very few insights on addressing abandonment when it occurs. We argue open-source sustainability research must expand its focus beyond trying to keep particular projects alive, to also cover the sustainable use of open source by supporting users when they face potential or actual abandonment. We perform an interview study with 33 developers who have experienced open-source dependency abandonment and analyze the data using iterative thematic analysis. Often, multiple strategies were used to cope with abandonment, for example, first reaching out to the community to find potential alternatives, then switching to a community-accepted alternative if one exists. We found many developers felt they had little to no support or guidance when facing abandonment, leaving them to figure out what to do through a trial-and-error process on their own. Abandonment introduces cost for otherwise seemingly free dependencies, but users can decide whether and how to prepare for abandonment through a number of different strategies, such as dependency monitoring, building abstraction layers, and community involvement. In many cases, community members can invest in resources that help others facing the same abandoned dependency, but often do not because of the many other competing demands on their time – in a form of the volunteer's dilemma. We discuss cost reduction strategies and ideas to overcome this volunteers dilemma. Our findings can be used directly by open-source users seeking resources on dealing with dependency abandonment, or by researchers to motivate future work supporting the sustainable use of open source.

Nadia Nahar, Haoran Zhang, Grace Lewis, Shurui Zhou, and Christian Kästner. A Meta-Summary of Challenges in Building Products with ML Components – Collecting Experiences from 4758+ Practitioners. In Proceedings of the International Conference on AI Engineering - Software Engineering for AI (CAIN), pages 171--183, May 2023. [ .pdf, arXiv, doi, bib ]

Incorporating machine learning (ML) components into software products raises new software-engineering challenges and elevates already existing challenges. Many researchers have invested significant effort into understanding the challenges of industry practitioners working on building products with ML components through interviews and surveys with practitioners. With the intention to aggregate and present their collective findings, we conduct a meta-summary study: We collect 50 relevant papers that together interacted with over 4758 practitioners using guidelines for systematic literature reviews and subsequently group and organize the over 500 mentions of challenges within those papers. We highlight the most commonly reported challenges and how this meta-summary will be a useful resource for the research community to prioritize research and education in this field.

Avinash Bhat, Austin Coursey, Grace Hu, Sixian Li, Nadia Nahar, Shurui Zhou, Christian Kästner, and Jin L.C. Guo. Aspirations and Practice of ML Model Documentation: Moving the Needle with Nudging and Traceability. In Proceedings of the ACM CHI Conference on Human Factors in Computing Systems (CHI), Article No.: 749, April 2023. [ .pdf, arXiv, doi, bib ]

The documentation practice for machine-learned (ML) models often falls short of established practices for traditional software, which impedes model accountability and inadvertently abets inappropriate or misuse of models. Recently, model cards, a proposal for model documentation, have attracted notable attention, but their impact on the actual practice is unclear. In this work, we systematically study the model documentation in the field and investigate how to encourage more responsible and accountable documentation practice. Our analysis of publicly available model cards reveals a substantial gap between the proposal and the practice. We then designed a tool named DocML aiming to (1) nudge the data scientists to comply with the model cards proposal during the model development, especially the sections related to ethics, and (2) assess and manage the documentation quality. A lab study reveals the benefit of our tool towards long-term documentation quality and accountability.

Chenyang Yang, Rachel A Brower-Sinning, Grace Lewis, Christian Kästner, and Tongshuang Wu. Capabilities for Better ML Engineering. In Proceedings of the AAAI-23 Workshop on Artificial Intelligence Safety (SafeAI), pages 1--8, February 2023. [ .pdf, http, bib ]

In spite of machine learning’s rapid growth, its engineering support is scattered in many forms, and tends to favor certain engineering stages, stakeholders, and evaluation preferences. We envision a capability-based framework, which uses fine-grained specifications for ML model behaviors to unite existing efforts towards better ML engineering. We use concrete scenarios (model design, debugging, and maintenance) to articulate capabilities’ broad applications across various different dimensions, and their impact on building safer, more generalizable and more trustworthy models that reflect human needs. Through preliminary experiments, we show the potential of capabilities for reflecting model generalizability, which can provide guidance for the ML engineering process. We discuss challenges and opportunities for the integration of capabilities into ML engineering.

Katherine R. Maffey, Kyle Dotterrer, Jennifer Niemann, Iain Cruickshank, Grace Lewis, and Christian Kästner. MLTEing Models: Negotiating, Evaluating, and Documenting Model and System Qualities. In Proceedings of the Proc. International Conference on Software Engineering -- New Ideas Track (ICSE-NIER), pages 31--36, May 2023. [ .pdf, doi, bib ]

Many organizations seek to ensure that machine learning (ML) and artificial intelligence (AI) systems work as intended in production but currently do not have a cohesive methodology in place to do so. To fill this gap, we propose MLTE (Machine Learning Test and Evaluation, colloquially referred to as “melt”), a framework and implementation to evaluate ML models and systems. The framework compiles state-of-the-art evaluation techniques into an organizational process for interdisciplinary teams, including model developers, software engineers, system owners, and other stakeholders. The MLTE tooling supports this process by providing a domain-specific language that teams can use to express model requirements, an infrastructure to define, generate, and collect ML evaluation metrics, and the means to communicate results.

Chenyang Yang, Rachel A Brower-Sinning, Grace Lewis, and Christian Kästner. Data Leakage in Notebooks: Static Detection and Better Processes. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (ASE), Article No.: 30, New York, NY: ACM Press, October 2022. Acceptance rate: 22 % (116/527). [ .pdf, arXiv, doi, bib ]

Data science pipelines to train and evaluate models with machine learning may contain bugs just like any other code. Leakage between training and test data can lead to overestimating the model's accuracy during offline evaluations, possibly leading to deployment of low-quality models in production. Such leakage can happen easily by mistake or by following poor practices but may be tedious and challenging to detect manually. We develop a static analysis approach to detect common forms of data leakage in data science code. Our evaluation shows that our analysis accurately detects data leakage and that such leakage is pervasive among over 100,000 analyzed public notebooks. We discuss how our static analysis approach can help both practitioners and educators, and how leakage prevention can be designed into the development process.

Yuan Jiang, Christian Kästner, and Shurui Zhou. Elevating Jupyter Notebook Maintenance Tooling by Identifying and Extracting Notebook Structures. In Proceedings of the 38th International Conference on Software Maintenance and Evolution (ICSME), pages 399--403, October 2022. Acceptance rate: 44 % (17/39). [ .pdf, doi, bib ]

Data analysis is an exploratory, interactive, and often collaborative process. Computational notebooks have become a popular tool to support this process, among others because of their ability to interleave code, narrative text, and results. However, notebooks in practice are often criticized as hard to maintain and being of low code quality, including problems such as unused or duplicated code and out-of-order code execution. Data scientists can benefit from better tool support when maintaining and evolving notebooks. We argue that central to such tool support is identifying the structure of notebooks. We present a lightweight and accurate approach to extract notebook structure and outline several ways such structure can be used to improve maintenance tooling for notebooks, including navigation and finding alternatives.

Christian Kästner, Eunsuk Kang, and Sven Apel. Feature Interactions on Steroids: On the Composition of ML Models. IEEE Software (IEEE-Sw), 39(3):120--124, May 2022. [ .pdf, doi, bib ]

One of the key differences between traditional software engineering and machine learning (ML) is the lack of specifications for ML models. Traditionally, specifications provide a cornerstone for compositional reasoning and for the divide-and-conquer strategy of how we build large and complex systems from components, but these are hard to come by for machine learned components. While the lack of specification seems like a fundamental new problem at first sight, in fact, software engineers routinely deal with iffy specifications in practice. We face weak specifications, wrong specifications, and unanticipated interactions among specifications. ML may push us further, but the problems are not fundamentally new. Rethinking ML model composition from the perspective of the feature-interaction problem highlights the importance of software design.

Kimberly Truong, Courtney Miller, Bogdan Vasilescu, and Christian Kästner. The Unsolvable Problem or the Unheard Answer? A Dataset of 24,669 Open-Source Software Conference Talks. In Proceedings of the 20th International Conference on Mining Software Repositories (MSR), pages 348--352, New York, NY: ACM Press, May 2022. [ .pdf, doi, bib ]

Talks at practitioner-focused open-source software conferences are a valuable source of information for software engineering researchers. They provide a pulse of the community and are valuable source material for grey literature analysis. We curated a dataset of 24,669 talks from 87 open-source conferences between 2010 and 2021. We stored all relevant metadata from these conferences and provide scripts to collect the transcripts. We believe this data is useful for answering many kinds of questions, such as: What are the important/highly discussed topics within practitioner communities? How do practitioners interact? And how do they present themselves to the public? We demonstrate the usefulness of this data by reporting our findings from two small studies: a topic model analysis providing an overview of open-source community dynamics since 2011 and a qualitative analysis of a smaller community-oriented sample within our dataset to gain a better understanding of why contributors leave open-source software.

Philip Gray. To Disengage or Not to Disengage: A Look at Contributor Disengagement in Open Source Software. In Proceedings of the International Conference on Software Engineering (Companion) (ICSE-SRC), pages 328--330, New York, NY: ACM Press, 2022. ICSE student research competition. [ .pdf, doi, bib ]

Contributors are vital to the sustainability of open source ecosys- tems, and disengagement threatens that sustainability. We seek to protect and strengthen open source communities through a better and more robust way of defining and identifying contributor dis- engagement in open source communities. To do this we, collected a large amount of gray literature on contributor disengagement, and performed a qualitative analysis to better our understanding of why contributors disengage.

Kimberly Truong. Let’s Talk Open-Source — An Analysis of Conference Talks and Community Dynamics. In Proceedings of the International Conference on Software Engineering (Companion) (ICSE-SRC), pages 322--324, New York, NY: ACM Press, 2022. ICSE student research competition, first place. [ .pdf, doi, bib ]

Open-source software has integrated itself into our daily lives, impacting 78% of US companies in 2015. Past studies of open-source community dynamics have found motivations behind contributions and the significance of community engagement, but there are still many aspects not well understood. There’s a direct correlation between the success of an open-source project and the social interactions within its community. Most projects depend on a small group. A study by Avelino et al. on the 133 most popular GitHub projects found that 86% will fail if one or two of its core contributors leave. To sustain open-source, we need to better understand how contributors interact, what infor- mation is shared, and what concerns practitioners have. We study common topics, how these have changed over time (2011 - 2021), and what social issues have appeared within open-source commu- nities. Our research is guided by the following questions: (1) How is open-source changing/evolving? (2) What changes do practitioners believe are necessary for open-source to be sustainable? ...

Huilian Sophie Qiu, Bogdan Vasilescu, Christian Kästner, Carolyn Egelman, Ciera Jaspan, and Emerson Murphy-Hill. Detecting Interpersonal Conflict in Issues and Code Review: Cross Pollinating Open- and Closed-Source Approaches. In Proceedings of the Proc. International Conference on Software Engineering -- Software Engineering in Society Track (ICSE-SEIS), pages 41--55, New York, NY: ACM Press, May 2022. Acceptance rate: 44 % (17/39). [ .pdf, doi, video, bib ]

Interpersonal conflict in code review, such as toxic language or an unnecessary pushback, is associated with negative outcomes such as stress and turnover. Automatic detection is one approach to prevent and mitigate interpersonal conflict. Two recent automatic detection approaches were developed in different settings: a toxicity detector using text analytics for open source issue discussions and a pushback detector using logs-based metrics for corporate code reviews. This paper tests how the toxicity detector and the pushback detector can be generalized beyond their respective contexts and discussion types, and how the combination of the two can help improve interpersonal conflict detection. The results reveal connections between the two concepts.

Nadia Nahar, Shurui Zhou, Grace Lewis, and Christian Kästner. Collaboration Challenges in Building ML-Enabled Systems: Communication, Documentation, Engineering, and Process. In Proceedings of the 44th International Conference on Software Engineering (ICSE), pages 413--425, New York, NY: ACM Press, May 2022. Acceptance rate: 26 % (197/751). Distinguished Paper Award. [ .pdf, arXiv, doi, video, bib ]

The introduction of machine learning (ML) components in software projects has created the need for software engineers to collaborate with data scientists and other specialists. While collaboration can always be challenging, ML introduces additional challenges with its exploratory model development process, additional skills and knowledge needed, difficulties testing ML systems, need for continuous evolution and monitoring, and non-traditional quality requirements such as fairness and explainability. Through interviews with 45 practitioners from 28 organizations, we identified key collaboration challenges that teams face when building and deploying ML systems into production. We report on common collaboration points in the development of production ML systems for requirements, data, and integration, as well as corresponding team patterns and challenges. We find that most of these challenges center around communication, documentation, engineering, and process and collect recommendations to address these challenges

Courtney Miller, Sophie Cohen, Daniel Klug, Bogdan Vasilescu, and Christian Kästner. "Did You Miss My Comment or What?" Understanding Toxicity in Open Source Discussions. In Proceedings of the 44th International Conference on Software Engineering (ICSE), pages 710--722, New York, NY: ACM Press, May 2022. Acceptance rate: 26 % (197/751). Distinguished Paper Award. [ .pdf, doi, video, bib ]

Online toxicity is ubiquitous across the internet and its negative impact on the people and online communities it effects has been well documented. However, toxicity manifests differently on various platforms and toxicity in open source communities, while frequently discussed, is not well understood. We take a first stride at understanding the characteristics of open source toxicity to better inform future work designing effective intervention and detection methods. To this end, we curate a sample of 100 toxic GitHub issue discussions combining multiple search and sampling strategies. We then qualitatively analyze the sample to gain an understanding of the characteristics of open-source toxicity. We find that the prevalent forms of toxicity in open source differ from those observed on other platforms like Reddit or Wikipedia. We find some of the most prevalent forms of toxicity in open source are entitled, demanding, and arrogant comments from project users and insults arising from technical disagreements. In addition, not all toxicity was written by people external to the projects, project members were also common authors of toxicity. We also provide in-depth discussions about the implications of our findings including patterns that may be useful for detection work and subsequent questions for future work.

Miguel Velez, Pooyan Jamshidi, Norbert Siegmund, Sven Apel, and Christian Kästner. On Debugging the Performance of Configurable Software Systems: Developer Needs and Tailored Tool Support. In Proceedings of the 44th International Conference on Software Engineering (ICSE), pages 1571--1583, New York, NY: ACM Press, May 2022. Acceptance rate: 26 % (197/751). [ .pdf, arXiv, doi, video, bib ]

Determining whether a configurable software system has a performance bug or the system was misconfigured is often challenging. While there are numerous debugging techniques that can support developers in this task, there is limited empirical evidence of how useful the techniques are to address the actual needs that developers have when debugging the performance of configurable systems; most techniques are often evaluated in terms of technical accuracy instead of their usability. In this paper, we take a human-centered approach to identify, design, implement, and evaluate a solution to support developers in the process of debugging the performance of configurable software systems. We first conduct an exploratory study with 19 developers to identify the information needs that developers have during this process. Subsequently, we design and implement a tailored tool, building on relevant information provided by Global and Local performance-influence models, CPU profiling, and program slicing, to support those needs. Two user studies, with a total of 20 developers, validate and confirm that the information that we provide help developers debug the performance of configurable software systems.

Helen Dong, Shurui Zhou, Jin L.C. Guo, and Christian Kästner. Splitting, Renaming, Removing: A Study of Common Cleaning Activities in Jupyter Notebooks. In Proceedings of the 9tn International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE), pages 114--119, Los Alamitos, CA: IEEE Computer Society, November 2021. [ .pdf, doi, bib ]

Data scientists commonly use computational notebooks because they provide a good environment for testing multiple models. However, once the scientist completes the code and finds the ideal model, he or she will have to dedicate time to clean up the code in order for others to easily understand it. In this paper, we perform a qualitative study on how scientists clean their code in hopes of being able to suggest a tool to automate this process. Our end goal is for tool builders to address possible gaps and provide additional aid to data scientists, who then can focus more on their actual work rather than the routine and tedious cleaning work. By sampling notebooks from GitHub and analyzing changes between subsequent commits, we identified common cleaning activities, such as changes to markdown (e.g., adding headers sections or descriptions) or comments (both deleting dead code and adding descriptions) as well as reordering cells. We also find that common cleaning activities differ depending on the intended purpose of the notebook. Our results provide a valuable foundation for tool builders and notebook users, as many identified cleaning activities could benefit from codification of best practices and dedicated tool support, possibly tailored depending on intended use.

Sophie Cohen. Contextualizing Toxicity in Open Source: A Qualitative Study. In Proceedings of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE-SRC), pages 1669--1671, New York, NY: ACM Press, 2021. ESEC/FSE Student Research Competition. [ .pdf, doi, http, bib ]

In this paper, we study toxic online interactions in issue discussions of open-source communities. Our goal is to qualitatively understand how toxicity impacts an open-source community like GitHub. We are driven by users complaining about toxicity, which leads to burnout and disengagement from the site. We collect a substantial sample of toxic interactions and qualitatively analyze their characteristics to ground future discussions and intervention design.

Helen Dong. A Qualitative Study of Cleaning in Jupyter Notebooks. In Proceedings of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE-SRC), pages 1663--1665, New York, NY: ACM Press, 2021. ESEC/FSE Student Research Competition. [ .pdf, doi, http, bib ]

Data scientists commonly use computational notebooks because they provide a good environment for testing multiple models. However, once the scientist completes the code and finds the ideal model, the data scientist will have to dedicate time to clean up the code in order for others to understand it. In this paper, we perform a qualitative study on how scientists clean their code in hopes of being able to suggest a tool to automate this process. Our end goal is for tool builders to address possible gaps and provide additional aid to data scientists, who can then focus more on their actual work rather than the routine and tedious cleaning duties.

Chenyang Yang, Shurui Zhou, Jin L.C. Guo, and Christian Kästner. Subtle Bugs Everywhere: Generating Documentation for Data Wrangling Code. In Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 304--316, Los Alamitos, CA: IEEE Computer Society, November 2021. Acceptance rate: 27 % (120/440). [ .pdf, doi, bib ]

Data scientists reportedly spend a significant amountof their time in their daily routines on data wrangling, i.e., cleaning data and extracting features. However, data wrangling code is often repetitive and error-prone to write. Moreover, itis easy to introduce subtle bugs when reusing and adopting existing code, which result not in crashes but reduce model quality. To support data scientists with data wrangling, we present a technique to generate interactive documentation for data wrangling code. We use (1) program synthesis techniques to automatically summarize data transformations and (2) test case selection techniques to purposefully select representative examples from the data based on execution information collected with tailored dynamic program analysis. We demonstrate that a JupyterLab extension with our technique can provide documentation for many cells in popular notebooks and find in a user study that users with our plugin are faster and more effective at finding realistic bugs in data wrangling code.

Chu-Pan Wong, Priscila Santiesteban, Christian Kästner, and Claire Le Goues. VarFix: Balancing Edit Expressiveness and Search Effectiveness in Automated Program Repair. In Proceedings of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 354--366, New York, NY: ACM Press, August 2021. Acceptance rate: 24 % (97/396). [ .pdf, doi, bib ]

Automatically repairing a bugging program is essentially a search problem, searching for code transformations that pass a set of tests. Various search strategies have been explored, but they either navigate the search space in an ad hoc way using heuristics, or systemically but at the cost of limited edit expressiveness in the kinds of supported program edits. In this work, we explore the possibility of systematically navigating the search space without sacrificing edit expressiveness. The key enabler of this exploration is variational execution, a dynamic analysis technique that has been shown to be effective at exploring many similar executions in large search spaces. We evaluate our approach on IntroClassJava and Defects4J, showing that a systematic search is effective at leveraging and combining fixing ingredients to find patches, including many high-quality patches and multi-edit patches.

Bo Shen, Wei Zhang, Christian Kästner, Haiyan Zhao, Zhao Wei, Guangtai Liang, and Zhi Jin. SmartCommit: A Graph-based Interactive Assistant for Activity-Oriented Commits. In Proceedings of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 379--390, New York, NY: ACM Press, August 2021. Acceptance rate: 24 % (97/396). [ .pdf, doi, bib ]

In collaborative software development, it is considered to be a best practice to submit code changes as a sequence of cohesive commits, each of which records the work result of a specific development activity, such as adding a new feature, bug fixing, and refactoring. However, rather than following this best practice, developers often submit a set of loosely-related changes serving for different development activities as a composite commit, due to the tedious manual work and lack of effective tool support to decompose such a tangled changeset. Composite commits often obfuscate the change history of software artifacts and bring challenges to efficient collaboration among developers. To encourage activity-oriented commits, we propose SmartCommit, a graph-partitioning-based interactive approach to tangled changeset decomposition that leverages not only the efficiency of algorithms but also the knowledge of developers. To evaluate the effectiveness of our approach, we (1) deployed SmartCommit in an international IT company, and analyzed usage data collected from a field study with 83 engineers over 9 months; and (2) conducted a controlled experiment on 3,000 synthetic composite commits from 10 diverse open-source projects. Results show that SmartCommit achieves a median accuracy between 71%–83% when decomposing composite commits without developer involvement, and significantly helps developers follow the best practice of submitting activity-oriented commits with acceptable interaction effort and time cost in real collaborative software development.

Christian Kästner, Eunsuk Kang, and Sven Apel. Feature Interactions on Steroids: On the Composition of ML Models. Technical Report 2105.06449, arXiv, May 2021. [ .pdf, arXiv, bib ]

The lack of specifications is a key difference between traditional software engineering and machine learning. We discuss how it drastically impacts how we think about divide-and-conquer approaches to system design, and how it impacts reuse, testing and debugging activities. Traditionally, specifications provide a cornerstone for compositional reasoning and for the divide-and-conquer strategy of how we build large and complex systems from components, but those are hard to come by for machine-learned components. While the lack of specification seems like a fundamental new problem at first sight, in fact software engineers routinely deal with iffy specifications in practice: we face weak specifications, wrong specifications, and unanticipated interactions among components and their specifications. Machine learning may push us further, but the problems are not fundamentally new. Rethinking machine-learning model composition from the perspective of the feature interaction problem, we may even teach us a thing or two on how to move forward, including the importance of integration testing, of requirements engineering, and of design.

Christopher Bogart, Christian Kästner, James Herbsleb, and Ferdian Thung. When and how to make breaking changes: Policies and practices in 18 open source software ecosystems. ACM Transactions on Software Engineering and Methodology (TOSEM), 30(4):Article No.: 42, pp 1--56, October 2021. [ .pdf, http, bib ]

Open source software projects often rely on package management systems that help projects discover, incorporate, and maintain dependencies on other packages, maintained by other people. Such systems save a great deal of effort over adhoc ways of advertising, packaging, and transmitting useful libraries, but coordination among project teams is still needed when one package makes a breaking change affecting other packages. Ecosystems differ in their approaches to breaking changes, and there is no general theory to explain the relationships between features, behavioral norms, ecosystem outcomes, and motivating values. We address this through two empirical studies. In an interview case study we contrast Eclipse, NPM, and CRAN, demonstrating that these different norms for coordination of breaking changes shift the costs of using and maintaining the software among stakeholders, appropriate to each ecosystem’s mission. In a second study, we combine a survey, repository mining, and document analysis to broaden and systematize these observations across 18 ecosystems. We find that all ecosystems share values such as stability and compatibility, but differ in other values. Ecosystems’ practices often support their espoused values, but in surprisingly diverse ways. The data provides counterevidence against easy generalizations about why ecosystem communities do what they do.

Miguel Velez, Pooyan Jamshidi, Norbert Siegmund, Sven Apel, and Christian Kästner. White-Box Analysis over Machine Learning: Modeling Performance of Configurable Systems. In Proceedings of the 43rd International Conference on Software Engineering (ICSE), pages 1072--1084, Los Alamitos, CA: IEEE Computer Society, May 2021. Acceptance rate: 23 % (138/602). [ .pdf, arXiv, doi, bib ]

Performance-influence models can help stakeholders understand how and where configuration options and their interactions influence the performance of a system. With this understanding, stakeholders can debug performance and make deliberate configuration decisions. Current black-box techniques to build such models combine various sampling and learning strategies, resulting in trade offs between measurement effort, accuracy, and interpretability. We present Comprex, a white-box approach to build performance-influence models for configurable systems, combining insights of local measurements, dynamic taint analysis to track options in the implementation, compositionality, and compression of the configuration space, without using machine learning to extrapolate incomplete samples. Our evaluation on 4 widely-used open-source projects demonstrates that Comprex builds similarly accurate performance-influence models to the most accurate and expensive black-box approach, but at a reduced cost and with additional benefits from interpretable and local models.

Gabriel Ferreira, Limin Jia, Joshua Sunshine, and Christian Kästner. Containing Malicious Package Updates in npm with a Lightweight Permission System. In Proceedings of the 43rd International Conference on Software Engineering (ICSE), pages 1334--1346, Los Alamitos, CA: IEEE Computer Society, May 2021. Acceptance rate: 23 % (138/602). [ .pdf, doi, bib ]

The large amount of third-party packages available in fast-moving software ecosystems, such as the Node.js/npm, enables attackers to compromise applications by pushing malicious updates to their package dependencies. Studying the npm repository, we observed that many packages perform only simple computations and do not need access to filesystem or network APIs. This offers the opportunity to enforce least-privilege design per package, protecting them from malicious updates. We discuss the design space and propose a lightweight permission system that protects Node.js/npm applications by enforcing package permissions at runtime. Our system makes a large number of packages much harder to be exploited, almost for free.

João P. Diniz, Chu-Pan Wong, Christian Kästner, and Eduardo Figueiredo. Dissecting Strongly Subsuming Second-Order Mutants. In Proceedings of the International Conference on Software Testing, Verification and Validation (ICST), pages 171--181, Los Alamitos, CA: IEEE Computer Society, April 2021. [ doi, bib ]

Mutation testing is a fault-based technique commonly used to evaluate the quality of test suites in software systems. It consists of introducing syntactical changes, called mutations, into source code and checking whether the test cases distinguish them. Since there are dozens of distinct mutation types, one of the most challenging problems is the high computational effort required to test the whole test suite against each mutant. Since mutation testing is proposed, researchers have presented techniques aiming at effort reduction in the phases of its process. This study focuses on the potential reduction in the number of mutants provided by a special set of mutants generated by the introduction of two syntactical changes (strongly subsuming second-order mutants). In this work, we exhaustively searched for those second-order mutants Our results show that they (i) are frequently generated by the "expression removal" mutation, (ii) are likely to be killed by the same test cases that kill their constituent mutants, and (iii) have the potential to reduce the number of mutants to be executed by about 22%.

Larissa Rocha Soares, Ivan Machado, Eduardo Santana de Almeida, Christian Kästner, and Sarah Nadi. A Semi-Automated Iterative Process for Detecting Feature Interactions. In Proceedings of the 34th Brazilian Symposium on Software Engineering (SBES), pages 778--787, October 2020. [ doi, bib ]

For configurable systems, features developed and tested separately may present a different behavior when combined in a system. Since software products might be composed of thousands of features, developers should guarantee that all valid combinations work properly. However, features can interact in undesired ways, resulting in failures. A feature interaction is an unpredictable behavior that cannot be easily deduced from the individual features involved. We proposed VarXplorer to inspect feature interactions as they are detected and incrementally classify them as benign or problematic. Our approach provides an iterative analysis of feature interactions allowing developers to focus on suspicious cases. In this paper, we present an experimental study to evaluate our iterative process of tests execution. We aim to understand how VarXplorer could be used for a faster and more objective feature interaction analysis. Our results show that VarXplorer may reduce up to 50% the amount of interactions a developer needs to check during the testing process.

Flávio Medeiros, Márcio Ribeiro, Rohit Gheyi, Larissa Braz, Christian Kästner, Sven Apel, and Kleber Santos. An Empirical Study on Configuration-Related Code Weaknesses. In Proceedings of the 34th Brazilian Symposium on Software Engineering (SBES), pages 193--202, October 2020. [ doi, bib ]

Developers often use the C preprocessor to handle variability and portability. However, many researchers and practitioners criticize the use of preprocessor directives because of their negative effect on code understanding, maintainability, and error proneness. This negative effect may lead to configuration-related code weaknesses, which appear only when we enable or disable certain configuration options. A weakness is a type of mistake in software that, in proper conditions, could contribute to the introduction of vulnerabilities within that software. Configuration-related code weaknesses may be harder to detect and fix than weaknesses that appear in all configurations, because variability increases complexity. To address this problem, we propose a sampling-based white-box technique to detect configuration-related weaknesses in configurable systems. To evaluate our technique, we performed an empirical study with 24 popular highly configurable systems that make heavy use of the C preprocessor, such as Apache Httpd and Libssh. Using our technique, we detected 57 configuration-related weaknesses in 16 systems. In total, we found occurrences of the following five kinds of weaknesses: 30 memory leaks, 10 uninitialized variables, 9 null pointer dereferences, 6 resource leaks, and 2 buffer overflows. The corpus of these weaknesses is a valuable source to better support further research on configuration-related code weaknesses.

Miguel Velez, Pooyan Jamshidi, Florian Sattler, Norbert Siegmund, Sven Apel, and Christian Kästner. ConfigCrusher: Towards White-Box Performance Analysis for Configurable Systems. Automated Software Engineering -- An International Journal (AUSE), 27:265--300, 2020. [ .pdf, arXiv, doi, bib ]

In configurable software systems, stakeholders are often interested in knowing how configuration options influence the performance of a system to facilitate, for example, the debugging and optimization processes of these systems. There are several black-box approaches to obtain this information, but they either sample the system end-to-end with a large number of configurations to make accurate predictions or miss important performance-influencing interactions when sampling few configurations. In addition, these approaches cannot pinpoint the parts of a program that are responsible for performance differences among configurations. This paper proposes ConfigCrusher, a white-box performance analysis that analyzes the implementation of a system to guide the performance analysis and exploits several insights about configurable systems in the process. ConfigCrusher employs a static data-flow analysis to identify how configuration options may influence control-flow decisions and instruments code regions corresponding to these decisions to dynamically analyze the influence of configuration options on the regions' performance. Our evaluation shows the feasibility of our white-box approach to more efficiently build performance models that are similar to or more accurate than current state-of-the-art approaches on 10 configurable systems. Overall, we showcase the benefits of white-box performance analyses and their potential to outperform black-box approaches and provide additional information for analyzing configurable systems.

Hemank Lamba, Asher Trockman, Daniel Armanios, Christian Kästner, Heather Miller, and Bogdan Vasilescu. Heard it Through the Gitvine: An Empirical Study of Tool Diffusion Across the npm Ecosystem. In Proceedings of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 505--517, New York, NY: ACM Press, November 2020. Acceptance rate: 28 % (101/360). [ .pdf, doi, teaser, video, bib ]

Automation tools have become essential in contemporary software development. Tools like continuous integration services, code coverage reporters, style checkers, dependency managers, etc. are all known to provide significant improvements in developer productivity and software quality. Some of these tools are widespread, others are not. How do these automation "best practices" spread? And how might we facilitate the diffusion process for those that have seen slower adoption? In this paper, we rely on a recent innovation in transparency on code hosting platforms like GitHub–the use of repository badges–to track how automation tools spread in open-source ecosystems through different social and technical mechanisms over time. Using a large longitudinal data set, multivariate network science techniques, and survival analysis, we study which socio-technical factors can best explain the observed diffusion process of a number of popular automation tools. Our results show that factors such as social exposure, competition, and observability affect the adoption of tools significantly, and they provide a roadmap for software engineers and researchers seeking to propagate best practices and tools.

Chu-Pan Wong, Jens Meinicke, Leo Chen, João P. Diniz, Christian Kästner, and Eduardo Figueiredo. Efficiently Finding Higher-Order Mutants. In Proceedings of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 1165--1177, New York, NY: ACM Press, November 2020. Acceptance rate: 28 % (101/360). [ .pdf, arXiv, doi, teaser, video, bib ]

Higher-order mutation has the potential for improving major drawbacks of traditional first-order mutation, such as by simulating more realistic faults or improving test optimization techniques. Despite interest in studying promising higher-order mutants, such mutants are difficult to find due to the exponential search space of mutation combinations. State-of-the-art approaches rely on genetic search, which is often incomplete and expensive due to its stochastic nature. First, we propose a novel way of finding a complete set of higher-order mutants by using variational execution, a technique that can, in many cases, explore large search spaces completely and often efficiently. Second, we use the identified complete set of higher-order mutants to study their characteristics. Finally, we use the identified characteristics to design and evaluate a new search strategy, independent of variational execution, that is highly effective at finding higher-order mutants even in large code bases.

Jens Meinicke, Juan David Hoyos Rentería, Christian Kästner, and Bogdan Vasilescu. Capture the Feature Flag: Detecting Feature Flags in Open-Source. In Proceedings of the 18th International Conference on Mining Software Repositories (MSR), pages 169--173, New York, NY: ACM Press, May 2020. Acceptance rate: 26 % (45/171). [ .pdf, doi, video, bib ]

Feature flags (a.k.a feature toggles) are a mechanism to keep new features hidden behind a boolean option during development. Flags are used for many purposes, such as A/B testing and turning off a feature more easily in case of failures. While software engineering feature flags research is burgeoning, examples of software projects using flags rarely come from outside commercial and private projects, stifling academic progress. To address this gap, in this paper we present a novel mining software repositories approach to detect feature flagging open-source projects, based on analyzing the projects' commit messages. We apply our approach to all open-source GitHub projects, identifying 231,223 candidate feature flagging projects, and manually validating 100. We also report on an initial analysis of feature flags in the validated sample of 100 projects, investigating practices that correlate with shorter flag lifespans (typically desirable to reduce technical debt), such as using the issue tracker and having the flag owner (the developer introducing a flag) also be the one removing it.

Naveen Raman, Minxuan Cao, Yulia Tsvetkov, Christian Kästner, and Bogdan Vasilescu. Stress and Burnout in Open Source: Toward Finding, Understanding, and Mitigating Unhealthy Interactions. In Proceedings of the Proc. International Conference on Software Engineering -- New Ideas Track (ICSE-NIER), pages 57--60, May 2020. [ .pdf, doi, video, bib ]

Developers from open-source communities have reported high stress levels from frequent demands for features and bug fixes and the sometimes aggressive tone of these demands. Toxic conversations may demotivate and burn out developers, creating challenges for sustaining open source. We outline a path toward finding, understanding, and possibly mitigating such unhealthy interactions. We take a first step toward finding them, by developing and demonstrating a measurement instrument (an SVM classifier tailored towards the software engineering domain) to detect toxic discussions in GitHub issues. We used our classifier to analyze trends over time and in different GitHub communities, finding that toxicity varies by community and that toxicity decreased from 2012-2018.

Jens Meinicke, Chu-Pan Wong, Bogdan Vasilescu, and Christian Kästner. Exploring Differences and Commonalities between Feature Flags and Configuration Options. In Proceedings of the Proc. International Conference on Software Engineering -- Software Engineering in Practice Track (ICSE-SEIP), pages 233--242, May 2020. [ .pdf, doi, video, bib ]

Feature flags for continuous deployment and configuration options for customizing software share many similarities, both conceptually and technically. However, neither academic nor practitioner publications seem to distinguish these two concepts. We argue that a distinction is valuable, as applications, goals, and challenges differ fundamentally between feature flags and configuration options. In this work, we explore the differences and commonalities of both concepts to help understand practices and challenges and to help transfer existing solutions (e.g., for testing). To better understand feature flags and how they relate to configuration options, we performed nine semi-structured interviews with feature-flag experts. We discovered a number of distinguishing characteristics but also opportunities for knowledge and technology transfer across both communities. Overall, we think that both communities can learn from each other.

Cassandra Overney, Jens Meinicke, Christian Kästner, and Bogdan Vasilescu. How to Not Get Rich: An Empirical Study of Donations in Open Source. In Proceedings of the 42nd International Conference on Software Engineering (ICSE), pages 1209--1221, New York, NY: ACM Press, May 2020. Acceptance rate: 21 % (129/617). [ .pdf, doi, video, bib ]

Open source is ubiquitous and critical infrastructure, yet funding and sustaining it is challenging. While there are many different funding models for open-source donations and concerted efforts through foundations, donation platforms like Paypal, Patreon, or OpenCollective are popular and low-bar forms to raise funds for open-source development, for which GitHub recently even built explicit support. With a mixed-method study, we explore the emerging and largely unexplored phenomenon of donations in open source: We quantify how commonly open-source projects ask for donations, statistically model characteristics of projects that ask for and receive donations, analyze for what the requested funds are needed and used, and assess whether the received donations achieve the intended outcomes. We find 25,885 projects asking for donations on GitHub, often to support engineering activities; however, we also find no clear evidence that donations influence the activity level of a project. In fact, we find that donations are used in a multitude of ways, raising new research questions about effective funding.

Shurui Zhou, Bogdan Vasilescu, and Christian Kästner. How Has Forking Changed in the Last 20 Years? A Study of Hard Forks on GitHub. In Proceedings of the 42nd International Conference on Software Engineering (ICSE), pages 445--456, New York, NY: ACM Press, May 2020. Acceptance rate: 21 % (129/617). [ .pdf, doi, video, bib ]

The notion of forking has changed with the rise of distributed version control systems and social coding environments, like GitHub. Traditionally forking refers to splitting off an independent development branch (which we call hard forks); research on hard forks, conducted mostly in pre-GitHub days showed that hard forks were often seen critical as they may fragment a community. Today, in social forking environments, open-source developers are encouraged to fork a project in order to integrate contributions to the community (which we call social forks), which may have also influenced perceptions and practices around hard forks. To revisit hard forks, we identify, study and classify 15,306 hard forks on GitHub and interview 18 owners of hard forks or forked repositories. We find that, among others, hard forks often evolve out of social forks rather than being planned deliberately and that perception about hard forks have indeed changed dramatically, seeing them often as a positive non-competitive alternative to the original project.

Christian Kästner, and Eunsuk Kang. Teaching Software Engineering for AI-Enabled Systems. In Proceedings of the Proc. International Conference on Software Engineering -- Software Engineering Education and Training Track (ICSE-SEET), pages 45--48, New York, NY: ACM Press, May 2020. Acceptance rate: 23 % (21/90). [ .pdf, arXiv, doi, video, bib ]

Software engineers have significant expertise to offer when building intelligent systems, drawing on decades of experience and methods for building systems that scale and are responsive and robust, even when built on unreliable components. Systems with artificial-intelligence or machine-learning (ML) components raise new challenges and require careful engineering. We designed a new course to teach software-engineering skills to students with a background in ML. We specifically go beyond traditional ML courses that teach modeling techniques under artifical conditions and focus, in lecture and assignments, on realism with large and changing datasets, robust and evolvable infrastructure, and purposeful requirements engineering that considers also ethics and fairness. We describe the course and our infrastructure and share experience and all material from teaching the course for the first time.

Markos Viggiato, Johnatan Oliveira, Eduardo Figueiredo, Pooyan Jamshidi, and Christian Kästner. How Do Code Changes Evolve in Different Platforms? A Mining-based Investigation. In Proceedings of the 35th International Conference on Software Maintenance and Evolution (ICSME), pages 218--222, September 2019. [ doi, http, bib ]

Software developed in different platforms has different characteristics and needs. More specifically, code changes are differently performed in the mobile platform compared to non-mobile platforms (e.g., desktop and Web platforms). Prior works have investigated the differences in specific platforms. However, we still lack a deeper understanding of how code changes evolve across different software platforms. In this paper, we present a study aiming at investigating the frequency of changes and how source code changes, build changes and test changes co-evolve in mobile and non-mobile platforms. We developed linear regression models to explain which factors influence the frequency of changes in different platforms and applied the Apriori algorithm to find types of changes that frequently occur together. Our findings show that non-mobile repositories have a higher number of commits per month compared to mobile and our regression models suggest that being mobile significantly impacts on the number of commits in a negative direction when controlling for confound factors, such as code size. We also found that developers do not usually change source code files together with build files or test files. We argue that our results can provide valuable information for developers on how changes are performed in different platforms so that practices adopted in successful software systems can be followed.

Shurui Zhou, Bogdan Vasilescu, and Christian Kästner. What the Fork: A Study of Inefficient and Efficient Forking Practices in Social Coding. In Proceedings of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 350--361, New York, NY: ACM Press, August 2019. Acceptance rate: 24 % (74/303). [ .pdf, doi, http, bib ]

Forking and pull requests have been widely used in open-source communities as a uniform development and contribution mechanisms, which gives developers the flexibility to modify their own fork without affecting others. However, some projects observe severe inefficiencies, including lost and duplicate contributions and fragmented communities. We observed that different communities experience these inefficiencies to widely different degrees and interviewed practitioners indicate several project characteristics and practices, including modularity and coordination mechanisms, that may encourage more efficient forking practices. In this paper, we explore how open-source projects on GitHub differ with regard to forking inefficiencies. Using logistic regression models, we analyzed the association of context factors with the inefficiencies and found that better modularity and centralized management can encourage more contributions and a higher fraction of accepted pull requests, suggesting specific good practices that project maintainers can adopt to reduce forking-related inefficiencies in their community.

David Widder, Michael Hilton, Christian Kästner, and Bogdan Vasilescu. Integrating and Testing the Literature: A Conceptual Replication of Continuous Integration Pain Points. In Proceedings of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 647--658, New York, NY: ACM Press, August 2019. Acceptance rate: 24 % (74/303). [ .pdf, doi, http, bib ]

Continuous integration (CI) is an established software quality assurance practice, and the focus of much prior research with a diverse range of methods and populations. In this paper, we conduct a literature review of 37 papers on CI pain points. We then conduct a conceptual replication study on results from these papers using a triangulation design consisting of a survey with 132 responses, 12 interviews, and two logistic regressions predicting CI abandonment and switching on a dataset of 6,239 GitHub projects. We report and discuss which past results we were able to replicate, those for which we found conflicting evidence, those for which we did not find evidence, and the implications of these cases on future CI research, CI tool builders, and CI users.

Gabriel Ferreira, Christian Kästner, Joshua Sunshine, Sven Apel, and William Scherlis. Design Dimensions for Software Certification: A Grounded Analysis. Technical Report 1905.09760, arXiv, May 2019. [ .pdf, arXiv, bib ]

In many domains, software systems cannot be deployed until authorities judge them fit for use in an intended operating environment. Certification standards and processes have been devised and deployed to regulate operations of software systems and prevent their failures. However, practitioners are often unsatisfied with the efficiency and value proposition of certification efforts. In this study, we compare two certification standards, Common Criteria and DO-178C, and collect insights from literature and from interviews with subject-matter experts to identify design options relevant to the design of standards. The results of the comparison of certification efforts—leading to the identification of design dimensions that affect their quality—serve as a framework to guide the comparison, creation, and revision of certification standards and processes. This paper puts software engineering research in context and discusses key issues around process and quality assurance and includes observations from industry about relevant topics such as recertification, timely evaluations, but also technical discussions around model-driven approaches and formal methods. Our initial characterization of the design space of certification efforts can be used to inform technical discussions and to influence the directions of new or existing certification efforts. Practitioners, technical commissions, and government can directly benefit from our analytical framework.

Sergiy S. Kolesnikov, Norbert Siegmund, Christian Kästner, and Sven Apel. On the Relation of Control-flow and Performance Feature Interactions: A Case Study. Empirical Software Engineering (EMSE), (24):2410--2437, 2019. [ .pdf, arXiv, doi, http, bib ]

Detecting feature interactions is imperative for accurately predicting performance of highly-configurable systems. State-of-the-art performance prediction techniques rely on supervised machine learning for detecting feature interactions, which, in turn, relies on time-consuming performance measurements to obtain training data. By providing information about potentially interacting features, we can reduce the number of required performance measurements and make the overall performance prediction process more time efficient. We expect that information about potentially interacting features can be obtained by analyzing the source code of a highly-configurable system, which is computationally cheaper than performing multiple performance measurements. To this end, we conducted an in-depth qualitative case study on two real-world systems (mbedTLS and SQLite), in which we explored the relation between internal (precisely control-flow) feature interactions, detected through static program analysis, and external (precisely performance) feature interactions, detected by performance-prediction techniques using performance measurements. We found that a relation exists that can potentially be exploited to predict performance interactions.

Jonathan Aldrich, Joydeep Biswas, Javier Cámara, David Garlan, Arjun Guha, Jarrett Holtz, Pooyan Jamshidi, Christian Kästner, Claire Le Goues, Anahita Mohseni-Kabir, Ivan Ruchkin, Selva Samuel, Bradley Schmerl, Christopher Timperley, Manuela Veloso, and Ian Voysey. Model-based Adaptation for Robotics Software. IEEE Software (IEEE-Sw), 36(2):83--90, 2019. [ .pdf, doi, bib ]

We developed model-based adaptation, an approach that leverages models of software and its environment to enable automated adaptation. The goal of our approach is to build long-lasting software systems that can effectively adapt to changes in their environment.

Pooyan Jamshidi, Javier Cámara, Bradley Schmerl, Christian Kästner, and David Garlan. Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Autonomous Robots. In Proceedings of the 14th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS), pages 39--50, May 2019. [ .pdf, doi, http, bib ]

Modern cyber-physical systems (e.g., robotics systems) are typically composed of physical and software components, the characteristics of which are likely to change over time. Assumptions about parts of the system made at design time may not hold at run time, especially when a system is deployed for long periods (e.g., over decades). Self-adaptation is designed to find reconfigurations of systems to handle such run-time inconsistencies. Planners can be used to find and enact optimal reconfigurations in such an evolving context. However, for systems that are highly configurable, such planning becomes intractable due to the size of the adaptation space. To overcome this challenge, in this paper we explore an approach that (a) uses machine learning to find Pareto-optimal configurations without needing to explore every configuration, and (b) restricts the search space to such configurations to make planning tractable. We explore this in the context of robot missions that need to consider task timeliness and energy consumption. An independent evaluation shows that our approach results in high-quality adaptation plans in uncertain and adversarial environments.

Markos Viggiato, Johnatan Oliveira, Eduardo Figueiredo, Pooyan Jamshidi, and Christian Kästner. Understanding Similarities and Differences in Software Development Practices Across Domains. In Proceedings of the 14th ACM/IEEE International Conference on Global Software Engineering (ICGSE), pages 74--84, May 2019. [ .pdf, doi, http, bib ]

Since software engineering is not a homogeneous whole, we expect that development practices are differently adopted across domains. However, little is known about how practices are followed in different software domains (e.g., healthcare, banking, and Oil and gas). In this paper, we report the results of an exploratory and inductive research, in which we seek differences and similarities regarding the adoption of several practices across 13 domains. We interviewed 19 developers with experience in multiple domains (i.e., cross-domain developers) from large multinational companies, such as Facebook, Google and Macy's. We also run a Web survey to confirm (or not) the interview results. Our findings show that, in fact, different domains adopt practices in a different fashion. We identified that continuous integration practices are interrupted during important commerce periods (e.g., Black Friday) in the financial domains. We also noticed the company's culture and policies strongly influence the adopted practices, instead of the domain itself. Our study also has important implications for practice. For instance, companies should provide targeted training for their development teams and new interdisciplinary courses in software engineering and other domains, such as healthcare, are highly recommended.

Kalil Garrett, Gabriel Ferreira, Christian Kästner, Joshua Sunshine, and Limin Jia. Detecting Suspicious Package Updates. In Proceedings of the International Conference on Software Engineering -- New Ideas Track (ICSE-NIER), pages 13--16, May 2019. [ .pdf, doi, bib ]

With an increased level of automation provided bypackage managers, which sometimes allow updates to be installedautomatically, malicious package updates are becoming a realthreat in software ecosystems. To address this issue, we proposean approach based on anomaly detection, to identify suspiciousupdates based on security-relevant features that attackers coulduse in an attack. We evaluate our approach in the contextof Node.js/npm ecosystem, to show its feasibility in terms ofreduced review effort and the correct identification of a confirmedmalicious update attack. Although we do not expect it to bea complete solution in isolation, we believe it is an importantsecurity building block for software ecosystems.

Courtney Miller, David Widder, Christian Kästner, and Bogdan Vasilescu. Why Do People Give Up FLOSSing? A Study of Contributor Disengagement in Open Source. In Proceedings of the 15th International Conference on Open Source Systems (OSS), pages 116--129, May 2019. [ .pdf, doi, http, bib ]

Established contributors are the backbone of many free/libre open source software (FLOSS) projects. Previous research has shown that it is critically important to retain contributors and has also revealed motives behind why contributors choose to participate in FLOSS in the first place. However, there has been limited research done on the reasons why established contributors disengage, and factors (on an individual and project level) that predict their disengagement. In this paper, we conduct a mixed-methods empirical study, combining surveys and survival modeling, in order to identify reasons and predictive factors behind established contributor disengagement. We find that different groups of contributors tend to disengage for different reasons, however, overall contributors most commonly cite some kind of transition (e.g., switching jobs or leaving academia). We also find that factors such as the popularity of the projects a contributor works on, whether they have experienced a transition, when they work, and how much they work are all factors that can be used to predict their disengagement from open source.

Luyao Ren, Shurui Zhou, Christian Kästner, and Andrzej Wąsowski. Identifying Redundancies in Fork-based Development. In Proceedings of the 27th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pages 230--241, 2019. [ .pdf, doi, bib ]

Fork-based development is popular and easy to use, but makes it difficult to maintain an overview of the whole community when the number of forks increases, which leads to redundant development where multiple developers are solving the same problem in parallel without being aware of each other. Redundant development wastes effort for both maintainers and developers. In this paper, we designed an approach to identify redundant code changes in forks as early as possible by extracting clues indicating similarities between code changes, and building a machine learning model to predict redundancies. We evaluated the effectiveness from both the maintainer's and the developer's perspectives. The result shows that we achieve 57%-83% precision for detecting duplicate code changes from maintainer's perspective, and we could save developers' effort of 1.9-3.0 commits on average. Also, we show that our approach significantly outperforms existing state-of-art.

Marianne Huchard, Christian Kästner, and Gordon Fraser, editors. Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, September 3-7, 2018. New York, NY: ACM Press, September 2018. [ http, bib ]

Christian Kästner, and Aniruddha S. Gokhale, editors. Proceedings of the 2015 ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, GPCE 2015, Pittsburgh, PA, USA, October 26-27, 2015. New York, NY: ACM Press, October 2015. [ http, bib ]

Jaakko Järvi, and Christian Kästner, editors. Generative Programming: Concepts and Experiences, GPCE'13, Indianapolis, IN, USA - October 27 - 28, 2013. New York, NY: ACM Press, October 2013. [ http, bib ]

Flávio Medeiros, Gabriel Lima, Guilherme Amaral, Sven Apel, Christian Kästner, Márcio Ribeiro, and Rohit Gheyi. An Investigation of Misunderstanding Code Patterns in C Open-Source Software Projects. Empirical Software Engineering (EMSE), 24(4):1693--1726, August 2019. [ .pdf, doi, http, bib ]

Maintenance consumes 40% to 80% of software development costs. So, it is essential to write source code that is easy to understand to reduce the costs with maintenance. Improving code understanding is important because developers often mistake the meaning of code, and misjudge the program behavior, which can lead to errors. There are patterns in source code, such as operator precedence, and comma operator, that have been shown to influence code understanding negatively. Despite initial results, these patterns have not been evaluated in a real-world setting, though. Thus, it is not clear whether developers agree that the patterns studied by researchers can cause substantial misunderstandings in real-world practice. To better understand the relevance of misunderstanding patterns, we applied a mixed research method approach, by performing repository mining and a survey with developers, to evaluate misunderstanding patterns in 50 C open-source projects, including Apache, OpenSSL, and Python. Overall, we found more than 109K occurrences of the 12 patterns in practice. Our study shows that according to developers only some patterns considered previously by researchers may cause misunderstandings. Our results complement previous studies by taking the perception of developers into account.

Leo Chen. Finding Higher Order Mutants Using Variational Execution. Technical Report 1809.04563, arXiv, September 2018. SPLASH Student research competition. [ .pdf, arXiv, bib ]

Mutation testing is an effective but time consuming method for gauging the quality of a test suite. It functions by repeatedly making changes, called mutants, to the source code and checking whether the test suite fails (i.e., whether the mutant is killed). Recent work has shown cases in which applying multiple changes, called a higher order mutation, is more difficult to kill than a single change, called a first order mutation. Specifically, a special kind of higher order mutation, called a strongly subsuming higher order mutation (SSHOM), can enable equivalent accuracy in assessing the quality of the test suite with fewer executions of tests. Little is known about these SSHOMs, as they are difficult to find. Our goal in this research is to identify a faster, more reliable method for finding SSHOMs in order to characterize them in the future. We propose an approach based on variational execution to find SSHOMs. Preliminary results indicate that variational execution performs better than the existing genetic algorithm in terms of speed and completeness of results. Out of a set of 33 first order mutations, our variational execution approach finds all 38 SSHOMs in 4.5 seconds, whereas the genetic algorithm only finds 36 of the 38 SSHOMs in 50 seconds.

Asher Trockman. Adding sparkle to social coding: an empirical study of repository badges in the npm ecosystem. In Proceedings of the 40th International Conference on Software Engineering (ICSE-SRC), pages 524--526, New York, NY: ACM Press, 2017. ICSE Student research competition, first place. [ .pdf, doi, http, bib ]

Contemporary software development is characterized by increased reuse and speed. Open source software forges such as GitHub host millions of repositories of libraries and tools, which developers reuse liberally, creating complex and often fragile networks of interdependencies. Hence, developers must make more decisions at a higher speed, finding which libraries to depend on and which projects to contribute to. This decision making process is supported by the transparency provided by social coding platforms like GitHub, where user profile pages display information on a one's contributions, and repository pages provide information on a project's social standing (e.g., through stars and watchers).

Lukas Lazarek. How to Efficiently Process $2^100$ List Variations. In Proceedings of the 2017 ACM SIGPLAN Conference on Systems, Programming, Languages, and Applications: Software for Humanity (SPLASH-SRC), pages 36--38, New York, NY: ACM Press, 2017. SPLASH Student research competition, first place. [ .pdf, doi, http, bib ]

Variational execution offers an avenue of efficiently analyzing configurable systems, but data structures like lists require special consideration. We implement automatic substitution of a more efficient list representation in a variational execution framework and evaluate its performance in micro-benchmarks. The results suggest that the substitution may offer substantial performance improvements to programs involving highly variational lists.

Larissa Rocha Soares, Jens Meinicke, Sarah Nadi, Christian Kästner, and Eduardo Santana de Almeida. Exploring Feature Interactions Without Specifications: A Controlled Experiment. In Proceedings of the 17th ACM International Conference on Generative Programming and Component Engineering (GPCE), pages 41--52, New York, NY: ACM Press, 2018. [ .pdf, doi, http, bib ]

In highly configurable systems, features may interact unexpectedly and produce faulty behavior. Those faults are not easily identified from the analysis of each feature separately, especially when feature specifications are missing. We propose VarXplorer, a dynamic and iterative approach to detect suspicious interactions. It provides information on how features impact the control and data flow of the program. VarXplorer supports developers with a graph that visualizes this information, mainly showing suppress and require relations between features. To evaluate whether VarXplorer helps improve the performance of identifying suspicious interactions, we perform a controlled study with 24 subjects. We find that with our proposed feature-interaction graphs, participants are able to identify suspicious interactions more than 3 times faster compared to the state-of-the-art tool.

Hung Viet Nguyen, Hung Dang Phan, Christian Kästner, and Tien N. Nguyen. Exploring Output-Based Coverage for Testing PHP Web Applications. Automated Software Engineering -- An International Journal (AUSE), 26(1):59--85, March 2019. [ .pdf, doi, bib ]

In software testing, different testers focus on different aspects of the software such as functionality, performance, design, and other attributes. While many tools and coverage metrics exist to support testers at the code level, not much support is targeted for testers who want to inspect the output of a program such as a dynamic web application. To support this category of testers, we propose a family of output-coverage metrics (similar to statement, branch, and path coverage metrics on code) that measure how much of the possible output has been produced by a test suite and what parts of the output are still uncovered. To do that, we first approximate the output universe using our existing symbolic execution technique. Then, given a set of test cases, we map the produced outputs onto the output universe to identify the covered and uncovered parts and compute output-coverage metrics. In our empirical evaluation on seven real-world PHP web applications, we show that selecting test cases by output coverage is more effective at identifying presentation faults such as HTML validation errors and spelling errors than selecting test cases by traditional code coverage. In addition, to help testers understand output coverage and augment test cases, we also develop a tool called WebTest that displays the output universe in one single web page and allows testers to visually explore covered and uncovered parts of the output.

Alexander von Rhein, Jörg Liebig, Andreas Janker, Christian Kästner, and Sven Apel. Variability-Aware Static Analysis at Scale: An Empirical Study. ACM Transactions on Software Engineering and Methodology (TOSEM), 27(4):Article No. 18, 2018. [ .pdf, doi, bib ]

The advent of variability management and generator technology enables users to derive individual system variants from a configurable code base by selecting desired configuration options. This approach gives rise to the generation of possibly billions of variants, which, however, cannot be efficiently analyzed for bugs and other properties with classic analysis techniques. To address this issue, researchers and practitioners have developed sampling heuristics and, recently, variability-aware analysis techniques. While sampling reduces the analysis effort significantly, the information obtained is necessarily incomplete, and it is unknown whether state-of-the-art sampling techniques scale to billions of variants. Variability-aware analysis techniques process the configurable code base directly, exploiting similarities among individual variants with the goal of reducing analysis effort. However, while being promising, so far, variability-aware analysis techniques have been applied mostly only to small academic examples. To learn about the mutual strengths and weaknesses of variability-aware and sample-based static-analysis techniques, we compared the two by means of seven concrete control-flow and data-flow analyses, applied to five real-world subject systems: BusyBox, OpenSSL, SQLite, the x86 Linux kernel, and uclibc. In particular, we compare the efficiency (analysis execution time) of the static analyses and their effectiveness (potential bugs found). Overall, we found that variability-aware analysis outperforms most sample-based static-analysis techniques with respect to efficiency and effectiveness. For example, checking all variants of OpenSSL with a variability-aware static analysis is faster than checking even only two variants with an analysis that does not exploit similarities among variants.

Chu-Pan Wong, Jens Meinicke, Lukas Lazarek, and Christian Kästner. Faster Variational Execution with Transparent Bytecode Transformation. Proceedings of the ACM on Programming Languages, Issue OOPSLA (OOPSLA), 2:117:1--117:30, 2018. [ .pdf, doi, bib ]

Variational execution is a novel dynamic analysis technique for exploring highly configurable systems and accurately tracking information flow. It is able to efficiently analyze many configurations by aggressively sharing redundancies of program executions. The idea of variational execution has been demonstrated to be effective in exploring variations in the program, especially when the configuration space grows out of control. Existing implementations of variational execution often require heavy lifting of the runtime interpreter, which is painstaking and error-prone. Furthermore, the performance of this approach is suboptimal. For example, the state-of-the-art variational execution interpreter for Java, VarexJ, slows down executions by 100 to 800~times over a single execution for small to medium size Java programs. Instead of modifying existing JVMs, we propose to transform existing bytecode to make it variational, so it can be executed on an unmodified commodity JVM. Our evaluation shows a dramatic improvement on performance over the state-of-the-art, with a speedup of 2 to 46 times, and high efficiency in sharing computations.

Chu-Pan Wong, Jens Meinicke, and Christian Kästner. Beyond Testing Configurable Systems: Applying Variational Execution to Automatic Program Repair and Higher Order Mutation Testing. In Proceedings of the 26th International Symposium on Foundations of Software Engineering -- New Ideas Track (FSE-NIER), pages 749--753, November 2018. [ .pdf, doi, bib ]

Variational execution is a novel dynamic analysis technique for exploring highly configurable systems and accurately tracking information flow. It is able to efficiently analyze many configurations by aggressively sharing redundancies of program executions. The idea of variational execution has been demonstrated to be effective in exploring variations in the program, especially when the configuration space grows out of control. Existing implementations of variational execution often require heavy lifting of the runtime interpreter, which is painstaking and error-prone. Furthermore, the performance of this approach is suboptimal. For example, the state-of-the-art variational execution interpreter for Java, VarexJ, slows down executions by 100 to 800~times over a single execution for small to medium size Java programs. Instead of modifying existing JVMs, we propose to transform existing bytecode to make it variational, so it can be executed on an unmodified commodity JVM. Our evaluation shows a dramatic improvement on performance over the state-of-the-art, with a speedup of 2 to 46 times, and high efficiency in sharing computations.

Norman Peitek, Janet Siegmund, Sven Apel, Christian Kästner, Chris Parnin, Anja Bethmann, Thomas Leich, Gunter Saake, and André Brechmann. A Look into Programmers’ Heads. IEEE Transactions on Software Engineering (TSE), 46(4):442--462, April 2018. [ .pdf, doi, bib ]

Program comprehension is an important, but hard to measure cognitive process. This makes it difficult to provide suitable programming languages, tools, or coding conventions to support developers in their everyday work. Here, we explore whether functional magnetic resonance imaging (fMRI) is feasible for soundly measuring program comprehension. To this end, we observed 17 participants inside an fMRI scanner while they were comprehending source code. The results show a clear, distinct activation of five brain regions, which are related to working memory, attention, and language processing, which all fit well to our understanding of program comprehension. Furthermore, we found reduced activity in the default mode network, indicating the cognitive effort necessary for program comprehension. We also observed that familiarity with Java as underlying programming language reduced cognitive effort during program comprehension. To gain confidence in the results and the method, we replicated the study with 11 new participants and largely confirmed our findings. Our results encourage us and, hopefully, others to use fMRI to observe programmers and, in the long run, answer questions, such as: How should we train programmers? Can we train someone to become an excellent programmer? How effective are new languages and tools for program comprehension?

Jens Meinicke, Chu-Pan Wong, Christian Kästner, and Gunter Saake. Understanding Differences among Executions with Variational Traces. Technical Report 1807.03837, arXiv, July 2018. [ .pdf, arXiv, bib ]

One of the main challenges of debugging is to understand why the program fails for certain inputs but succeeds for others. This becomes especially difficult if the fault is caused by an interaction of multiple inputs. To debug such interaction faults, it is necessary to understand the individual effect of the input, how these inputs interact and how these interactions cause the fault. The differences between two execution traces can explain why one input behaves differently than the other. We propose to compare execution traces of all input options to derive explanations of the behavior of all options and interactions among them. To make the relevant information stand out, we represent them as variational traces that concisely represents control-flow and data-flow differences among multiple concrete traces. While variational traces can be obtained from brute-force execution of all relevant inputs, we use variational execution to scale the generation of variational traces to the exponential space of possible inputs. We further provide an Eclipse plugin Varviz that enables users to use variational traces for debugging and navigation. In a user study, we show that users of variational traces are more than twice as fast to finish debugging tasks than users of the standard Eclipse debugger. We further show that variational traces can be scaled to programs with many options

Pooyan Jamshidi, Miguel Velez, Christian Kästner, and Norbert Siegmund. Learning to Sample: Exploiting Similarities Across Environments to Learn Performance Models for Configurable Systems. In Proceedings of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 71--82, New York, NY: ACM Press, November 2018. Acceptance rate: 21 % (61/289). [ .pdf, doi, bib ]

Most software systems provide options that allow users to tailor the system in terms of functionality and qualities. The increased flexibility raises challenges for understanding the configuration space and the effects of options and their interactions on performance and other non-functional properties. To identify how options and interactions affect the performance of a system, several sampling and learning strategies have been recently proposed. However, existing approaches usually assume a fixed environment (hardware, workload, version) such that learning has to be repeated when the environment changes. Repeating learning and measurement for each environment is expensive and often practically infeasible. Instead, we pursue a strategy that transfers knowledge across environments, but sidesteps heavyweight and expensive transfer-learning strategies. Based on empirical insights about common relationships regarding (i) influential options, (ii) their interactions, and (iii) their performance distributions, our approach L2S (Learning to Sample) selects better samples in the target environment based on information from the source environment. It progressively shrinks the configuration space and adaptively concentrates on interesting regions of the configuration space. With both synthetic benchmarks and several real systems, we demonstrate that L2S outperforms state of the art performance learning and transfer-learning approaches in terms of measurement effort and learning accuracy.

Luyao Ren, Shurui Zhou, and Christian Kästner. Poster: Forks Insight: Providing an Overview of GitHub Forks. In Proceedings of the International Conference on Software Engineering (ICSE), pages 179--180, New York, NY: ACM Press, 2018. Poster. [ .pdf, doi, bib ]

Allan Mori, Gustavo Vale, Markos Viggiato, Johnatan Oliveira, Eduardo Figueiredo, Elder Cirilo, Pooyan Jamshidi, and Christian Kästner. Evaluating Domain-Specific Metric Thresholds: An Empirical Study. In Proceedings of the International Conference on Technical Debt (TechDebt), pages 41--50, New York, NY: ACM Press, May 2018. [ .pdf, doi, bib ]

Software metrics and thresholds provide means to quantify several quality attributes of software systems. Indeed, they have been used in a wide variety of methods and tools for detecting different sorts of technical debts, such as code smells. Unfortunately, these methods and tools do not take into account characteristics of software domains, as the intrinsic complexity of geo-localization and scientific software systems or the simple protocols employed by messaging applications. Instead, they rely on generic thresholds that are derived from heterogeneous systems. Although derivation of reliable thresholds has long been a concern, we still lack empirical evidence about threshold variation across distinct software domains. To tackle this limitation, this paper investigates whether and how thresholds vary across domains by presenting a large-scale study on 3,107 software systems from 15 domains. We analyzed the derivation and distribution of thresholds based on 8 well-known source code metrics. As a result, we observed that software domain and size are relevant factors to be considered when building benchmarks for threshold derivation. Moreover, we also observed that domain-specific metric thresholds are more appropriated than generic ones for code smell detection.

David Widder, Michael Hilton, Christian Kästner, and Bogdan Vasilescu. I’m Leaving You, Travis: A Continuous Integration Breakup Story. In Proceedings of the 16th International Conference on Mining Software Repositories (MSR), pages 165--169, New York, NY: ACM Press, May 2018. Acceptance rate: 33 % (48/145). [ .pdf, doi, bib ]

Continuous Integration (CI) services, which can automatically build, test, and deploy software projects, are an invaluable asset in distributed teams, increasing productivity and helping to maintain code quality. Prior work has shown that CI pipelines can be sophisticated, and choosing and configuring a CI system involves tradeoffs. As CI technology matures, new CI tool offerings arise to meet the distinct wants and needs of software teams, as they negotiate a path through these tradeoffs, depending on their context. In this paper, we begin to uncover these nuances, and tell the story of open-source projects falling out of love with Travis, the earliest and most popular cloud-based CI system. Using logistic regression, we quantify the effects that open-source community factors and project technical factors have on the rate of Travis abandonment. We find that increased build complexity reduces the chances of abandonment, that larger projects abandon at higher rates, and that a project’s dominant language has significant but varying effects. Finally, we find the surprising result that metrics of configuration attempts and knowledge dispersion in the project do not affect the rate of abandonment.

Asher Trockman, Keenen Cates, Mark Mozina, Tuan Nguyen, Christian Kästner, and Bogdan Vasilescu. "Automatically Assessing Code Understandability" Reanalyzed: Combined Metrics Matter. In Proceedings of the 16th International Conference on Mining Software Repositories (MSR), pages 314--318, New York, NY: ACM Press, May 2018. Acceptance rate: 33 % (48/145). [ .pdf, doi, bib ]

Previous research shows that developers spend most of their time understanding code. Despite the importance of code understandability for maintenance-related activities, an objective measure of it remains an elusive goal. Recently, Scalabrino et al. reported on an experiment with 46 Java developers designed to evaluate metrics for code understandability. The authors collected and analyzed data on more than a hundred features describing the code snippets, the developers’ experience, and the developers’ performance on a quiz designed to assess understanding. They concluded that none of the metrics considered can individually capture understandability. Expecting that understandability is better captured by a combination of multiple features, we present a reanalysis of the data from the Scalabrino et al. study, in which we use different statistical modeling techniques. Our models suggest that some computed features of code, such as those arising from syntactic structure and documentation, have a small but significant correlation with understandability. Further, we construct a binary classifier of understandability based on various interpretable code features, which has a small amount of discriminating power. Our encouraging results, based on a small data set, suggest that a useful metric of understandability could feasibly be created, but more data is needed.

Sergiy S. Kolesnikov, Norbert Siegmund, Christian Kästner, Alexander Grebhahn, and Sven Apel. Tradeoffs in Modeling Performance of Highly-Configurable Software Systems. International Journal on Software and Systems Modeling (SOSYM), 18(3):2265--2283, 2019. [ .pdf, doi, http, bib ]

Modeling the performance of a highly-configurable software system requires capturing the influences of its configuration options and their interactions on the system's performance. Performance-influence models quantify these influences, explaining this way the performance behavior of a configurable system as a whole. To be useful in practice, a performance-influence model should have a low prediction error, small model size, and reasonable computation time. Because of the inherent tradeoffs among these properties, optimizing for one property may negatively influence the others. It is unclear, though, to what extent these tradeoffs manifest themselves in practice, that is, whether a large configuration space can be described accurately only with large models and significant resource investment. By means of 10 real-world highly-configurable systems from different domains, we have systematically studied the tradeoffs between the three properties. Surprisingly, we found that the tradeoffs between prediction error and model size and between prediction error and computation time are rather marginal. That is, we can learn accurate and small models in reasonable time, so that one performance-influence model can fit different use cases, such as program comprehension and performance prediction. We further investigated the reasons for why the tradeoffs are marginal. We found that interactions among four or more configuration options have only a minor influence on the prediction error and that ignoring them when learning a performance-influence model can save a substantial amount of computation time, while keeping the model small without considerably increasing the prediction error. This is an important insight for new sampling and learning techniques as they can focus on specific regions of the configuration space and find a sweet spot between accuracy and effort. We further analyzed the causes for the configuration options and their interactions having the observed influences on the systems' performance. We were able to identify several patterns across subject systems, such as dominant configuration options and data pipelines, that explain the influences of highly influential configuration options and interactions, and give further insights into the domain of highly-configurable systems.

Asher Trockman, Shurui Zhou, Christian Kästner, and Bogdan Vasilescu. Adding Sparkle to Social Coding: An Empirical Study of Repository Badges in the npm Ecosystem. In Proceedings of the 40th International Conference on Software Engineering (ICSE), pages 511--522, New York, NY: ACM Press, May 2018. Acceptance rate: 21 % (105/502). [ .pdf, doi, http, bib ]

In fast-paced, reuse-heavy software development, the transparency provided by social coding platforms like GitHub is essential to decision making. Developers infer the quality of projects using visible cues, known as signals, collected from personal profile and repository pages. We report on a large-scale, mixed-methods empirical study of npm packages that explores the emerging phenomenon of repository badges, with which maintainers signal underlying qualities about the project to contributors and users. We investigate which qualities maintainers intend to signal and how well badges correlate with those qualities. After surveying developers, mining 294,941 repositories, and applying statistical modeling and time series analysis techniques, we find that non-trivial badges, which display the build status, test coverage, and up-to-dateness of dependencies, are mostly reliable signals, correlating with more tests, better pull requests, and fresher dependencies. Displaying such badges correlates with best practices, but the effects do not always persist.

Shurui Zhou, Ștefan Stănciulescu, Olaf Leßenich, Yingfei Xiong, Andrzej Wąsowski, and Christian Kästner. Identifying Features in Forks. In Proceedings of the 40th International Conference on Software Engineering (ICSE), pages 105--116, New York, NY: ACM Press, May 2018. Acceptance rate: 21 % (105/502). [ .pdf, doi, http, bib ]

Fork-based development has been widely used both in open source community and industry, because it gives developers flexibility to modify their own fork without affecting others. Unfortunately, this mechanism has downsides; when the number of forks becomes large, it is difficult for developers to get or maintain an overview of activities in the forks. Current tools provide little help. We introduced INFOX, an approach to automatically identifies not-merged features in forks and generates an overview of active forks in a project. The approach clusters cohesive code fragments using code and network analysis techniques and uses information-retrieval techniques to label clusters with keywords. The clustering is effective, with 90% accuracy on a set of known features. In addition, a human-subject evaluation shows that INFOX can provide actionable insight for developers of forks.

Larissa Rocha Soares, Jens Meinicke, Sarah Nadi, Christian Kästner, and Eduardo Santana de Almeida. VarXplorer: Lightweight Process for Dynamic Inspection of Feature Interactions. In Proceedings of the 12nd Int'l Workshop on Variability Modelling of Software-Intensive Systems (VaMoS), pages 59--66, 2018. [ .pdf, doi, http, bib ]

Features in highly configurable systems can interact in undesiredways which may result in faults. However, most interactions arenot easily detectable as specifications of feature interactions areusually missing. In this paper, we aim to detect interactions and tohelp create feature-interaction specifications. We use variational ex-ecution to observe internal interactions on control and data flow ofhighly configurable systems. The number of potential interactionscan be large and hard to understand, especially as many interac-tions are benign. To help developers understand these interactions,we propose feature-interaction graphs as a concise representationof all pairwise interactions. We provide two analyses that reportsuspicious interactions, namely suppress and require interactionsFinally, we propose a specification language that enables develop-ers to define different kinds of allowed and forbidden interactions.Our tool, VarXplorer, provides a visualization of feature-interactiongraphs and supports the creation of feature interaction specifi-cations. VarXplorer also provides an iterative analysis of featureinteractions allowing developers to focus on suspicious cases.

Olaf Leßenich, Janet Siegmund, Sven Apel, Christian Kästner, and Claus Hunsen. Indicators for Merge Conflicts in the Wild: Survey and Empirical Study. Automated Software Engineering -- An International Journal (AUSE), 25(2):279--313, 2018. [ .pdf, doi, http, bib ]

While the creation of new branches and forks is easy and fast with modern version-control systems, merging is often time-consuming. Especially when dealing with many branches or forks, a prediction of merge costs based on lightweight indicators would be desirable to help developers recognize problematic merging scenarios before potential conflicts become too severe in the evolution of a complex software project. We analyze the predictive power of several indicators, such as the number, size or scattering degree of commits in each branch, derived either from the version-control system or directly from the source code. Based on a survey of 41 developers, we inferred 7 potential indicators to predict the number of merge conflicts. We tested corresponding hypotheses by studying 163 open-source projects, including 21,488 merge scenarios and comprising 49,449,773 lines of code. A notable (negative) result is that none of the 7 indicators suggested by the participants of the developer survey has a predictive power concerning the frequency of merge conflicts. We discuss this and other findings as well as perspectives thereof.

Max Lillack, Christian Kästner, and Eric Bodden. Tracking Load-time Configuration Options. IEEE Transactions on Software Engineering (TSE), 44(12):1269--1291, 2018. [ .pdf, doi, bib ]

Highly configurable software systems are pervasive, although configuration options and their interactions raise complexity of the program and increase maintenance effort. Especially load-time configuration options, such as parameters from command-line options or configuration files, are used with standard programming constructs such as variables and if-statements intermixed with the program’s implementation; manually tracking configuration options from the time they are loaded to the point where they may influence control-flow decisions is tedious and error prone. We design and implement LOTRACK , an extended static taint analysis to track configuration options automatically. LOTRACK derives a configuration map that explains for each code fragment under which configurations it may be executed. An evaluation on Android apps and Java applications from different domains shows that LOTRACK yields high accuracy with reasonable performance. We use LOTRACK to empirically characterize how much of the implementation of Android apps depends on the platform’s configuration options or interactions of these options.

Jafar Al-Kofahi, Suresh Kothari, and Christian Kästner. Four Languages and Lots of Macros: Analyzing Autotools Build Systems. In Proceedings of the 16th ACM International Conference on Generative Programming and Component Engineering (GPCE), pages 176--186, New York, NY: ACM Press, October 2017. Acceptance rate: 32 % (18/56). [ .pdf, doi, http, bib ]

Build systems are crucial for software system development, however there is a lack of tool support to help with their high maintenance overhead. GNU Autotools are widely used in the open source community, but users face various challenges from its hard to comprehend nature and staging of multiple code generation steps, often leading to low quality and error-prone build code. In this paper, we present a platform AutoHaven to provide a foundation for developers to create analysis tools to help them understand, maintain, and migrate their GNU Autotools build systems. Internally it uses approximate parsing and symbolic analysis of the build logic. We illustrate the use of the platform with two tools: ACSense helps developers to better understand their build systems and ACSniff detects build smells to improve build code quality. Our evaluation shows that AutoHaven can support most GNU Autotools build systems and can detect build smells in the wild.

Pooyan Jamshidi, Norbert Siegmund, Miguel Velez, Christian Kästner, Akshay Patel, and Yuvraj Agarwal. Transfer Learning for Performance Modeling of Configurable Systems: An Exploratory Analysis. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 497--508, Los Alamitos, CA: IEEE Computer Society, November 2017. Acceptance rate: 23 % (88/388). [ .pdf, arXiv, doi, bib ]

Modern software systems provide many configuration options which not only influence their functionality but also non-functional properties such as response-time. To understand and predict the effect of configuration options, several sampling, analysis, and learning strategies have been proposed, albeit often with significant cost to cover the highly dimensional configuration space. Recently, transfer learning has been applied to reduce the effort of constructing performance models by transferring knowledge about performance behavior across environments. While this line of research is promising to learn more accurate models at lower cost, it is unclear until now why and when transfer learning works for performance modeling and analysis in highly configurable systems. To shed light on when it is beneficial to apply transfer learning, we conducted an empirical study on four popular software systems, varying software configurations and environmental conditions, such as hardware, workload, and software versions, to identify the key knowledge pieces that can be exploited for transfer learning. Our results show that in small environmental changes (e.g., homogeneous workload change), by applying a linear transformation to the performance model of the source environment, we can understand the performance behavior of the target environment, while for severe environmental changes (e.g., drastic workload change) we can transfer only knowledge that makes sampling in the target environment more efficient, e.g., by reducing the dimensionality of the configuration space.

Olaf Leßenich, Sven Apel, Christian Kästner, Georg Seibt, and Janet Siegmund. Renaming and Shifted Code in Structured Merging: Looking Ahead for Precision and Performance. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 543--553, Los Alamitos, CA: IEEE Computer Society, November 2017. Acceptance rate: 23 % (88/388). [ .pdf, doi, http, bib ]

Diffing and merging of source-code artifacts is an essential task when integrating changes in software versions. While state-of-the-art line-based tools (e.g., git merge) are fast and independent of the programming language used, they have only a low precision. Recently, it has been shown that the precision of merging can be substantially improved by using a language-aware, structured approach that works on abstract syntax trees. But, precise structured merging is NP hard, especially, when considering the notoriously difficult scenarios of renamings and shifted code. To address these scenarios without compromising scalability, we propose a syntax-aware, heuristic optimization for structured merging that employs a lookahead mechanism during tree matching. The key idea is that renamings and shifted code are not arbitrarily distributed but their occurrence follows patterns, which we address with a syntax-specific lookahead. Our experiments with 48 real-world open-source projects (4878 merge scenarios with over 400 million lines of code) demonstrate that we can significantly improve matching precision in 28 percent while maintaining performance.

Janet Siegmund, Norman Peitek, Chris Parnin, Sven Apel, Johannes Hofmeister, Christian Kästner, Andrew Begel, Anja Bethmann, and André Brechmann. Measuring Neural Efficiency of Program Comprehension. In Proceedings of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 140--150, New York, NY: ACM Press, September 2017. [ .pdf, doi, http, bib ]

Most modern software programs cannot be understood in their entirety by a single programmer. Instead, programmers must rely on a set of cognitive processes that aid in seeking, filtering, and shaping relevant information for a given programming task. Several theories have been proposed to explain these processes, such as “beacons,” for locating relevant code, and “plans,” for encoding cognitive models. However, these theories are decades old and lack validation with modern cognitive-neuroscience methods. In this paper, we report on a study using functional magnetic resonance imaging (fMRI) with 11 participants who performed program comprehension tasks. We manipulated experimental conditions related to beacons and layout to isolate specific cognitive processes related to bottom-up comprehension and comprehension based on semantic cues. We found evidence of semantic chunking during bottom-up comprehension and lower activation of brain areas during comprehension based on semantic cues, confirming that beacons ease comprehension.

Christian Kästner. Differential Testing for Variational Analyses: Experience from Developing KConfigReader. Technical Report 1706.09357, arXiv, June 2017. [ .pdf, arXiv, bib ]

Differential testing to solve the oracle problem has been applied in many scenarios where multiple supposedly equivalent implementations exist, such as multiple implementations of a C compiler. If the multiple systems disagree on the output for a given test input, we have likely discovered a bug without every having to specify what the expected output is. Research on variational analyses (or variability-aware or family-based analyses) can benefit from similar ideas. The goal of most variational analyses is to perform an analysis, such as type checking or model checking, over a large number of configurations much faster than an existing traditional analysis could by analyzing each configuration separately. Variational analyses are very suitable for differential testing, since the existence nonvariational analysis can provide the oracle for test cases that would otherwise be tedious or difficult to write. In this experience paper, I report how differential testing has helped in developing KConfigReader, a tool for translating the Linux kernel's kconfig model into a propositional formula. Differential testing allows us to quickly build a large test base and incorporate external tests that avoided many regressions during development and made KConfigReader likely the most precise kconfig extraction tool available.

Raman Goyal, Gabriel Ferreira, Christian Kästner, and James Herbsleb. Identifying Unusual Commits on GitHub. Journal of Software: Evolution and Process (JSEP), 30(1):, January 2018. [ .pdf, doi, http, bib ]

Transparent environments and social-coding platforms as GitHub help developers to stay abreast of changes during the development and maintenance phase of a project. Especially, notification feeds can help developers to learn about relevant changes in other projects. Unfortunately, transparent environments can quickly overwhelm developers with too many notifications, such that they loose the important ones in a sea of noise. Complementing existing prioritization and filtering strategies based on binary compatibility and code ownership, we develop an anomaly-detection mechanism to identify unusual commits in a repository, that stand out with respect to other changes in the same repository or by the same developer. Among others, we detect exceptionally large commits, commits at unusual times, and commits touching rarely changed file types given the characteristics of a particular repository or developer. We automatically flag unusual commits on GitHub through a browser plugin. In an interactive survey with 173 active GitHub users, rating commits in a project of their interest, we found that, though our unusual score is only a weak predictor of whether developers want to be notified about a commit, information about unusual characteristics of a commit change how developers regard commits. Our anomaly-detection mechanism is a building block for scaling transparent environments.

Pooyan Jamshidi, Miguel Velez, Christian Kästner, Norbert Siegmund, and Prasad Kawthekar. Transfer Learning for Improving Model Predictions in Highly Configurable Software. In Proceedings of the 12th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS), pages 31--41, Los Alamitos, CA: IEEE Computer Society, May 2017. Acceptance rate: 23 % (14/61). [ .pdf, doi, bib ]

Modern software systems are now being built to be used in dynamic environments utilizing configuration capabilities to adapt to changes and external uncertainties. In a self-adaptation context, we are often interested in reasoning about the performance of the systems under different configurations. Usually, we learn a black-box model based on real measurements to predict the performance of the system given a specific configuration. However, as modern systems become more complex, there are many configuration parameters that may interact and, therefore, we end up learning an exponentially large configuration space. Naturally, this does not scale when relying on real measurements in the actual changing environment. We propose a different solution: Instead of taking the measurements from the real system, we learn the model using samples from other sources, such as simulators that approximate performance of the real system at low cost. We define a cost model that transform the traditional view of model learning into a multi-objective problem that not only takes into account model accuracy but also measurements effort as well. We evaluate our cost-aware transfer learning solution using real world configurable software including (i) a robotic system, (ii) 3 different stream processing applications, and (iii) a NoSQL database system. The experimental results demonstrate that our approach can achieve (a) high prediction accuracy as well as (b) high model reliability with only few samples from the target environment.

Flávio Medeiros, Márcio Ribeiro, Rohit Gheyi, Sven Apel, Christian Kästner, Bruno Ferreira, Luiz Carvalho, and Baldoino Fonseca. Discipline Matters: Refactoring of Preprocessor Directives in the #ifdef Hell. IEEE Transactions on Software Engineering (TSE), 44(5):453--469, May 2018. [ .pdf, doi, bib ]

The C preprocessor is used in many C projects to support variability and portability. However, researchers and practitioners criticize the C preprocessor because of its negative effect on code understanding and maintainability and its error proneness. More importantly, the use of the preprocessor hinders the development of tool support that is standard in other languages, such as automated refactoring. Developers aggravate these problems when using the preprocessor in undisciplined ways (e.g., conditional blocks that do not align with the syntactic structure of the code). In this article, we proposed a catalogue of refactorings and we evaluated the number of application possibilities of the refactorings in practice, the opinion of developers about the usefulness of the refactorings, and whether the refactorings preserve behavior. Overall, we found 5670 application possibilities for the refactorings in 63 real-world C projects. In addition, we performed an online survey among 246 developers, and we submitted 28 patches to convert undisciplined directives into disciplined ones. According to our results, 63% of developers prefer to use the refactored (i.e., disciplined) version of the code instead of the original code with undisciplined preprocessor usage. To verify that the refactorings are indeed behavior preserving, we applied them to more than 36 thousand programs generated automatically using a model of a subset of the C language, running the same test cases in the original and refactored programs. Furthermore, we applied the refactorings to three real-world projects: BusyBox, OpenSSL, and SQLite. This way, we detected and fixed a few behavioral changes, 62% caused by unspecified behavior in the C programming language.

Meng Meng, Jens Meinicke, Chu-Pan Wong, Eric Walkingshaw, and Christian Kästner. A Choice of Variational Stacks: Exploring Variational Data Structures. In Proceedings of the 11st Int'l Workshop on Variability Modelling of Software-Intensive Systems (VaMoS), pages 28--35, 2017. [ .pdf, doi, bib ]

Many applications require not only representing variability in software and data, but also computing with it. To do so efficiently requires variational data structures that make the variability explicit in the underlying data and the operations used to manipulate it. Variational data structures have been developed ad hoc for many applications, but there is little general understanding of how to design them or what tradeoffs exist among them. In this paper, we take a first step towards a more systematic exploration and analysis of a variational data structure. We want to know how different design decisions affect the performance and scalability of a variational data structure, and what properties of the underlying data and operation sequences need to be considered. Specifically, we study several alternative designs of a variational stack, a data structure that supports efficiently representing and computing with multiple variants of a plain stack, and that is a common building block in many algorithms. The different variational stacks are presented as a small product line organized by three design decisions. We analyze how these design decisions affect the performance of a variational stack with different usage profiles. Finally, we evaluate how these design decisions affect the performance of the variational stack in a real-world scenario: in the interpreter VarexJ when executing real software containing variability.

Jafar Al-Kofahi, Tien N. Nguyen, and Christian Kästner. Escaping AutoHell: A Vision For Automated Analysis and Migration of Autotools Build Systems. In Proceedings of the 4rd International Workshop on Release Engineering (Releng), pages 12--15, New York, NY: ACM Press, November 2016. [ .pdf, doi, bib ]

GNU Autotools is a widely used build tool in the open source community. As open source projects grow more complex, maintaining their build systems becomes more challenges, due to the lack of tool support. Here we propose a platform to mitigate this problem, and aid developers by providing a platform to build support tools for GNU Autotools build systems. The platform would provide an abstract approximation for the build system to be used in different analysis techniques.

Jens Meinicke, Chu-Pan Wong, Christian Kästner, Thomas Thüm, and Gunter Saake. On Essential Configuration Complexity: Measuring Interactions In Highly-Configurable Systems. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 483--494, New York, NY: ACM Press, September 2016. Acceptance rate: 19 % (57/298). [ .pdf, doi, bib ]

Quality assurance for highly-configurable systems is challenging due to the exponentially growing configuration space. Interactions among multiple options can lead to surprising behaviors, bugs, and security vulnerabilities. Analyzing all configurations systematically might be possible though if most options do not interact or interactions follow specific patterns that can be exploited by analysis tools. To better understand interactions in practice, we analyze program traces to identify where interactions occur on control flow and data. To this end, we developed a dynamic analysis for Java based on variability-aware execution and monitor executions of multiple mid-sized real-world programs. We find that the essential configuration complexity of these programs is indeed much lower than the combinatorial explosion of the configuration space indicates, but also that the interaction characteristics that allow scalable and complete analyses are more nuanced than what is exploited by existing state-of-the-art quality assurance strategies.

Prasad Kawthekar, and Christian Kästner. Sensitivity Analysis For Building Evolving & Adaptive Robotic Software. In Proceedings of the IJCAI Workshop on Autonomous Mobile Service Robots (WSR), , July 2016. [ .pdf, http, bib ]

There has been a considerable growth in research and development of service robots in recent years. For deployment in diverse environment conditions for a wide range of service tasks, novel features and algorithms are developed and existing ones undergo change. However, developing and evolving the robot software requires making and revising many design decisions that can affect the quality of performance of the robots and that are non-trivial to reason about intuitively because of interactions among them. We propose to use sensitivity analysis to build models of the quality of performance to the different design decisions to ease design and evolution. Moreover, we envision these models to be used for run-time adaptation in response to changing goals or environment conditions. Constructing these models is challenging due to the exponential size of the decision space. We build on previous work on performance influence models of highly-configurable software systems using a machine-learning-based approach to construct influence models for robotic software.

Christopher Bogart, Christian Kästner, James Herbsleb, and Ferdian Thung. How to Break an API: Cost Negotiation and Community Values in Three Software Ecosystems. In Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE), pages 109--120, New York, NY: ACM Press, November 2016. Acceptance rate: 27 % (74/273). [ .pdf, doi, http, bib ]

Change introduces conflict into software ecosystems: breaking changes may ripple through the ecosystem and trigger rework for users of a package, but often developers can invest additional effort or accept opportunity costs to alleviate or delay downstream costs. We performed a multiple case study of three software ecosystems with different tooling and philosophies toward change, Eclipse, R/CRAN, and Node.js/npm, to understand how developers make decisions about change and change-related costs and what practices, tooling, and policies are used. We found that all three ecosystems differ substantially in their practices and expectations toward change and that those differences can be explained largely by different community values in each ecosystem. Our results illustrate that there is a large design space in how to build an ecosystem, its policies and its supporting infrastructure; and there is value in making community values and accepted tradeoffs explicit and transparent in order to resolve conflicts and negotiate change-related costs.

Gabriel Ferreira, Momin Malik, Christian Kästner, Juergen Pfeffer, and Sven Apel. Do #ifdefs Influence the Occurrence of Vulnerabilities? An Empirical Study of the Linux Kernel. In Proceedings of the 20th International Software Product Line Conference (SPLC), pages 65--744, New York, NY: ACM Press, September 2016. Acceptance rate: 39 % (17/44). [ .pdf, doi, bib ]

Preprocessors support the diversification of software products with #ifdefs, but also require additional effort from developers to maintain and understand variable code. We conjecture that #ifdefs cause developers to produce more vulnerable code because they are required to reason about multiple features simultaneously and maintain complex mental models of dependencies of configurable code. We extracted a variational call graph across all configurations of the Linux kernel, and used configuration complexity metrics to compare vulnerable and non-vulnerable functions considering their vulnerability history. Our goal was to learn about whether we can observe a measurable influence of configuration complexity on the occurrence of vulnerabilities. Our results suggest, among others, that vulnerable functions have higher variability than non-vulnerable ones and are also constrained by fewer configuration options. This suggests that developers are inclined to notice functions appear in frequently-compiled product variants. We aim to raise developers’ awareness to address variability more systematically, since configuration complexity is an important, but often ignored aspect of software product lines.

Waqar Ahmad, Christian Kästner, Joshua Sunshine, and Jonathan Aldrich. Inter-app Communication in Android: Developer Challenges. In Proceedings of the 14th International Conference on Mining Software Repositories (MSR), pages 177--188, New York, NY: ACM Press, May 2016. Acceptance rate: 27 % (36/133). [ .pdf, doi, bib ]

The Android platform is designed to support mutually untrusted third-party apps, which run as isolated processes but may interact via platform-controlled mechanisms, called Intents. Interactions among third-party apps are intended and can contribute to a rich user experience, for example, the ability to share pictures from one app with another. The Android platform presents an interesting point in a design space of module systems that is biased toward isolation, extensibility, and untrusted contributions. The Intent mechanism essentially provides message channels among modules, in which the set of message types is extensible. However, the module system has design limitations including the lack of consistent mechanisms to document message types, very limited checking that a message conforms to its specification, the inability to explicitly declare dependencies on other modules, and the lack of checks for backward compatibility as message types evolve over time. In order to understand the degree to which these design limitations result in real issues, we studied a broad corpus of apps and cross-validated our results against app documentation and Android support forums. Our findings suggest that design limitations do indeed cause development problems. Based on our results, we outline further research questions and propose possible mitigation strategies..

Flávio Medeiros, Christian Kästner, Márcio Ribeiro, Rohit Gheyi, and Sven Apel. A Comparison of 10 Sampling Algorithms for Configurable Systems. In Proceedings of the 38th International Conference on Software Engineering (ICSE), pages 643--654, New York, NY: ACM Press, May 2016. Acceptance rate: 19 % (101/530). [ .pdf, doi, bib ]

Almost every software system provides configuration options to tailor the system to the target platform and application scenario. Often, this configurability renders the analysis of every individual system configuration infeasible. To address this problem, researchers proposed a diverse set of sampling algorithms. We present a comparative study of 10 state-of-the-art sampling algorithms regarding their fault-detection capability and size of sample sets. The former is important to improve software quality and the latter to reduce the time of analysis. In a nutshell, we found that the sampling algorithms with larger sample sets detected higher numbers of faults. Furthermore, we observed that the limiting assumptions made in previous work influence the number of detected faults, the size of sample sets, and the ranking of algorithms. Finally, we identified a number of technical challenges when trying to avoid the limiting assumptions, which question the practicality of certain sampling algorithms.

James Herbsleb, Christian Kästner, and Christopher Bogart. Intelligently Transparent Software Ecosystems. IEEE Software (IEEE-Sw), 33(1):89--96, 2015. [ .pdf, doi, http, bib ]

Today’s social coding tools foreshadow a transformation of the software industry, as it increasingly relies on open libraries, frameworks, and code fragments. Our vision calls for new “intelligently transparent” services that support rapid development of innovative products while managing risk and receiving early warnings of looming failures. Intelligent transparency is enabled by an infrastructure that applies analytics to data from all phases of the lifecycle of open source projects, from development to deployment, bringing stakeholders the information they need when they need it.

Christopher Bogart, Christian Kästner, and James Herbsleb. When it Breaks, it Breaks: How Ecosystem Developers Reason About the Stability of Dependencies. In Proceedings of the ASE Workshop on Software Support for Collaborative and Global Software Engineering (SCGSE), pages 86--89, November 2015. [ .pdf, doi, bib ]

Dependencies among software projects and libraries are an indicator of the often implicit collaboration among many developers in software ecosystems. Negotiating change can be tricky: changes to one module may cause ripple effects to many other modules that depend on it, yet insisting on only backward-compatible changes may incur significant opportunity cost and stifle change. We argue that awareness mechanisms based on various notions of stability can enable developers to make decisions that are independent yet wise and provide stewardship rather than disruption to the ecosystem. In ongoing interviews with developers in two software ecosystems (CRAN and Node.js), we are finding that developers in fact struggle with change, that they often use adhoc mechanisms to negotiate change, and that existing awareness mechanisms like Github notification feeds are rarely used due to information overload. We study the state of the art and current information needs and outline a vision toward a change-based awareness system.

Waqar Ahmad, Joshua Sunshine, Christian Kästner, and Adam Wynne. Enforcing Fine-Grained Security and Privacy Policies in an Ecosystem within an Ecosystem. In Proceedings of the 3rd International Workshop on Mobile Development Lifecycle (MobileDeLi), pages 28--34, October 2015. [ .pdf, doi, bib ]

Smart home automation and IoT promise to bring many advantages but they also expose their users to certain security and privacy vulnerabilities. For example, leaking the information about the absence of a person from home or the medicine somebody is taking may have serious security and privacy consequences for home users and potential legal implications for providers of home automation and IoT platforms. We envision that a new ecosystem within an existing smartphone ecosystem will be a suitable platform for distribution of apps for smart home and IoT devices. Android is increasingly becoming a popular platform for smart home and IoT devices and applications. Built-in security mechanisms in ecosystems such as Android have limitations that can be exploited by malicious apps to leak users’ sensitive data to unintended recipients. For instance, Android enforces that an app requires the Internet permission in order to access a web server but it does not control which servers the app talks to or what data it shares with other apps. Therefore, sub-ecosystems that enforce additional fine-grained custom policies on top of existing policies of the smartphone ecosystems are necessary for smart home or IoT platforms. To this end, we have built a tool that enforces additional policies on inter-app interactions and permissions of Android apps. We have done preliminary testing of our tool on three proprietary apps developed by a future provider of a home automation platform. Our initial evaluation demonstrates that it is possible to develop mechanisms that allow definition and enforcement of custom security policies appropriate for ecosystems of the like smart home automation and IoT.

Hung Viet Nguyen, My Huu Nguyen, Son Cuu Dang, Christian Kästner, and Tien N. Nguyen. Detecting Semantic Merge Conflicts With Variability-Aware Execution. In Proceedings of the International Symposium on Foundations of Software Engineering -- New Ideas Track (ESEC/FSE-NIER), pages 926--929, New York, NY: ACM Press, August 2015. [ .pdf, doi, bib ]

In collaborative software development, when two or more developers incorporate their changes, a merge conflict may arise if the changes are incompatible. Previous research has shown that such conflicts are common and occur as textual conflicts or build/test failure, i.e., semantic conflicts. When a merge conflict occurs for a large number of parallel changes, it is desirable to identify the actual (minimum) set of changes that directly results in the conflict. Pinpointing the specific conflicting changes directly leading to test failure facilitates quick accountability and correction from developers. For semantic conflicts, to identify such subset of the changes is challenging. A naive approach trying all possible subsets would not scale due to the exponential number of possible combinations. We propose Semex, a novel approach to detect semantic conflicts using variability-aware execution. In the first step, we encode all parallel changes into a single program with variability in which we use symbolic variables to represent whether a given change is applied to the original program. In the second step, we run the test cases via variability-aware execution. Variability-aware execution explores all possible executions of the combined program with regard to all possible values of the symbolic variables representing all changes, and returns a propositional formula over the set of variables repre- senting the condition in which a test case fails. Due to our encoding algorithm, such a set corresponds to the minimum set of changes that are responsible for the conflict. In our preliminary experimental study on seven PHP applications with a total of 50 test cases and 19 semantic conflicts, Semex correctly detected all 19 conflicts.

Norbert Siegmund, Alexander Grebhahn, Christian Kästner, and Sven Apel. Performance-Influence Models for Highly Configurable Systems. In Proceedings of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 284--294, New York, NY: ACM Press, August 2015. Acceptance rate: 25 % (74/291). [ .pdf, bib ]

Almost every complex software system today is configurable. While configurability has many benefits, it challenges performance prediction, optimization, and debugging. Often, the influences of the individual configurations options on performance is unknown. Worse, configuration options may interact, giving rise to a configuration space of possibly exponential size. Addressing this challenge, we propose an approach that derives a performance-influence model for a given configurable system, describing all relevant influences of configuration options and their interactions. Such a model shall be useful for automatic performance prediction and optimization, on the one hand, and performance debugging for developers, on the other hand. Our approach combines machine-learning and sampling technique in a novel way. Our approach improves over standard techniques in that it (1) represents influences of options and their interactions explicitly (which eases debugging), (2) smoothly integrates binary and numeric configuration options for the first time, (3) incorporates domain knowledge, if available (which eases learning and increases accuracy), (4) considers complex constraints among options, and (5) systematically reduces the solution space to a tractable size. A series of experiments demonstrates the feasibility of our approach in terms of the accuracy of the models learned as well as the accuracy of the performances predictions one can make with them. Using our approach, we were able to identify a number of real performance bugs and other problems in real-world systems.

Hung Viet Nguyen, Christian Kästner, and Tien N. Nguyen. Cross-language Program Slicing for Dynamic Web Applications. In Proceedings of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 369--380, New York, NY: ACM Press, August 2015. Acceptance rate: 25 % (74/291). [ .pdf, bib ]

During software maintenance, program slicing is a useful technique to assist developers in understanding the impact of their changes. While different program-slicing techniques have been proposed for traditional software systems, program slicing for dynamic web applications is challenging since the client-side code is generated from the server-side code and data entities are referenced across different languages and are often embedded in string literals in the server-side program. To address those challenges, we introduce WebSlice, an approach to compute program slices across different languages for web applications. We first identify data-flow dependencies among data entities for PHP code based on symbolic execution. We also compute SQL queries and a conditional DOM that represents client-code variations and construct the data flows for embedded languages: SQL, HTML, and JavaScript. Next, we connect the data flows across different languages and those across PHP pages. Finally, we compute a program slice for any given entity based on the established data flows. Running WebSlice on five real-world PHP systems, we found that out of 40,670 program slices, 10 % cross languages, 38 % cross files, and 13 % cross string fragments, demonstrating the potential benefit of tool support for cross-language program slicing in web applications.

Sarah Nadi, Thorsten Berger, Christian Kästner, and Krzysztof Czarnecki. Where do Configuration Constraints Stem From? An Extraction Approach and an Empirical Study. IEEE Transactions on Software Engineering (TSE), 41(8):820--841, 2015. [ .pdf, doi, bib ]

Highly configurable systems allow users to tailor software to specific needs. Valid combinations of configuration options are often restricted by intricate constraints. Describing options and constraints in a variability model allows reasoning about the supported configurations. To automate creating and verifying such models, we need to identify the origin of such constraints. We propose a static analysis approach, based on two rules, to extract configuration constraints from code. We apply it on four highly configurable systems to evaluate the accuracy of our approach and to determine which constraints are recoverable from the code. We find that our approach is highly accurate (93 % and 77 % respectively) and that we can recover 28 % of existing constraints. We complement our approach with a qualitative study to identify constraint sources, triangulating results from our automatic extraction, manual inspections, and interviews with 27 developers. We find that, apart from low-level implementation dependencies, configuration constraints enforce correct runtime behavior, improve users’ configuration experience, and prevent corner cases. While the majority of constraints is extractable from code, our results indicate that creating a complete model requires further substantial domain knowledge and testing. Our results aim at supporting researchers and practitioners working on variability model engineering, evolution, and verification techniques.

Flávio Medeiros, Christian Kästner, Márcio Ribeiro, Sarah Nadi, and Rohit Gheyi. The Love/Hate Relationship with The C Preprocessor: An Interview Study. In Proceedings of the 29th European Conference on Object-Oriented Programming (ECOOP), volume 37 of Leibniz International Proceedings in Informatics, pages 495--518, Dagstuhl, Germany: Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik, 2015. [ .pdf, doi, bib ]

The C preprocessor has received strong criticism in academia, among others regarding separation of concerns, error proneness, and code obfuscation, but is widely used in practice. Many (mostly academic) alternatives to the preprocessor exist, but have not been adopted in practice. Since developers continue to use the preprocessor despite all criticism and research, we ask how practitioners perceive the C preprocessor. We performed interviews with 40 developers, used grounded theory to analyze the data, and cross-validated the results with data from a survey among 202 developers, repository mining, and results from previous studies. In particular, we investigated four research questions related to why the preprocessor is still widely used in practice, common problems, alternatives, and the impact of undisciplined annotations. Our study shows that developers are aware of the criticism the C preprocessor receives, but use it nonetheless, mainly for portability and variability. They indicate that they regularly face preprocessor-related problems and preprocessor-related bugs. The majority of our interviewees do not see any current C-native technologies that can entirely replace the C preprocessor. However, developers tend to mitigate problems with guidelines, but those guidelines are not enforced consistently. We report the key insights gained from our study and discuss implications for practitioners and researchers on how to better use the C preprocessor to minimize its negative impact.

Shurui Zhou, Jafar Al-Kofahi, Tien N. Nguyen, Christian Kästner, and Sarah Nadi. Extracting Configuration Knowledge from Build Files with Symbolic Analysis. In Proceedings of the 3rd International Workshop on Release Engineering (Releng), pages 20--23, New York, NY: ACM Press, May 2015. [ .pdf, doi, bib ]

Build systems contain a lot of configuration knowledge about a software system, such as under which conditions specific files are compiled. Extracting such configuration knowledge is important for many tools analyzing highly-configurable systems, but very challenging due to the complex nature of build systems. We design an approach, based on SYMake, that symbolically evaluates Makefiles and extracts configuration knowledge in terms of file presence conditions and conditional parameters. We implement an initial prototype and demonstrate feasibility on small examples.

Hung Viet Nguyen, Christian Kästner, and Tien N. Nguyen. Varis: IDE Support for Embedded Client Code in PHP Web Applications. In Proceedings of the 37th International Conference on Software Engineering (Volume 2) (ICSE), pages 693--696, May 2015. Formal Demonstration paper, Best Demonstration Award. [ .pdf, doi, bib ]

In software development, IDE services such as syntax highlighting, code completion, and “jump to declara- tion” are used to assist developers in programming tasks. In dynamic web applications, however, since the client-side code is dynamically generated from the server-side code and is embedded in the server-side program as string literals, providing IDE services for such embedded code is challenging. In this work, we introduce Varis, a tool that provides editor services on the client-side code of a PHP-based web application, while it is still embedded within server-side code. Technically, we first perform symbolic execution on a PHP program to approximate all possible variations of the generated client-side code and subsequently parse this client code into a VarDOM that compactly represents all its variations. Finally, using the VarDOM, we implement various types of IDE services for embedded client code including syntax highlighting, code completion, and “jump to declaration”.

Sarah Nadi, Thorsten Berger, Christian Kästner, and Krzysztof Czarnecki. Where do Configuration Constraints Stem From? An Extraction Approach and an Empirical Study. Technical Report GSDLAB-TR 2015-01-27, Waterloo, ON, Canada: Generative Software Development Laboratory, University of Waterloo, January 2015. [ .pdf, http, bib ]

Claus Hunsen, Janet Siegmund, Olaf Leßenich, Sven Apel, Bo Zhang, Christian Kästner, and Martin Becker. Preprocessor-Based Variability in Open-Source and Industrial Software Systems: An Empirical Study. Empirical Software Engineering (EMSE), Special Issue on Empirical Evidence on Software Product Line Engineering, 1--34, 2015. [ .pdf, doi, bib ]

Almost every sufficiently complex software system today is configurable. Conditional compilation is a simple variability-implementation mechanism that is widely used in open-source projects and industry. Especially, the C preprocessor (cpp) is very popular in practice, but it is also gaining (again) interest in academia. Although there have been several attempts to understand and improve cpp, there is a lack of understanding of how it is used in open-source and industrial systems and whether different usage patterns have emerged. The background is that much research on configurable systems and product lines concentrates on open-source systems, simply because they are available for study in the first place. This leads to the potentially problematic situation that it is unclear whether the results obtained from these studies are transferable to industrial systems. We aim at lowering this gap by comparing the use of cpp in open-source projects and industry—especially from the embedded-systems domain—, based on a substantial set of subject systems and well-known variability metrics, including size, scattering, and tangling metrics. A key result of our empirical study is that, regarding almost all aspects we studied, the analyzed open-source systems and the considered embedded systems from industry have comparable distributions regarding most metrics, including systems that have been developed in industry and made open source at some point. So, our study indicates that, regarding cpp as variability-implementation mechanism, insights, methods, and tools developed based on studies of open- source systems are transferable to industrial systems—at least, with respect to the metrics we considered.

Max Lillack, Christian Kästner, and Eric Bodden. Tracking Load-time Configuration Options. In Proceedings of the 29th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 445--456, Los Alamitos, CA: IEEE Computer Society, September 2014. Acceptance rate: 20 % (55/276). [ .pdf, doi, bib ]

Highly-configurable software systems are pervasive, although configuration options and their interactions raise complexity of the program and increase maintenance effort. Especially load-time configuration options, such as parameters from command-line options or configuration files, are used with standard programming constructs such as variables and if statements intermixed with the program’s implementation; manually tracking configuration options from the time they are loaded to the point where they may influence control-flow decisions is tedious and error prone. We design and implement Lotrack, an extended static taint analysis to automatically track configuration options. Lotrack derives a configuration map that explains for each code fragment under which configurations it may be executed. An evaluation on Android applications shows that Lotrack yields high accuracy with reasonable performance. We use Lotrack to empirically characterize how much of the implementation of Android apps depends on the platform’s configuration options or interactions of these options.

Hung Viet Nguyen, Christian Kästner, and Tien N. Nguyen. Building Call Graphs for Embedded Client-Side Code in Dynamic Web Applications. In Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE), pages 518--529, New York, NY: ACM Press, November 2014. Acceptance rate: 22 % (61/273). [ .pdf, doi, bib ]

When developing and maintaining a software system, programmers often rely on IDEs to provide editor services such as syntax highlighting, auto-completion, and “jump to declaration”. In dynamic web applications, such tool support is currently limited to either the server-side code or to hand-written or generated client-side code. Our goal is to build a call graph for providing editor services on client-side code while it is still embedded as string literals within server-side code. First, we symbolically execute the server-side code to identify all possible client-side code variations. Subsequently, we parse the generated client-side code with all its variations into a VarDOM that compactly represents all DOM variations for further analysis. Based on VarDOM, we build conditional call graphs for embedded HTML, CSS, and JS. Our empirical evaluation on real-world web applications show that our analysis achieves 100 % precision in identifying call-graph edges. 62 % of the edges cross PHP strings, and 17 % of them cross files—in both situations, navigation without tool support is tedious and error prone.

Eric Walkingshaw, Christian Kästner, Martin Erwig, Sven Apel, and Eric Bodden. Variational Data Structures: Exploring Tradeoffs in Computing with Variability. In Proceedings of the 13rd SIGPLAN Symposium on New Ideas in Programming and Reflections on Software at SPLASH (Onward!), pages 213--226, New York, NY: ACM Press, 2014. [ .pdf, doi, bib ]

Variation is everywhere, but in the construction and analysis of customizable software it is paramount. In this context, there arises a need for variational data structures for efficiently representing and computing with related variants of an underlying data type. So far, variational data structures have been explored and developed ad hoc. This paper is a first attempt and a call to action for systematic and foundational research in this area. Research on variational data structures will benefit not only customizable software, but the many other application domains that must cope with variability. In this paper, we show how support for variation can be understood as a general and orthogonal property of data types, data structures, and algorithms. We begin a systematic exploration of basic variational data structures, exploring the tradeoffs between different implementations. Finally, we retrospectively analyze the design decisions in our own previous work where we have independently encountered problems requiring variational data structures.

Zack Coker, Samir Hasan, Jeffrey Overbey, Munawar Hafiz, and Christian Kästner. Integers In C: An Open Invitation to Security Attacks? Technical Report CSSE14-01, Auburn, AL: College of Engineering, Auburn University, February 2014. [ .pdf, bib ]

We performed an empirical study to explore how closely well-known, open source C programs follow the safe C standards for integer behavior, with the goal of understanding how difficult it is to migrate legacy code to these stricter standards. We performed an automated analysis on fifty-two releases of seven C programs (6 million lines of preprocessed C code), as well as releases of Busybox and Linux (nearly one billion lines of partially-preprocessed C code). We found that integer issues, that are allowed by the C standard but not by the safer C standards, are ubiquitous—one out of four integers were inconsistently declared, and one out of eight integers were inconsistently used. Integer issues did not improve over time as the programs evolved. Also, detecting the issues is complicated by a large number of integers whose types vary under different preprocessor configurations. Most of these issues are benign, but the chance of finding fatal errors and exploitable vulnerabilities among these many issues remains significant. A preprocessor-aware, tool-assisted approach may be the most viable way to migrate legacy C code to comply with the standards for secure programming.

Thomas Thüm, Sven Apel, Christian Kästner, Ina Schaefer, and Gunter Saake. A Classification and Survey of Analysis Strategies for Software Product Lines. ACM Computing Surveys (CSUR), 47(1):Article 6, June 2014. [ .pdf, doi, http, bib ]

Software-product-line engineering has gained considerable momentum in recent years, both in industry and in academia. A software product line is a set of software products that share a common set of features. Software product lines challenge traditional analysis techniques, such as type checking, model checking, and theorem proving, in their quest of ensuring correctness and reliability of software. Simply creating and analyzing all products of a product line is usually not feasible, due to the potentially exponential number of valid feature combinations. Recently, researchers began to develop analysis techniques that take the distinguishing properties of software product lines into account, for example, by checking feature-related code in isolation or by exploiting variability information during analysis. The emerging field of product-line analyses is both broad and diverse, such that it is difficult for researchers and practitioners to understand their similarities and differences. We propose a classification of product-line analyses to enable systematic research and application. Based on our insights with classifying and comparing a corpus of 76 articles, we infer a research agenda to guide future research on product-line analyses.

Janet Siegmund, Christian Kästner, Sven Apel, Chris Parnin, Anja Bethmann, Thomas Leich, Gunter Saake, and André Brechmann. Understanding Understanding Source Code with Functional Magnetic Resonance Imaging. In Proceedings of the 36th International Conference on Software Engineering (ICSE), pages 378--389, June 2014. Acceptance rate: 20 % (99/495). [ .pdf, doi, bib ]

Program comprehension is an important cognitive process that inherently eludes direct measurement. Thus, researchers are struggling with providing optimal programming languages, tools, or coding conventions to support developers in their everyday work. With our approach, we explore whether functional magnetic resonance imaging (fMRI), which is well established in cognitive neuroscience, is feasible to directly measure program comprehension. To this end, we observed 17 participants inside an fMRI scanner while comprehending short source-code snippets, which we contrasted with locating syntax errors. We found a clear, distinct activation pattern of five brain regions, which are related to working memory, attention, and language processing—all processes that fit well to our understanding of program comprehension. Based on the results, we propose a model of program comprehension. Our results encourage us to use fMRI in future studies to measure program comprehension and, in the long run, answer questions, such as: Can we predict whether someone will be an excellent programmer? How effective are new languages and tools for program understanding? How do we train someone to become an excellent programmer?

Hung Viet Nguyen, Christian Kästner, and Tien N. Nguyen. Exploring Variability-Aware Execution for Testing Plugin-Based Web Applications. In Proceedings of the 36th International Conference on Software Engineering (ICSE), pages 907--918, June 2014. Acceptance rate: 20 % (99/495). [ .pdf, doi, bib ]

In plugin-based systems, plugin conflicts may occur when two or more plugins interfere with one another, changing their expected behaviors. It is highly challenging to detect plugin conflicts due to the exponential explosion of the combinations of plugins (i.e., configurations). In this paper, we address the challenge of executing a test case over many configurations. Leveraging the fact that many executions of a test are similar, our variability-aware execution runs common code once. Only when encountering values that are different depending on specific configurations will the execution split to run for each of them. To evaluate the scalability of variability-aware execution on a large real-world setting, we built a prototype PHP interpreter called Varex and ran it on the popular WordPress blogging Web application. The results show that while plugin interactions exist, there is a significant amount of sharing that allows variability-aware execution to scale to 2^50 configurations within seven minutes of running time. During our study, with Varex, we were able to detect two plugin conflicts: one was recently reported on WordPress forum, and another one is not yet discovered.

Sarah Nadi, Thorsten Berger, Christian Kästner, and Krzysztof Czarnecki. Mining Configuration Constraints: Static Analyses and Empirical Results. In Proceedings of the 36th International Conference on Software Engineering (ICSE), pages 140--151, June 2014. Acceptance rate: 20 % (99/495). [ .pdf, doi, bib ]

Highly-configurable systems allow users to tailor the software to their specific needs. Not all combinations of configuration options are valid though, and constraints arise for technical or non-technical reasons. Explicitly describing these constraints in a variability model allows reasoning about the supported configurations. To automate creating variability models, we need to identify the origin of such configuration constraints. We propose an approach which uses build-time errors and a novel feature-effect heuristic to automatically extract configuration constraints from C code. We conduct an empirical study on four highly-configurable open-source systems with existing variability models having three objectives in mind: evaluate the accuracy of our approach, determine the recoverability of existing variability-model constraints using our analysis, and classify the sources of variability-model constraints. We find that both our extraction heuristics are highly accurate (93 % and 77 % respectively), and that we can recover 19 % of the existing variability-models using our approach. However, we find that many of the remaining constraints require expert knowledge or more expensive analyses. We argue that our approach, tooling, and experimental results support researchers and practitioners working on variability model re-engineering, evolution, and consistency-checking techniques.

Márcio Ribeiro, Paulo Borba, and Christian Kästner. Feature Maintenance with Emergent Interfaces. In Proceedings of the 36th International Conference on Software Engineering (ICSE), pages 989--1000, June 2014. Acceptance rate: 20 % (99/495). [ .pdf, doi, bib ]

Hidden code dependencies are responsible for many complications in maintenance tasks. With the introduction of variable features in product lines, dependencies may even cross feature boundaries and related problems are prone to be detected late. Many current implementation techniques for product lines lack proper interfaces, which could make such dependencies explicit. As alternative to changing the implementation approach, we provide a comprehensive tool-based solution to support developers in recognizing and dealing with feature dependencies: emergent interfaces. Emergent interfaces are computed on demand, based on feature-sensitive interprocedural data-flow analysis. They emerge in the IDE and emulate benefits of modularity not available in the host language. To evaluate the potential of emergent interfaces, we conducted and replicated a controlled experiment, and found, in the studied context, that emergent interfaces can improve performance of code change tasks by up to 3 times while also reducing the number of errors.

Janet Feigenspan, Christian Kästner, Jörg Liebig, Sven Apel, and Stefan Hanenberg. Measuring and Modeling Programming Experience. Empirical Software Engineering (EMSE), 19(5):1299--1334, October 2014. [ .pdf, doi, bib ]

Programming experience is an important confounding parameter in controlled experiments regarding program comprehension. In literature, ways to measure or control programming experience vary. Often, researchers neglect it or do not specify how they controlled for it. We set out to find a well-defined understanding of programming experience and a way to measure it. From published comprehension experiments, we extracted questions that assess programming experience. In a controlled experiment, we compare the answers of computer-science students to these questions with their performance in solving program-comprehension tasks. We found that self estimation seems to be a reliable way to measure programming experience. Furthermore, we applied exploratory and confirmatory factor analyses to extract and evaluate a model of programming experience. With our analysis, we initiate a path toward validly and reliably measuring and describing programming experience to better understand and control its influence in program-comprehension experiments.

Sven Apel, Sergiy S. Kolesnikov, Norbert Siegmund, Christian Kästner, and Brady Garvin. Exploring Feature Interactions in the Wild: The New Feature-Interaction Challenge. In Proceedings of the 5th International Workshop on Feature-Oriented Software Development (FOSD), pages 1--8, New York, NY: ACM Press, October 2013. Acceptance rate: 75 % (6/8). [ .pdf, doi, bib ]

The feature-interaction problem has been keeping researchers and practitioners in suspense for years. Although there has been substantial progress in developing approaches for modeling, detecting, managing, and resolving feature interactions, we lack sufficient knowledge on the kind of feature interactions that occur in real-world systems. In this position paper, we set out the goal to explore the nature of feature interactions systematically and comprehensively, classified in terms of order and visibility. Understanding this nature will have significant implications on research in this area, for example, on the efficiency of interaction-detection or performance-prediction techniques. A set of preliminary results as well as a discussion of possible experimental setups and corresponding challenges give us confidence that this endeavor is within reach but requires a collaborative effort of the community.

Christian Kästner, Alexander Dreiling, and Klaus Ostermann. Variability Mining: Consistent Semiautomatic Detection of Product-Line Features. IEEE Transactions on Software Engineering (TSE), 40(1):67--82, 2014. [ .pdf, doi, http, bib ]

Software product line engineering is an efficient means to generate a set of tailored software products from a common implementation. However, adopting a product-line approach poses a major challenge and significant risks, since typically legacy code must be migrated toward a product line. Our aim is to lower the adoption barrier by providing semiautomatic tool support—called variability mining—to support developers in locating, documenting, and extracting implementations of product-line features from legacy code. Variability mining combines prior work on concern location, reverse engineering, and variability-aware type systems, but is tailored specifically for the use in product lines. Our work pursues three technical goals: (1) we provide a consistency indicator based on a variability-aware type system, (2) we mine features at a fine level of granularity, and (3) we exploit domain knowledge about the relationship between features when available. With a quantitative study, we demonstrate that variability mining can efficiently support developers in locating features.

Jörg Liebig, Alexander von Rhein, Christian Kästner, Sven Apel, Jens Dörre, and Christian Lengauer. Scalable Analysis of Variable Software. In Proceedings of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 81--91, New York, NY: ACM Press, August 2013. Acceptance rate: 20 % (51/251). [ .pdf, doi, http, bib ]

The advent of proper variability management and generator technology enables users to derive individual variants from a variable code base solely based on a selection of desired configuration options. This approach gives rise to a huge configuration space, but the high degree of variability comes at a cost: classic analysis methods do not scale any more; there are simply too many potential variants to analyze. To address this issue, researchers and practitioners usually apply sampling techniques—only a subset of all possible variants is analyzed. While sampling promises to reduce the analysis effort significantly, the information obtained is necessarily incomplete. Furthermore, it is unknown whether sampling strategies scale to billions of variants, because even samples may be huge and expensive to compute. Recently, researchers have begun to develop variability-aware analyses that analyze the variable code base directly with the goal to exploit the similarities among individual variants to reduce analysis effort. However, while being promising, so far, variability-aware analyses have been applied mostly only to small academic systems. To learn about the mutual strengths and weaknesses of variability-aware and sampling-based analyses of large-scale, real-world software systems, we compared the two by means of two concrete analysis implementations (type checking and liveness analysis) applied to three subject systems: the Busybox tool suite, the x86 Linux kernel, and the cryptographic library OpenSSL. A key result is that in these settings already setting up sampling techniques is challenging while variability-aware analysis even outperforms most sampling approximations with respect to analysis time.

Sven Apel, Don Batory, Christian Kästner, and Gunter Saake. Feature-Oriented Software Product Lines: Concepts and Implementation. Berlin/Heidelberg: Springer-Verlag, 2013. 308 pages, ISBN 978-3-642-37520-0. [ http, bib ]

While standardization has empowered the software industry to substantially scale software development and to provide affordable software to a broad market, it often does not address smaller market segments, nor the needs and wishes of individual customers. Software product lines reconcile mass production and standardization with mass customization in software engineering. Ideally, based on a set of reusable parts, a software manufacturer can generate a software product based on the requirements of its customer. The concept of features is central to achieving this level of automation, because features bridge the gap between the requirements the customer has and the functionality a product provides. Thus features are a central concept in all phases of product-line development. The authors take a developer’s viewpoint, focus on the development, maintenance, and implementation of product-line variability, and especially concentrate on automated product derivation based on a user’s feature selection. The book consists of three parts. Part I provides a general introduction to feature-oriented software product lines, describing the product-line approach and introducing the product-line development process with its two elements of domain and application engineering. The pivotal Part II covers a wide variety of implementation techniques including design patterns, frameworks, components, feature-oriented programming, and aspect-oriented programming, as well as tool-based approaches including preprocessors, build systems, version-control systems, and virtual separation of concerns. Finally, Part III is devoted to advanced topics related to feature-oriented product lines like refactoring, feature interaction, and analysis tools specific to product lines. In addition, an Appendix lists various helpful tools for software product-line development, along with a description of how they relate to the topics covered in this book. To tie the book together, the authors use two running examples that are well documented in the product-line literature: data management for embedded systems, and variations of graph data structures. They start every chapter by explicitly stating the respective learning goals and finish it with a set of exercises; additional teaching material is also available online. All these features make the book ideally suited for teaching – both for academic classes and for professionals interested in self-study.

Sven Apel, Alexander von Rhein, Thomas Thüm, and Christian Kästner. Feature-Interaction Detection based on Feature-Based Specifications. Computer Networks (COMNET), Special Issue on Feature Interaction, 57(12):2399--2409, August 2013. [ .pdf, doi, bib ]

Formal specification and verification techniques have been used successfully to detect feature interactions. We investigate whether feature-based specifications can be used for this task. Feature-based specifications are a special class of specifications that aim at modularity in open-world, feature-oriented systems. The question we address is whether modularity of specifications impairs the ability to detect feature interactions, which cut across feature boundaries. In an exploratory study on 10 feature-oriented systems, we found that the majority of feature interactions could be detected based on feature-based specifications, but some specifications have not been modularized properly and require undesirable workarounds to modularization. Based on the study, we discuss the merits and limitations of feature-based specifications, as well as open issues and perspectives. A goal that underlies our work is to raise awareness of the importance and challenges of feature-based specification.

Janet Siegmund, Christian Kästner, Sven Apel, André Brechmann, and Gunter Saake. Experience from Measuring Program Comprehension -- Toward a General Framework. In Proceedings of the Software Engineering 2013 -- Fachtagung des GI-Fachbereichs Softwaretechnik (SE), volume P-213 of Lecture Notes in Informatics, pages 239--257, Bonn, Germany: Gesellschaft für Informatik (GI), February 2013. [ .pdf, http, bib ]

Program comprehension plays a crucial role during the software-development life cycle: Maintenance programmers spend most of their time with comprehending source code, and maintenance is the main cost factor in software development. Thus, if we can improve program comprehension, we can save considerable amount of time and cost. To improve program comprehension, we have to measure it first. However, program comprehension is a complex, internal cognitive process that we cannot observe directly. Typically, we need to conduct controlled experiments to soundly measure program comprehension. However, empirical research is applied only reluctantly in software engineering. To close this gap, we set out to support researchers in planning and conducting experiments regarding program comprehension. We report our experience with experiments that we conducted and present the resulting framework to support researchers in planning and conducting experiments. Additionally, we discuss the role of teaching for the empirical researchers of tomorrow.

Paolo G. Giarrusso, Klaus Ostermann, Michael Eichberg, Ralf Mitschke, Tillmann Rendel, and Christian Kästner. Reify Your Collection Queries for Modularity and Speed! In Proceedings of the 12th ACM International Conference on Aspect-Oriented Software Development (AOSD), pages 1--12, New York, NY: ACM Press, March 2013. Acceptance rate: 24 % (17/72). [ .pdf, doi, bib ]

The collections API of a programming language forms an embedded domain-specific language to express queries and operations on collections. Unfortunately, the ordinary style of implementing such APIs does not allow automatic domain-specific analyses and optimizations such as fusion of collection traversals, usage of indexing, or reordering of filters. Performance-critical code using collections must instead be hand-optimized, leading to non-modular, brittle, and redundant code. We propose SQuOpt, the Scala Query Optimizer—a deep embedding of the Scala collections API that allows such analyses and optimizations to be defined and executed within Scala, with- out relying on external tools or compiler extensions. SQuOpt provides the same “look and feel” (syntax and static typing guar- antees) as the standard collections API. We evaluate SQuOpt by re-implementing several code analyses of the Findbugs tool using SQuOpt and demonstrate that SQuOpt can reconcile modularity and efficiency in real-world applications.

Leonardo Passos, Krzysztof Czarnecki, Sven Apel, Andrzej Wąsowski, Christian Kästner, and Jianmei Guo. Feature Oriented Software Evolution. In Proceedings of the 7th Int'l Workshop on Variability Modelling of Software-Intensive Systems (VaMoS), ISBN 978-1-4503-1541-8, pages 17:1--17:8, New York, NY: ACM Press, January 2013. Acceptance rate: 42 % (19/45). [ .pdf, doi, bib ]

Software product-line engineering aims at the development of families of related products that share common assets. An important aspect is that customers are often interested not only in particular functionalities (i.e., features), but also in non-functional quality attributes such as performance, reliability, and footprint. A naive approach is to measure quality attributes of every single product, and to deliver the products that fit the customers' needs. However, as product lines may consist of millions of products, this approach does not scale. In this research-in-progress report, we propose a systematic approach for the efficient and scalable prediction of quality attributes of products that consists of two steps. First, we generate predictors for certain categories of quality attributes (e.g., a predictor for low performance) based on software and network measures, and receiver operating characteristic analysis. Second, we use these predictors to guide a sampling process that takes the asset base of a product line as input and efficiently determines the products that fall into the category denoted by a given predictor (e.g., products with low performance). In other words, we use predictors to make the process of finding “acceptable” products more efficient. We discuss and compare several strategies to incorporate predictors in the sampling process.

Sergiy S. Kolesnikov, Sven Apel, Norbert Siegmund, Stefan Sobernig, Christian Kästner, and Semah Senkaya. Predicting Quality Attributes of Software Product Lines Using Software and Network Measures and Feature Sampling. In Proceedings of the 7th Int'l Workshop on Variability Modelling of Software-Intensive Systems (VaMoS), ISBN 978-1-4503-1541-8, pages 6:1--6:5, New York, NY: ACM Press, January 2013. Acceptance rate: 42 % (19/45). [ .pdf, doi, bib ]

Software product-line engineering aims at the development of families of related products that share common assets. An important aspect is that customers are often interested not only in particular functionalities (i.e., features), but also in non-functional quality attributes such as performance, reliability, and footprint. A naive approach is to measure quality attributes of every single product, and to deliver the products that fit the customers' needs. However, as product lines may consist of millions of products, this approach does not scale. In this research-in-progress report, we propose a systematic approach for the efficient and scalable prediction of quality attributes of products that consists of two steps. First, we generate predictors for certain categories of quality attributes (e.g., a predictor for low performance) based on software and network measures, and receiver operating characteristic analysis. Second, we use these predictors to guide a sampling process that takes the asset base of a product line as input and efficiently determines the products that fall into the category denoted by a given predictor (e.g., products with low performance). In other words, we use predictors to make the process of finding “acceptable” products more efficient. We discuss and compare several strategies to incorporate predictors in the sampling process.

Alexander von Rhein, Sven Apel, Christian Kästner, Thomas Thüm, and Ina Schaefer. The PLA Model: On the Combination of Product-Line Analyses. In Proceedings of the 7th Int'l Workshop on Variability Modelling of Software-Intensive Systems (VaMoS), ISBN 978-1-4503-1541-8, pages 14:1--14:8, New York, NY: ACM Press, January 2013. Acceptance rate: 42 % (19/45). [ .pdf, doi, bib ]

Product-line analysis has received considerable attention in the past. As it is often infeasible to analyze each product of a product line individually, researchers have developed analyses, called variability-aware analyses, that consider and exploit variability manifested in a code base. Variability-aware analyses are often significantly more efficient than traditional analyses, but each of them has certain weaknesses regarding applicability or scalability, as we discuss in this paper. We present the Product-Line-Analysis Model, a formal model for the classification and comparison of existing analyses, including traditional and variability-aware analyses, and lay a foundation for formulating and exploring further, combined analyses. As a proof of concept, we discuss different examples of analyses in the light of our model, and demonstrate its benefits for systematic comparison and exploration of product-line analyses.

Jörg Liebig, Alexander von Rhein, Christian Kästner, Sven Apel, Jens Dörre, and Christian Lengauer. Large-Scale Variability-Aware Type Checking and Dataflow Analysis. Technical Report MIP-1212, Passau, Germany: Department of Informatics and Mathematics, University of Passau, November 2012. [ .pdf, bib ]

Janet Siegmund, André Brechmann, Sven Apel, Christian Kästner, Jörg Liebig, Thomas Leich, and Gunter Saake. Toward Measuring Program Comprehension with Functional Magnetic Resonance Imaging. In Proceedings of the 20th International Symposium on Foundations of Software Engineering -- New Ideas Track (FSE-NIER), pages 24:1--24:4, November 2012. Acceptance rate: 20 % (12/59). [ .pdf, doi, bib ]

Program comprehension is an often evaluated, internal cognitive process. In neuroscience, functional magnetic resonance (fMRI) imaging is used to visualize such internal cognitive processes. We propose an experimental design to measure program comprehension based on fMRI. In the long run, we hope to answer questions like What distinguishes good programmers from bad programmers? or What makes a good programmer?

Christian Kästner, Alexander von Rhein, Sebastian Erdweg, Jonas Pusch, Sven Apel, Tillmann Rendel, and Klaus Ostermann. Toward Variability-Aware Testing. In Proceedings of the 4th International Workshop on Feature-Oriented Software Development (FOSD), ISBN 978-1-4503-1309-4, pages 1--8, New York, NY: ACM Press, September 2012. Acceptance rate: 57 % (8/14). [ .pdf, doi, bib ]

We investigate how to execute a unit test in all configurations of a product line without generating each product in isolation in a brute-force fashion. Learning from variability-aware analyses, we (a) design and implement a variability-aware interpreter and (b) reencode variability of the product line to simulate the test cases with a model checker. The interpreter internally reasons about variability, executing paths not affected by variability only once for the whole product line. The model checker achieves similar results by reusing powerful off-the-shelf analyses. We experimented with a prototype implementation for each strategy. We compare both strategies and discuss trade-offs and future directions.

Janet Siegmund, Christian Kästner, Jörg Liebig, and Sven Apel. Comparing Program Comprehension of Physically and Virtually Separated Concerns. In Proceedings of the 4th International Workshop on Feature-Oriented Software Development (FOSD), ISBN 978-1-4503-1309-4, pages 17--24, New York, NY: ACM Press, September 2012. Acceptance rate: 57 % (8/14). [ .pdf, doi, bib ]

It is common believe that separating source code along concerns or features improves program comprehension of source code. However, empirical evidence is mostly missing. In this paper, we design a controlled experiment to evaluate that believe for feature-oriented programming based on maintenance tasks with human participants. We validate our experiment with a pilot study, which already preliminarily confirms that students use different strategies to complete maintenance tasks.

Sebastian Erdweg, Tillmann Rendel, Christian Kästner, and Klaus Ostermann. Layout-Sensitive Generalized Parsing. In Proceedings of the International Conference on Software Language Engineering (SLE), ISBN 978-3-642-36088-6, pages 244--263, Berlin/Heidelberg: Springer-Verlag, September 2012. Acceptance rate: 32 % (20/62). [ .pdf, doi, bib ]

The theory of context-free languages is well-understood and context-free parsers can be used as off-the-shelf tools in practice. In particular, to use a context-free parser framework, a user does not need to understand its internals but can specify a language declaratively as a grammar. However, many languages in practice are not context-free. One particularly important class of such languages is layout-sensitive languages, in which the structure of code depends on indentation and whitespace. For example, Python, Haskell, F\#, and Markdown use indentation instead of curly braces to determine the block structure of code. Their parsers (and lexers) are not declaratively specified but hand-tuned to account for layout-sensitivity. To support declarative specifications of layout-sensitive languages, we propose a parsing framework in which a user can annotate layout in a grammar as constraints on the relative positioning of tokens in the parsed subtrees. For example, a user can declare that a block consists of statements that all start on the same column. We have integrated layout constraints into SDF and implemented a layout-sensitive generalized parser as an extension of generalized LR parsing. We evaluate the correctness and performance of our parser by parsing 33290 open-source Haskell files. Layout-sensitive generalized parsing is easy to use, and its performance overhead compared to layout-insensitive parsing is small enough for most practical applications.

Norbert Siegmund, Marko Rosenmüller, Christian Kästner, Paolo G. Giarrusso, Sven Apel, and Sergiy S. Kolesnikov. Scalable Prediction of Non-functional Properties in Software Product Lines: Footprint and Memory Consumption. Information and Software Technology (IST), Special Issue on Software Reuse and Product Lines, 55(3):491--507, March 2013. [ .pdf, doi, http, bib ]

Context: A software product line is a family of related software products, typically created from a set of common assets. Users select features to derive a product that fulfills their needs. Users often expect a product to have specific non-functional properties, such as a small footprint or a bounded response time. Because a product line may have an exponential number of products with respect to its features, it is usually not feasible to generate and measure non-functional properties for each possible product. Objective: Our overall goal is to derive optimal products with respect to non-functional requirements by showing customers which features must be selected. Method: We propose an approach to predict a product’s non-functional properties based on the product’s feature selection. We aggregate the influence of each selected feature on a non-functional property to predict a product’s properties. We generate and measure a small set of products and, by comparing measurements, we approximate each feature’s influence on the non-functional property in question. As a research method, we conducted controlled experiments and evaluated prediction accuracy for the non-functional properties footprint and main-memory consumption. But, in principle, our approach is applicable for all quantifiable non-functional properties. Results: With nine software product lines, we demonstrate that our approach predicts the footprint with an average accuracy of 94\,\%, and an accuracy of over 99\,\% on average if feature interactions are known. In a further series of experiments, we predicted main memory consumption of six customizable programs and achieved an accuracy of 89\,\% on average. Conclusion: Our experiments suggest that, with only few measurements, it is possible to accurately predict non-functional properties of products of a product line. Furthermore, we show how already little domain knowledge can improve predictions and discuss trade-offs between accuracy and required number of measurements. With this technique, we provide a basis for many reasoning and product-derivation approaches.

Paolo G. Giarrusso, Klaus Ostermann, Michael Eichberg, Tillmann Rendel, and Christian Kästner. Reifying and Optimizing Collection Queries for Modularity. In Proceedings of the 26th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 77--78, New York, NY: ACM Press, 2012. Poster. [ .pdf, bib ]

Christian Kästner, and Sven Apel. Feature-Oriented Software Development: A Short Tutorial on Feature-Oriented Programming, Virtual Separation of Concerns, and Variability-Aware Analysis. In GTTSE Summer School: Generative & Transformational Techniques in Software Engineering, volume 7680 of Lecture Notes in Computer Science, pages 346--382, Berlin/Heidelberg: Springer-Verlag, 2011. [ .pdf, http, bib ]

Feature-oriented software development is a paradigm for the construction, customization, and synthesis of large-scale and variable software systems, focusing on structure, reuse and variation. In this tutorial, we provide a gentle introduction to software product lines, feature oriented programming, virtual separation of concerns, and variability- aware analysis. We provide an overview, show connections between the different lines of research, and highlight possible future research directions.

Thomas Thüm, Christian Kästner, Fabian Benduhn, Jens Meinicke, Gunter Saake, and Thomas Leich. FeatureIDE: An Extensible Framework for Feature-Oriented Software Development. Science of Computer Programming (SCP), Special Issue on Experimental Software and Toolkits, 79:70--85, 2014. [ .pdf, doi, bib ]

FeatureIDE is an open-source framework for feature-oriented software development (FOSD) based on Eclipse. FOSD is a paradigm for the construction, customization, and synthesis of software systems. Code artifacts are mapped to features, and a customized software system can be generated given a selection of features. The set of software systems that can be generated is called a software product line (SPL). FeatureIDE supports several FOSD implementation techniques such as feature-oriented programming, aspect-oriented programming, delta-oriented programming, and preprocessors. All phases of FOSD are supported in FeatureIDE, namely domain analysis, requirements analysis, domain implementation, and software generation.

Christian Kästner, Klaus Ostermann, and Sebastian Erdweg. A Variability-Aware Module System. In Proceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 773--792, New York, NY: ACM Press, October 2012. Acceptance rate: 25 % (57/228). [ .pdf, doi, bib ]

Module systems enable a divide and conquer strategy to software development. To implement compile-time variability in software product lines, modules can be composed in different combinations. However, this way variability dictates a dominant decomposition. Instead, we introduce a variability-aware module system that supports compile-time variability inside a module and its interface. This way, each module can be considered a product line that can be type checked in isolation. Variability can crosscut multiple modules. The module system breaks with the antimodular tradition of a global variability model in product-line development and provides a path toward software ecosystems and product lines of product lines developed in an open fashion. We discuss the design and implementation of such a module system on a core calculus and provide an implementation for C, which we use to type check the open source product line Busybox with 811 compile-time options.

Christian Kästner, Klaus Ostermann, and Sebastian Erdweg. A Variability-Aware Module System. Technical Report 01/2012, Marburg, Germany: Department of Mathematics and Computer Science, Philipps University Marburg, April 2012. [ .pdf, bib ]

Module systems enable a divide and conquer strategy to software development. To implement compile-time variability in software product lines, modules can be composed in different combinations. However, this way variability dictates a dominant decomposition. Instead, we introduce a variability-aware module system that supports compile-time variability inside a module and its interface. This way, each module can be considered a product line that can be type checked in isolation. Variability can crosscut multiple modules. The module system breaks with the antimodular tradition of a global variability model in product-line development and provides a path toward software ecosystems and product lines of product lines developed in an open fashion. We discuss the design and implementation of such a module system on a core calculus and provide an implementation for C, which we use to type check the open source product line Busybox with 811 compile-time options.

Thomas Thüm, Sven Apel, Christian Kästner, Martin Kuhlemann, Ina Schaefer, and Gunter Saake. Analysis Strategies for Software Product Lines. Technical Report FIN-2012-04, Magdeburg, Germany: University of Magdeburg, April 2012. [ .pdf, bib ]

Software-product-line engineering has gained considerable momentum in recent years, both in industry and in academia. A software product line is a set of software products that share a common set of features. Software product lines challenge traditional analysis techniques, such as type checking, testing, and formal verification, in their quest of ensuring correctness and reliability of software. Simply creating and analyzing all products of a product line is usually not feasible, due to the potentially exponential number of valid feature combinations. Recently, researchers began to develop analysis techniques that take the distinguishing properties of software product lines into account, for example, by checking feature-related code in isolation or by exploiting variability information during analysis. The emerging field of product-line analysis techniques is both broad and diverse such that it is difficult for researchers and practitioners to understand their similarities and differences (e.g., with regard to variability awareness or scalability), which hinders systematic research and application. We classify the corpus of existing and ongoing work in this field, we compare techniques based on our classification, and we infer a research agenda. A short-term benefit of our endeavor is that our classification can guide research in product-line analysis and, to this end, make it more systematic and efficient. A long-term goal is to empower developers to choose the right analysis technique for their needs out of a pool of techniques with different strengths and weaknesses.

Janet Feigenspan, Christian Kästner, Sven Apel, Jörg Liebig, Michael Schulze, Raimund Dachselt, Maria Papendieck, Thomas Leich, and Gunter Saake. Do Background Colors Improve Program Comprehension in the #ifdef Hell? Empirical Software Engineering (EMSE), 18(4):699--745, 2012. [ .pdf, doi, http, bib ]

Software-product-line engineering aims at the development of variable and reusable software systems. In practice, software product lines are often implemented with preprocessors. Preprocessor directives are easy to use, and many mature tools are available for practitioners. However, preprocessor directives have been heavily criticized in academia and even referred to as “#ifdef hell”, because they introduce threats to program comprehension and correctness. There are many voices that suggest to use other implementation techniques instead, but these voices ignore the fact that a transition from preprocessors to other languages and tools is tedious, erroneous, and expensive in practice. Instead, we and others propose to increase the readability of preprocessor directives by using background colors to highlight source code annotated with ifdef directives. In three controlled experiments with over 70 subjects in total, we evaluate whether and how background colors improve program comprehension in preprocessor-based implementations. Our results demonstrate that background colors have the potential to improve program comprehension, independently of size and programming language of the underlying product. Additionally, we found that subjects generally favor background colors. We integrate these and other findings in a tool called FeatureCommander, which facilitates program comprehension in practice and which can serve as a basis for further research.

Janet Feigenspan, Michael Schulze, Maria Papendieck, Christian Kästner, Raimund Dachselt, Veit Köppen, Mathias Frisch, and Gunter Saake. Supporting Program Comprehension in Large Preprocessor-Based Software Product Lines. IET Software, 6(6):488--501, December 2012. [ .pdf, doi, bib ]

Background: Software product line engineering provides an effective mechanism to implement variable software. However, the usage of preprocessors to realize variability, which is typical in industry, is heavily criticized, because it often leads to obfuscated code. Using background colours to highlight preprocessor statements to support comprehensibility has shown effective, however, scalability to large software product lines (SPLs) is questionable. Aim: Our goal is to implement and evaluate scalable usage of background colours for industrial-sized SPLs. Method: We designed and implemented scalable concepts in a tool called FeatureCommander. To evaluate its effectiveness, we conducted a controlled experiment with a large real-world SPL with over 99,000 lines of code and 340 features. We used a within-subjects design with treatments colours and no colours. We compared correctness and response time of tasks for both treatments. Results: For certain kinds of tasks, background colours improve program comprehension. Furthermore, subjects generally favour background colours compared to no background colours. Additionally, subjects who worked with background colours had to use the search functions less frequently. Conclusion: We show that background colours can improve program comprehension in large SPLs. Based on these encouraging results, we will continue our work on improving program comprehension in large SPLs.

Janet Feigenspan, Christian Kästner, Jörg Liebig, Sven Apel, and Stefan Hanenberg. Measuring Programming Experience. In Proceedings of the 20th International Conference on Program Comprehension (ICPC), pages 73--82, Los Alamitos, CA: IEEE Computer Society, 2012. Acceptance rate: 41 % (21/51). Most Influencial Paper Award at ICPC'22. [ .pdf, bib ]

Programming experience is an important confounding parameter in controlled experiments regarding program comprehension. In literature, ways to measure or control programming experience vary. Often, researchers neglect it or do not specify how they controlled it. We set out to find a well-defined understanding of programming experience and a way to measure it. From published comprehension experiments, we extracted questions that assess programming experience. In a controlled experiment, we compare the answers of 128 students to these questions with their performance in solving program-comprehension tasks. We found that self estimation seems to be a reliable way to measure programming experience. Furthermore, we applied exploratory factor analysis to extract a model of programming experience. With our analysis, we initiate a path toward measuring programming experience with a valid and reliable tool, so that we can control its influence on program comprehension.

Christian Kästner. Virtual Separation of Concerns: Toward Preprocessors 2.0. Information Technology (it), 54(1):42--46, 2012. [ .pdf, doi, bib ]

Norbert Siegmund, Sergiy S. Kolesnikov, Christian Kästner, Sven Apel, Don Batory, Marko Rosenmüller, and Gunter Saake. Predicting Performance via Automated Feature-Interaction Detection. In Proceedings of the 34th International Conference on Software Engineering (ICSE), ISBN 978-1-4673-1067-3, pages 167--177, Los Alamitos, CA: IEEE Computer Society, 2012. Acceptance rate: 21 % (87/408). [ .pdf, bib ]

Customizable programs and program families provide user-selectable features to tailor a program to an application scenario. Knowing in advance which feature selection yields the best performance is difficult because a direct measurement of all possible feature combinations is infeasible. Our work aims at predicting program performance based on selected features. The challenge is predicting performance accurately when features interact. An interaction occurs when a feature combination has an unexpected influence on performance. We present a method that automatically detects performance feature interactions to improve prediction accuracy. To this end, we propose three heuristics to reduce the number of measurements required to detect interactions. Our evaluation consists of six real-world case studies from varying domains (e.g. databases, compression libraries, and web server) using different configuration techniques (e.g., configuration files and preprocessor flags). Results show, on average, a prediction accuracy of 95 %.

Sven Apel, Christian Kästner, and Christian Lengauer. Language-Independent and Automated Software Composition: The FeatureHouse Experience. IEEE Transactions on Software Engineering (TSE), 39(1):63--79, 2013. [ .pdf, http, bib ]

Superimposition is a composition technique that has been applied successfully in many areas of software development. Although superimposition is a general-purpose concept, it has been (re)invented and implemented individually for various kinds of software artifacts. We unify languages and tools that rely on superimposition by using the language-independent model of feature structure trees (FSTs). On the basis of the FST model, we propose a general approach to the composition of software artifacts written in different languages. Furthermore, we offer a supporting framework and tool chain, called FeatureHouse. We use attribute grammars to automate the integration of additional languages. In particular, we have integrated Java, C#, C, Haskell, Alloy, and JavaCC. A substantial number of case studies demonstrate the practicality and scalability of our approach and reveal insights into the properties that a language must have in order to be ready for superimposition. We discuss perspectives of our approach and demonstrate how we extended FeatureHouse with support for XML languages (in particular, XHTML, XMI/UML, and Ant) and alternative composition approaches (in particular, aspect weaving). Rounding off our previous work, we provide here a holistic view of the FeatureHouse approach based on rich experience with numerous languages and case studies and reflections on several years of research.

Mario Pukall, Christian Kästner, Walter Cazzola, Sebastian Götz, Alexander Grebhahn, Reimar Schröter, and Gunter Saake. JavAdaptor: Flexible Runtime Updates of Java Applications. Software: Practice and Experience (SPE), 43(2):153--185, February 2013. [ .pdf, doi, http, bib ]

Software is changed frequently during its life cycle. New requirements come and bugs must be fixed. To update an application it usually must be stopped, patched, and restarted. This causes time periods of unavailability which is always a problem for highly available applications. Even for the development of complex applications restarts to test new program parts can be time consuming and annoying. Thus, we aim at dynamic software updates to update programs at runtime. There is a large body of research on dynamic software updates, but so far, existing approaches have shortcomings either in terms of flexibility or performance. In addition, some of them depend on specific runtime environments and dictate the program’s architecture. We present JavAdaptor, the first runtime update approach based on Java that (a) offers flexible dynamic software updates, (b) is platform independent, (c) introduces only minimal performance overhead, and (d) does not dictate the program architecture. JavAdaptor combines schema changing class replacements by class renaming and caller updates with Java HotSwap using containers and proxies. It runs on top of all major standard Java virtual machines. We evaluate our approach’s applicability and performance in non-trivial case studies and compare it to existing dynamic software update approaches.

Christian Kästner, Alexander Dreiling, and Klaus Ostermann. Variability Mining with LEADT. Technical Report 01/2011, Marburg, Germany: Department of Mathematics and Computer Science, Philipps University Marburg, September 2011. [ .pdf, http, bib ]

Software product line engineering is an efficient means to generate a set of tailored software products from a common implementation. However, adopting a product-line approach poses a major challenge and significant risks, since typically legacy code must be migrated toward a product line. Our aim is to lower the adoption barrier by providing semiautomatic tool support—called variability mining—to support developers in locating, documenting, and extracting implementations of product-line features from legacy code. Variability mining combines prior work on concern location, reverse engineering, and variability-aware type systems, but is tailored specifically for the use in product lines. Our work extends prior work in three important aspects: (1) we provide a consistency indicator based on a variability-aware type system, (2) we mine features at a fine level of granularity, and (3) we exploit domain knowledge about the relationship between features when available. With a quantitative study, we demonstrate that variability mining can efficiently support developers in locating features.

Martin Kuhlemann, Christian Kästner, Sven Apel, and Gunter Saake. An Algebra for Refactoring and Feature-Oriented Programming. Technical Report FIN-2011-06, Magdeburg, Germany: University of Magdeburg, September 2011. [ .pdf, bib ]

Sebastian Erdweg, Lennart C.L. Kats, Tillmann Rendel, Christian Kästner, Klaus Ostermann, and Eelco Visser. Growing a Language Environment with Editor Libraries. In Proceedings of the 10th ACM International Conference on Generative Programming and Component Engineering (GPCE), ISBN 978-1-4503-0689-8, pages 167--176, New York, NY: ACM Press, October 2011. Acceptance rate: 31 % (18/58). [ .pdf, doi, bib ]

Large software projects consist of code written in a multitude of different (possibly domain-specific) languages, which are often deeply interspersed even in single files. While many proposals exist on how to integrate languages semantically and syntactically, the question of how to support this scenario in integrated development environments (IDEs) remains open: How can standard IDE services, such as syntax highlighting, outlining, or reference resolving, be provided in an extensible and compositional way, such that an open mix of languages is supported in a single file? Based on our library-based syntactic extension language for Java, SugarJ, we propose to make IDEs extensible by organizing editor services in editor libraries. Editor libraries are libraries written in the object language, SugarJ, and hence activated and composed through regular import statements on a file-by-file basis. We have implemented an IDE for editor libraries on top of SugarJ and the Eclipse-based Spoofax language workbench. We have validated editor libraries by evolving this IDE into a fully-fledged and schema-aware XML editor as well as an extensible Latex editor, which we used for writing this paper.

Christian Kästner, Sven Apel, and Klaus Ostermann. The Road to Feature Modularity? In Proceedings of the 3rd International Workshop on Feature-Oriented Software Development (FOSD), ISBN 978-1-4503-0789-5, pages 5:1--5:8, New York, NY: ACM Press, September 2011. [ .pdf, doi, bib ]

Modularity of feature representations has been a long standing goal of feature-oriented software development. While some researchers regard feature modules and corresponding composition mechanisms as a modular solution, other researchers have challenged the notion of feature modularity and pointed out that most feature-oriented implementation mechanisms lack proper interfaces and support neither modular type checking nor separate compilation. We step back and reflect on the feature-modularity discussion. We distinguish two notions of modularity, cohesion without interfaces and information hiding with interfaces, and point out the different expectations that, we believe, are the root of many heated discussions. We discuss whether feature interfaces should be desired and weigh their potential benefits and costs, specifically regarding crosscutting, granularity, feature interactions, and the distinction between closed-world and open-world reasoning. Because existing evidence for and against feature modularity and feature interfaces is shaky and inconclusive, more research is needed, for which we outline possible directions.

Janet Feigenspan, Maria Papendieck, Christian Kästner, Mathias Frisch, and Raimund Dachselt. FeatureCommander: Colorful #ifdef World. In Proceedings of the 15th International Software Product Line Conference (SPLC), second volume (Demonstration) (SPLC), ISBN 978-1-4503-0789-5, pages 48:1--48:2, New York, NY: ACM Press, September 2011. [ .pdf, doi, bib ]

Christian Kästner, Paolo G. Giarrusso, Tillmann Rendel, Sebastian Erdweg, Klaus Ostermann, and Thorsten Berger. Variability-Aware Parsing in the Presence of Lexical Macros and Conditional Compilation. In Proceedings of the 26th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), ISBN 978-1-4503-0940-0, pages 805--824, New York, NY: ACM Press, October 2011. Acceptance rate: 37 % (61/166). [ .pdf, doi, bib ]

In many projects, lexical preprocessors are used to manage different variants of the project (using conditional compilation) and to define compile-time code transformations (using macros). Unfortunately, while being a simply way to implement variability, conditional compilation and lexical macros hinder automatic analysis, even though such analysis would be urgently needed to combat variability-induced complexity. To analyze code with its variability, we need to parse it without preprocessing it. However, current parsing solutions use heuristics, support only a subset of the language, or suffer from exponential explosion. As part of the TypeChef project, we contribute a novel variability-aware parser that can parse unpreprocessed code without heuristics in practicable time. Beyond the obvious task of detecting syntax errors, our parser paves the road for further analysis, such as variability-aware type checking. We implement variabilityaware parsers for Java and GNU C and demonstrate practicability by parsing the product line MobileMedia and the entire X86 architecture of the Linux kernel with 6065 variable features.

Sebastian Erdweg, Lennart C.L. Kats, Tillmann Rendel, Christian Kästner, Klaus Ostermann, and Eelco Visser. SugarJ: Library-Based Language Extensibility. In Proceedings of the 26th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), ISBN 978-1-4503-0942-4, pages 187--188, New York, NY: ACM Press, 2011. Poster. [ doi, bib ]

Sebastian Erdweg, Lennart C.L. Kats, Tillmann Rendel, Christian Kästner, Klaus Ostermann, Lennart C.L. Kats, and Eelco Visser. Library-Based Model-Driven Software Development with SugarJ. In Proceedings of the 26th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), ISBN 978-1-4503-0942-4, pages 17--18, New York, NY: ACM Press, 2011. Demonstration paper. [ doi, bib ]

Sebastian Erdweg, Tillmann Rendel, Christian Kästner, and Klaus Ostermann. SugarJ: Library-based Syntactic Language Extensibility. In Proceedings of the 26th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), ISBN 978-1-4503-0940-0, pages 391--406, New York, NY: ACM Press, October 2011. Acceptance rate: 37 % (61/166). Distinguished Paper Award in 2011 and Most Influencial Paper Award in 2021. [ .pdf, doi, bib ]

Existing approaches to extend a programming language with syntactic sugar often leave a bitter taste, because they cannot be used with the same ease as the main extension mechanism of the programming language—libraries. Sugar libraries are a novel approach for syntactically extending a programming language within the language. A sugar library is like an ordinary library, but can, in addition, export syntactic sugar for using the library. Sugar libraries maintain the composability and scoping properties of ordinary libraries and are hence particularly well-suited for embedding a multitude of domain-specific languages into a host language. They also inherit the self-applicability of libraries, which means that the syntax extension mechanism can be applied in the definition of sugar libraries themselves. To demonstrate the expressiveness and applicability of sugar libraries, we have developed SugarJ, a language on top of Java, SDF and Stratego that supports syntactic extensibility. SugarJ employs a novel incremental parsing mechanism that allows changing the syntax within a source file. We demonstrate SugarJ by five language extensions, including embeddings of XML and closures in Java, all available as sugar libraries. We illustrate the utility of self-applicability by embedding XML Schema, a metalanguage to define XML languages.

Sven Apel, Jörg Liebig, Benjamin Brandl, Christian Lengauer, and Christian Kästner. Semistructured Merge: Rethinking Merge in Revision Control Systems. In Proceedings of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 190--200, New York, NY: ACM Press, September 2011. Acceptance rate: 17 % (34/203). [ .pdf, bib ]

An ongoing problem in revision control systems is how to resolve conflicts in a merge of independently developed revisions. Unstructured revision control systems are purely text-based and solve conflicts based on textual similarity. Structured revision control systems are tailored to specific languages and use language-specific knowledge for conflict resolution. We propose semistructured revision control systems that inherit the strengths of both classes of systems: generality and expressiveness. The idea is to provide structural information of the underlying software artifacts—declaratively, in the form of annotated grammars. This way, a wide variety of languages can be supported and the information provided can assist the automatic resolution of two classes of conflicts: ordering conflicts and semantic conflicts. The former can be resolved independently of the language and the latter can be resolved using specific conflict handlers supplied by the user. We have been developing a tool that supports semistructured merge and conducted an empirical study on 24 software projects developed in Java, C#, and Python comprising 180 merge scenarios. We found that semistructured merge reduces the number of conflicts in 60 % of the sample merge scenarios by, on average, 34 %. Our study reveals that renaming is challenging in that it can significantly increase the number of conflicts during semistructured merge, which we discuss.

Norbert Siegmund, Marko Rosenmüller, Martin Kuhlemann, Christian Kästner, Sven Apel, and Gunter Saake. SPL Conqueror: Toward Optimization of Non-functional Properties in Software Product Lines. Software Quality Journal (SQJ), Special Issue on Quality Engineering for Software Product Lines, 20(3):487--517, 2011. [ .pdf, doi, http, bib ]

A software product line (SPL) is a family of related programs of a domain. The programs of an SPL are distinguished in terms of features, which are end-uservisible characteristics of programs. Based on a selection of features, stakeholders can derive tailor-made programs that satisfy functional requirements. Besides functional requirements, different application scenarios raise the need for optimizing non-functional properties of a variant. The diversity of application scenarios leads to heterogeneous optimization goals with respect to non-functional properties (e.g., performance vs. footprint vs. energy optimized variants). Hence, an SPL has to satisfy different and sometimes contradicting requirements regarding non-functional properties. Usually, the actually required non-functional properties are not known before product derivation and can vary for each application scenario and customer. Allowing stakeholders to derive optimized variants requires to measure non-functional properties after the SPL is developed. Unfortunately, the high variability provided by SPLs complicates measurement and optimization of non-functional properties due to a large variant space. With SPL Conqueror, we provide a holistic approach to optimize non-functional properties in SPL engineering. We show how non-functional properties can be qualitatively specified and quantitatively measured in the context of SPLs. Furthermore, we discuss the variant-derivation process in SPL Conqueror that reduces the effort of computing an optimal variant. We demonstrate the applicability of our approach by means of nine case studies of a broad range of application domains (e.g., database management and operating systems). Moreover, we show that SPL Conqueror is implementation and language independent by using SPLs that are implemented with different mechanisms, such as conditional compilation and feature-oriented programming.

Ateeq Khan, Christian Kästner, Veit Köppen, and Gunter Saake. Service Variability Patterns. In Proceedings of the ER Workshop on Software Variability Management (Variability@ER), volume 6999 of Lecture Notes in Computer Science, pages 130--140, Berlin/Heidelberg: Springer-Verlag, 2011. [ http, bib ]

Christian Kästner. Virtuelle Trennung von Belangen. In Ausgezeichnete Informatikdissertationen 2010, ISBN 9783885794158, pages 121--130, Bonn, Germany: Gesellschaft für Informatik (GI), 2011. Invited paper. [ .pdf, bib ]

Bedingte Kompilierung ist ein einfaches und häufig benutztes Mittel zur Implementierung von Variabilität in Softwareproduktlinien, welches aber aufgrund negativer Auswirkungen auf Codequalität und Wartbarkeit stark kritisiert wird. Wir zeigen wie Werkzeugunterstützung – Sichten, Visualisierung, kontrollierte Annotationen, Produktlinien-Typsystem – die wesentlichen Probleme beheben kann und viele Vorteile einer modularen Entwicklung emuliert. Wir bieten damit eine Alternative zur klassischen Trennung von Belangen mittels Modulen. Statt Quelltext notwendigerweise in Dateien zu separieren erzielen wir eine virtuelle Trennung von Belangen durch entsprechender Werkzeugunterstüzung.

Janet Feigenspan, Sven Apel, Jörg Liebig, and Christian Kästner. Exploring Software Measures to Assess Program Comprehension. In Proceedings of the 5th International Symposium on Empirical Software Engineering and Measurement (ESEM), pages 1--10, paper 3, Los Alamitos, CA: IEEE Computer Society, September 2011. Acceptance rate: 31 % (33/105). [ .pdf, bib ]

Software measures are often used to assess program comprehension, although their applicability is discussed controversially. Often, their application is based on plausibility arguments, which however is not sufficient to decide whether and how software measures are good predictors for program comprehension. Our goal is to evaluate whether and how software measures and program comprehension correlate. To this end, we carefully designed an experiment. We used four different measures that are often used to judge the quality of source code: complexity, lines of code, concern attributes, and concern operations. We measured how subjects understood two comparable software systems that differ in their implementation, such that one implementation promised considerable benefits in terms of better software measures. We did not observe a difference in program comprehension of our subjects as the software measures suggested it. To explore how software measures and program comprehension could correlate, we used several variants of computing the software measures. This brought them closer to our observed result, however, not as close as to confirm a relationship between software measures and program comprehension. Having failed to establish a relationship, we present our findings as an open issue to the community and initiate a discussion on the role of software measures as comprehensibility predictors.

Thomas Thüm, Christian Kästner, Sebastian Erdweg, and Norbert Siegmund. Abstract Features in Feature Modeling. In Proceedings of the 15th International Software Product Line Conference (SPLC), pages 191--200, Los Alamitos, CA: IEEE Computer Society, August 2011. Acceptance rate: 29 % (20/69). [ .pdf, bib ]

A software product line is a set of program variants, typically generated from a common code base. Feature models describe variability in product lines by documenting features and their valid combinations. In product-line engineering, we need to reason about variability and program variants for many different tasks. For example, given a feature model, we might want to determine the number of all valid feature combinations or detect specific feature combinations for testing. However, we found that contemporary reasoning approaches can only reason about feature combinations, not about program variants, because they do not take abstract features into account. Abstract features are features used to structure a feature model that, however, do not have any impact at implementation level. Using existing feature-model reasoning mechanisms for product variants leads to incorrect results. We raise awareness of the problem of abstract features for different kinds of analyses on feature models. We argue that, in order to reason about program variants, abstract features should be made explicit in feature models. We present a technique based on propositional formulas to reason about program variants. In practice, our technique can save effort that is caused by considering the same program variant multiple times, for example, in product-line testing.

Norbert Siegmund, Marko Rosenmüller, Christian Kästner, Paolo G. Giarrusso, Sven Apel, and Sergiy S. Kolesnikov. Scalable Prediction of Non-functional Properties in Software Product Lines. In Proceedings of the 15th International Software Product Line Conference (SPLC), pages 160--169, Los Alamitos, CA: IEEE Computer Society, August 2011. Acceptance rate: 29 % (20/69). Best Paper Award. [ .pdf, bib ]

A software product line (SPL) is a family of related software products, from which users can derive a product that fulfills their needs. Often, users expect a product to have specific non-functional properties, for example, to not exceed a footprint limit or to respond in a given time frame. Unfortunately, it is usually not feasible to generate and measure non-functional properties for each possible product of an SPL in isolation, because an SPL can contain millions of products. Hence, we propose an approach to estimate each product's non-functional properties in advance, based on the product's configuration. To this end, we approximate non-functional properties per features and per feature interaction. We generate and measure a small set of products and approximated non-functional properties by comparing the measurements. Our approach is implementation independent and language independent. We present three different approaches with different trade-offs regarding accuracy and required number of measurements. With nine case studies, we demonstrate that our approach can predict non-functional properties with an accuracy of 2\%.

Sven Apel, Florian Heidenreich, Christian Kästner, and Marko Rosenmüller. Third International Workshop on Feature-Oriented Software Development (FOSD 2011). In Proceedings of the 15th International Software Product Line Conference (SPLC), pages 337--338, Los Alamitos, CA: IEEE Computer Society, August 2011. [ .pdf, http, bib ]

Klaus Ostermann, Paolo G. Giarrusso, Christian Kästner, and Tillmann Rendel. Revisiting Information Hiding: Reflections on Classical and Nonclassical Modularity. In Proceedings of the 25th European Conference on Object-Oriented Programming (ECOOP), volume 6813 of Lecture Notes in Computer Science, pages 155--178, Berlin/Heidelberg: Springer-Verlag, 2011. Acceptance rate: 26 % (26/100). [ .pdf, doi, epub, bib ]

What is modularity? Which kind of modularity should developers strive for? Despite decades of research on modularity, these basic questions have no definite answer. We submit that the common understanding of modularity, and in particular its notion of information hiding, is deeply rooted in classical logic. We analyze how classical modularity, based on classical logic, fails to address the needs of developers of large software systems, and encourage researchers to explore alternative visions of modularity, based on nonclassical logics, and henceforth called nonclassical modularity.

Janet Feigenspan, Michael Schulze, Maria Papendieck, Christian Kästner, Raimund Dachselt, Veit Köppen, and Mathias Frisch. Using Background Colors to Support Program Comprehension in Software Product Lines. In Proceedings of the 15th International Conference on Evaluation and Assessment in Software Engineering (EASE), pages 66--75, Institution of Engineering and Technology, 2011. Acceptance rate: 40 % (20/50). [ .pdf, bib ]

Background: Software product line engineering provides an effective mechanism to implement variable software. However, the usage of preprocessors, which is typical in industry, is heavily criticized, because it often leads to obfuscated code. Using background colors to support comprehensibility has shown effective, however, scalability to large software product lines (SPLs) is questionable. Aim: Our goal is to implement and evaluate scalable usage of background colors for industrial-sized SPLs. Method: We designed and implemented scalable concepts in a tool called FeatureCommander. To evaluate its effectiveness, we conducted a controlled experiment with a large real-world SPL with over 160,000 lines of code and 340 features. We used a within-subjects design with treatments colors and no colors. We compared correctness and response time of tasks for both treatments. Results: For certain kinds of tasks, background colors improve program comprehension. Furthermore, subjects generally favor background colors. Conclusion: We show that background colors can improve program comprehension in large SPLs. Based on these encouraging results, we will continue our work improving program comprehension in large SPLs.

Michael Stengel, Janet Feigenspan, Mathias Frisch, Christian Kästner, Sven Apel, and Raimund Dachselt. View Infinity: A Zoomable Interface for Feature-Oriented Software Development. In Proceedings of the 33rd International Conference on Software Engineering (Demonstration Track) (ICSE), ISBN 978-1-4503-0445-0, pages 1031--1033, New York, NY: ACM Press, 2011. Acceptance rate: 37 % (22/60). [ .pdf, acm, doi, bib ]

Mario Pukall, Alexander Grebhahn, Reimar Schröter, Christian Kästner, Walter Cazzola, and Sebastian Götz. JavaAdaptor: Unrestricted Dynamic Software Updates for Java. In Proceedings of the 33rd International Conference on Software Engineering (Demonstration Track) (ICSE), ISBN 978-1-4503-0445-0, pages 989--991, New York, NY: ACM Press, 2011. Acceptance rate: 37 % (22/60). [ .pdf, acm, doi, bib ]

Christian Kästner, Sven Apel, Thomas Thüm, and Gunter Saake. Type Checking Annotation-Based Product Lines. ACM Transactions on Software Engineering and Methodology (TOSEM), 21(3):Article 14, 2012. [ .pdf, doi, epub, bib ]

Software-product-line engineering is an efficient means to generate a family of program variants for a domain from a single code base. However, because of the potentially high number of possible program variants, it is difficult to test them all and ensure properties like type safety for the entire product line. We present a product-line–aware type system that can type check an entire software product line without generating each variant in isolation. Specifically, we extend the Featherweight Java calculus with feature annotations for product-line development and prove formally that all program variants generated from a well-typed product line are well-typed. Furthermore, we present a solution to the problem of typing mutually exclusive features. We discuss how results from our formalization helped implementing our own product-line tool CIDE for full Java and report of experience with detecting type errors in four existing software-product-line implementations.

Jörg Liebig, Christian Kästner, and Sven Apel. Analyzing the Discipline of Preprocessor Annotations in 30 Million Lines of C Code. In Proceedings of the 10th ACM International Conference on Aspect-Oriented Software Development (AOSD), pages 191--202, New York, NY: ACM Press, March 2011. Acceptance rate: 23 % (21/92). [ .pdf, acm, bib ]

The C preprocessor cpp is a widely used tool for implementing variable software. It enables programmers to express variable code of features that may crosscut the entire implementation with conditional compilation. The C preprocessor relies on simple text processing and is independent of the host language (C, C++, Java, and so on). Language independent text processing is powerful and expressive|programmers can make all kinds of annotations in the form of #ifdefs but can render unpreprocessed code difficult to process automatically by tools, such as code aspect refactoring, concern management, and also static analysis and variability-aware type checking. We distinguish between disciplined annotations, which align with the underlying source-code structure, and undisciplined annotations, which do not align with the structure and hence complicate tool development. This distinction raises the question of how frequently programmers use undisciplined annotations and whether it is feasible to change them to disciplined annotations to simplify tool development and to enable programmers to use a wide variety of tools in the first place. By means of an analysis of 40 mediumsized to large-sized C programs, we show empirically that programmers use cpp mostly in a disciplined way: about 85 % of all annotations respect the underlying source-code structure. Furthermore, we analyze the remaining undisciplined annotations, identify patterns, and discuss how to transform them into a disciplined form.

Christian Kästner, Paolo G. Giarrusso, and Klaus Ostermann. Partial Preprocessing C Code for Variability Analysis. In Proceedings of the 5th Int'l Workshop on Variability Modelling of Software-Intensive Systems (VaMoS), ISBN 978-1-4503-0570-9, pages 137--140, New York, NY: ACM Press, January 2011. Acceptance rate: 55 % (21/38). [ .pdf, acm, bib ]

The C preprocessor is commonly used to implement variability. Given a feature selection, code fragments can be excluded from compilation with #ifdef and similar directives. However, the token-based nature of the C preprocessor makes variability implementation difficult and errorprone. Additionally, variability mechanisms are intertwined with macro definitions, macro expansion, and file inclusion. To determine whether a code fragment is compiled, the entire file must be preprocessed. We present a partial preprocessor that preprocesses file inclusion and macro expansion, but retains variability information for further analysis. We describe the mechanisms of the partial preprocessor, provide a full implementation, and present some initial experimental results. The partial preprocessor is part of a larger endeavor in the TypeChef project to check variability implementations (syntactic correctness, type correctness) in C projects such as the Linux kernel.

Sven Apel, Don Batory, Krzysztof Czarnecki, Florian Heidenreich, Christian Kästner, and Oscar Nierstrasz, editors. Proceedings of the Second International Workshop on Feature-Oriented Software Development (FOSD), October 10, 2010, Eindhoven, The Netherlands. New York, NY: ACM Press, October 2010. [ .pdf, http, bib ]

Andy Kenner, Christian Kästner, Steffen Haase, and Thomas Leich. TypeChef: Toward Type Checking #ifdef Variability in C. In Proceedings of the 2nd International Workshop on Feature-Oriented Software Development (FOSD), ISBN 978-1-4503-0208-1, pages 25--32, New York, NY: ACM Press, October 2010. Acceptance rate: 55 % (11/20). [ .pdf, acm, bib ]

Software product lines have gained momentum as an approach to generate many variants of a program, each tailored to a specific use case, from a common code base. However, the implementation of product lines raises new challenges, as potentially millions of program variants are developed in parallel. In prior work, we and others have developed product-line–aware type systems to detect type errors in a product line, without generating all variants. With TypeChef, we build a similar type checker for product lines written in C that implements variability with #ifdef directives of the C preprocessor. However, a product-line–aware type system for C is more difficult than expected due to several peculiarities of the preprocessor, including lexical macros and unrestricted use of #ifdef directives. In this paper, we describe the problems faced and our progress to solve them with TypeChef. Although TypeChef is still under development and cannot yet process arbitrary C code, we demonstrate its capabilities so far with a case study: By type checking the open-source web server Boa with potentially 2^110 variants, we found type errors in several variants.

Sven Apel, Wolfgang Scholz, Christian Lengauer, and Christian Kästner. Language-Independent Reference Checking in Software Product Lines. In Proceedings of the 2nd International Workshop on Feature-Oriented Software Development (FOSD), ISBN 978-1-4503-0208-1, pages 64--71, New York, NY: ACM Press, October 2010. Acceptance rate: 55 % (11/20). [ .pdf, acm, bib ]

Feature-Oriented Software Development (FOSD) is a paradigm for the development of software product lines. A challenge in FOSD is to guarantee that all software systems of a software product line are correct. Recent work on type checking product lines can provide a guarantee of type correctness without generating all possible systems. We generalize previous results by abstracting from the specifics of particular programming languages. In a first attempt, we present a reference-checking algorithm that performs key tasks of product-line type checking independently of the target programming language. Experiments with two sample product lines written in Java and C are encouraging and give us confidence that this approach is promising.

Sven Apel, Sergiy S. Kolesnikov, Jörg Liebig, Christian Kästner, Martin Kuhlemann, and Thomas Leich. Access Control in Feature-Oriented Programming. Science of Computer Programming (SCP), Special Issue on Feature-Oriented Software Development, 77(3):174--187, March 2012. [ .pdf, doi, bib ]

In feature-oriented programming (FOP) a programmer decomposes a program in terms of features. Ideally, features are implemented modularly so that they can be developed in isolation. Access control is an important ingredient to attain feature modularity as it provides mechanisms to hide and expose internal details of a module's implementation. But developers of contemporary feature-oriented languages have not considered access control mechanisms so far. The absence of a well-defined access control model for FOP breaks encapsulation of feature code and leads to unexpected program behaviors and inadvertent type errors. We raise awareness of this problem, propose three feature-oriented access modifiers, and present a corresponding access modifier model. We offer an implementation of the model on the basis of a fully-fledged feature-oriented compiler. Finally, by analyzing ten feature-oriented programs, we explore the potential of feature-oriented modifiers in FOP.

Sven Apel, Wolfgang Scholz, Christian Lengauer, and Christian Kästner. Dependences and Interactions in Feature-Oriented Design. In Proceedings of the 21st IEEE International Symposium on Software Reliability Engineering (ISSRE), pages 161--170, Los Alamitos, CA: IEEE Computer Society, October 2010. Acceptance rate: 31 % (40/130). [ .pdf, bib ]

Feature-oriented software development (FOSD) aims at the construction, customization, and synthesis of large-scale software systems. We propose a novel software design paradigm, called feature-oriented design, which takes the distinguishing properties of FOSD into account, especially the clean and consistent mapping between features and their implementations as well as the tendency of features to interact inadvertently. We extend the lightweight modeling language Alloy with support for feature-oriented design and call the extension FeatureAlloy. By means of an implementation and four case studies, we demonstrate how feature-oriented design with FeatureAlloy facilitates separation of concerns, variability, and reuse of models of individual features and helps in defining and detecting semantic dependences and interactions between features.

Sandro Schulze, Sven Apel, and Christian Kästner. Code Clones in Feature-Oriented Software Product Lines. In Proceedings of the 9th ACM International Conference on Generative Programming and Component Engineering (GPCE), pages 103--112, New York, NY: ACM Press, October 2010. Acceptance rate: 31 % (18/59). [ .pdf, acm, bib ]

Some limitations of object-oriented mechanisms are known to cause code clones (e.g., extension using inheritance). Novel programming paradigms such as feature-oriented programming (FOP) aim at alleviating these limitations. However, it is an open issue whether FOP is really able to avoid code clones or whether it even facilitates (FOP-specific) clones. To address this issue, we conduct an empirical analysis on ten feature-oriented software product lines with respect to code cloning. We found that there is a considerable amount of clones in feature-oriented software product lines and that a large fraction of these clones is FOP-specific (i.e., caused by limitations of feature-oriented mechanisms). Based on our results, we initiate a discussion on the reasons for FOP-specific clones and on how to cope with them. We exemplary show how such clones can be removed by the application of refactoring.

Christian Kästner. Virtual Separation of Concerns: Toward Preprocessors 2.0. PhD thesis, Magdeburg, Germany: University of Magdeburg, May 2010. Logos Verlag Berlin, isbn 978-3-8325-2527-9. [ .pdf, http, bib ]

Conditional compilation with preprocessors such as cpp is a simple but effective means to implement variability. By annotating code fragments with #ifdef and #endif directives, different program variants with or without these annotated fragments can be created, which can be used (among others) to implement software product lines. Although, such annotation-based approaches are frequently used in practice, researchers often criticize them for their negative effect on code quality and maintainability. In contrast to modularized implementations such as components or aspects, annotation-based implementations typically neglect separation of concerns, can entirely obfuscate the source code, and are prone to introduce subtle errors. Our goal is to rehabilitate annotation-based approaches by showing how tool support can address these problems. With views, we emulate modularity; with a visual representation of annotations, we reduce source code obfuscation and increase program comprehension; and with disciplined annotations and a product-line–aware type system, we prevent or detect syntax and type errors in the entire software product line. At the same time we emphasize unique benefits of annotations, including simplicity, expressiveness, and being language independent. All in all, we provide tool-based separation of concerns without necessarily dividing source code into physically separated modules; we name this approach virtual separation of concerns. We argue that with these improvements over contemporary preprocessors, virtual separation of concerns can compete with modularized implementation mechanisms. Despite our focus on annotation-based approaches, we do intend not give a definite answer on how to implement software product lines. Modular implementations and annotation-based implementations both have their advantages; we even present an integration and migration path between them. Our goal is to rehabilitate preprocessors and show that they are not a lost cause as many researchers think. On the contrary, we argue that – with the presented improvements – annotation-based approaches are a serious alternative for product-line implementation.

Janet Feigenspan, Christian Kästner, Mathias Frisch, Raimund Dachselt, and Sven Apel. Visual Support for Understanding Product Lines. In Proceedings of the 18th International Conference on Program Comprehension (ICPC), ISBN 978-1-4244-7604-6, pages 34--35, Los Alamitos, CA: IEEE Computer Society, 2010. Demonstration paper. [ .pdf, doi, bib ]

The C preprocessor is often used in practice to implement variability in software product lines. Using #ifdef statements provokes problems such as obfuscated source code, yet they will still be used in practice at least in the medium-term future. With CIDE, we demonstrate a tool to improve understanding and maintaining code that contains #ifdef statements by visualizing them with colors and providing different views on the code.

Sven Apel, Christian Lengauer, Bernhard Möller, and Christian Kästner. An Algebraic Foundation for Automatic Feature-Based Program Synthesis. Science of Computer Programming (SCP), 75(11):1022--1047, November 2010. [ .pdf, doi, bib ]

Feature-Oriented Software Development (FOSD) provides a multitude of formalisms, methods, languages, and tools for building variable, customizable, and extensible software. Along different lines of research, different notions of a feature have been developed. Although these notions have similar goals, no common basis for evaluation, comparison, and integration exists. We present a feature algebra that captures the key ideas of feature orientation and provides a common ground for current and future research in this field, in which also alternative options can be explored. Furthermore, our algebraic framework is meant to serve as a basis for the upcoming development paradigms automatic feature-based program synthesis and architectural metaprogramming.

Sven Apel, Christian Kästner, Armin Größlinger, and Christian Lengauer. Type Safety for Feature-Oriented Product Lines. Automated Software Engineering -- An International Journal (AUSE), 17(3):251--300, 2010. [ .pdf, doi, http, bib ]

A feature-oriented product line is a family of programs that share a common set of features. A feature implements a stakeholder's requirement and represents a design decision or configuration option. When added to a program, a feature involves the introduction of new structures, such as classes and methods, and the refinement of existing ones, such as extending methods. A feature-oriented decomposition enables a generator to create an executable program by composing feature code solely on the basis of the feature selection of a user – no other information needed. A key challenge of product line engineering is to guarantee that only well-typed programs are generated. As the number of valid feature combinations grows combinatorially with the number of features, it is not feasible to type check all programs individually. The only feasible approach is to have a type system check the entire code base of the feature-oriented product line. We have developed such a type system on the basis of a formal model of a feature-oriented Java-like language. The type system guaranties type safety for feature-oriented product lines. That is, it ensures that every valid program of a well-typed product line is well-typed. Our formal model including type system is sound and complete.

Jörg Liebig, Sven Apel, Christian Lengauer, Christian Kästner, and Michael Schulze. An Analysis of the Variability in Forty Preprocessor-Based Software Product Lines. In Proceedings of the 32nd International Conference on Software Engineering (ICSE), pages 105--114, New York, NY: ACM Press, May 2010. Acceptance rate: 14 % (52/380). [ .pdf, acm, doi, bib ]

Over 30 years ago, the preprocessor cpp was developed to extend the programming language C by lightweight metaprogramming capabilities. Despite its error-proneness and low abstraction level, the cpp is still widely being used in presentday software projects to implement variable software. However, not much is known about how the cpp is employed to implement variability. To address this issue, we have analyzed forty open-source software projects written in C. Specifically, we answer the following questions: How does program size influence variability? How complex are extensions made via cpp's variability mechanisms? At which level of granularity are extensions applied? What is the general type of extensions? These questions revive earlier discussions on understanding and refactoring of the preprocessor. To answer them, we introduce several metrics measuring the variability, complexity, granularity, and type of extensions. Based on the data obtained, we suggest alternative implementation techniques. The data we have collected can influence other research areas, such as language design and tool support.

Sven Apel, Jörg Liebig, Christian Lengauer, Christian Kästner, and William R. Cook. Semistructured Merge in Revision Control Systems. In Proceedings of the 4th Int'l Workshop on Variability Modelling of Software-Intensive Systems (VaMoS), pages 13--20, Essen, Germany: University of Duisburg-Essen, January 2010. [ .pdf, bib ]

Revision control systems are a major means to manage versions and variants of today's software systems. An ongoing problem in these systems is how to resolve conflicts when merging independently developed revisions. Unstructured revision control systems are purely text-based and solve conflicts based on textual similarity. Structured revision control systems are tailored to specific languages and use language-specific knowledge for conflict resolution. We propose semistructured revision control systems to inherit the strengths of both classes of systems: generality and expressiveness. The idea is to provide structural information of the underlying software artifacts in the form of annotated grammars, which is motivated by recent work on software product lines. This way, a wide variety of languages can be supported and the information provided can assist the resolution of conflicts. We have implemented a preliminary tool and report on our experience with merging Java artifacts. We believe that drawing a connection between revision control systems and product lines has benefits for both fields.

Christian Kästner, Sven Apel, and Gunter Saake. Virtuelle Trennung von Belangen (Präprozessor 2.0). In Proceedings of the Software Engineering 2010 -- Fachtagung des GI-Fachbereichs Softwaretechnik (SE), volume P-159 of Lecture Notes in Informatics, pages 165--176, Bonn, Germany: Gesellschaft für Informatik (GI), February 2010. Acceptance rate: 36 % (17/47). [ .pdf, bib ]

Bedingte Kompilierung mit Präprozessoren wie cpp ist ein einfaches, aber wirksames Mittel zur Implementierung von Variabilität in Softwareproduktlinien. Durch das Annotieren von Code-Fragmenten mit #ifdef und #endif können verschiedene Programmvarianten mit oder ohne diesen Fragmenten generiert werden. Obwohl Präprozessoren häufig in der Praxis verwendet werden, werden sie oft für ihre negativen Auswirkungen auf Codequalität und Wartbarkeit kritisiert. Im Gegensatz zu modularen Implementierungen, etwa mit Komponenten oder Aspekte, vernachlässigen Präprozessoren die Trennung von Belangen im Quelltext, sind anfällig für subtile Fehler und verschlechtern die Lesbarkeit des Quellcodes. Wir zeigen, wie einfache Werkzeugunterstützung diese Probleme adressieren und zum Teil beheben bzw. die Vorteile einer modularen Implementierung emulieren kann. Gleichzeitig zeigen wir Vorteile von Präprozessoren wie Einfachheit und Sprachunabhängigkeit auf.

Martin Kuhlemann, Christian Kästner, and Sven Apel. Reducing Code Replication in Delegation-Based Java Programs. In Java Software and Embedded Systems, ISBN 978-1-60741-661-6, pages 171--183, Hauppauge, NY: Nova Science Publishers, Inc., 2010. [ http, bib ]

Mario Pukall, Christian Kästner, Sebastian Götz, Walter Cazzola, and Gunter Saake. Flexible Runtime Program Adaptations in Java -- A Comparison. Technical Report FIN-2009-14, Magdeburg, Germany: University of Magdeburg, November 2009. [ .pdf, bib ]

Sven Apel, William R. Cook, Krzysztof Czarnecki, Christian Kästner, Neil Loughran, and Oscar Nierstrasz, editors. Proceedings of the First International Workshop on Feature-Oriented Software Development (FOSD), October 6, 2009, Denver, Colorado, USA. New York, NY: ACM Press, October 2009. [ http, bib ]

Janet Feigenspan, Christian Kästner, Sven Apel, and Thomas Leich. How to Compare Program Comprehension in FOSD Empirically -- An Experience Report. In Proceedings of the 1st International Workshop on Feature-Oriented Software Development (FOSD), ISBN 978-1-60558-567-3, pages 55--62, New York, NY: ACM Press, October 2009. [ .pdf, doi, bib ]

There are many different implementation approaches to realize the vision of feature oriented software development, ranging from simple preprocessors, over feature-oriented programming, to sophisticated aspect-oriented mechanisms. Their impact on readability and maintainability (or program comprehension in general) has caused a debate among researchers, but sound empirical results are missing. We report experience from our endeavor to conduct experiments to measure the influence of different implementation mechanisms on program comprehension. We describe how to design such experiments and report from possibilities and pitfalls we encountered. Finally, we present some early results of our first experiment on comparing CPP with CIDE.

Sven Apel, Jörg Liebig, Christian Kästner, Martin Kuhlemann, and Thomas Leich. An Orthogonal Access Modifier Model for Feature-Oriented Programming. In Proceedings of the 1st International Workshop on Feature-Oriented Software Development (FOSD), ISBN 978-1-60558-567-3, pages 27--34, New York, NY: ACM Press, October 2009. [ .pdf, doi, bib ]

In feature-oriented programming (FOP), a programmer decomposes a program in terms of features. Ideally, features are implemented modularly so that they can be developed in isolation. Access control is an important ingredient to attain feature modularity as it provides mechanisms to hide and expose internal details of a module's implementation. But developers of contemporary feature-oriented languages did not consider access control mechanisms so far. The absence of a well-defined access control model for FOP breaks the encapsulation of feature code and leads to unexpected and undefined program behaviors as well as inadvertent type errors, as we will demonstrate. The reason for these problems is that common object-oriented modifiers, typically provided by the base language, are not expressive enough for FOP and interact in subtle ways with feature-oriented language mechanisms. We raise awareness of this problem, propose three feature-oriented modifiers for access control, and present an orthogonal access modifier model.

Christian Kästner, and Sven Apel. Virtual Separation of Concerns -- A Second Chance for Preprocessors. Journal of Object Technology (JOT), 8(6):59--78, September 2009. Refereed Column. [ .pdf, http, bib ]

Conditional compilation with preprocessors like cpp is a simple but effective means to implement variability. By annotating code fragments with #ifdef and #endif directives, different program variants with or without these fragments can be created, which can be used (among others) to implement software product lines. Although, preprocessors are frequently used in practice, they are often criticized for their negative effect on code quality and maintainability. In contrast to modularized implementations, for example using components or aspects, preprocessors neglect separation of concerns, are prone to introduce subtle errors, can entirely obfuscate the source code, and limit reuse. Our aim is to rehabilitate the preprocessor by showing how simple tool support can address these problems and emulate some benefits of modularized implementations. At the same time we emphasize unique benefits of preprocessors, like simplicity and language independence. Although we do not have a definitive answer on how to implement variability, we want highlight opportunities to improve preprocessors and encourage research toward novel preprocessor-based approaches.

Christian Kästner, Sven Apel, and Martin Kuhlemann. A Model of Refactoring Physically and Virtually Separated Features. In Proceedings of the 8th ACM International Conference on Generative Programming and Component Engineering (GPCE), ISBN 978-1-60558-828-5, pages 157--166, New York, NY: ACM Press, October 2009. Acceptance rate: 31 % (19/62). [ .pdf, acm, doi, bib ]

Physical separation with class refinements and method refinements à la AHEAD and virtual separation using annotations à la #ifdef or CIDE are two competing groups of implementation approaches for software product lines with complementary advantages. Although both groups have been mainly discussed in isolation, we strive for an integration to leverage the respective advantages. In this paper, we provide the basis for such an integration by providing a model that supports both, physical and virtual separation, and by describing refactorings in both directions. We prove the refactorings complete, such that every virtually separated product line can be automatically transformed into a physically separated one (replacing annotations by refinements) and vice versa. To demonstrate the feasibility of our approach, we have implemented the refactorings in our tool CIDE and conducted four case studies.

Martin Kuhlemann, Don Batory, and Christian Kästner. Safe Composition of Non-Monotonic Features. In Proceedings of the 8th ACM International Conference on Generative Programming and Component Engineering (GPCE), ISBN 978-1-60558-828-5, pages 177--185, New York, NY: ACM Press, October 2009. Acceptance rate: 31 % (19/62). [ acm, doi, bib ]

Programs can be composed from features. We want to verify automatically that all legal combinations of features can be composed safely without errors. Prior work on this problem assumed that features add code monotonically. We generalize prior work to enable features to both add and remove code, describe our analyses and implementation, and review case studies. We observe that more expressive features can increase the complexity of developed programs rapidly – up to the point where automated concepts as presented in this paper are not a helpful tool but a necessity for verification.

Sven Apel, and Christian Kästner. An Overview of Feature-Oriented Software Development. Journal of Object Technology (JOT), 8(5):49--84, July/August 2009. Refereed Column. [ .pdf, http, bib ]

Feature-oriented software development (FOSD) is a paradigm for the construction, customization, and synthesis of large-scale software systems. In this survey, we give an overview and a personal perspective on the roots of FOSD, connections to other software development paradigms, and recent developments in this field. Our aim is to point to connections between different lines of research and to identify open issues.

Sven Apel, Christian Kästner, Armin Größlinger, and Christian Lengauer. Type-Safe Feature-Oriented Product Lines. Technical Report MIP-0909, Passau, Germany: Department of Informatics and Mathematics, University of Passau, June 2009. [ .pdf, arXiv, bib ]

Christian Kästner, Sven Apel, and Martin Kuhlemann. LJ^AR: A Model of Refactoring Physically and Virtually Separated Features. Technical Report FIN-2009-08, Magdeburg, Germany: University of Magdeburg, May 2009. [ .pdf, bib ]

Christian Kästner, Sven Apel, Syed Saif ur Rahman, Marko Rosenmüller, Don Batory, and Gunter Saake. On the Impact of the Optional Feature Problem: Analysis and Case Studies. In Proceedings of the 13rd International Software Product Line Conference (SPLC), ISBN 978-0-9786956-2-0, pages 181--190, Pittsburgh, PA: SEI, August 2009. Acceptance rate: 36 % (30/83). [ .pdf, bib ]

A software product-line is a family of related programs that are distinguished in terms of features. A feature implements a stakeholders' requirement. Different program variants specified by distinct feature selections are produced from a common code base. The optional feature problem describes a common mismatch between variability intended in the domain and dependencies in the implementation. When this occurs, some variants that are valid in the domain cannot be produced due to implementation issues. There are many different solutions to the optional feature problem, but they all suffer from drawbacks such as reduced variability, increased development effort, reduced efficiency, or reduced source code quality. In this paper, we examine the impact of the optional feature problem in two case studies in the domain of embedded database systems, and we survey different state-of-the-art solutions and their trade-offs. Our intension is to raise awareness of the problem, to guide developers in selecting an appropriate solution for their product-line project, and to identify opportunities for future research.

Christian Kästner, Sven Apel, and Gunter Saake. Sichere Produktlinien: Herausforderungen für Syntax- und Typ-Prüfungen. In Proceedings of the 26. Workshop der GI-Fachgruppe Programmiersprachen und Rechenkonzepte (), pages 37--38, Kiel, Germany: University of Kiel, May 2009. [ http, bib ]

Friedrich Steimann, Thomas Pawlitzki, Sven Apel, and Christian Kästner. Types and Modularity for Implicit Invocation with Implicit Announcement. ACM Transactions on Software Engineering and Methodology (TOSEM), 20(1):Article 1; 43 pages, June 2010. [ .pdf, acm, doi, bib ]

Through implicit invocation, procedures are called without explicitly referencing them. Implicit announcement adds to this implicitness by not only keeping implicit which procedures are called, but also where or when – under implicit invocation with implicit announcement, the call site contains no signs of that, or what it calls. Recently, aspect-oriented programming has popularized implicit invocation with implicit announcement as a possibility to separate concerns that lead to interwoven code if conventional programming techniques are used. However, as has been noted elsewhere, as currently implemented it establishes strong implicit dependencies between components, hampering independent software development and evolution. To address this problem, we present a type-based modularization of implicit invocation with implicit announcement that is inspired by how interfaces and exceptions are realized in JAVA. By extending an existing compiler and by rewriting several programs to make use of our proposed language constructs, we found that the imposed declaration clutter tends to be moderate; in particular, we found that for general applications of implicit invocation with implicit announcement, fears that programs utilizing our form of modularization become unreasonably verbose are unjustified.

Sven Apel, Florian Janda, Salvador Trujillo, and Christian Kästner. Model Superimposition in Software Product Lines. In Proceedings of the 2nd International Conference on Model Transformation (ICMT), ISBN 978-3-642-02407-8, pages 4--19, Berlin/Heidelberg: Springer-Verlag, June 2009. Acceptance rate: 21 % (14/67). [ .pdf, doi, http, bib ]

In software product line engineering, feature composition generates software tailored to specific requirements from a common set of artifacts. Superimposition is a popular technique to merge code pieces belonging to different features. The advent of model-driven development raises the question of how to support the variability of software product lines in modeling techniques. We propose to use superimposition as a model composition technique in order to support variability. We analyze the feasibility of superimposition as a model composition technique, offer a corresponding tool for model composition, and discuss our experiences with three case studies (including one industrial study) using this tool.

Sven Apel, Christian Kästner, Armin Größlinger, and Christian Lengauer. Feature (De)composition in Functional Programming. In Proceedings of the 8th International Conference on Software Composition (SC) (SC), ISBN 978-3-642-02654-6, pages 9--26, Berlin/Heidelberg: Springer-Verlag, July 2009. Acceptance rate: 33 % (10/30). [ .pdf, doi, http, bib ]

The separation of concerns is a fundamental principle in software engineering. Crosscutting concerns are concerns that do not align with hierarchical and block decomposition supported by mainstream programming languages. In the past, crosscutting concerns have been studied mainly in the context of object orientation. Feature orientation is a novel programming paradigm that supports the (de)composition of crosscutting concerns in a system with a hierarchical block structure. By means of two case studies we explore the problem of crosscutting concerns in functional programming and propose two solutions based on feature orientation.

Stefan Boxleitner, Sven Apel, and Christian Kästner. Language-Independent Quantification and Weaving for Feature Composition. In Proceedings of the 8th International Conference on Software Composition (SC) (SC), ISBN 978-3-642-02654-6, pages 45--54, Berlin/Heidelberg: Springer-Verlag, July 2009. Acceptance rate: 33 % (10/30). Short Paper. [ .pdf, doi, http, bib ]

Based on a general model of feature composition, we present a composition language that enables programmers by means of quantification and weaving to formulate extensions to programs written in different languages. We explore the design space of composition languages that rely on quantification and weaving and discuss our choices. We outline a tool that extends an existing infrastructure for feature composition and discuss results of three initial case studies.

Christian Kästner, Sven Apel, Salvador Trujillo, Martin Kuhlemann, and Don Batory. Guaranteeing Syntactic Correctness for all Product Line Variants: A Language-Independent Approach. In Proceedings of the 47th International Conference Objects, Models, Components, Patterns (TOOLS EUROPE), volume 33 of Lecture Notes in Business Information Processing, pages 175--194, Berlin/Heidelberg: Springer-Verlag, June 2009. Acceptance rate: 28 % (19/67). [ .pdf, doi, http, bib ]

A software product line (SPL) is a family of related program variants in a well-defined domain, generated from a set of features. A fundamental difference from classical application development is that engineers develop not a single program but a whole family with hundreds to millions of variants. This makes it infeasible to separately check every distinct variant for errors. Still engineers want guarantees on the entire SPL. A further challenge is that an SPL may contain artifacts in different languages (code, documentation, models, etc.) that should be checked. In this paper, we present CIDE, an SPL development tool that guarantees syntactic correctness for all variants of an SPL. We show how CIDE's underlying mechanism abstracts from textual representation and we generalize it to arbitrary languages. Furthermore, we automate the generation of safe plug-ins for additional languages from annotated grammars. To demonstrate the language-independent capabilities, we applied CIDE to a series of case studies with artifacts written in Java, C++, C, Haskell, ANTLR, HTML, and XML.

Christian Kästner, Thomas Thüm, Gunter Saake, Janet Feigenspan, Thomas Leich, Fabian Wielgorz, and Sven Apel. FeatureIDE: Tool Framework for Feature-Oriented Software Development. In Proceedings of the 31st International Conference on Software Engineering (ICSE), ISBN 978-1-4244-3452-7, pages 611--614, Los Alamitos, CA: IEEE Computer Society, May 2009. Acceptance rate: 33 % (24/72). Formal Demonstration paper. [ .pdf, bib ]

Tools support is crucial for the acceptance of a new programming language. However, providing such tool support is a huge investment that can usually not be provided for a research language. With FeatureIDE, we have built an IDE for AHEAD that integrates all phases of featureoriented software development. To reuse this investment for other tools and languages, we refactored FeatureIDE into an open source framework that encapsulates the common ideas of feature-oriented software development and that can be reused and extended beyond AHEAD. Among others, we implemented extensions for FeatureC++ and FeatureHouse, but in general, FeatureIDE is open for everybody to showcase new research results and make them usable to a wide audience of students, researchers, and practitioners.

Marko Rosenmüller, Christian Kästner, Norbert Siegmund, Sagar Sunkle, Sven Apel, Thomas Leich, and Gunter Saake. SQL à la Carte -- Toward Tailor-made Data Management. In Proceedings of the 13. GI-Fachtagung Datenbanksysteme für Business, Technologie und Web (BTW), ISBN 978-3-88579-238-3, pages 117--136, Bonn, Germany: Gesellschaft für Informatik (GI), March 2009. [ .pdf, http, bib ]

The size of the structured query language (SQL) continuously increases. Extensions of SQL for special domains like stream processing or sensor networks come with own extensions, more or less unrelated to the standard. In general, underlying DBMS support only a subset of SQL plus vendor specific extensions. In this paper, we analyze application domains where special SQL dialects are needed or are already in use. We show how SQL can be decomposed to create an extensible family of SQL dialects. Concrete dialects, e.g., a dialect for web databases, can be generated from such a family by choosing SQL features à la carte. A family of SQL dialects simplifies analysis of the standard when deriving a concrete dialect, makes it easy to understand parts of the standard, and eases extension for new application domains. It is also the starting point for developing tailor-made data management solutions that support only a subset of SQL. We outline how such customizable DBMS can be developed and what benefits, e.g., improved maintainability and performance, we can expect from this.

Norbert Siegmund, Christian Kästner, Marko Rosenmüller, Florian Heidenreich, Sven Apel, and Gunter Saake. Bridging the Gap between Variability in Client Application and Database Schema. In Proceedings of the 13. GI-Fachtagung Datenbanksysteme für Business, Technologie und Web (BTW), ISBN 978-3-88579-238-3, pages 297--306, Bonn, Germany: Gesellschaft für Informatik (GI), March 2009. [ .pdf, http, bib ]

Database schemas are used to describe the logical design of a database. Diverse groups of users have different perspectives on the schema which leads to different local schemas. Research has focused on view integration to generate a global, consistent schema out of different local schemas or views. However, this approach seems to be too constrained when the generated global view should be variable and only a certain subset is needed. Variable schemas are needed in software product lines in which products are tailored to the needs of stakeholders. We claim that traditional modeling techniques are not sufficient for expressing a variable database schema. We show that software product line methodologies, when applied to the database schemas, overcome existing limitations and allow the generation of tailor-made database schemas.

Sven Apel, Christian Kästner, and Christian Lengauer. Vergleich und Integration von Komposition und Annotation zur Implementierung von Produktlinien. In Proceedings of the Software Engineering 2009 -- Fachtagung des GI-Fachbereichs Softwaretechnik (SE), volume P-143 of Lecture Notes in Informatics, pages 101--112, Bonn, Germany: Gesellschaft für Informatik (GI), March 2009. [ .pdf, http, bib ]

Es gibt eine Vielzahl sehr unterschiedlicher Techniken, Sprachen und Werkzeuge zur Entwicklung von Softwareproduktlinien. Trotzdem liegen gemeinsame Mechanismen zu Grunde, die eine Klassifikation in Kompositions- und Annotationsansatz erlauben. Während der Kompositionsansatz in der Forschung große Beachtung findet, kommt im industriellen Umfeld hauptsächlich der Annotationsansatz zur Anwendung. Wir analysieren und vergleichen beide Ansätze anhand von drei repräsentativen Vertretern und identifizieren anhand von sechs Kriterien individuelle Stärken und Schwächen. Wir stellen fest, dass die jeweiligen Stärken und Schwächen komplementär sind. Aus diesem Grund schlagen wir die Integration des Kompositions- und Annotationsansatzes vor, um so die Vorteile beider zu vereinen, dem Entwickler eine breiteres Spektrum an Implementierungsmechanismen zu Verfügung zu stellen und die Einführung von Produktlinientechnologie in bestehende Softwareprojekte zu erleichtern.

Sven Apel, Christian Kästner, Armin Größlinger, and Christian Lengauer. On Feature Orientation and Functional Programming. Technical Report MIP-0806, Passau, Germany: Department of Informatics and Mathematics, University of Passau, November 2008. [ .pdf, bib ]

Thomas Thüm, Don Batory, and Christian Kästner. Reasoning about Edits to Feature Models. In Proceedings of the 31st International Conference on Software Engineering (ICSE), ISBN 978-1-4244-3452-7, pages 254--264, Los Alamitos, CA: IEEE Computer Society, May 2009. Acceptance rate: 12 % (50/405). [ .pdf, bib ]

Features express the variabilities and commonalities among programs in a software product line (SPL). A feature model defines the valid combinations of features, where each combination corresponds to a program in an SPL. SPLs and their feature models evolve over time. We classify the evolution of a feature model via modifications as refactorings, specializations, generalizations, or arbitrary edits. We present an algorithm to reason about feature model edits to help designers determine how the program membership of an SPL has changed. Our algorithm takes two feature models as input (before and after edit versions), where the set of features in both models are not necessarily the same, and it automatically computes the change classification. Our algorithm is able to give examples of added or deleted products and efficiently classifies edits to even large models that have thousands of features.

Sven Apel, Christian Kästner, and Christian Lengauer. FeatureHouse: Language-Independent, Automated Software Composition. In Proceedings of the 31st International Conference on Software Engineering (ICSE), ISBN 978-1-4244-3452-7, pages 221--231, Los Alamitos, CA: IEEE Computer Society, May 2009. Acceptance rate: 12 % (50/405). [ .pdf, bib ]

Superimposition is a composition technique that has been applied successfully in many areas of software development. Although superimposition is a general-purpose concept, it has been (re)invented and implemented individually for various kinds of software artifacts. We unify languages and tools that rely on superimposition by using the language-independent model of feature structure trees (FSTs). On the basis of the FST model, we propose a general approach to the composition of software artifacts written in different languages, Furthermore, we offer a supporting framework and tool chain, called FEATUREHOUSE. We use attribute grammars to automate the integration of additional languages, in particular, we have integrated Java, C#, C, Haskell, JavaCC, and XML. Several case studies demonstrate the practicality and scalability of our approach and reveal insights into the properties a language must have in order to be ready for superimposition.

Christian Kästner, and Sven Apel. Integrating Compositional and Annotative Approaches for Product Line Engineering. In Proceedings of the GPCE Workshop on Modularization, Composition and Generative Techniques for Product Line Engineering (McGPLE), pages 35--40, Passau, Germany: Department of Informatics and Mathematics, University of Passau, October 2008. [ .pdf, bib ]

Software product lines can be implemented with many different approaches. However, there are common underlying mechanisms which allow a classification into compositional and annotative approaches. While research focuses mainly on composition approaches like aspect- or feature-oriented programming because those support feature traceability and modularity, in practice annotative approaches like preprocessors are common as they are easier to adopt. In this paper, we compare both groups of approaches and find complementary strengths. We propose an integration of compositional and annotative approaches to combine advantages, increase flexibility for the developer, and ease adoption.

Marko Rosenmüller, Norbert Siegmund, Syed Saif ur Rahman, and Christian Kästner. Modeling Dependent Software Product Lines. In Proceedings of the GPCE Workshop on Modularization, Composition and Generative Techniques for Product Line Engineering (McGPLE), pages 13--18, Passau, Germany: Department of Informatics and Mathematics, University of Passau, October 2008. [ .pdf, bib ]

Software product line development is a mature technique to implement similar programs tailored to serve the needs of multiple users while providing a high degree of reuse. This approach also scales for larger product lines that use smaller product lines to fulfill special tasks. In such compositions of SPLs, the interacting product lines depend on each other and programs generated from these product lines have to be correctly configured to ensure correct communication between them. Constraints between product lines can be used to allow only valid combinations of generated programs. This, however, is not sufficient if multiple instances of one product line are involved. In this paper we present an approach that uses UML and OO concepts to model compositions of SPLs. The model extends the approach of constraints between SPLs to constraints between instances of SPLs and integrates SPL specialization. Based on this model we apply a feature-oriented approach to simplify the configuration of complete SPL compositions.

Norbert Siegmund, Marko Rosenmüller, Martin Kuhlemann, Christian Kästner, and Gunter Saake. Measuring Non-functional Properties in Software Product Lines for Product Derivation. In Proceedings of the 15th Asia-Pacific Software Engineering Conference (APSEC), ISBN 978-0-7695-3446-6, pages 187--194, Los Alamitos, CA: IEEE Computer Society, December 2008. Acceptance rate: 30 % (66/221). [ .pdf, bib ]

Software product lines (SPLs) enable stakeholders to derive different software products for a domain while providing a high degree of reuse of their code units. Software products are derived in a configuration process by combining different code units. This configuration process becomes complex if SPLs contain hundreds of features. In many cases, a stakeholder is not only interested in functional but also in resulting non-functional properties of a desired product. Because SPLs can be used in different application scenarios alternative implementations of already existing functionality are developed to meet special nonfunctional requirements, like restricted binary size and performance guarantees. To enable these complex configurations we discuss and present techniques to measure nonfunctional properties of software modules and use these values to compute SPL configurations optimized to the users needs.

Mario Pukall, Christian Kästner, and Gunter Saake. Towards Unanticipated Runtime Adaptation of Java Applications. In Proceedings of the 15th Asia-Pacific Software Engineering Conference (APSEC), ISBN 978-0-7695-3446-6, pages 85--92, Los Alamitos, CA: IEEE Computer Society, December 2008. Acceptance rate: 30 % (66/221). [ .pdf, bib ]

Modifying an application usually means to stop the application, apply the changes, and start the application again. That means, the application is not available for at least a short time period. This is not acceptable for highly available applications. One reasonable approach which faces the problem of unavailability is to change highly available applications at runtime. To allow extensive runtime adaptation the application must be enabled for unanticipated changes even of already executed program parts. This is due to the fact that it is not predictable what changes become necessary and when they have to be applied. Since Java is commonly used for developing highly available applications, we discuss its shortcomings and opportunities regarding unanticipated runtime adaptation. We present an approach based on Java HotSwap and object wrapping which overcomes the identified shortcomings and evaluate it in a case study.

Christian Kästner, Salvador Trujillo, and Sven Apel. Visualizing Software Product Line Variabilities in Source Code. In Proceedings of the 2nd International SPLC Workshop on Visualisation in Software Product Line Engineering (ViSPLE), ISBN 978-1-905952-06-9, pages 303--313, September 2008. [ .pdf, bib ]

Implementing software product lines is a challenging task. Depending on the implementation technique the code that realizes a feature is often scattered across multiple code units. This way it becomes difficult to trace features in source code which hinders maintenance and evolution. While previous effort on visualization technologies in software product lines has focused mainly on the feature model, we suggest tool support for feature traceability in the code base. With our tool CIDE, we propose an approach based on filters and views on source code in order to visualize and trace features in source code.

Sven Apel, Christian Kästner, and Christian Lengauer. Feature Featherweight Java: A Calculus for Feature-Oriented Programming and Stepwise Refinement. In Proceedings of the 7th ACM International Conference on Generative Programming and Component Engineering (GPCE), ISBN 978-1-60558-267-2, pages 101--112, New York, NY: ACM Press, August 2008. Acceptance rate: 29 % (16/55). [ .pdf, acm, doi, bib ]

Feature-oriented programming (FOP) is a paradigm that incorporates programming language technology, program generation techniques, and stepwise refinement. In their GPCE'07 paper, Thaker et al. suggest the development of a type system for FOP to guarantee safe feature composition, i.e, to guarantee the absence of type errors during feature composition. We present such a type system along with a calculus for a simple feature-oriented, Java-like language, called Feature Featherweight Java (FFJ). Furthermore, we explore four extensions of FFJ and how they affect type soundness.

Sven Apel, Christian Kästner, and Don Batory. Program Refactoring using Functional Aspects. In Proceedings of the 7th ACM International Conference on Generative Programming and Component Engineering (GPCE), ISBN 978-1-60558-267-2, pages 161--170, New York, NY: ACM Press, August 2008. Acceptance rate: 29 % (16/55). [ .pdf, acm, doi, bib ]

A functional aspect is an aspect that has the semantics of a transformation; it is a function that maps a program to an advised program. Functional aspects are composed by function composition. In this paper, we explore functional aspects in the context of aspect-oriented refactoring. We show that refactoring legacy applications using functional aspects is just as flexible as traditional aspects in that (a) the order in which aspects are refactored does not matter, and (b) the number of potential aspect interactions is decreased. We analyze several aspect-oriented programs of different sizes to support our claims.

Chang Hwan Peter Kim, Christian Kästner, and Don Batory. On the Modularity of Feature Interactions. In Proceedings of the 7th ACM International Conference on Generative Programming and Component Engineering (GPCE), ISBN 978-1-60558-267-2, pages 23--34, New York, NY: ACM Press, August 2008. Acceptance rate: 29 % (16/55). [ .pdf, acm, doi, bib ]

Feature modules are the building blocks of programs in software product lines (SPLs). A foundational assumption of feature-based program synthesis is that features are composed in a predefined order. Recent work on virtual separation of concerns reveals a new model of feature interactions that shows that feature modules can be quantized as compositions of smaller modules called derivatives. We present this model and examine some of its unintuitive consequences, namely, that (1) a given program can be reconstructed by composing features in any order, and (2) the contents of a feature module (as expressed as a composition of derivatives) is determined automatically by a feature order. We show that different orders allow one to “adjust” the contents of a feature module to isolate and study the impact of interactions that a feature has with other features. Using derivatives, we show the utility of generalizing safe composition (SC), a basic analysis of SPLs that verifies program type-safety, to prove that every legal composition of derivatives (and thus any composition order of features) produces a typesafe program, which is a much stronger SC property.

Christian Kästner, and Sven Apel. Type-checking Software Product Lines -- A Formal Approach. In Proceedings of the 23rd IEEE/ACM International Conference on Automated Software Engineering (ASE), ISBN 978-1-4244-2187-9, pages 258--267, Los Alamitos, CA: IEEE Computer Society, September 2008. Acceptance rate: 11 % (30/280). [ .pdf, doi, bib ]

A software product line (SPL) is an efficient means to generate a family of program variants for a domain from a single code base. However, because of the potentially high number of possible program variants, it is difficult to test all variants and ensure properties like type-safety for the entire SPL. While first steps to type-check an entire SPL have been taken, they are informal and incomplete. In this paper, we extend the Featherweight Java (FJ) calculus with feature annotations to be used for SPLs. By extending FJ's type system, we guarantee that – given a well-typed SPL – all possible program variants are welltyped as well. We show how results from this formalization reflect and help implementing our own language-independent SPL tool CIDE.

Sven Apel, Christian Lengauer, Bernhard Möller, and Christian Kästner. An Algebra for Features and Feature Composition. In Proceedings of the 12th International Conference on Algebraic Methodology and Software Technology (AMAST), volume 5140 of Lecture Notes in Computer Science, pages 36--50, Berlin/Heidelberg: Springer-Verlag, July 2008. Acceptance rate: 47 % (27/58). [ .pdf, doi, bib ]

Feature-Oriented Software Development (FOSD) provides a multitude of formalisms, methods, languages, and tools for building variable, customizable, and extensible software. Along different lines of research, different notions of a feature have been developed. Although these notions have similar goals, no common basis for evaluation, comparison, and integration exists. We present a feature algebra that captures the key ideas of feature orientation and provides a common ground for current and future research in this field, in which also alternative options can be explored.

Sven Apel, Christian Kästner, and Christian Lengauer. An Overview of Feature Featherweight Java. Technical Report MIP-0802, Passau, Germany: Department of Informatics and Mathematics, University of Passau, April 2008. [ .pdf, bib ]

Christian Kästner, Sven Apel, Salvador Trujillo, Martin Kuhlemann, and Don Batory. Language-Independent Safe Decomposition of Legacy Applications into Features. Technical Report FIN-2008-02, Magdeburg, Germany: University of Magdeburg, March 2008. [ .pdf, bib ]

Sven Apel, Christian Kästner, and Christian Lengauer. Research Challenges in the Tension Between Features and Services. In Proceedings of the ICSE Workshop on Systems Development in SOA Environments (SDSOA), ISBN 978-1-60558-029-6, pages 53--58, New York, NY: ACM Press, May 2008. [ .pdf, doi, bib ]

We present a feature-based approach, known from software product lines, to the development of service-oriented architectures. We discuss five benefits of such an approach: improvements in modularity, variability, uniformity, specifiability, and typeability. Subsequently, we review preliminary experiences and results, and propose an agenda for further research in this direction.

Christian Kästner, Sven Apel, and Martin Kuhlemann. Granularity in Software Product Lines. In Proceedings of the 30th International Conference on Software Engineering (ICSE), ISBN 978-1-60558-079-1, pages 311--320, New York, NY: ACM Press, May 2008. Acceptance rate: 15 % (56/371). Most Influencial Paper Award at SPLC'19. [ .pdf, acm, doi, epub, bib ]

Building software product lines (SPLs) with features is a challenging task. Many SPL implementations support features with coarse granularity - e.g., the ability to add and wrap entire methods. However, fine-grained extensions, like adding a statement in the middle of a method, either require intricate workarounds or obfuscate the base code with annotations. Though many SPLs can and have been implemented with the coarse granularity of existing approaches, fine-grained extensions are essential when extracting features from legacy applications. Furthermore, also some existing SPLs could benefit from fine-grained extensions to reduce code replication or improve readability. In this paper, we analyze the effects of feature granularity in SPLs and present a tool, called Colored IDE (CIDE), that allows features to implement coarse-grained and fine-grained extensions in a concise way. In two case studies, we show how CIDE simplifies SPL development compared to traditional approaches.

Norbert Siegmund, Martin Kuhlemann, Marko Rosenmüller, Christian Kästner, and Gunter Saake. Integrated Product Line Model for Semi-Automated Product Derivation Using Non-Functional Properties. In Proceedings of the 2nd Int'l Workshop on Variability Modelling of Software-Intensive Systems (VaMoS), pages 25--23, Essen, Germany: University of Duisburg-Essen, January 2008. [ .pdf, http, bib ]

Software product lines (SPL) allow to generate tailormade software by manually configuring reusable core assets. However, SPLs with hundreds of features and millions of possible products require an appropriate support for semi-automated product derivation. This derivation has to be based on non-functional properties that are related to core assets and domain features. Both elements are part of different models connected via complex mappings. We propose a model that integrates features and core assets in order to allow semi-automated product derivation.

Martin Kuhlemann, and Christian Kästner. Reducing the Complexity of AspectJ Mechanisms for Recurring Extensions. In Proceedings of the GPCE Workshop on Aspect-Oriented Product Line Engineering (AOPLE), pages 14--19, 2007. [ .pdf, bib ]

Aspect-Oriented Programming (AOP) aims at modularizing crosscutting concerns. AspectJ is a popular AOP language extension for Java that includes numerous sophisticated mechanisms for implementing crosscutting concerns modularly in one aspect. The language allows to express complex extensions, but at the same time the complexity of some of those mechanisms hamper the writing of simple and recurring extensions, as they are often needed especially in software product lines. In this paper we propose an AspectJ extension that introduces a simplified syntax for simple and recurring extensions. We show that our syntax proposal improves evolvability and modularity in AspectJ programs by avoiding those mechanisms that may harm evolution and modularity if misused. We show that the syntax is applicable for up to 74\% of all pointcut and advice mechanisms by analysing three AspectJ case studies.

Sven Apel, Christian Kästner, Martin Kuhlemann, and Thomas Leich. Pointcuts, Advice, Refinements, and Collaborations: Similarities, Differences, and Synergies. Innovations in Systems and Software Engineering -- A NASA Journal (ISSE), 3(3-4):281--289, December 2007. [ .pdf, http, bib ]

Aspect-oriented programming (AOP) is a novel programming paradigm that aims at modularizing complex software. It embraces several mechanisms including (1) pointcuts and advice as well as (2) refinements and collaborations. Though all these mechanisms deal with crosscutting concerns, i.e., a special class of design and implementation problems that challenge traditional programming paradigms, they do so in different ways. In this article we explore their relationship and their impact on software modularity. This helps researchers and practitioners to understand their differences and guides to use the right mechanism for the right problem.

Salvador Trujillo, Christian Kästner, and Sven Apel. Product Lines that supply other Product Lines: A Service-Oriented Approach. In Proceedings of the SPLC Workshop on Service-Oriented Architectures and Product Lines (SOAPL), pages 69--76, Pittsburgh, PA: SEI, September 2007. [ bib ]

Software product line is a paradigm to develop a family of software products with the goal of reuse. In this paper, we focus on a scenario in which different products from different product lines are combined together in a third product line to yield more elaborate products, i.e., a product line consumes products from third product line suppliers. The issue is not how different products can be produced separately, but how they can be combined together. We propose a service-oriented architecture where product lines are regarded as services, yielding a service-oriented product line. This paper illustrates the approach with an example for a service-oriented architecture of a web portal product line supplied by portlet product lines.

Sven Apel, Christian Lengauer, Don Batory, Bernhard Möller, and Christian Kästner. An Algebra for Feature-Oriented Software Development. Technical Report MIP-0706, Passau, Germany: Department of Informatics and Mathematics, University of Passau, July 2007. [ .pdf, bib ]

Christian Kästner, Martin Kuhlemann, and Don Batory. Automating Feature-Oriented Refactoring of Legacy Applications. In Proceedings of the ECOOP Workshop on Refactoring Tools (WRT), pages 62--63, Berlin, Germany: TU Berlin, July 2007. [ .pdf, bib ]

Creating a software product line from a legacy application is a difficult task. We propose a tool that helps automating tedious tasks of refactoring legacy applications into features and frees the developer from the burden of performing laborious routine implementations.

Christian Kästner. CIDE: Decomposing Legacy Applications into Features. In Proceedings of the 11st International Software Product Line Conference, second volume (Demonstration) (SPLC), ISBN 978-4-7649-0342-5, pages 149--150, 2007. [ .pdf, bib ]

Taking an extractive approach to decompose a legacy application into features is difficult and laborious with current approaches and tools. We present a prototype of a tooldriven approach that largely hides the complexity of the task.

Christian Kästner, Sven Apel, and Don Batory. A Case Study Implementing Features Using AspectJ. In Proceedings of the 11st International Software Product Line Conference (SPLC), pages 223--232, Los Alamitos, CA: IEEE Computer Society, September 2007. Acceptance rate: 35 % (28/80). [ .pdf, bib ]

Software product lines aim to create highly configurable programs from a set of features. Common belief and recent studies suggest that aspects are well-suited for implementing features. We evaluate the suitability of AspectJ with respect to this task by a case study that refactors the embedded database system Berkeley DB into 38 features. Contrary to our initial expectations, the results were not encouraging. As the number of aspects in a feature grows, there is a noticeable decrease in code readability and maintainability. Most of the unique and powerful features of AspectJ were not needed. We document where AspectJ is unsuitable for implementing features of refactored legacy applications and explain why.

Sven Apel, Christian Kästner, Thomas Leich, and Gunter Saake. Aspect Refinement - Unifying AOP and Stepwise Refinement. Journal of Object Technology (JOT), Special Issue on TOOLS EUROPE 2007, 6(9):13--33, October 2007. [ .pdf, http, bib ]

Stepwise refinement (SWR) is fundamental to software engineering. As aspectoriented programming (AOP) is gaining momentum in software development, aspects should be considered in the light of SWR. In this paper, we elaborate the notion of aspect refinement that unifies AOP and SWR at the architectural level. To reflect this unification to the programming language level, we present an implementation technique for refining aspects based on mixin composition along with a set of language mechanisms for refining all kinds of structural elements of aspects in a uniform way (methods, pointcuts, advice). To underpin our proposal, we contribute a fully functional compiler on top of AspectJ, present a non-trivial, medium-sized case study, and derive a set of programming guidelines.

Sven Apel, Christian Kästner, and Salvador Trujillo. On the Necessity of Empirical Studies in the Assessment of Modularization Mechanisms for Crosscutting Concerns. In Proceedings of the ICSE Workshop on Assessment of Contemporary Modularization Techniques (ACoM), Los Alamitos, CA: IEEE Computer Society, May 2007. [ .pdf, bib ]

Collaborations are a frequently occurring class of crosscutting concerns. Prior work has argued that collaborations are better implemented using Collaboration Languages (CLs) rather than AspectJ-like Languages (ALs). The main argument is that aspects flatten the objectoriented structure of a collaboration, and introduce more complexity rather than benefits – in other words, CLs and ALs differ with regard to program comprehension. To explore the effects of CL and AL modularization mechanisms on program comprehension, we propose to conduct a series of experiments. We present ideas on how to arrange such experiments that should serve as a starting point and foster a discussion with other researchers.

Christian Kästner. Aspect-Oriented Refactoring of Berkeley DB. Diplomarbeit, Magdeburg, Germany: University of Magdeburg, March 2007. [ .pdf, bib ]

Sven Apel, Christian Kästner, Martin Kuhlemann, and Thomas Leich. Modularität von Softwarebausteinen: Aspekte versus Merkmale. iX Magazin für Professionelle Informationstechnik (iX), (10):116--122, October 2006. [ http, bib ]

Schon seit einigen Jahren macht die aspektorientierte Programmierung von sich reden. Daneben zieht in jüngster Zeit die merkmalsorientierte Programmierung die Aufmerksamkeit auf sich. Beide verfolgen mit der Verbesserung der Modularität von Softwarebausteinen ähnliche Ziele, realisieren dies aber auf unterschiedliche Art und Weise - jeweils mit Vor- und Nachteilen.}

Sven Apel, Christian Kästner, Thomas Leich, and Gunter Saake. Aspect Refinement. Technical Report FIN-2006-10, Magdeburg, Germany: University of Magdeburg, August 2006. [ .pdf, bib ]

Christian Kästner, Sven Apel, and Gunter Saake. Implementing Bounded Aspect Quantification in AspectJ. In Proceedings of the 4th Workshop on Reflection, AOP and Meta-Data for Software Evolution (RAM-SE), pages 111--122, Magdeburg, Germany: University of Magdeburg, July 2006. [ .pdf, bib ]

The integration of aspects into the methodology of stepwise software development and evolution is still an open issue. This paper focuses on the global quantification mechanism of nowadays aspect-oriented languages that contradicts basic principles of this methodology. One potential solution to this problem is to bound the potentially global effects of aspects to a set of local development steps. We discuss several alternatives to implement such bounded aspect quantification in AspectJ. Afterwards, we describe a concrete approach that relies on meta-data and pointcut restructuring in order to control the quantification of aspects. Finally, we discuss open issues and further work.

Copyright Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Supervised Theses

Jens Meinicke. VarexJ: A Variability-Aware Interpreter for Java Applications. Master's Thesis, University of Magdeburg, Germany, December 2014. [ bib | .pdf]

Jonas Pusch. Variability-Aware Interpretation. Bachelor's Thesis, University of Marburg, Germany, November 2012. [ bib | .pdf]

Markus Kreutzer. Statische Analyse von Produktlinien. Bachelor's Thesis, University of Marburg, Germany, April 2012. [ bib | .pdf]

Steffen Haase. A Program Slicing Approach to Feature Identification in Legacy C Code. Master's Thesis (Diplomarbeit), University of Magdeburg, Germany, February 2012. [ bib | .pdf]

Constanze Adler. Optional Composition – A Solution to the Optional Feature Problem? Master's Thesis, University of Magdeburg, Germany, December 2010. [ bib | .pdf]

Matthias Ritter. Softwareschutz auf Quellcode-Ebene durch Techniken der Softwareproduktlinienentwicklung. Master's Thesis (Diplomarbeit), University of Magdeburg, Germany, September 2010. [ bib | .pdf]

Andy Kenner. Statische Referenzanalyse in C-Präprozessor-konfigurierten Anwendungen. Master's Thesis (Diplomarbeit), University of Magdeburg, Germany, August 2010. Results published as a workshop paper at FOSD 2010 [ bib | .pdf]

Alexander Dreiling. Feature Mining: Semiautomatische Transition von (Alt-)Systemen zu Software-Produktlinien. Master's Thesis (Diplomarbeit), University of Magdeburg, Germany, July 2010. A journal paper about the results is currently under review [ bib | .pdf]

Christian Becker. Entwicklung eines nativen Compilers für Feature-orientierte Programmierung. Master's Thesis, University of Magdeburg, Germany, June 2010. [ bib | .pdf]

Thomas Thüm. A Machine-Checked Proof for a Product-Line-Aware Type System. Master's Thesis (Diplomarbeit), University of Magdeburg, Germany, January 2010. Best-thesis award of the Denert Foundation for Software Engineering. Results published as part of a journal paper in ACM Transactions on Software Engineering and Methodology (TOSEM), 2011 [ bib | .pdf]

Andreas Schulze. Systematische Analyse von Feature-Interaktionen in Softwareproduktlinien. Master's Thesis (Diplomarbeit), University of Magdeburg, Germany, November 2009. [ bib | .pdf]

Dirk Aporius. Verringerung des redundanten Softwareentwicklungsaufwandes für Portable Systeme. Master's Thesis (Diplomarbeit), University of Magdeburg, Germany, October 2009. [ bib | .pdf]

Janet Feigenspan. Empirical Comparison of FOSD Approaches Regarding Program Comprehension – A Feasibility Study. Master's Thesis (Diplomarbeit), University of Magdeburg, Germany, August 2009. Best-thesis award by Metop Research Center and Research Award by IHK Magdeburg. The results were published as part of a journal paper in Empirical Software Engineering, 2012. [ bib | .pdf]

Malte Rosenthal. Alternative Features in Colored Featherweight Java. Master's Thesis (Diplomarbeit), University of Passau, Germany, July 2009. [ bib | .pdf]

Chau Le Minh. Evaluation feature-basierter service-orientierter Architekturen am Beispiel eines Domotic-Szenarios. Master's Thesis (Diplomarbeit), University of Magdeburg, Germany, June 2009. [ bib | .pdf]

Stefan Kegel. Streamed verification of a data stream management benchmark. Bachelor's Thesis (Studienarbeit), University of Magdeburg, Germany, April 2009. [ bib | .pdf]

Janet Feigenspan. Requirements and design for a language-independent IDE framework to support feature-oriented programming. Bachelor's Thesis (Studienarbeit), University of Magdeburg, Germany, February 2009. [ bib | .pdf]

Christian Hübner. Unterstützung der Requirementsanalyse von Navigationssoftware auf Grundlage feature-basierter Domänen-Modelle. Master's Thesis (Diplomarbeit), University of Magdeburg, Germany, December 2008. [ bib | .pdf]

Axel Hoffmann. Nachvollziehbare Bewirtschaftung gewachsener Datenbestände großer Unternehmen für das Controlling. Bachelor's Thesis (Studienarbeit), University of Magdeburg, Germany, August 2008. [ bib | .pdf]

Thomas Thüm. Reasoning about Feature Model Edits. Bachelor's Thesis (Studienarbeit), University of Magdeburg, Germany, June 2008. Results published as conference paper at the International Conference on Software Engineering (ICSE), 2009. [ bib | .pdf]

Jens Meinicke. Variational Debugging: Understanding Differences among Executions. PhD Dissertation, University of Magdeburg, Germany, January 2019. [ bib | .pdf]

Shurui Zhou. Improving Collaboration Efficiency in Fork-based Development. PhD Dissertation, Carnegie Mellon University, USA, May 2020. [ bib | .pdf]