I am a 4th year PhD at Institute for Software Research, School of Computer Science at Carnegie Mellon University.
My advisor is Professor Christian Kästner.
I am interested in understanding fork-based software development, helping developers better collaborate in distributed settings with forks. I discover and evaluate existing interventions and develop new ones that steer collaborative development with forks toward better practices, such as better coordination among otherwise independent developers.
Our paper Poster: Forks Insight: Providing an Overview of GitHub Forks. got accepted for ICSE 2018 Poster Track!
Our paper Adding Sparkle to Social Coding: An Empirical Study of Repository Badges in the npm Ecosystem got accepted for ICSE 2018 !
Fork-based development has been widely used both in open source community and industry, because it gives developers flexibility to modify their own fork without affecting others. Unfortunately, this mechanism has downsides; when the number of forks becomes large, it is difficult for developers to get or maintain an overview of activities in the forks. Current tools provide little help. We introduced INFOX, an approach to automatically identifies not-merged features in forks and generates an overview of active forks in a project. The approach clusters cohesive code fragments using code and network analysis techniques and uses information-retrieval techniques to label clusters with keywords. The clustering is effective, with 90% accuracy on a set of known features. In addition, a human-subject evaluation shows that INFOX can provide actionable insight for developers of forks.
In fast-paced, reuse-heavy software development, the transparency provided by social coding platforms like GitHub is essential to decision making. Developers infer the quality of projects using visible cues, known as signals, collected from personal profile and repository pages. We report on a large-scale, mixed-methods empirical study of npm packages that explores the emerging phenomenon of repository badges, with which maintainers signal underlying qualities about the project to contributors and users. We investigate which qualities maintainers intend to signal and how well badges correlate with those qualities. After surveying developers, mining 294,941 repositories, and applying statistical modeling and time series analysis techniques, we find that non-trivial badges, which display the build status, test coverage, and up-to-dateness of dependencies, are mostly reliable signals, correlating with more tests, better pull requests, and fresher dependencies. Displaying such badges correlates with best practices, but the effects do not always persist.
Build systems contain a lot of configuration knowledge about a software system, such as under which conditions specific files are compiled. Extracting such configuration knowledge is important for many tools analyzing highly-configurable systems, but very challenging due to the complex nature of build systems. We design an approach, based on SYMake, that symbolically evaluates Makefiles and extracts configuration knowledge in terms of file presence conditions and conditional parameters. We implement an initial prototype and demonstrate feasibility on small examples.
Elastic resource management is one of the key characteristics of cloud computing systems. Existing elastic approaches focus mainly on single resource consumption such as CPU consumption, rarely considering comprehensively various features of applications. Applications deployed on a PaaS are usually heterogeneous. While sharing the same resource, these applications are usually quite different in resource consuming. How to deploy these heterogeneous applications on the smallest size of hardware thus becomes a new research topic. In this paper, we take into consideration application's CPU consumption, I/O consumption, consumption of other server resources and application's request rate, all of which are defined as application features. This paper proposes a practical and effective elasticity approach based on the analysis of application features. The evaluation experiment shows that, compared with traditional approach, our approach can save up to 32.8% VMs without significant increase of average response time and SLA violation.