Mining
Software Engineering Data Bibliography
--
What kinds of software
engineering data can be mined?--
Static
code bases
- Data Mining Library Reuse Patterns in User-Selected
Applications. Amir Michail. ASE 1999. [PDF]
- Data Mining Library Reuse Patterns using Generalized
Association Rules. Amir Michail. ICSE 2000. [PDF]
- Bugs as Deviant Behavior: A General Approach to Inferring
Errors in Systems Code. Dawson Engler, David Yu Chen, Seth Hallem, Andy
Chou, and Benjamin Chelf. SOSP 2001. [PDF]
- A New Approach to Data Mining for Software Design. Walid
Taha, Scott Crosby, and Kedar Swadi. CSITeA 2004. [PDF]
- CP-Miner: A Tool for Finding Copy-paste and Related Bugs
in Operating System Code. Zhenmin Li, Shan Lu, Suvda Myagmar, and
Yuanyuan Zhou. OSDI 2004. [PDF]
- MUDABlue: An
Automatic Categorization System for Open Source Repositories. Shinji
Kawaguchi, Pankaj K. Garg, Makoto Matsushita, and Katsuro Inoue. APSEC
2004. [PDF]
- PR-Miner: Automatically Extracting Implicit Programming
Rules and Detecting Violations in Large Software Code. Zhenmin Li and
Yuanyuan Zhou. ESEC/FSE 2005. [PDF]
- Mining Jungloids: Helping to Navigate the API Jungle. David Mandelin, Lin
Xu, Rastislav Bodik, and Doug Kimelman. PLDI 2005. [PDF]
[Prospector
tool web interface]
- Mining Temporal Specifications for Error Detection.
Westley Weimer and George Necula. TACAS 2005. [PDF]
- Synthesis of Interface Specifications for Java Classes.
Rajeev Alur, Pavol Cerny, Gunjan Gupta, P. Madhusudan, Wonhong Nam, and
Anshuman Srivastava. POPL 2005. [PDF]
- Timna: A Framework for Automatically Combining Aspect
Mining Analyses. David Shepherd, Jeffrey Palm, Lori Pollock and Mark
Chu-Carroll. ASE 2005. [PDF]
- Permissive Interfaces. Thomas A. Henzinger, Ranjit Jhala,
and Rupak Majumdar, ESEC/FSE 2005. [PDF]
- Understanding Software Application Interfaces via String
Analysis. Evan Martin and Tao Xie. ICSE 2006 ER. [PDF]
- MAPO: Mining API Usages from
Open Source Repositories. Tao Xie and Jian Pei. MSR 2006. [PDF]
- Context-Sensitive Domain-Independent Algorithm Composition and Selection. Troy A. Johnson. Rudolf Eigenmann. PLDI 2006 [PDF]
- GPLAG: Detection of Software Plagiarism by Procedure
Dependency Graph Analysis. Chao Liu,
Chen Chen
, Jiawei Han, and Philip Yu. KDD 2006. [PDF]
- XSnippet: Mining for Sample Code. Naiyana Tansalarak and
Kajal T. Claypool. OOPSLA 2006 [PDF]
- Mica: A Web-Search Tool for Finding API Components and Examples. Jeffrey Stylos and Brad A. Myers, VL/HCC 2006. [PDF] [Web]
- Mining Security-sensitive Operations in Legacy Code using Concept
Analysis. Vinod Ganapathy, David King, Trent Jaeger and Somesh Jha.
ICSE 2007. [PDF]
- Automated Inference of Pointcuts in Aspect-Oriented Refactoring. Prasanth
Anbalagan and
Tao Xie. ICSE 2007 [PDF]
- Path-Sensitive Inference of Function Precedence Protocols. Murali
Krishna Ramanathan, Ananth Grama, Suresh Jagannathan. ICSE 2007. [PDF]
- Static Specification Inference Using Predicate Mining. Murali Krishna Ramanathan, Ananth Grama, Suresh Jagannathan. PLDI 2007 [PDF]
- Static Error Detection Using Semantic Inconsistency Inference. Isil Dillig, Thomas Dillig and Alex Aiken. PLDI 2007. [PDF]
- Finding What's Not There: A New Approach to Revealing Neglected
Conditions in Software. Ray-Yaung Chang, Andy Podgurski, and Jiong
Yang. ISSTA 2007 [PDF]
- Static Specification Mining Using Automata-Based Abstractions. Sharon
Shoham, Eran Yahav, Stephen Fink and Marco Pistoia. ISSTA 2007 [PDF]
- Efficient Mining of Iterative Patterns for Software
Specification Discovery. David Lo, Siau-Cheng Khoo and Chao Liu. KDD
2007. [PDF]
- Recommending Random Walks, Zachary M Saul, Vladimir Filkov, Premkumar Devanbu, Christian Bird. ESEC/FSE 2007 [PDF] [FRAN implementation]
- Mining API Patterns as Partial Orders from Source Code: From Usage
Scenarios to Specifications, Mithun Acharya, Tao Xie, Jian Pei, Jun Xu.
ESEC/FSE 2007. [PDF]
- Detecting Object Usage Anomalies. Andrzej Wasylkowski, Andreas Zeller, and Christian Lindig. ESEC/FSE 2007. [PDF]
- Assieme:
Finding and Leveraging Implicit References in a Web Search Interface
for Programmers. Raphael Hoffmann, James Fogarty, and Daniel S. Weld. UIST 2007. [PDF]
- Searching the Library and Asking the Peers: Learning to Use Java APIs
on Demand. Yunwen Ye, Yasuhiro Yamamoto, Kumiyo Nakakoji, Yoshiyuki
Nishinaka, and Mitsuhiro Asada. PPPJ 2007. [PDF]
- Mining Concepts from Code with Probabilistic Topic Models. Erik
Linstead, Paul Rigor, Sushil Bajracharya, Cristina Lopes, and Pierre
Baldi. ASE 2007 [PDF]
- PARSEWeb: A Programmer Assistant for Reusing Open Source Code on the Web. Suresh Thummalapenta and Tao Xie. ASE 2007. [PDF]
- NEGWeb: Static Defect Detection via Searching Billions of Lines
of Open Source Code. Suresh Thummalapenta and Tao Xie. NCSU CSC
2007. [PDF]
- Mining Patterns and Violations Using Concept Analysis. Christian Lindig. [PDF]
Software
change history
- CVSSearch: Searching through Source Code using CVS
Comments. Annie Chen, Eric
Chou, Joshua Wong, Andrew Y. Yao, Qing Zhang, Shao Zhang, and Amir
Michail. ICSM 2001. [PDF]
- Mining the Maintenance History of a Legacy Software
System. Jelber Sayyad-Shirabad, Timothy Lethbridge, Stan Matwin. ICSM
2003 [PDF]
- Automatic Categorization Algorithm for Evolvable Software
Archive. Shinji Kawaguchi,
Pankaj K. Garg, Makoto
Matsushita,
and
Katsuro Inoue. IWPSE 2003. [PDF] More
papers More
papers
- Mining Version Histories to Guide Software Changes. Thomas
Zimmermann, Peter Weibgerber, Stephan Diehl, and Andreas Zeller. ICSE
2004. [PDF][eRose tool
implementation]
- Predicting Source Code Changes by Mining Change History.
Annie Ying, Gail Murphy, Raymond Ng, and Mark Chu-Carroll. TSE 2004. [PDF]
- Recovering System Specific Rules from Software
Repositories. Chadd Williams, and Jeffrey K. Hollingsworth. MSR 2005. [PDF]
- Mining Change and Version Management Histories to Evaluate
an Analysis Tool, Danhua Shao, Sarfraz Khurshid and Dewayne E. Perry.
2005. [PDF]
- DynaMine: Finding Common Error Patterns by Mining Software
Revision Histories. Benjamin
Livshits and Thomas Zimmermann. ESEC/FSE 2005. [PDF]
- An Empirical Study of Code Clone Genealogies.
Miryung Kim , Vibha Sazawal, David Notkin, and
Gail C. Murphy. ESEC/FSE 2005. [PDF]
- When Do Changes Induce Fixes? Jacek Sliwerski, Thomas
Zimmermann, and Andreas Zeller. MSR 2005. [PDF]
[slides]
- Locating Matching Method Calls by Mining Revision History
Data. Benjamin Livshits and
Thomas Zimmermann. BUG 2005. [PDF]
- Automatic Mining of Source Code Repositories to Improve
BugFinding Techniques. Chadd
C. Williams and Jeffrey K. Hollingsworth. TSE 2005. [PDF]
- Mining Eclipse for Cross-Cutting Concerns. Silvia Breu and
Thomas Zimmerman and Christian Lindig. MSR 2006. [PDF]
- Mining Aspects from Version History. Silvia Breu and
Thomas Zimmermann. ASE 2006. [PDF]
- Automatic Inference of Structural Changes for Matching
Across Program Versions. Miryung Kim, David Notkin, and Dan Grossman.
ICSE 2007. [PDF]
- Predicting Faults from Cached History. Sunghun Kim, Thomas Zimmermann, E. James Whitehead Jr., and Andreas Zeller. ICSE 2007. [PDF]
Profiled
program states
- Dynamically Discovering Likely Program Invariants to
Support
Program Evolution. Michael D. Ernst, Jake Cockrell, William G.
Griswold, and David Notkin. TSE 2001. [PDF]
[Daikon tool
implementation] [Publications
using Daikon]
- Tracking Down Software Bugs Using Automatic Anomaly
Detection, Sudheendra Hangal and Monica S. Lam. ICSE
2002. [PDF]
[DIDUCE tool
implementation]
- Discovering Algebraic Specifications from Java Classes.
Johannes Henkel and Amer Diwan. ECOOP 2003. [PDF]
- Bug Isolation via Remote Program Sampling. Ben Liblit,
Alex Aiken, Alice X. Zheng, and Michael I. Jordan. PLDI 2003. [PDF]
- Automatic Extraction of Object-Oriented Observer
Abstractions from Unit-Test Executions. Tao Xie and David Notkin. ICFEM
2004. [PDF]
- Automatic Extraction of Sliced Object State Machines for
Component Interfaces. Tao Xie and David Notkin. SAVCBS 2004. [PDF]
- Scalable Statistical Bug Isolation. Ben Liblit, Mayur
Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan. PLDI 2005. [PDF]
- Automatic Extraction of Abstract-Object-State Machines
Based on Branch Coverage. Hai Yuan and Tao Xie. RETR 2005. [PDF]
- Automatically Identifying
Special and Common Unit Tests for Object-Oriented Programs. Tao Xie and
David Notkin. ISSRE 2005. [PDF]
- Behavior Capture and
Test: Automated Analysis of Component Integration, Leonardo Mariani and
Mauro Pezzè. ICECCS 2005. [PDF]
- SOBER: Statistical Model-based Bug Localization. Chao Liu,
Xifeng Yan, Long Fei, Jiawei Han, and Samuel P. Midkiff. ESEC/FSE2005. [PDF]
- Mining Object Behavior with ADABU. Valentin Dallmeier and
Christian Lindig and Andrzej Wasylkowski and Andreas Zeller. WODA 2006.
[PDF]
- Automatic Extraction of Abstract-Object-State Machines
from Unit-Test Executions. Tao Xie, Evan Martin, and Hai Yuan. ICSE
2006 Demo. [PDF]
- Inferring Access-Control
Policy Properties via Machine Learning, Evan Martin and Tao Xie. POLICY
2006. [PDF]
- Inference and Enforcement of Data Structure Consistency
Specifications. Brian Demsky, Michael D. Ernst, Philip J. Guo, Stephen
McCamant, Jeff H. Perkins, and Martin Rinard. ISSTA 2006. [PDF]
- Statistical Debugging Using Compound Boolean Predicates. Piramanayagam
Arumuga Nainar, Ting Chen, Jake Rosin, and Ben Liblit. ISSTA 2007 [PDF]
Profiled
structural entities
- Execution branches/paths
- The Concept of Dynamic Analysis. Tom Ball. ESEC/FSE 1999. [PDF]
- Finding Failures by Cluster Analysis of Execution
Profiles. William Dickinson and David Leon and Andy Podgurski. ICSE
2001. [PDF]
- Pursuing Failure: the Distribution of Program Failures in
aProfile Space. William Dickinson, David Leon, and Andy Podgurski. FSE
2001. [PDF]
- Detecting AAA Vulnerabilities by Mining Execution
Profiles. Zhan Xu, David Leon, and Andy Poidgurski, and Vincenzo Liberatore.
IEEE S&P 2004. [PDF]
- Active Learning for Automatic Classification of Software
Behavior. James Bowring, James Rehg, and Mary Jean Harrold. ISSTA 2004.
[PDF]
- Mining Control Patterns from Java Program Corpora.
Deng-Jyi Chen, Chung-Chien Hwang, Shih-Kun Huang, and David T. K. Chen.
JISE 2004. [PDF]
- Improving the Classification of Software Behaviors using
Ensembles of Control-Flow and Data-Flow Classifiers. James Bowring,
Mary Jean Harrold, and James Rehg. GIT-CERCS-05-10. [PDF]
- Data Mining and Cross-checking of Execution Traces.
TristanDenmat, Mireille Ducasse and Olivier Ridoux. ASE 2005.
[PDF]
- Mining Control Flow Abnormality for Logic Error Isolation.
Chao Liu, Xifeng Yan, and Jiawei Han. SDM 2006 [PDF]
- How Bayesians Debug. Chao Liu, Zeng Lian and Jiawei Han. ICDM 2006 [PDF]
- QUARK: Empirical Assessment of Automaton-based Specification Miners. David Lo and Siau-Cheng Khoo. WCRE 2006. [PDF]
- Failure Proximity: A Fault Localization-Based Approach. Chao Liu and Jiawei Han. FSE 2006 [PDF]
- SMArTIC: Towards Building an Accurate, Robust and Scalable Specification Miner. David Lo and Siau-Cheng Khoo. FSE 2006 [PDF]
- Mining Specifications of Malicious Behavior.
Mihai Christodorescu, Somesh Jha, and Christopher Kruegel. ESEC/FSE 2007. [PDF]
- Function calls
- Discovering Models of Software Processes from Event-Based
Data. Jonathan E. Cook and Alexander L. Wolf. TOSEM 1998 [PDF]
- Encoding Program Executions. Steven P. Reiss and Manos Renieris.
ICSE 2001 [PDF]
- Automatic Extraction of Object-Oriented Component
Interfaces.
John Whaley, Michael C. Martin, and Monica S. Lam. ISSTA 2002 [PDF]
- Mining Specifications. Glenn Ammons, Rastislav Bodk, and
James R. Larus. POPL 2002 [PDF]
- Interaction-Pattern Mining: Extracting Usage Scenarios
from Run-time Behavior Traces, M. El-Ramly, E. Stroulia, P.
Sorenson. KDD 2002. [PDF]
CELLEST
Mohammad
El-Ramly Stroulia
- Debugging Temporal Specifications with Concept Analysis. Glenn Ammons, David Mandelin, Rastislav
Bodik, James Larus. PLDI 2003. [PDF]
- Aspect Mining through the Formal Concept Analysis of
Execution Traces. Paolo Tonella, Mariano Ceccato. WCRE 2004. [PDF]
- Dynamically Inferring Temporal Properties. Jinlin Yang and
David Evans. PASTE 2004. [PDF]
[Terracotta
tool implementation]
- Automatically Inferring Temporal Properties for Program
Evolution. Jinlin Yang and David Evans. ISSRE 2004. [PDF]
- Inference of Component Protocols by the kBehavior
Algorithm. Leonardo Mariani, and Mauro Pezzè. 2004 Tech
Report [PDF]
[kBehavior
tool implementation]
- Aspect Mining Using Event Traces. Silvia Breu and Jens
Krinke. ASE 2004. [PDF]
- Mining System-User Interaction Logs for Interaction
Patterns. Mohammad El-Ramly and Eleni Stroulia. MSR 2004. [PDF]
- Applying Webmining Techniques to Execution Traces to
Support the Program Comprehension Process. Andy Zaidman, Toon Calders, Serge
Demeyer, and Jan Paredaens. CSMR 2005. [PDF]
- Behavior Capture and
Test: Automated Analysis of Component Integration. Leonardo Mariani and
Mauro Pezzè. ICECCS 2005. [PDF]
- Mining Behavior Graphs for Backtrace
of Noncrashing Bugs.
Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han, and Philip S. Yu. SDM
2005. [PDF]
- Helping Users Avoid Bugs in GUI
Applications. Amir Michail and Tao Xie. ICSE 2005. [PDF]
- Lightweight Defect Localization for Java. Valentin
Dallmeier, Christian Lindig, and Andreas Zeller. ECOOP 2005. [PDF]
[Ample tool
implementation]
- Detecting
Failure-Related Anomalies in Method Call Sequences. Valentin Dallmeier.
Thesis 2005. [PDF]
- Data Mining Approaches to Software Fault Diagnosis. Bose,
R.P.J.C.; Srinivasan, S.H. RIDE-SDMA 2005. [PDF]
- Extending Dynamic Aspect
Mining with Static Information. Silvia Breu. SCAM 2005. [PDF]
- Perracotta: Mining Temporal API Rules from Imperfect
Traces. Jinlin Yang, David Evans, Deepali Bhardwaj, Thirumalesh Bhat,
and Manuvir Das. ICSE 2006. [PDF]
- Cost-Sensitive Decision Tree Learning for Forensic Classification.
Jason V. Davis, Jungwoo Ha, Hany E. Ramadan, Christopher J. Rossbach,
and Emmett Witchel. ECML 2006. [PDF]
- Improved Error Reporting for Software that Uses Black-Box Components.
Jungwoo Ha, Christopher J. Rossbach, Jason V. Davis, Indrajit Roy, Hany
E. Ramadan, Don E. Porter, David L. Chen and Emmett Witchel. PLDI 2007 [PDF]
Bug
reports/natural languages
- Bug reports:
- Automated Support for Classifying Software Failure
Reports.
Andy Podgurski, David Leon, Patrick Francis, Wes Masri, MelindaMinch,
Jiayang Sun, and Bin Wang. ICSE 2003. [PDF]
- Tree-Based Methods for Classifying Software Failures.
Patrick
Francis, David Leon, Melinda Minch, Andy Podgurski. ISSRE 2004. [PDF]
- Using FogBUGZ to Get
Crash Reports From Users - Automatically! Joel Spolsky. [HTML]
- Automatic Bug Triage Using Text
Classification. Davor Cubranic and Gail Murphy. SEKE 2004. [PDF]
- When Do Changes Induce Fixes? Jacek Sliwerski, Thomas
Zimmermann, and Andreas Zeller. MSR 2005. [PDF]
[slides]
- Software Defect Association Mining and Defect Correction
Effort Prediction. Qinbao
Song, Martin Shepperd, Michelle Cartwright, and Carolyn Mair. TSE 2006.
[PDF]
- Who Should Fix This Bug? John Anvik, Lyndon Hiew, and Gail
C. Murphy. ICSE 2006. [PDF]
[BugTriage
project]
-
Coping With Open Bug Repositories. John Anvik, Lyndon Hiew, and Gail C. Murphy. eTX 2006. [PDF] [BugTriage project]
-
Mining Metrics to Predict Component Failures, Nachiappan Nagappan,
Thomas Ball, Andreas Zeller, ICSE 2006 [PDF]
- How
Long Did It Take to Fix Bugs? Sunghun Kim, E. James Whitehead, Jr. MSR
2006 [PDF]
- If
Your Bug Database Could Talk. Adrian Schröter, Thomas
Zimmermann, Rahul
Premraj, and Andreas Zeller:
Technical Report, Saarland University, June 2006. [PDF][Eclipse
Bug Data]
- Automatic
Identification of Bug-Introducing Changes. Sunghun Kim, Thomas
Zimmermann, Kai Pan, E. James Whitehead, Jr. ASE 2006 [PDF]
- A Linguistic Analysis of
How People Describe Software Problems in Bug Reports. Andrew Ko, Brad
Myers, Duen Horng Chau. VLHCC 2006 [PDF]
- Detection of Duplicate Defect Reports Using Natural Language
Processing. Per Runeson, Magnus Alexandersson, and Oskar Nyholm. ICSE
2007. [PDF]
- Which Warnings Should I Fix First? Sunghun Kim and Michael D. Ernst. ESEC/FSE 2007. [PDF]
- An Approach to
Detecting Duplicate Bug Reports using Natural Language and Execution
Information.. Xiaoyin Wang, Lu Zhang, Tao Xie, John Anvik, and Jiasu
Sun. ICSE 2008. [PDF]
- Code comments:
- Examining the Evolution of Code Comments in PostgreSQL. Zhen Ming Jiang and Ahmed E. Hassan. MSR 2006 [PDF]
- /* iComment: Bugs or Bad Comments? */. Lin Tan, Ding Yuan, Gopal Krishna and Yuanyuan Zhou. SOSP 2007. [PDF]
- Emails:
- Mining Email Social Networks. Christian Bird, Alex Gourley, Prem Devanbu, Michael Gertz, and Anand Swaminathan. MSR 2006 [PDF]
- What Can
OSS Mailing Lists Tell Us? A Preliminary Psychometric Text Analysis of
the Apache Developer Mailing List. Peter C. Rigby, and Ahnmed E.
Hassan. MSR 2007 [PDF]
- Code identifiers:
- Mining
the Lexicon Used by Programmers during Sofware Evolution. G. Antoniol
and Y. Gael and E. Merlo and Paolo Tonella. ICSM 2007 [PDF]
Misc.
-
Mining Aspects in Requirements. Américo
Sampaio, Neil Loughran, Awais Rashid and Paul Rayson.
Workshop on Early Aspects 2005. [PDF]
- EA-Miner: A Tool for Automating
Aspect-Oriented Requirements Identification. Americo Sampaio, Ruzanna
Chitchyan, Awais Rashid, and Paul Rayson. ASE 2005 [PDF]
Maintained by 