Improving
Software Productivity and Quality via
Mining Program Source Code
[Summary] [People] [Publications] [Tutortials] [Presentations] [Software] [Links] [Sponsors]
PROJECT SUMMARY
Since late 90's,
various data mining techniques have been applied to analyze software
engineering data, and have achieved many noticeable successes. Substantial
experience, development, and lessons of data mining for software engineering
pose interesting challenges and opportunities for new research and
development. This project develops novel techniques and tools for mining
program source code by exploiting code search engines to expand the mining
scope, adapting program model checking to collect static program traces for
simulating runtime program behavior (without requiring system runtime setup or
system test suites), and applying data mining techniques on collected program
source code or its static traces to mine API usage patterns or properties.
These mined API usage patterns or properties are used to find bugs in
software systems or aid developers in correctly writing API client code when developing
software systems.
PEOPLE
Faculty
Tao Xie (Principal
Investigator)
Students
Mithun Acharya (PhD
Candidate)
Suresh Thummalapenta
(PhD Student)
PUBLICATIONS
- Suresh
Thummalapenta and Tao Xie. SpotWeb: Detecting Framework Hotspots and Coldspots via Mining Open Source Code on the Web. To appear in Proceedings
of the 23rd IEEE/ACM International Conference on Automated Software Engineering (ASE 2008), L'Aquila, Italy, September 2008. [BibTeX]
A previous version
appeared in Proceedings
of MSR 2008 as a Position Paper.
- Suresh
Thummalapenta and Tao Xie. NEGWeb: Detecting Neglected Conditions via Mining Programming Rules from Open Source Code. To be presented as a Student Poster at International Symposium on Software Testing and Analysis (ISSTA 2008), Seattle, Washington, July 2008.
- Christoph Csallner, Yannis Smaragdakis, and Tao Xie. DSD-Crasher: A hybrid analysis tool for bug finding. ACM Transactions on Software Engineering and Methodology, Vol. 17, Issue 2, pp. 345-371,
July 2008. [PDF][BibTeX]
-
Xiaoyin Wang, Lu
Zhang, Tao Xie, John Anvik, and Jiasu Sun. An
Approach to Detecting Duplicate Bug Reports using Natural Language and
Execution Information. In Proceedings
of the 30th International Conference on Software Engineering (ICSE
2008), Leipzig, Germany, pp. 461-470, May 2008. [PDF][BibTeX]
- Suresh
Thummalapenta and Tao Xie. SpotWeb: Detecting Framework Hotspots via Mining Open Source Repositories on the Web. In Proceedings
of the 5th Working Conference on Mining Software Repositories (MSR 2008),
Position Paper, Leipzig, Germany, pp.109-112, May 2008. [PDF][BibTeX]
- Tao Xie, Mithun Acharya, Suresh Thummalapenta, and Kunal Taneja. Improving Software Reliability and Productivity via Mining Program Source Code. In Proceedings of the NSF Next Generation Software Program Workshop at IPDPS 2008 (NSFNGS
2008), Miami, Florida, April 2008. [PDF][BibTeX]
-
Suresh
Thummalapenta and Tao Xie. PARSEWeb:
A
Programmer Assistant for Reusing Open Source Code on the Web. In Proceedings
of the 22nd IEEE/ACM International Conference on Automated
Software Engineering (ASE 2007),
Atlanta, Georgia, pp. 204-213, November 2007. [PDF][BibTeX]
-
Mithun
Acharya and Tao Xie. Static
Detection of API Error-Handling Bugs via Mining Source Code. North Carolina State University Department of Computer Science
Technical report TR-2007-35, October
15,
2007. [PDF][BibTex]
- Yoonki Song, Suresh Thummalapenta, and Tao Xie. UnitPlus:
Assisting Developer Testing in Eclipse. In Proceedings of
the Eclipse Technology eXchange
Workshop at OOPSLA 2007 (ETX
2007), Montreal, Canada, Pages 26-30, October 2007. (Best
Student Paper Award) [PDF][BibTeX]
-
Suresh
Thummalapenta. Exploiting code search engines to improve programmer productivity. In Proceedings
of the 21th Annual ACM SIGPLAN International Conference on
Object-Oriented Programming, Systems, Languages, and Applications
(Companion) (OOPSLA 2007), ACM SIGPLAN Student Research Competition, Montreal, Canada, pp. 921-922, October 2007.[PDF][PPT]
-
Suresh
Thummalapenta and Tao Xie. NEGWeb: Static Defect
Detection via Searching Billions of Lines of Open Source Code. North Carolina State University Department of Computer Science
Technical report TR-2007-24,
September 16, 2007. [PDF][BibTex]
-
Mithun Acharya,
Tao Xie, Jian Pei, and Jun Xu. Mining API Patterns
as Partial Orders from Source Code: From Usage Scenarios to
Specifications. In Proceedings
of the 6th joint meeting of the European Software Engineering
Conference and the ACM SIGSOFT Symposium on the Foundations of Software
Engineering (ESEC/FSE 2007),
Dubrovnik, Croatia, pp. 25-34, September, 2007. [PDF][BibTeX]
-
Mithun Acharya, Tao Xie, and Jun Xu.
Mining Interface Specifications for Generating Checkable
Robustness Properties.
In Proceedings of the
17th
IEEE International Conference
on Software Reliability Engineering (ISSRE
2006), Raleigh, NC,
pp. 311-320, November
2006. [PDF][BibTeX]
- Mithun Acharya. Automatic Inference of Interface Properties from Program Source Code. In Proceedings of the 14th ACM SIGSOFT Symposium on Foundations of Software Engineering (FSE 2006), Doctoral Symposium, Portland, Oregon, USA, November 2006. [PDF]
- Mithun Acharya. Automatic Generation and Inference of Interface Properties from Program Source Code. In Proceedings
of the 20th Annual ACM SIGPLAN International Conference on
Object-Oriented Programming, Systems, Languages, and Applications
(Companion) (OOPSLA 2006), ACM SIGPLAN Student Research Competition, Portland, Oregon, USA, pp. 750-751, October 2006.
-
Mithun Acharya, Tanu
Sharma, Jun Xu, and Tao Xie.
Effective Generation of Interface Robustness
Properties for Static Analysis. In
Proceedings of the 21st IEEE/ACM
International Conference on Automated Software Engineering (ASE
2006), Short Paper,
Tokyo, Japan, pp. 293-296, September 2006. [PDF][BibTeX]
- Mithun Acharya. Automatic Generation of Robustness and Security Properties from Program Source Code. In Supplemental Proceedings of the IEEE International Conference on Dependable Systems and Networks (DSN 2006), Student Forum, Philadelphia, PA, USA, pp. 166-168, June 2006. [PDF]
-
Tao Xie and Jian Pei.
MAPO: Mining API Usages from Open Source Repositories. In Proceedings of the 3rd
International Workshop on Mining Software Repositories (MSR 2006), Shanghai,
China, pp. 54-57, May 2006. [PDF][BibTeX][Slides]
TUTORIALS/COURSE MODULES
-
Ahmed E. Hassan and
Tao Xie. Mining Software
Engineering Data. In Proceedings
of the 30th
International Conference
on Software Engineering (ICSE 2008), Companion Volume, Tutorials, Leipzig,
Germany, May 2008. [Tutorial
Web][BibTeX]
-
Chao
Liu, Tao Xie, and Jiawei Han. Mining for Software
Reliability. In Proceedings
of the 2007 IEEE
International Conference
on Data Mining (ICDM
2007), Omaha, NE, October 2007. [Tutorial
Web][BibTeX]
-
Tao
Xie, Jian Pei, and Ahmed E. Hassan. Mining Software
Engineering Data. In Proceedings
of the 29th
International Conference
on Software Engineering (ICSE 2007),
Companion Volume, Tutorials, Minneapolis,
MN, pp.
172-173, May 2007. [Tutorial
Web][PDF][BibTeX]
-
Tao Xie and Jian Pei.
Data Mining for Software Engineering. In Proceedings
of the 12th ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining (KDD 2006), Tutorial,
Philadelphia, Pennsylvania,
August
2006. [Tutorial
Web][Slides][BibTeX]
- Tao Xie. Data
Mining III - Text Mining course module. the Master
of Science in Analytics (MSA) program, the
Institute for Advanced Analytics, North Carolina State University,
January-February 2008.
PRESENTATIONS
- Tao Xie. Searching and Mining Open Source Code from the Web. ASE 2008 PC Workshop presentation. Google Tech Talks. Mountain View, CA, June 2008.
- Tao Xie. SpotWeb: Detecting Framework Hotspots via Mining Open Source Repositories on the Web. Conference presentation. the 5th Working Conference on Mining Software Repositories (MSR 2008),
Leipzig, Germany, May 2008.
- Tao Xie. Improving
Software Productivity and Quality via Mining Program Source Code.
Invited talk, Department of Electrical and Computer Engineering,
Clarkson University, Potsdam, NY, April 2008.
- Tao Xie. Improving Software Reliability and Productivity via Mining Program Source Code. Workshop presentation. the NSF Next Generation Software Program Workshop at IPDPS 2008 (NSFNGS
2008), Miami, Florida, April 2008.
- Tao
Xie. Recommendation Systems for Code Reuse. Workshop talk, Bellairs
Workshop On Software Analysis for Recommendation Systems (SARS 2008), Barbados, February, 2008. [Slides]
- Suresh Thummalapenta. PARSEWeb: A
Programmer Assistant for Reusing Open Source Code on the Web. Conference presentation, the 22nd
IEEE/ACM International Conference on Automated
Software Engineering (ASE 2007),
Atlanta, Georgia, November 2007.
- Tao Xie. Improving
Software Productivity and Quality via Mining Program Source Code.
Invited talk, Accenture Labs,
Chicago, IL, October 2007.
- Tao Xie. Improving Software Productivity and
Quality via
Mining Program Source Code.
Invited talk, Motorola Labs,
Schaumburg, IL, October 2007.
- Tao Xie. Improving
Automation in Developer Testing: Achievements and Challenges.
Conference talk, International Verify Conference (Verify 2007),
Arlington, VA, October 2007.
-
Suresh
Thummalapenta. Exploiting code search engines to improve programmer productivity. Conference ACM SIGPLAN SRC SRC presentation, the 21th Annual ACM SIGPLAN International Conference on
Object-Oriented Programming, Systems, Languages, and Applications
(Companion) (OOPSLA 2007), ACM SIGPLAN Student Research Competition, Montreal, Canada, October 2006.
- Tao Xie. Improving
Automation in Developer Testing: Achievements and Challenges.
Conference talk, Triangle Information Systems Quality Association Conference (TISQA 2007), Chapel Hill, NC, September 2007.
- Mithun Acharya. Mining API Patterns
as Partial Orders from Source Code: From Usage Scenarios to
Specifications. Conference presentation, the 6th joint meeting of
the European Software Engineering
Conference and the ACM SIGSOFT Symposium on the Foundations of Software
Engineering
(ESEC/FSE 2007),
Dubrovnik, Croatia, September, 2007.
-
Tao Xie. Improving
Software Productivity and Quality via Mining Program Source Code.
Invited talk, Lane
Department of Computer Science and Electrical Engineering, West
Virginia University, Morgantown,
WV, September 2007.
-
Tao Xie. Improving Programmer
Productivity via Mining Program Source Code.
Invited talk, Department of Computer Science and Engineering, Hong Kong
University of Science and Technology, China, August
2007.
-
Tao Xie. Improving Programmer Productivity via Mining
Program Source Code.
Invited talk, Department of Computer Science and Engineering, The
Chinese University of Hong Kong, Hong Kong, China, August 2007.
- Tao Xie. Mining Software Engineering
Data. Invited talk,
Software Engineering Institute, Peking University, Beijing, China, July
2007.
- Tao Xie. Improving
Programmer Productivity via Mining Program Source
Code.
Invited talk,
Department of Computer Science, University of Calgary, Canada, May 2007.
- Mithun Acharya. Mining Interface Specifications
for Generating Checkable
Robustness Properties. Conference presentation, the 17th
IEEE International Conference
on Software Reliability Engineering (ISSRE
2006), Raleigh, NC, November
2006.
- Mithun Acharya. Automatic Inference of Interface Properties from Program Source Code. Conference doctoral symposium presentation, the 14th ACM SIGSOFT Symposium on Foundations of Software Engineering (FSE 2006), Doctoral Symposium, Portland, Oregon, USA, November 2006
- Mithun Acharya. Automatic
Generation and Inference of Interface Properties from Program Source
Code. Conference ACM SIGPLAN SRC SRC presentation, the 20th Annual ACM SIGPLAN International Conference on
Object-Oriented Programming, Systems, Languages, and Applications
(Companion) (OOPSLA 2006), ACM SIGPLAN Student Research Competition, Portland, Oregon, USA, October 2006.
- Mithun
Acharya. Effective
Generation of Interface Robustness
Properties for Static Analysis. Conference poster presentation, the 21st IEEE/ACM
International Conference on Automated Software Engineering
(ASE
2006), Tokyo, Japan, September 2006.
- Mithun Acharya. Automatic Generation of Robustness and Security Properties from Program Source Code. Conference student forum presentation, the IEEE International Conference on Dependable Systems and Networks (DSN 2006), Student Forum, Philadelphia, PA, USA, June 2006
- Tao Xie. Data
Mining for Software Engineering. Visit talk, Fudan
University, China, May 2006.
- Tao Xie. MAPO:
Mining API Usages from
Open Source Repositories. Workshop presentation, the 3rd
International Workshop on Mining Software Repositories (MSR 2006),
Shanghai, China, May 2006. [Slides]
SOFTWARE
- XWeb: Detecting Exception-Handling Defects via Mining Open Source Code
- NEGWeb: Static Defect Detection via Searching Billions of Lines of Open Source Code
- SpotWeb: Detecting Framework Hotspots and Coldspots via Mining Open Source Code on the Web
- PARSEWeb: A
Programmer Assistant for Reusing Open Source Code on the Web
- MAPO: Mining API Usages from Open Source Repositories
- UnitPlus: Assisting Developer Testing in Eclipse
LINKS
Bibliography on Mining Software Engineering Data
SPONSORS
National Science Foundation Award CNS-0720641,
Computer Systems Research (CSR) Program (08/01/2007-07/31/2008)
Army
Research Office Award W911NF-07-1-0431,
Short Term Innovative Research (STIR) Program (06/18/2007-03/17/2008)