Improving Software Productivity and Quality via
Mining Program Source Code
[Summary] [People] [Publications]
[Tutortials]
[Presentations]
[Software]
[Links] [Sponsors]
PROJECT SUMMARY
Since
late 90's, various data mining techniques have been applied to analyze
software engineering data, and have achieved many noticeable successes.
Substantial experience, development, and lessons of data mining for
software engineering pose interesting challenges and opportunities for
new research and development. This project develops novel
techniques and tools for mining program source code by exploiting code
search engines to expand the mining scope, adapting program model
checking to collect static program traces for simulating runtime
program behavior (without requiring system runtime setup or system test
suites), and applying data mining techniques on collected program
source code or its static traces to mine API usage patterns or
properties. These mined API usage patterns or properties are used to
find bugs in software systems or aid developers in correctly
writing API client code when developing software systems.
PEOPLE
Faculty
Tao Xie
(Principal Investigator)
Graduate Students
Madhuri R
Marri (MS Student)
Suresh
Thummalapenta (PhD Student)
Undergraduate
Student (REU
project)
Justin W. Gorham
Collaborators
Mithun
Acharya (to join ABB Research June 2009)
PUBLICATIONS
- Suresh
Thummalapenta and Tao Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. To appear in Proceedings
of the 24th IEEE/ACM International Conference on Automated
Software Engineering (ASE 2009), Auckland, New Zealand, November 2009. [BibTeX]
- Hao Zhong, Lu Zhang,
Tao Xie, and Hong Mei. Inferring Resource Specifications from Natural Language API Documentation. To appear in Proceedings
of the 24th IEEE/ACM International Conference on Automated
Software Engineering (ASE 2009), Auckland, New Zealand, November 2009. [BibTeX]
- Nuo Li, Tao Xie, Nikolai
Tillmann, Jonathan de Halleux, and Wolfram Schulte. Reggae: Automated Test Generation for Programs using Complex Regular Expressions. To appear in Proceedings
of the 24th IEEE/ACM International Conference on Automated
Software Engineering (ASE 2009), Short Paper, Auckland, New Zealand, November 2009. [BibTeX]
- Suresh
Thummalapenta, Tao Xie, Nikolai Tillmann, Peli de
Halleux, and Wolfram Schulte. MSeqGen: Object-Oriented Unit-Test Generation via Mining Source Code. To appear in Proceedings
of the 7th joint meeting of the European Software Engineering
Conference and the ACM SIGSOFT Symposium on the Foundations of Software
Engineering (ESEC/FSE 2009), Amsterdam, the Netherlands, August 2009. [PDF][BibTeX]
- Tao Xie, Suresh Thummalapenta, David Lo, and Chao Liu. Data Mining for Software Engineering. IEEE Computer, 42(8), pp.35-42, August 2009. [BibTeX]
- Hao Zhong, Tao Xie, Lu Zhang, Jian Pei, and Hong Mei. MAPO: Mining and Recommending API Usage Patterns. To appear in Proceedings
of the 23rd European Conference on Object-Oriented Programming (ECOOP 2009), Genova, Italy, July 2009. [PDF][BibTeX]
- Tao Xie, Nikolai Tillmann,
Peli de Halleux, and Wolfram Schulte. Fitness-Guided Path Exploration
in Dynamic Symbolic Execution. To appear in Proceedings
of the 39th Annual IEEE/IFIP International Conference on
Dependable Systems and Networks (DSN 2009),
Lisbon, Portugal, June-July 2009. [PDF][BibTeX]
- Suresh
Thummalapenta and Tao Xie. Mining
Exception-Handling Rules as Sequence Association Rules. To
appear in Proceedings
of the 31st International Conference
on Software Engineering (ICSE 2009),
Vancouver, Canada, May 2009. [PDF][BibTeX]
- Xiaoyin Wang, Lu
Zhang, Tao Xie, Hong Mei, and Jiasu Sun. Locating Need-to-Translate
Constant Strings for Software Internationalization. To
appear in Proceedings
of the 31st International Conference
on Software Engineering (ICSE 2009),
Vancouver, Canada, May 2009. [PDF][BibTeX]
- Xiaoyin Wang, Lu
Zhang, Tao Xie, Hong Mei, and Jiasu Sun. TranStrL: An Automatic
Need-to-Translate String Locator for Software Internationalization. To
appear in Proceedings
of the 31st International Conference
on Software Engineering (ICSE 2009),
Formal Demonstration, Vancouver, Canada, May 2009. [PDF][BibTeX]
- Wujie
Zheng, Michael R. Lyu, and Tao Xie. Test
Selection for Result Inspection via Mining Predicate Rules. In Proceedings
of the 31st International Conference
on Software Engineering (ICSE 2009),
New Ideas and Emerging Results, Vancouver, Canada, pp. 219-222, May 2009
- Madhuri R Marri, Suresh
Thummalapenta, and Tao Xie. Improving
Software Quality via Code Searching and Mining. To appear in Proceedings
of the First
International Workshop on Search-Driven Development – Users,
Infrastructure,
Tools and Evaluation (SUITE 2009),
Vancouver, Canada,
May
2009. [PDF][BibTeX]
- Tao Xie, Nikolai
Tillmann,
Jonathan de Halleux, and Wolfram Schulte.
Mutation
Analysis of Parameterized Unit Tests. In Proceedings
of the 4th International Workshop on Mutation Analysis (Mutation
2009),
Denver, Colorado,
pp. 177-181, April
2009. [PDF][BibTeX]
- Mithun Acharya and
Tao Xie. Mining API
Error-Handling Specifications from Source Code. To appear
In Proceedings
of International Conference on Fundamental Approaches to Software
Engineering
(FASE 2009), York,
UK, March 2009. [PDF][BibTeX]
- Suresh
Thummalapenta and Tao Xie. SpotWeb:
Detecting Framework Hotspots and Coldspots via Mining Open Source Code
on the Web. In Proceedings
of the 23rd IEEE/ACM International Conference on Automated
Software Engineering (ASE 2008), L'Aquila, Italy,
pp. 327-336, September 2008. [BibTeX]
A
previous version
appeared in Proceedings
of MSR 2008 as a Position Paper.
- Suresh
Thummalapenta and Tao Xie. NEGWeb: Detecting Neglected Conditions via
Mining Programming Rules from Open Source Code. Presented
as a Student Poster at International Symposium on Software
Testing and Analysis (ISSTA
2008), Seattle, Washington, July 2008.
- Christoph Csallner,
Yannis Smaragdakis, and Tao
Xie. DSD-Crasher:
A hybrid analysis
tool for bug finding. ACM
Transactions on Software Engineering
and Methodology, Vol. 17,
Issue 2, pp. 345-371,
July 2008. [PDF][BibTeX]
- Xiaoyin Wang, Lu Zhang, Tao Xie, John Anvik, and Jiasu Sun. An Approach to Detecting Duplicate Bug
Reports using Natural Language and Execution Information. In Proceedings of the 30th International
Conference on Software Engineering (ICSE
2008), Leipzig, Germany, pp. 461-470, May 2008. [PDF][BibTeX]
- Suresh
Thummalapenta and Tao Xie. SpotWeb:
Detecting Framework Hotspots and Coldspots via Mining Open Source Code
on the Web. To appear in Proceedings
of the 23rd IEEE/ACM International Conference on Automated
Software Engineering (ASE 2008),
L'Aquila, Italy, September 2008. [PDF][BibTeX]
- Tao Xie, Mithun Acharya, Suresh Thummalapenta, and Kunal Taneja. Improving Software Reliability and
Productivity via Mining Program Source Code. In Proceedings of the NSF Next Generation Software Program
Workshop at IPDPS 2008 (NSFNGS
2008), Miami, Florida, April 2008. [PDF][BibTeX]
- Suresh Thummalapenta and Tao Xie. PARSEWeb: A Programmer Assistant for
Reusing Open Source Code on the Web. In Proceedings of the 22nd IEEE/ACM
International Conference on Automated Software Engineering (ASE 2007), Atlanta, Georgia, pp.
204-213, November 2007. [PDF][BibTeX]
- Mithun Acharya and Tao Xie. Static Detection of API Error-Handling Bugs
via Mining Source Code. North Carolina State University
Department of Computer Science Technical report TR-2007-35, October 15, 2007. [PDF][BibTex]
- Yoonki Song, Suresh Thummalapenta,
and Tao Xie. UnitPlus: Assisting
Developer Testing in Eclipse. In Proceedings of the
Eclipse Technology eXchange
Workshop at OOPSLA 2007 (ETX 2007),
Montreal, Canada, Pages 26-30, October 2007. (Best Student Paper Award) [PDF][BibTeX]
- Suresh
Thummalapenta. Exploiting code search engines to improve
programmer productivity. In Proceedings
of the 21th Annual ACM SIGPLAN International Conference on
Object-Oriented Programming, Systems, Languages, and Applications
(Companion) (OOPSLA 2007), ACM
SIGPLAN Student Research Competition, Montreal, Canada, pp. 921-922, October 2007.[PDF][PPT]
- Suresh Thummalapenta and Tao Xie.
NEGWeb: Static Defect Detection
via Searching Billions of Lines of Open Source Code. North
Carolina State University Department of Computer Science Technical
report TR-2007-24, September 16, 2007. [PDF][BibTex]
- Mithun Acharya, Tao Xie, Jian Pei, and Jun Xu. Mining API Patterns as Partial Orders from
Source Code: From Usage Scenarios to Specifications. In Proceedings of the 6th joint meeting of
the European Software Engineering Conference and the ACM SIGSOFT
Symposium on the Foundations of Software Engineering (ESEC/FSE 2007), Dubrovnik,
Croatia, pp. 25-34, September, 2007. [PDF][BibTeX]
- Mithun Acharya, Tao Xie, and Jun Xu. Mining Interface
Specifications for Generating Checkable Robustness Properties.
In Proceedings
of the 17th IEEE International Conference on Software Reliability
Engineering (ISSRE
2006), Raleigh, NC, pp.
311-320, November 2006. [PDF][BibTeX]
- Mithun Acharya. Automatic Inference of
Interface Properties from Program Source Code. In Proceedings of the 14th ACM SIGSOFT
Symposium on Foundations of Software Engineering (FSE 2006), Doctoral Symposium,
Portland, Oregon, USA, November 2006. [PDF]
- Mithun Acharya. Automatic
Generation and Inference of Interface Properties from Program Source
Code. In Proceedings
of the 20th Annual ACM SIGPLAN International Conference on
Object-Oriented Programming, Systems, Languages, and Applications
(Companion) (OOPSLA 2006), ACM
SIGPLAN Student Research Competition, Portland, Oregon, USA, pp.
750-751, October 2006.
- Mithun Acharya, Tanu Sharma, Jun Xu, and Tao
Xie. Effective Generation of
Interface Robustness Properties for Static Analysis. In
Proceedings of the 21st IEEE/ACM International Conference on Automated
Software Engineering (ASE 2006),
Short Paper, Tokyo, Japan, pp. 293-296, September 2006. [PDF][BibTeX]
- Mithun Acharya. Automatic Generation of
Robustness and Security Properties from Program Source Code. In Supplemental Proceedings of the IEEE
International Conference on Dependable Systems and Networks (DSN 2006), Student Forum, Philadelphia, PA,
USA, pp. 166-168, June 2006. [PDF]
- Tao Xie and Jian Pei. MAPO: Mining API Usages
from Open Source Repositories. In Proceedings of the 3rd
International Workshop on Mining Software Repositories (MSR 2006), Shanghai,
China, pp. 54-57, May 2006. [PDF][BibTeX][Slides]
TUTORIALS/COURSE MODULES/ORGANIZED EVENTS
- Tao
Xie and Ahmed E. Hassan.
Mining Software
Engineering Data. To
be presented at the 31st
International Conference
on Software Engineering (ICSE 2009),
Tutorials, Vancouver,
Canada, May 2009. [Tutorial
Web][BibTeX]
- Ahmed E. Hassan and Tao Xie. Mining
Software Engineering Data. In Proceedings of the 30th
International Conference on Software
Engineering (ICSE 2008), Companion
Volume, Tutorials, Leipzig, Germany, May 2008. [Tutorial Web][BibTeX]
- Chao Liu, Tao Xie, and Jiawei Han. Mining for Software Reliability. In Proceedings
of the 2007 IEEE International Conference on Data Mining (ICDM 2007),
Omaha, NE, October 2007. [Tutorial Web][BibTeX]
- Tao Xie, Jian Pei, and Ahmed E. Hassan. Mining Software Engineering Data. In
Proceedings of the 29th International Conference on
Software Engineering (ICSE 2007), Companion
Volume, Tutorials, Minneapolis,
MN, pp. 172-173, May 2007. [Tutorial Web][PDF][BibTeX]
- Tao Xie and Jian Pei. Data Mining for Software Engineering.
In Proceedings of the 12th ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining
(KDD 2006), Tutorial,
Philadelphia, Pennsylvania, August 2006
- Tao Xie, Co-Organizer
(with Abraham
Bernstein, Harald
Gall, and Andreas
Zeller), Dagstuhl Seminar on Mining Programs and Processes (Dagstuhl
Seminar 07491)
. [Tutorial
Web][Slides][BibTeX]
- Tao Xie. Data Mining III -
Text Mining course module. the Master of Science in
Analytics (MSA) program, the
Institute for Advanced Analytics, North Carolina State University,
January-February 2008.
PRESENTATIONS
- Tao Xie. Improving
Software Reliability via Automated Static
and Dynamic Analysis. Invited talk, FDA Office of Science and
Engineering Laboratories, Electrical and Software Engineering Group,
October 2008.
- Tao
Xie. Improving Software Productivity and Quality via Mining Program
Source Code. Invited talk, Department of Computer Science, Drexel
University, Philadelphia, PA, September 2008.
- Tao Xie. Searching
and Mining Open Source Code from the Web. ASE 2008 PC Workshop
presentation. Google Tech Talks. Mountain View, CA, June 2008.
- Tao Xie. Mining Software
Engineering Data. Tutorial presentation, the 30th International
Conference on Software Engineering (ICSE
2008), Leipzig, Germany, May 2008.
- Tao Xie. SpotWeb: Detecting Framework Hotspots via
Mining Open Source Repositories on the Web. Conference presentation. the 5th Working Conference on Mining Software
Repositories (MSR
2008),
Leipzig, Germany, May 2008.
- Tao Xie.
Improving
Software Productivity and Quality via Mining Program Source Code.
Invited talk, Department of Electrical and Computer Engineering,
Clarkson University, Potsdam, NY, April 2008.
- Tao Xie. Improving
Software Reliability and Productivity via Mining
Program Source Code. Workshop presentation. the NSF Next
Generation Software Program Workshop at IPDPS 2008 (NSFNGS
2008), Miami, Florida, April 2008.
- Tao
Xie. Recommendation Systems for Code Reuse. Workshop talk, Bellairs
Workshop On Software Analysis for Recommendation Systems (SARS 2008),
Barbados, February, 2008. [Slides]
- Suresh
Thummalapenta. PARSEWeb: A Programmer Assistant for
Reusing Open Source Code on the Web. Conference presentation, the
22nd IEEE/ACM International Conference on Automated Software Engineering
(ASE 2007), Atlanta, Georgia,
November 2007.
- Tao
Xie. Improving Software Productivity and
Quality via Mining Program Source Code. Invited talk, Accenture Labs,
Chicago, IL, October 2007.
- Tao
Xie. Improving Software Productivity and
Quality via Mining Program Source Code. Invited talk, Motorola Labs,
Schaumburg, IL, October 2007.
- Tao
Xie. Improving Automation in Developer Testing: Achievements and
Challenges. Conference talk, International Verify Conference (Verify 2007), Arlington, VA,
October 2007.
- Suresh Thummalapenta. Exploiting
code search engines to improve programmer productivity.
Conference ACM SIGPLAN SRC SRC presentation, the 21th Annual ACM SIGPLAN International
Conference on Object-Oriented Programming, Systems, Languages, and
Applications (Companion) (OOPSLA 2007), ACM
SIGPLAN Student Research Competition, Montreal, Canada, October 2006.
- Tao Xie. Improving Automation
in Developer Testing: Achievements and Challenges. Conference talk,
Triangle Information Systems Quality Association Conference (TISQA 2007), Chapel Hill, NC, September 2007.
- Mithun Acharya. Mining
API Patterns as Partial Orders from Source Code: From Usage Scenarios
to Specifications. Conference
presentation, the 6th joint meeting of the European
Software Engineering Conference and the ACM SIGSOFT Symposium on the
Foundations of Software Engineering (ESEC/FSE 2007), Dubrovnik,
Croatia, September, 2007.
- Tao
Xie. Improving Software Productivity and Quality via Mining Program
Source Code. Invited talk, Lane Department of Computer Science and
Electrical Engineering, West Virginia University, Morgantown, WV,
September 2007.
- Tao
Xie. Improving Programmer Productivity via
Mining Program Source Code. Invited talk, Department of Computer
Science and Engineering, Hong Kong University of Science and
Technology, China, August 2007.
- Tao
Xie. Improving Programmer Productivity via
Mining Program Source Code. Invited talk, Department of Computer
Science and Engineering, The Chinese University of Hong Kong, Hong
Kong, China, August 2007.
- Tao
Xie. Mining Software Engineering Data. Invited talk, Software
Engineering Institute, Peking University, Beijing, China, July 2007.
- Tao
Xie. Improving Programmer Productivity via
Mining Program Source Code. Invited talk, Department of Computer Science, University of Calgary,
Canada, May 2007.
- Mithun
Acharya. Mining Interface
Specifications for Generating Checkable Robustness Properties. Conference presentation,
the 17th IEEE International Conference on
Software Reliability Engineering (ISSRE 2006),
Raleigh, NC, November 2006.
- Mithun Acharya. Automatic
Inference of Interface Properties from Program
Source Code. Conference doctoral symposium presentation, the 14th ACM SIGSOFT Symposium on
Foundations of Software Engineering (FSE 2006), Doctoral Symposium,
Portland, Oregon, USA, November 2006
- Mithun Acharya. Automatic
Generation and Inference of Interface Properties from Program Source
Code. Conference ACM SIGPLAN SRC SRC presentation, the 20th Annual ACM SIGPLAN International
Conference on
Object-Oriented Programming, Systems, Languages, and Applications
(Companion) (OOPSLA 2006), ACM
SIGPLAN Student Research Competition, Portland, Oregon, USA,
October 2006.
- Mithun Acharya. Effective Generation of
Interface Robustness
Properties for Static Analysis. Conference poster presentation, the 21st IEEE/ACM International Conference on Automated
Software Engineering (ASE
2006), Tokyo, Japan, September 2006.
- Mithun Acharya. Automatic Generation of Robustness and Security Properties
from Program Source Code. Conference student
forum presentation, the
IEEE International Conference on Dependable Systems and Networks (DSN 2006), Student Forum, Philadelphia, PA,
USA, June 2006
- Tao Xie. Data Mining for Software Engineering.
Visit talk, Fudan University, China, May 2006.
- Tao Xie. MAPO: Mining API Usages from Open Source
Repositories. Workshop presentation, the
3rd International Workshop on Mining Software Repositories (MSR
2006), Shanghai, China,
May 2006. [Slides]
SOFTWARE & EVALUATION SUBJECTS/RESULTS
- C/C++ Code Mining
- Java Code Mining
- C# Code
Mining
- Tool development in progress
LINKS
Bibliography on Mining
Software Engineering Data
SPONSORS
Army
Research Office Award W911NF-08-1-0443 (09/08/2008-08/30/2011)
National Science Foundation Award CNS-0720641,
Computer Systems Research (CSR) Program (08/01/2007-07/31/2008)
Army
Research Office Award W911NF-07-1-0431,
Short Term Innovative Research (STIR) Program (06/18/2007-03/17/2008)
College of Engineering and Department of Computer Science,
North Carolina State University Undergraduate
Research Award (01/24/2008-05/16/2008)
|
Attachments (1)
-
src05-thummalapenta.pdf - on May 12, 2009 6:58 PM by Tao Xie (version 1)
104k
View Download
|