Photo by Aswin Anand on Unsplash https://unsplash.com/photos/0Hmh461Goog |
A friend recently inquired if I might know of any published stats or papers on Defect Density (DD) Metrics - that might provide some guidance on some kind of an industry average for expected number of defects per 1K Lines of Code (LOC).
I think that that is a fairly hard answer to obtain - and may very well vary greatly depending on a number of factors:
- Business / Industry (e.g. NASA, Emergency Medicine Support Systems, Military-grade Avionics and Flight Control Systems, etc. vs. Social Media applications)
- Level of expertise of development team members (not necessarily years of experience)
- Programming Language (although, this is a weaker indicator/correlation factor)
- Size of the code base, in LOC
- Size of the team
- Number of Classes/Modules
- Complexity of the application/problem domain
- Level of Regulatory Compliance for the business/problem domain
There are also other considerations to obtaining a relatively meaningful/accurate Defect Density average:
- Accounting for Averages skewing based on the level of clustering in Defect Severity Levels - for a given code base
- Accounting for Averages skewing based on level of Code Duplication - for a given code base
There is excellent Open Source tooling/reporting available for determining/monitoring those two - that is easily integrated into Continuous Integration build pipelines - and from which automated alerts can be configured, based on exceeding defined tolerance/threshold levels.
Some Suggested Background Reading, re: Defect Density:
- v-SVR Polynomial Kernel for Predicting the Defect Density in New Software Projects, 6 pages, accepted at Special Session: ML for Predictive Models in Eng. Applications at the 17th IEEE International Conference on Machine Learning and Applications, 17th IEEE ICMLA 2018
- "An important product measure to determine the effectiveness of software processes is the defect density (DD). In this study, we propose the application of support vector regression (SVR) to predict the DD of new software projects obtained from the International Software Benchmarking Standards Group (ISBSG) Release 2018 data set. Two types of SVR (e-SVR and v-SVR) were applied to train and test these projects. Each SVR used four types of kernels. The prediction accuracy of each SVR was compared to that of a statistical regression (i.e., a simple linear regression, SLR). Statistical significance test showed that v-SVR with polynomial kernel was better than that of SLR when new software projects were developed on mainframes and coded in programming languages of third generation"
- "Verma and Kumar[14] use simple and multiple linear regression models topredict the DD of 62 open source software projects. They conclude that there isstatistically significant level of acceptance for DD prediction using few repository metrics individually and jointly"
- "Yadav and Yadav [15] apply a fuzzy logic model for predicting DD at each phase of development life cycle of 20 software projects from the top most reliability relevant metrics of each phase. They conclude that the predicted DD are found very near to the actual defects detected during testing."
- "Mandhan et al. [16] predict DD by using simple and multiple regression modelsgenerated from seven different software static metrics(i.e., coupling, depth, cohesion, response, weighted methods, comments,and lines of code). They conclude that there is a significant level of acceptance for DD prediction with these static metrics individually and jointly."
- "Rahmani and Khazanchi [11]applysimple and multiple regression models to predict DD of 44 open source software projects. They conclude that there isa statistically significant relationship between DD and number of developers and software size jointly"
- "Knab et al. [18]use a decision tree model to predict DD of seven releases of an open source web browser project.They conclude that (1) it is feasible to predict DD with acceptable accuracies with metrics from the same release, (2) to use lines of code has little predictive power with regard to DD, (3) size metrics such as number of functions are of little value for predicting DD, (4) it is feasible predict DD with satisfactory accuracy by using evolution data such as the number of modification reports, and that (5) change couplings are of little value for the prediction of DD"
- "The new software projects used in our study were obtained from the public ISBSG data set Release 2018. This release contains 8,261 projects developed between the years 1989 and 2016. The data of these projects were submitted to the ISBSG from 26 different countries [31]"
- "Regarding limitations of our study, although the last version of the ISBSG release 2018 consists of 2,557 new software projects of the total (8,261 projects), after we followed the criteria suggested by the ISBSG for selecting the data sets for new software projects, we could only use a data set of 21 projects to train and test the models."
- "DD is defined as the number of defects by 1000 functional size units of delivered software in the first month of use of the software. It is expressed as defects by 1000 function points"
- Quality of Open Source Systems from Product Metrics Perspective, arXiv:1511.03194 [cs.SE], (Submitted on 10 Nov 2015)
- "Several previous researchers reported their answers to the question, “What is the typical defect density of a project?” Akiyama [15] reported that for each thousand lines of code (KLOC), there were 23 defects. McConnell [16] reported 1 to 25 defects, and Chulani [17] reported 12 defects."
- "Phipps [21] compared C++ and Java programs and found that C++ programs had two to three times as many defects per line of code as Java programs had."
- A Study on Defect Density of Open Source Software, Conference: 9th IEEE/ACIS International Conference on Computer and Information Science, IEEE/ACIS ICIS 2010, 18-20 August 2010, Yamagata, Japan
- "...number of developers and software project size together present greater promise in explaining defect density of OSS projects"
- PREDICTION OF DEFECT DENSITY FOR OPEN SOURCE SOFTWARE USING REPOSITORY METRICS, Journal of Web Engineering, Vol. 16, No.3&4 (2017) 293-310
- "In this work, a relationship of defect density with different repository metrics of open source software has been established with the significance level. Five repository metrics namely Size of project, Number of defects, Number of developers, Number of downloads, and the Number of commits have been identified for predicting the defect density of open source project. This relationship can be used to predict the defect density of open source software. An analysis has been performed on 62 open source software available at sourceforge.net. Simple and multiple linear regression statistical methods have been used for analysis. The result reveals a statistically significant level of acceptance for prediction of defect density by some repository metrics individually and jointly"
- "As part of a Department of Homeland Security (DHS)federally-funded analysis, Coverity established a new baseline for security and quality in open source software based on sophisticated scans of 17.5 million lines of source code using the latest research from Stanford University’s Computer Science department. The LAMP stack — popular open source packages Linux, Apache,MySQL, and Perl/PHP/Python — showed significantly better software security and quality above the base line with 0.290 defects per thousand lines of code compared to an average of 0.434 for 32 open source software projects analyzed"
- https://scan.coverity.com/projects/
- Review projects that others have submitted - that have been scanned by Coverity - and note their DD values...
- You can also register a Github project for a scan...
(a) Industry Average: "about 15 - 50 errors per 1000 lines of delivered code."(b) Microsoft Applications: "about 10 - 20 defects per 1000 lines of code during in-house testing, and 0.5 defect per KLOC (KLOC IS CALLED AS 1000 lines of code) in released product (Moore 1992)."(c) "Harlan Mills pioneered 'cleanroom development', a technique that has been able to achieve rates as low as 3 defects per 1000 lines of code during in-house testing and 0.1 defect per 1000 lines of code in released product (Cobb and Mills 1990). A few projects - for example, the space-shuttle software - have achieved a level of 0 defects in 500,000 lines of code using a system of format development methods, peer reviews, and statistical testing."
- "This observation is very old, and comes from a very venerable source, namely Fred Brooks in his book "The Mythical Man Month". He was a top manager at IBM, and managed many programming projects including the millions-of-lines operating system OS/360. In fact he reported that the number of bugs in a program is not proportional to the length of code, but quadratic! According to his research, the number of bugs was proportional to the length of the program to the power 1.5. In other words, a program that is ten times longer has 30 times more bugs. And he reported that this held over all programming languages, and levels of programming languages."
- The Personal Software Process, Experiences from Denmark, Proceedings. 28th Euromicro Conference (2002)
- "The focus of the research and practice in software process improvement (SPI) is shifting from traditional large-scale assessment based improvement initiatives to smaller sized, tailored initiatives where the emphasis is on the development personnel and their personal abilities. Personal software process (PSP/sup SM/) is a method designed for improving the personal capabilities of the individual software engineer. This paper contributes to the body of knowledge within this area by reporting experiences from Denmark. The findings indicate an improvement in effort estimation skills and an increase in the resulting product quality in terms of reduced total defect density. The data shows that even with a relatively small effort (i.e., 10%) used in defect prevention activities (i.e., design and code reviews) almost one third of all defects could be removed and, consequently, the time required for the testing was reduced by 50%. On the basis of this data, the use of the PSP method in the software industry is discussed"
- "Software failure is becoming a serious issue. Ariane 5 provided a recent spectacular example of how a simple mistake, entirely avoidable, was allowed to sneak through the software verification stage and cause an immensely expensive failure. However, it is not just the aerospace industry which suffers such traumas. Here, the author discusses some common misconceptions."
2019-11-16 Saturday Addendum:
Today, I learned about the Stella Report - and recommend it as additional reading:
https://snafucatchers.github.io/
2020-09-15 Tuesday Addendum:
A tip of the hat to Pete Jarvis for his post on LinkedIn to this paper
How Do Fixes Become Bugs?
A Comprehensive Characteristic Study on Incorrect Fixes inCommercial andOpen Source Operating Systems
No comments:
Post a Comment