Welcome, this page is auxiliary for the paper: Understanding and Classifying Code Harmfulness, here we display the results as well provide the datasets used.



Results RQ2 Survey Results:

6 - Could you please justify your answer to question 5?
Code smells often indicate or lead to bigger problems. Those bigger problems can make a code base fragile, difficult to maintain, and prone to errors.
It makes it difficult to read and maintain your code base.
It difficults the readability of the code, that can lead to the introductions of bugs.
They tend to indicare over complication or lacking SOLID principals.
Code smells have impacts on code maintainability, especially when the code is not clear what should be done. This makes the misinterpretation of what the code should do, and if there is no automated testing to guide the developer on what the code should do, we often have the rework to rewrite all the functionality, because it is simpler to develop from scratch. that improve a code snippet.
It can impact very badly in the long run of software development, causing degration of code quality.
Code smell is a way to detect bad decisions that can impact directly the runtime and the business around the software.
This can hardly be evaluated out of context. Some harmfulness is caused by the broken window effect anyway.
Smells are indications something MIGHT be wrong and it's LIKELY that you are violating some aspect of good OO design. But then again maybe not.
It usually impacts on maintainability the presence of code smells, when the time of modify the existing code arrives, code smells make that changes easy to break the code. I don't mark this as very harmful because it is preferable a well tested code with some code smells that a non tested software without code smells.
Code smell is an item for have a technical debt. Miss unit test is another item being harmful.
It will have direct consequences in quality, maintenance if the smell is not fixed.
In a common basis we work with scalable and mantainable software, so any code smell probably means, in future, a huge refactor. There are worse code smells like the blob and stuffs, but any of them I consider a problem.
They always represent a relevant risk to the maintenance and, most of them, to future implementations, demanding a lot of time with refactor work.
If this problem isn't dealt with it'll spread, the cost and effort to fix it'll increase.
At the first moment, this wouldn't a big problem, but in the future, it will cause problems to maintain the application.
It usually means you are working with people who don't care about the quality of the work or is in a unconscious incompetence level, which means that they think they know what they are doing but they are clueless.
They increase the complexity of the codebase, and the cognitive load on engineers. This leads to decreased productivity and increased bugs.
Code smells make software readability and comprehensibility worse. That itself already degrade software quality. Moreover, code smells can make it harder to find bugs in the software, since you can only find bugs in code that you can read, and understand.
Code smells tend to make code understanding difficult and software maintenance very difficult.
Your code is going to be confusing to read.
Long methods require more time for maintenance and many times, require devs with more experience.
Code smells usually indicates that something is wrong. When you have, for example, long methods, code readability decreases and therefore system maintenance is adversely affected.
Complex and long methods are difficult to manage. I like to split complexity in smaller chunks.
I do believe that code smells are harmful, but not usually harmful. There are specific cases that make them harmful, for example, when specific types of smells (or from similar categories) co-occur together. For instance, if a class contains a God Class, Intensive Coupling (or Dispersed Coupling) and more than one Feature Envy, in this case, the smells are harmful since they indicate that the class has a bigger structural problem. On the other hand, there are cases that smells are not harmful, for example, when a class has Lazy Class.
I believe that if you have unit test around it, it'd be Harmful or even Somewhat Harmful. Usually code smell is easy to be refactored when you have unit test.
If I need to improve, I can introduce a bug.
A piece of bad code tend to degrade fast after a lot of time without proper refactoring/maintenance, leading to a costly code evolution in the future.
I don't think that there are a mandatory relation between code smells and the harmfulness. But, naturally, in some cases code smells could contains characteristics of harmful code.
Some code can be ugly, but it works.
Not all codes smell are so harmful.
It is difficult to introduce changes in the business logic.
The presence of code smell could increase code maintenance work.
Code smell instances eventually hinder some major development tasks, especially when it comes to maintaining and evolving systems. I've had a hard time to read and change some large classes, complex methods, and intrincate hierarchies as well.
8 - Could you please describe Code Smells Detection tools that you have used?
FindBugs, PMD, SonarQube
Sonar, tslint
PMD and JDeodorant
Tools for detecting code smells are important because humans are subject to failure, some practices that are clearly seen as code smells can always be detected by tools. Another point is that when you work on a project on your own, reviewing your own code for code smells has a much smaller impact than someone else's review, and puts a tool to help you with code review helps in those cases.
At some time of de development I consider checking all the project in order to find possible code smells. The problem is still the lacking of good software tools to do so.
Sonarqube, findbugs
Reek
Sonar, Findbugs, jacoco
Sonar, pmd, owasp, findbugs, vpav
Rubocop, Reek
PMD, CheckStyle
Idea, SonarQube, Js/TsLint are the ones i feel more confortable
Sonarqube and Sonarlint
Sometimes the changed code in a pull request doesn't have a full picture like duplication and etc.
Analyze Inspect Code - Android Studio
Most of themwere used as part of a CI solution, like sonar, and works pointing out code smells during the development process and suggesting improvements to solve them.
Rubocop, Pronto and Merge Requests with my team
Static code analyzers
I have read research on automatic code smell detection, but I don't recall any tool names.
It is not always in the development of the program to give yourself a macro view of it, so it is likely that some code smell will not be identified.
PMD
I've used in my career some tools to help me with code smells, like Checkstyle and more recently, we use Sonar.
After long years of experience I directly avoid all possible smells. For what I don't see at the first time there will be a code refactory.
I use one that has been developed in the research group that I work with. The name is Organic 2.0
For Java I've used PMD and since I use IntelliJ IDEA, it has embedded static code analysis in it
Linters in general
lints, code formatters, code analyzers.
In the past I have used some tools such as PMD and JDeodorant. In both cases, the tools were used in non-graphical ways, just for research purposes.
Sonarqube
Sonar and ESLint
I've published a literature review on these tools at EASE 2016 ("A review-based comparative study of bad smell detection tools"). Part of my work was using tools like PMD, inFusion, JDeodorant, and etc. I've also used some of these tools to detect smells for study purposes. But, to be honest, mostly I've written my own detection scripts to run them on spreadsheets with code metrics data... LOL
10 - Could you please justify your answer to question 9?
I refactor to improve reusability and reduce fragile or error-prone code. How the code got to that state is irrelevant.
Code smells should be found by static analysis tools or reviewer.
Buggy code is something that already problematic to the software, so should be prioritized.
You may have code that has code smells, but if they have good automated testing, refactoring becomes a simpler and less risky task. And there are different code smells, some impact the readability of implementation details more, if the public interface of the code is well written, but the internal code is poorly made, refactoring becomes even simpler. The problem is more in code than badly done, do not work. We should prioritize covering them with tests, then adjust them as guided by the tests discovering current code problems, then refactoring them.
I always consider improving the code quality when fixing a bug.
We usually run sonarqube reports and try to fix all the warnings that are found.
Not all the problems are solved in a refactoring session, bug is your mission to deliver the best in that situation
Short term business value. Smells are best addressed by coaching and boy scout approaches.
Since a smell only indicates there might be a problem and harmful code definitely does have a problem, you should look at that first.
I would fix all of it but time is limited so priorities have to be set.
The question is really tricky because you will refactor always the worst part, otherwise the code will not work. I see what do you want to achieve with that question but the way you are asking that is really biased.
Code blocks that already have a bug and have code smell is much more problematic than a code smell.
Always leave the code better than before.
When touching on working code the risk to ruin something is greater than fixing already buggy code.
I usually look for problems that already exist or might become one in the future
Working on code smells means working on the prevention of most of bugs that could appear, so worry only about those who already have bugs attached too it is never enough.
All information is important.
because if you are refactoring you are trying to improve the code. Does not matter if it's horrible or somewhat acceptable.
Any improvements are worth the time.
I usually don't refactor buggy code when a bug is identified; I only fix it asap and leave possible refactoring for later. I apply a refactoring usually when the code is baddly structured. If a bug is found in the middle of the refactoring, I will be fixed also during the refactoring process, asap.
Codes that allow bugs are problematic because they are harmful to execution.
Refactoring process could insert new bugs, so I avoid to refactor code that it is working.
When refactor source vocês, we focus on long and complex codes, to facilitate future maintenance or upgrades.
When my team works with software maintenance, we usually get a problem and start reviewing it. Usually this problem represents a bug; therefore, we have the opportunity to improve some parts of the code.
In the real life we have to stay into the budget: so when, for whatever reason, I put my hand on a piece of code I try to fix/improve it.
I usually refactor my code when it is hard to understand/maintain, however, if I have to prioritize refactorings, I refactor elements that contain bugs and have smells. Since I'm already refactoring the code to get rid of the bug, I take the opportunity to remove the smell as well. The goal is to kill two birds with one stone.
I usually give priority to duplication, shotgun surgery, large class, naming, long methods, too many params.
In general, I tend to fix the bug (first) as fast as possible, which means I do not think about quality code at first.
Economics. Buggy code have hard effects on end user which tend to abandon your product after working for some time with a bug.
When refactoring, it is ideal to deal with all modules of the code, but due to some limitations, such as time, it is most important to look at fragile parts as harmful code.
Second criterion will be harmful.
There is no priority.
Well-written code avoids Code smells.
My academic background made me see refactorings as means to "clean the house" in terms of bad code structures rather than means to actually fix bugs. Eventually, refactorings can help fix a bug. Nevertheless, as a practitioner, I've tried the most to avoid refactoring buggy code elements. These elements should be changed only if necessary; otherwise, the damage can get worse (other bugs may emerge).
11 - Are you aware of some issues in which code smells are harmful to the software quality? If yes, could you please describe it below or put the issue link?
Code smells often indicate or lead to bigger problems. Those bigger problems can make a code base fragile, difficult to maintain, and prone to errors.
Yes. Security or maintenance issues.
Most of the code smells are dangerous in their own ways. They usually they make the codebase hard to maintain and modify. Or they introduce subtle bugs in certain scenarios.
Sometimes it can lead to early complex architecture.
Bad decisions of choosing patterns, the false senior developer (probably a old people in your team that you trust in a period, and when you see the disaster are made). In terms of code, inheritance, bad encapsulation, repeating yourself.
Misunderstandings waiting to happen.
Maintainability, readability.
As i said before, maintainability, you can code with code smells if it is a single person codebase because you know what have you done. But when there are more developers involved in a development, code smells make then doubt about the intentions or purposes of that code smell, is it there otherwise code doesn't work?, those kind of questions make developers lose time and commit errors.
Coupled code, fuzzy parameters names.
Maintenance.
God classes are the first example that comes to my mind. It doesn't have separation of concerns, do more than one job, if maintaining that code you will affect many points...there are many quality risks. It is even hard to focus the quality assurance in just one part of the application, forcing you to do a full regression.
Yes. It makes code harder to read and understand and this contributes to more bugs and more time debugging.
Code smells use to decrease code readability.
Yes. Sometimes, when you have a method or function too long, normally this is related to centrilize many tasks in only one place and that method/function is being responsible for activities that actually shouldn't be its responsibility. So, this decreases the software quality in terms of future maintenance or possible software evolution.
Efficiency, security and maintainability.
Fault proneness, software degradation, Error proneness, maintanance dificulty.
Sure, and they are many. As stated in Q.6, I've struggled to read and change smelly code many times before. Too messy code is irritating, right? Once I had to prepare a project for migration across programming languages. It was a hell of a work to read some pretty large classes with dozens of lenghy and complex methods. And what can I say about the uncountable dependencies among classes that made it hard to reorganize the code every now and then?
12 - Please, let us know if you have any additional comments about Harmful Code.
If you have a Sonarqube around, and code reviews in the delivery pipeline they should be there in the first place.
Most of the projects that I worked on got new bugs after refactorings. Maybe the concept of refactoring should be upgrade to something like: code smells + bug = refactoring.
How much harmful, much priced.
It's very expensive to fix a bug that is in production, devs should analyze their solution, many times before submit their code. Testing is a very important phase of the development and the scenarios need to be real ones and complex scenarios, covering edge cases and large data input.
Bad quality code is an effect of programmers with bad knowledge about programming. Not only programming languages, but also logic, abstractions and modeling.


Results RQ3 (How effective are Machine Learning techniques to detect harmful code?):

Algorithm Smell Harmful
Switch Statements
KNeighborsClassifier 0.29 0.80
RandomForestClassifier 0.25 1.00
DecisionTreeClassifier 0.50 1.00
AdaBoostClassifier 0.33 1.00
GradientBoostingClassifier 0.25 1.00
SVM 0.50 0.89
GaussianNB 0.50 0.86
MIN 0.25 0.80
MAX 0.50 1.00
Magic Number
KNeighborsClassifier 0.50 0.73
RandomForestClassifier 0.80 1.00
DecisionTreeClassifier 0.67 0.96
AdaBoostClassifier 0.56 0.85
GradientBoostingClassifier 0.72 0.93
SVM 0.72 0.80
GaussianNB 0.25 0.68
MIN 0.25 0.68
MAX 0.80 1.00
Long Identifier
KNeighborsClassifier 0.67 0.80
RandomForestClassifier 0.75 1.00
DecisionTreeClassifier 0.75 1.00
GradientBoostingClassifier 0.75 1.00
SVM 0.67 0.80
GaussianNB 0.33 0.86
MIN 0.33 0.80
MAX 0.75 1.00
Insufficient Modularization
KNeighborsClassifier 0.64 0.90
RandomForestClassifier 0.60 1.00
DecisionTreeClassifier 0.50 0.97
AdaBoostClassifier 0.60 0.97
GradientBoostingClassifier 0.57 1.00
SVM 0.69 0.93
GaussianNB 0.67 0.62
MIN 0.50 0.62
MAX 0.69 1.00
Long Parameter List
KNeighborsClassifier 0.92 1.00
RandomForestClassifier 0.93 1.00
DecisionTreeClassifier 0.78 1.00
AdaBoostClassifier 0.93 1.00
GradientBoostingClassifier 1.00 1.00
SVM 0.92 1.00
GaussianNB 0.92 1.00
MIN 0.78 1.00
MAX 1.00 1.00
Unutilized Abstraction
KNeighborsClassifier 0.70 0.80
RandomForestClassifier 0.81 1.00
DecisionTreeClassifier 0.72 0.96
GradientBoostingClassifier 0.86 1.00
SVM 0.63 0.88
GaussianNB 0.55 0.70
MIN 0.55 0.70
MAX 0.86 1.00
Cyclic-Dependent Modularization
KNeighborsClassifier 0.50 0.74
RandomForestClassifier 0.67 0.97
DecisionTreeClassifier 0.67 0.97
AdaBoostClassifier 0.61 0.88
GradientBoostingClassifier 0.71 0.97
SVM 0.45 0.88
GaussianNB 0.52 0.58
MIN 0.45 0.58
MAX 0.71 0.97
Deficient Encapsulation
KNeighborsClassifier 0.73 0.90
RandomForestClassifier 0.65 0.98
DecisionTreeClassifier 0.61 1.00
AdaBoostClassifier 0.67 0.83
GradientBoostingClassifier 0.63 0.98
SVM 0.70 0.96
GaussianNB 0.56 0.55
MIN 0.56 0.55
MAX 0.73 1.00
Long Method
KNeighborsClassifier 0.40 0.75
RandomForestClassifier 0.75 1.00
DecisionTreeClassifier 0.29 0.86
AdaBoostClassifier 0.86 1.00
GradientBoostingClassifier 0.86 1.00
SVM 0.50 0.75
GaussianNB 0.86 1.00
MIN 0.29 0.75
MAX 0.86 1.00
Long Statement
KNeighborsClassifier 0.79 0.96
RandomForestClassifier 0.86 1.00
DecisionTreeClassifier 0.81 1.00
AdaBoostClassifier 0.64 1.00
GradientBoostingClassifier 0.86 1.00
SVM 0.75 1.00
GaussianNB 0.13 0.75
MIN 0.13 0.75
MAX 0.86 1.00
Empty Catch Clause
KNeighborsClassifier 0.86 0.86
RandomForestClassifier 0.50 1.00
DecisionTreeClassifier 0.00 0.86
AdaBoostClassifier 0.00 0.86
GradientBoostingClassifier 0.44 1.00
SVM 1.00 1.00
GaussianNB 0.00 1.00
MIN 0.00 0.86
MAX 1.00 1.00


Results RQ4 (Which metrics are most influential on detecting harmful code?):

Smell Feature Harmful
Cyclic-Dependent Modularization returns 0.0699
variables 0.0408
unique_words_qty 0.0373
line 0.0321
numbers_qty 0.0213
string_literals_qty 0.0191
rfc 0.0170
cbo 0.0161
annonymous_classes_qty 0.0097
parameters 0.0064
Deficient Encapsulation wmc 0.0776
parameters 0.0709
unique_words_qty 0.0404
returns 0.0171
variables 0.0129
rfc 0.0119
line 0.0113
number_commits 0.0016
cbo 0.0014
total_methods 0.0011
Empty Catch Clause wmc 0.1467
rfc 0.0807
unique_words_qty 0.0371
cbo 0.0368
parameters 0.0281
line 0.0267
variables 0.0108
returns 0.0097
numbers_qty 0.0012
annonymous_classes_qty 0.0008
Insuficient Modularization wmc 0.0446
unique_words_qty 0.0329
rfc 0.0140
string_literals_qty 0.0072
variables 0.0051
total_methods 0.0019
total_fields 0.0012
sub_classes_qty 0.0007
static_methods 0.0002
Long Identifier math_operations_qty 0.1035
annonymous_classes_qty 0.0516
cbo 0.0494
rfc 0.0251
loc 0.0227
numbers_qty 0.0118
unique_words_qty 0.0074
line 0.0058
string_literals_qty 0.0028
variables 0.0009
Long Method wmc 0.0145
unique_words_qty 0.0261
string_literals_qty 0.0023
rfc 0.0187
returns 0.0257
numbers_qty 0.0119
number_days 0.0001
number_commits 0.0005
median_files 0.0002
loc 0.0938
Long Parameter List numbers_qty 0.1211
parameters 0.1176
loc 0.0671
wmc 0.0241
cbo 0.0146
parenthesized_exps_qty 0.0138
line 0.0121
rfc 0.0073
string_literals_qty 0.0039
max_nested_blocks 0.0002
Long Statement parameters 0.1951
line 0.0321
rfc 0.0180
variables 0.0151
wmc 0.0136
unique_words_qty 0.0104
math_operations_qty 0.0074
returns 0.0014
max_nested_blocks 0.0005
numbers_qty 0.0005
Magic Number cbo 0.0668
returns 0.0430
line 0.0396
annonymous_classes_qty 0.0316
rfc 0.0254
parameters 0.0210
numbers_qty 0.0188
wmc 0.0156
variables 0.0148
unique_words_qty 0.0118
Switch Statements math_operations_qty 0.2281
rfc 0.0000
wmc 0.0000
numbers_qty 0.0000
line 0.0000
unique_words_qty 0.0000
number_commits 0.0000
median_files 0.0000
cbo 0.0000
parameters 0.0000
Unutilized Abstraction line 0.0723
unique_words_qty 0.0260
rfc 0.0243
cbo 0.0184
string_literals_qty 0.0179
math_operations_qty 0.0155
parameters 0.0088
variables 0.0060
wmc 0.0060
max_nested_blocks 0.0057


Software Metrics (CK):

NameDescription
CBO (Coupling between objects)Counts the number of dependencies a class has. The tools checks for any type used in the entire class (field declaration, method return types, variable declarations, etc). It ignores dependencies to Java itself (e.g. java.lang.String).
DIT (Depth Inheritance Tree)It counts the number of "fathers" a class has. All classes have DIT at least 1 (everyone inherits java.lang.Object). In order to make it happen, classes must exist in the project (i.e. if a class depends upon X which relies in a jar/dependency file, and X depends upon other classes, DIT is counted as 2).
Number of fieldsCounts the number of fields. Specific numbers for total number of fields, static, public, private, protected, default, final, and synchronized fields.
Number of methodsCounts the number of methods. Specific numbers for total number of methods, static, public, abstract, private, protected, default, final, and synchronized methods.
NOSI (Number of static invocations)Counts the number of invocations to static methods. It can only count the ones that can be resolved by the JDT.
RFC (Response for a Class)Counts the number of unique method invocations in a class. As invocations are resolved via static analysis, this implementation fails when a method has overloads with same number of parameters, but different types.
WMC (Weight Method Class) or McCabe's complexityIt counts the number of branch instructions in a class.
LOC (Lines of code)It counts the lines of count, ignoring empty lines.
LCOM (Lack of Cohesion of Methods)Calculates LCOM metric. This is the very first version of metric, which is not reliable. LCOM-HS can be better (hopefully, you will send us a pull request).
Quantity of returnsThe number of return instructions.
Quantity of loopsThe number of loops (i.e., for, while, do while, enhanced for).
Quantity of comparisonsThe number of comparisons (i.e., ==).
Quantity of try/catchesThe number of try/catches.
Quantity of parenthesized expressionsThe number of expressions inside parenthesis.
String literalsThe number of string literals (e.g., "John Doe"). Repeated strings count as many times as they appear.
Quantity of NumberThe number of numbers (i.e., int, long, double, float) literals.
Quantity of Math OperationsThe number of math operations (times, divide, remainder, plus, minus, left shit, right shift).
Quantity of VariablesNumber of declared variables.
Max nested blocksThe highest number of blocks nested together.
Quantity of Anonymous classes, subclasses, and lambda expressionsThe Quantity of Anonymous classes, subclasses, and lambda expressions.
Number of unique wordsNumber of unique words in the source code. See WordCounter class for details on the implementation.
Usage of each variableHow much each variable was used inside each method.
Usage of each fieldHow much each field was used inside each method.

Those metrics were collected using the tool CK: https://github.com/mauricioaniche/ck



Developers Metrics:

NameTypeTool UsedDescription
Number of Commits (NC)Developers' ExperiencePyDrillerthis metric represents the number of commits authored by a developer;
Number of Active Days in Project (NADP)Developers' ExperiencePyDrillerthis metric indicates how many days a developer has been active, i.e., committing;
Number of Days in Project (NDP)Developers' ExperiencePyDrillerthis metric counts the number of days that a developer has been associated to a project, independently if he is contributing or not;
Number of issues Activities (NIA)Developers' ExperienceGitHub APIthis metric measures the number of issues opened or closed by a developer;
Number of Pull Requests Activities (NPRA)Developers' ExperienceGitHub APIthis metric measures the number of pull requests opened or closed by a developer;
Number of Tests Included (TI)Technical Contribution NormsPyDrillerthis metric measures the quantity of commits that contain tests. To extract it, we adopted the procedure defined by [18]. First, we retrieve all the files modified in a commit authored by a developer. Then, we check how many files contain the "test" word in its pathname;
Median of Modified Files (MMF)Technical Contribution NormsPyDrillerthis metric measures the median of modified files among all the commits authored by a developer;
Median of Lines Changed (MLC)Technical Contribution NormsPyDrillerthis metric represents the median of changed lines among all the commits authored by a developer. A changed line can be an addition or a deletion in a commit;
Number of Followers (NF)General Community StatusWebCrawlerthis metric represents the number of followers that a developer has on GitHub;
Number of Public Repository (NPR)General Community StatusWebCrawlerthis metric counts the number of public repositories owned by a developer on GitHub;
Number of Public Gists (NPG)General Community StatusWebCrawlerthis metric represents the number of public Gists 2 owned by a developer. A Gist is a tool designed to share single files, parts of source code, or full applications created by a developer. Such tool may be very important to stimulate the reuse of software artifacts.

PyDriller: https://github.com/ishepard/pydriller

GitHub API: https://developer.github.com/v3/

Study Design:





Harmful Code Representation:





Study Design Effectiveness:





Files:

The dataset is available here