summit version
This commit is contained in:
parent
10c92e128a
commit
fb19605146
|
@ -75,7 +75,7 @@ Several academic studies focus on ITS
|
|||
to free the managers from some cumbersome and repetitive work,
|
||||
%improve the efficiency of project maintenance,
|
||||
%because of the key role that ITS plays in the development of project,
|
||||
such as automatically classifying issue reports to bug-prone and nonbug-prone~\cite{antoniol2008bug,herzig2013s,maalej2015bug,zhou2014combining},
|
||||
such as automatically classifying issue reports to bug-prone and nonbug-prone~\cite{antoniol2008bug,herzig2013s,zhou2014combining},
|
||||
bug assignment~\cite{anvik2006should,baysal2009bug},
|
||||
duplicate issue detection~\cite{wang2008approach,sun2010discriminative},
|
||||
fixing time prediction~\cite{weiss2007long}, and so on.
|
||||
|
@ -221,7 +221,7 @@ especially for the semantic complexity metrics extracted from textual descriptio
|
|||
|
||||
%\subsection{Relative Research in Automatically Classifying Issue Reports}
|
||||
\subsection{Related Work}
|
||||
Many studies have investigated bug classification \cite{antoniol2008bug,herzig2013s,maalej2015bug,zhou2014combining}
|
||||
Many studies have investigated bug classification \cite{antoniol2008bug,herzig2013s,zhou2014combining}
|
||||
to predict whether an issue is about a bug.
|
||||
|
||||
Antoniol et al.~\cite{antoniol2008bug} investigated the automatic classification of issue reports
|
||||
|
|
Binary file not shown.
|
@ -618,3 +618,5 @@ This research is supported by National Science Foundation of China (Grant No.614
|
|||
\end{document}
|
||||
|
||||
|
||||
|
||||
|
||||
|
|
Binary file not shown.
|
@ -28,7 +28,7 @@ The quantitative evaluations show that the classification performance can achiev
|
|||
|
||||
In our future work, we plan to explore the relationship between the performance of classifier and topics of the issue report.
|
||||
We believe that the distribution of categories is associate with topics of issue reports.
|
||||
Reusing classification model is another study we are interested in.
|
||||
% Reusing classification model is another study we are interested in.
|
||||
Otherwise, a more outperforming preprocessing is our sustained progressing task.
|
||||
|
||||
% That is, issue reports about one component will more likely be a bug and issue reports about one function may be more probability to be a feature.
|
||||
|
|
|
@ -178,7 +178,7 @@ Facing the huge changes described in Section~\ref{ITS_GH},
|
|||
determining whether the regular pattern of free text found in research~\cite{antoniol2008bug} still works,
|
||||
and whether the performance of the text-based classification model in dealing with large-scale projects is efficient, are required.
|
||||
Many text-based classifications are used to classify issues in different
|
||||
studies~\cite{antoniol2008bug,herzig2013s,maalej2015bug,zhou2014combining}.
|
||||
studies~\cite{antoniol2008bug,herzig2013s,zhou2014combining}.
|
||||
In this paper, various types of widely used text-based classifications,
|
||||
such as \textit{Naive Bayes}, \textit{Logistic Regression},
|
||||
were selected to know which classifier performs best.
|
||||
|
@ -207,7 +207,8 @@ A ten-fold cross-validation was applied to separate dataset samples into trainin
|
|||
% One subset is retained as the testing set for evaluating the classifier out of the 10 subsets,
|
||||
% and the remaining 9 subsets were used as the training set to build the classifier.
|
||||
The ten-fold cross-validation has a minimal effect on the sample characteristics
|
||||
and can investigate the stability of item loading on multiple factors~\cite{van2006five}.
|
||||
and can investigate the stability of item loading on multiple factors.
|
||||
% ~\cite{van2006five}.
|
||||
|
||||
% The advantage of this validation method over repeated random sub-sampling is that all observations are used for both training and validation, and each observation is used for validation exactly once .
|
||||
|
||||
|
@ -273,7 +274,6 @@ More words are likely to contain more information, which may help for classifier
|
|||
\subsection{Tow-stage Classification}
|
||||
\label{section:improvingmodel}
|
||||
|
||||
{\color{red}{
|
||||
Omitting structured information in GitHub, as described in section~\ref{ITS_GH},
|
||||
results in that less information can be used to build a synthesized classification model.
|
||||
The firsthand features can be extracted, except for free text of issues,
|
||||
|
@ -282,7 +282,7 @@ are relating to the historical information about issue contributors,
|
|||
Thus, we propose a two-stage classification approach to combine textual summary information
|
||||
and developer information, which could be expected to improve the performance of classification.
|
||||
Each stage of our approach is explained in the following paragraphs,
|
||||
and the overview of our approach is shown in Figure~\ref{figure:framework}.}}
|
||||
and the overview of our approach is shown in Figure~\ref{figure:framework}.
|
||||
% which consists of the following steps:}}
|
||||
|
||||
% 1) The first stage uses textual summary information, including title and description of issues,
|
||||
|
|
|
@ -84,8 +84,8 @@ Finally, a structure tag information is obtained through the aforementioned proc
|
|||
\end{table}
|
||||
|
||||
|
||||
{\color{red}{Through the prior process, we extract 149 tags, which can indicate the category of issues, as group ``$type$''.
|
||||
Finally, we filter total 252,084 issues with tags in group group ``$type$''.}}
|
||||
Through the prior process, we extract 149 tags, which can indicate the category of issues, as group ``$type$''.
|
||||
Finally, we filter total 252,084 issues with tags in group group ``$type$''.
|
||||
Table~\ref{tag:datacollection} shows the most used tags in group ``$type$'', and how many projects and issues they appear.
|
||||
These tags were divided into bug-prone or nonbug-prone by manually distinguishing.
|
||||
The most used tags are bug, enhancement, and feature, which were observed in 46.9\%, 17.8\%, and 5.9\% of the labeled issues, respectively.
|
||||
|
|
|
@ -146,7 +146,7 @@ thereby indicating the absence of multicollinearity~\cite{gharehyazie2014develop
|
|||
Moreover, no interaction among the variables in the models is observed,
|
||||
making the interpretation of our results easy and maintaining the cleanliness of the models.
|
||||
For project-level measures, the number of issues ($log(issue\_num)$) is highly significant,
|
||||
which means that {\color{red}{the number of issues in train set is far from sufficient. Based on current data set,}} the more training sets used,
|
||||
which means that the number of issues in train set is far from sufficient. Based on current data set, the more training sets used,
|
||||
the higher is the \textit{average F-measure} that the classification achieves.
|
||||
Moreover, no statistical significance is observed in other project-level measures with regard to the influencing the performance of the classification model.
|
||||
For issue-level measures, the number of confused issues ($log(confuse\_count + 0.5)$) contained in the dataset is highly significant.
|
||||
|
@ -229,8 +229,8 @@ Furthermore, too many confused issues in the training set will seriously affect
|
|||
|
||||
|
||||
An experiment based on the two-stage classifier was conducted to validate our approach.
|
||||
{\color{red}{In the first stage, SVM was used as text classifier based on the conclusion of RQ1.
|
||||
In the second stage, we selected Logistic Regression as our prediction model, which performed better than other classifier in table~\ref{tag:packages}.}}
|
||||
In the first stage, SVM was used as text classifier based on the conclusion of RQ1.
|
||||
In the second stage, we selected Logistic Regression as our prediction model, which performed better than other classifier in table~\ref{tag:packages}.
|
||||
As projects that achieve a high $f_{avg}$ (\ie average F-measure) contain few confused issues,
|
||||
our approach has a slight effect on these projects.
|
||||
Thus, to explore the performance of our approach for different projects, the project selection has two cases.
|
||||
|
|
|
@ -3,7 +3,7 @@ The fist threat concerned the study design about category extraction on our data
|
|||
We utilized tags used most in GitHub to distinguish the category of issues.
|
||||
We trained model on issues with these tags, so that we could clearly know the category of samples in training set.
|
||||
We ignored the issues without these tags, which might introduce bias to dataset.
|
||||
However, many researches~\cite{antoniol2008bug,maalej2015bug,zhou2014combining} selected labeled issues as training set.
|
||||
However, many researches~\cite{antoniol2008bug,zhou2014combining} selected labeled issues as training set.
|
||||
What's more, unlabeled issues was only a small part of the data set, which was 9.8\% in our study.
|
||||
|
||||
The second threat is the number of projects we used.
|
||||
|
|
Loading…
Reference in New Issue