summit version

This commit is contained in:
qiangge 2017-11-15 10:24:01 +08:00
parent 10c92e128a
commit fb19605146
9 changed files with 15 additions and 13 deletions

View File

@ -75,7 +75,7 @@ Several academic studies focus on ITS
to free the managers from some cumbersome and repetitive work,
%improve the efficiency of project maintenance,
%because of the key role that ITS plays in the development of project,
such as automatically classifying issue reports to bug-prone and nonbug-prone~\cite{antoniol2008bug,herzig2013s,maalej2015bug,zhou2014combining},
such as automatically classifying issue reports to bug-prone and nonbug-prone~\cite{antoniol2008bug,herzig2013s,zhou2014combining},
bug assignment~\cite{anvik2006should,baysal2009bug},
duplicate issue detection~\cite{wang2008approach,sun2010discriminative},
fixing time prediction~\cite{weiss2007long}, and so on.
@ -221,7 +221,7 @@ especially for the semantic complexity metrics extracted from textual descriptio
%\subsection{Relative Research in Automatically Classifying Issue Reports}
\subsection{Related Work}
Many studies have investigated bug classification \cite{antoniol2008bug,herzig2013s,maalej2015bug,zhou2014combining}
Many studies have investigated bug classification \cite{antoniol2008bug,herzig2013s,zhou2014combining}
to predict whether an issue is about a bug.
Antoniol et al.~\cite{antoniol2008bug} investigated the automatic classification of issue reports

Binary file not shown.

View File

@ -618,3 +618,5 @@ This research is supported by National Science Foundation of China (Grant No.614
\end{document}

BIN
latex/bare_conf_c.pdf Normal file

Binary file not shown.

View File

@ -28,7 +28,7 @@ The quantitative evaluations show that the classification performance can achiev
In our future work, we plan to explore the relationship between the performance of classifier and topics of the issue report.
We believe that the distribution of categories is associate with topics of issue reports.
Reusing classification model is another study we are interested in.
% Reusing classification model is another study we are interested in.
Otherwise, a more outperforming preprocessing is our sustained progressing task.
% That is, issue reports about one component will more likely be a bug and issue reports about one function may be more probability to be a feature.

View File

@ -178,7 +178,7 @@ Facing the huge changes described in Section~\ref{ITS_GH},
determining whether the regular pattern of free text found in research~\cite{antoniol2008bug} still works,
and whether the performance of the text-based classification model in dealing with large-scale projects is efficient, are required.
Many text-based classifications are used to classify issues in different
studies~\cite{antoniol2008bug,herzig2013s,maalej2015bug,zhou2014combining}.
studies~\cite{antoniol2008bug,herzig2013s,zhou2014combining}.
In this paper, various types of widely used text-based classifications,
such as \textit{Naive Bayes}, \textit{Logistic Regression},
were selected to know which classifier performs best.
@ -207,7 +207,8 @@ A ten-fold cross-validation was applied to separate dataset samples into trainin
% One subset is retained as the testing set for evaluating the classifier out of the 10 subsets,
% and the remaining 9 subsets were used as the training set to build the classifier.
The ten-fold cross-validation has a minimal effect on the sample characteristics
and can investigate the stability of item loading on multiple factors~\cite{van2006five}.
and can investigate the stability of item loading on multiple factors.
% ~\cite{van2006five}.
% The advantage of this validation method over repeated random sub-sampling is that all observations are used for both training and validation, and each observation is used for validation exactly once .
@ -273,7 +274,6 @@ More words are likely to contain more information, which may help for classifier
\subsection{Tow-stage Classification}
\label{section:improvingmodel}
{\color{red}{
Omitting structured information in GitHub, as described in section~\ref{ITS_GH},
results in that less information can be used to build a synthesized classification model.
The firsthand features can be extracted, except for free text of issues,
@ -282,7 +282,7 @@ are relating to the historical information about issue contributors,
Thus, we propose a two-stage classification approach to combine textual summary information
and developer information, which could be expected to improve the performance of classification.
Each stage of our approach is explained in the following paragraphs,
and the overview of our approach is shown in Figure~\ref{figure:framework}.}}
and the overview of our approach is shown in Figure~\ref{figure:framework}.
% which consists of the following steps:}}
% 1) The first stage uses textual summary information, including title and description of issues,

View File

@ -84,8 +84,8 @@ Finally, a structure tag information is obtained through the aforementioned proc
\end{table}
{\color{red}{Through the prior process, we extract 149 tags, which can indicate the category of issues, as group ``$type$''.
Finally, we filter total 252,084 issues with tags in group group ``$type$''.}}
Through the prior process, we extract 149 tags, which can indicate the category of issues, as group ``$type$''.
Finally, we filter total 252,084 issues with tags in group group ``$type$''.
Table~\ref{tag:datacollection} shows the most used tags in group ``$type$'', and how many projects and issues they appear.
These tags were divided into bug-prone or nonbug-prone by manually distinguishing.
The most used tags are bug, enhancement, and feature, which were observed in 46.9\%, 17.8\%, and 5.9\% of the labeled issues, respectively.

View File

@ -146,7 +146,7 @@ thereby indicating the absence of multicollinearity~\cite{gharehyazie2014develop
Moreover, no interaction among the variables in the models is observed,
making the interpretation of our results easy and maintaining the cleanliness of the models.
For project-level measures, the number of issues ($log(issue\_num)$) is highly significant,
which means that {\color{red}{the number of issues in train set is far from sufficient. Based on current data set,}} the more training sets used,
which means that the number of issues in train set is far from sufficient. Based on current data set, the more training sets used,
the higher is the \textit{average F-measure} that the classification achieves.
Moreover, no statistical significance is observed in other project-level measures with regard to the influencing the performance of the classification model.
For issue-level measures, the number of confused issues ($log(confuse\_count + 0.5)$) contained in the dataset is highly significant.
@ -229,8 +229,8 @@ Furthermore, too many confused issues in the training set will seriously affect
An experiment based on the two-stage classifier was conducted to validate our approach.
{\color{red}{In the first stage, SVM was used as text classifier based on the conclusion of RQ1.
In the second stage, we selected Logistic Regression as our prediction model, which performed better than other classifier in table~\ref{tag:packages}.}}
In the first stage, SVM was used as text classifier based on the conclusion of RQ1.
In the second stage, we selected Logistic Regression as our prediction model, which performed better than other classifier in table~\ref{tag:packages}.
As projects that achieve a high $f_{avg}$ (\ie average F-measure) contain few confused issues,
our approach has a slight effect on these projects.
Thus, to explore the performance of our approach for different projects, the project selection has two cases.

View File

@ -3,7 +3,7 @@ The fist threat concerned the study design about category extraction on our data
We utilized tags used most in GitHub to distinguish the category of issues.
We trained model on issues with these tags, so that we could clearly know the category of samples in training set.
We ignored the issues without these tags, which might introduce bias to dataset.
However, many researches~\cite{antoniol2008bug,maalej2015bug,zhou2014combining} selected labeled issues as training set.
However, many researches~\cite{antoniol2008bug,zhou2014combining} selected labeled issues as training set.
What's more, unlabeled issues was only a small part of the data set, which was 9.8\% in our study.
The second threat is the number of projects we used.