Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selection Algorithm

IMPROVING ANALOGY SOFTWARE EFFORT ESTIMATION USING FUZZY FEATURE SUBSET SELECTION ALGORITHM Mohammad Azzeh [email_address] Dr. Daniel Neagu [email_address] Prof. Peter Cowling [email_address] Computing Department, School of informatics University of Bradford Promise’08

Agenda Motivation The Problem. The Proposed Solution. Results. Conclusions.

Motivation Estimation by Analogy (EA) is the common technique for Software Cost Estimation. The user may be willing to accept such kind of estimation because it mimics human problem solving. Based on actual project data and experience. It can model the complex relationship.

The Problem The quality of data is important issue in analogy software estimation as being precondition to obtain quality knowledge and accurate estimation. Typically data sets are not collected with a particular prediction task in mind. [Kirsopp & Shepperd, 2002] 1 This estimation approach is sensitive to Incomplete and noisy data Irrelevant and misleading features.

Feature Subset Selection (FSS) Benefits of FFS: Reducing time of training and utilization, Improving data understanding and visualization, and Reducing data dimensionality. Dataset D(K x M) FSS Dataset D(K x N) Where N<M

FSS in Software estimation Existing searching techniques: Wrappers: use machine learning algorithms [Kirsopp & Shepperd, 2002] Exhaustive search Random search. Hill Climbing. Forward selection Backward selection Filters: use statistical approaches [Briand et al, 2000] Fitness criteria of Wrappers is often MMRE.

The proposed Solution The algorithm is designed based on Fuzzy c-Means and Fuzzy Logic. Selecting the optimal feature subset based on the similarity between fuzzy clusters. The feature set the presents a smaller similarity degree has the potential to deliver accurate estimation. It reflects the data structure. It combines both numerical and categorical data.

The proposed Solution Using Fuzzy c-Means Using Partition matrix and clusters centres Data set with M features Select a subset feature with N dimension

The Proposed Solution Definition 1. Similarity between two clusters for given m -dimensional features: is normalized weighting factor representing the importance of some features among others Definition 2 . Overall similarity between all clusters in a feature subset S i is given as :

FFSS algorithm Input: D(F 1 ,F 2 ,...F M ) //input dataset (NxM) Out //Output Dataset(Nx1) Output : D best //feature subset of high predictive features. begin Do : Step1 :Select feature subset to be searched S i . Step2 :Fuzzify the feature subset (S i ). Step3 :For feature subset S i , assess similarity degree between all pairs of clusters (i.e. fuzzy sets) in all features in S i . Until all feature subsets are searched . Step4 :Evaluate each feature subset S i. using E si Step5 : Evaluation: best feature subset is one with minimum E si . End;

Empirical validation We built analogy estimation model for each FSS algorithm, where Euclidian distance measure was the similarity measure. Validation strategy: 10-Fold CV Evaluation criteria: Mean Magnitude of Relative Errors (MMRE), Median MRE (MdMRE), Performance indicator (Pred(25%)) Dataset Number of feature Number of Projects ISBSG (release 10) 14 400 Desharnais 10 77

Kirsopp & Shepperd said: “ It is difficult to compare algorithms used classification problems with algorithms deals with prediction problems so the measures of accuracy used are different”

Results...ISBSG Algorithm used in analogy model One analogy mean of 2 analogies mean of 3 analogies All features 37.74% 41.0% 29.4% Exhaustive search, Hill climbing, Random search 28.25% 30.3% 30.2% Forward subset selection 33.3% 30.4% 31.2% Backward subset selection 34.7% 38% 34.4% FFSS 28.7% 30.6% 32.2%

Results...ISBSG Algorithm used in analogy model One analogy mean of 2 analogies mean of 3 analogies All features 31.6% 33% 20.62% Exhaustive search. Hill climbing, Random search 21.9% 24% 20.8% Forward subset selection 21.0% 21.4% 20.7% Backward subset selection 22.6% 28.7% 25.2% FFSS 21.8% 22.3% 22.7%

Results...Desharnais Algorithm used in analogy model One analogy mean of 2 analogies mean of 3 analogies All features 60.1% 51.5% 50.0% Exhaustive search, Forward subset selection, Hill climbing, Random search 38.2% 39.4% 36.4% Backward subset selection 42.4% 43.9% 46.6% FFSS 40.2% 40.3% 38.5%

Results...Desharnais Algorithm used in analogy model One analogy mean of 2 analogies mean of 3 analogies All features 41.7% 41.0% 36.1% Exhaustive search, Forward subset selection Hill climbing, Random search 30.8% 38.0% 30.9% Backward subset selection 38.4% 37.4% 34.6% FFSS 32.4% 33.3% 31.7%

Conclusions The fuzzy feature subset selection has a significant impact on accuracy of EA. Our FFSS algorithm produces comparable results with exhaustive search, Hill climbing and forward selection. It reduces uncertainty when categorical data is involved.

Conclusions…cont Which FSS is suitable ? -Fuzzy Feature subset selection -Forward feature selection, -Backward features selection Exhaustive search and Hill Climbing Accuracy only Less time, and quit reasonable Accuracy And data set is large Dataset size Random Hill Climbing Small Large

Threats to experiment validity Selecting ISBSG representative data. MMRE has been used as fitness criteria for all feature selection algorithms except our FFSS. Number of projects. Outliers and extreme values.

References Briand L, Langley T, Wieczorek I. Using the European Space Agency data set: a replicated assessment and comparison of common software cost modelling techniques, presented at 22nd IEEE Intl. Conf. on Softw. Eng., Limerick, Ireland, 2000. Kirsopp, C., Shepperd, M. 2002. Case and Feature Subset Selection in Case-Based Software Project Effort Prediction, Proc. 22nd SGAI Int’l Conf. Knowledge-Based Systems and Applied Artificial Intelligence.

Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selection Algorithm

More Related Content

What's hot (18)

Viewers also liked (6)

Similar to Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selection Algorithm (20)

More from gregoryg (20)

Recently uploaded (20)

Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selection Algorithm