SlideShare a Scribd company logo
IMPROVING ANALOGY SOFTWARE EFFORT ESTIMATION USING FUZZY FEATURE SUBSET SELECTION ALGORITHM Mohammad Azzeh  [email_address] Dr. Daniel Neagu  [email_address] Prof. Peter Cowling  [email_address] Computing Department, School of informatics University of Bradford Promise’08
Agenda Motivation The Problem. The Proposed Solution. Results. Conclusions.
Motivation Estimation by Analogy (EA) is the common technique for Software Cost Estimation. The user may be willing to accept such kind of estimation because it mimics human problem solving.  Based on actual project data and experience. It can model the complex relationship.
The Problem The quality of data is important issue in analogy software estimation as being precondition to obtain quality knowledge and accurate estimation. Typically data sets are not collected with a particular prediction task in mind.  [Kirsopp & Shepperd, 2002] 1 This estimation approach is sensitive to Incomplete and noisy data Irrelevant and misleading features.
Feature Subset Selection (FSS)  Benefits of FFS: Reducing time of training and utilization,  Improving data understanding and visualization, and  Reducing data dimensionality.  Dataset D(K x M) FSS Dataset D(K x N) Where N<M
FSS in Software estimation Existing searching techniques: Wrappers: use machine learning algorithms  [Kirsopp & Shepperd, 2002] Exhaustive search Random search. Hill Climbing. Forward selection Backward selection Filters: use statistical approaches  [Briand et al, 2000] Fitness criteria of Wrappers is often MMRE.
The proposed Solution The algorithm is designed based on Fuzzy c-Means and Fuzzy Logic. Selecting the optimal feature subset based on the similarity between fuzzy clusters. The feature set the presents a smaller similarity degree has the potential to deliver accurate estimation. It reflects the data structure. It combines both numerical and categorical data.
The proposed Solution Using Fuzzy c-Means Using Partition matrix and clusters centres  Data set with  M  features Select a subset feature with  N  dimension
The Proposed Solution Definition 1.  Similarity between two clusters for given  m -dimensional features:   is normalized weighting factor representing the importance of some features among others Definition 2 . Overall similarity between all clusters in a feature subset  S i  is given as :
FFSS algorithm Input: D(F 1 ,F 2 ,...F M ) //input dataset (NxM) Out   //Output Dataset(Nx1) Output :  D best   //feature subset of high predictive features.  begin Do : Step1 :Select feature subset to be searched S i . Step2 :Fuzzify the feature subset (S i ). Step3 :For feature subset S i , assess similarity degree between all pairs of clusters (i.e. fuzzy sets) in all features in S i . Until all feature subsets are searched .  Step4 :Evaluate each feature subset S i.  using E si   Step5 :  Evaluation: best feature subset is one with minimum  E si .  End;
Empirical validation We built analogy estimation model for each FSS algorithm, where Euclidian distance measure was the similarity measure. Validation strategy: 10-Fold CV Evaluation criteria: Mean Magnitude of Relative Errors (MMRE), Median MRE (MdMRE), Performance indicator (Pred(25%)) Dataset Number of feature Number of Projects ISBSG (release 10) 14 400 Desharnais 10 77
Kirsopp & Shepperd said: “ It is difficult to compare algorithms used classification problems with algorithms deals with prediction problems so the measures of accuracy used are different”
Results...ISBSG Algorithm used in analogy model One analogy mean of 2 analogies mean of 3 analogies All features 37.74% 41.0% 29.4% Exhaustive search, Hill climbing, Random search 28.25% 30.3% 30.2% Forward subset selection 33.3% 30.4% 31.2% Backward subset selection 34.7% 38% 34.4% FFSS 28.7% 30.6% 32.2%
Results...ISBSG Algorithm used in analogy model One analogy mean of 2 analogies mean of 3 analogies All features 31.6% 33% 20.62% Exhaustive search. Hill climbing, Random search 21.9% 24% 20.8% Forward subset selection 21.0% 21.4% 20.7% Backward subset selection 22.6% 28.7% 25.2% FFSS 21.8% 22.3% 22.7%
Results...Desharnais Algorithm used in analogy model One analogy mean of 2 analogies mean of 3 analogies All features 60.1% 51.5% 50.0% Exhaustive search, Forward subset selection, Hill climbing, Random search 38.2% 39.4% 36.4% Backward subset selection 42.4% 43.9% 46.6% FFSS 40.2% 40.3% 38.5%
Results...Desharnais Algorithm used in analogy model One analogy mean of 2 analogies mean of 3 analogies All features 41.7% 41.0% 36.1% Exhaustive search, Forward subset selection Hill climbing, Random search 30.8% 38.0% 30.9% Backward subset selection 38.4% 37.4% 34.6% FFSS 32.4% 33.3% 31.7%
Conclusions The fuzzy feature subset selection has a significant impact on accuracy of EA. Our FFSS algorithm produces comparable results with exhaustive search, Hill climbing and forward selection. It reduces uncertainty when categorical data is involved.
Conclusions…cont Which FSS is suitable  ? -Fuzzy Feature subset selection -Forward feature selection,  -Backward features selection Exhaustive search and Hill Climbing Accuracy only Less time, and  quit reasonable Accuracy And data set is large Dataset size Random Hill Climbing Small Large
Threats to experiment validity Selecting ISBSG representative data. MMRE has been used as fitness criteria for all feature selection algorithms except our FFSS. Number of projects. Outliers and extreme values.
References Briand L, Langley T, Wieczorek I. Using the European Space Agency data set: a replicated assessment and comparison of common software cost modelling techniques, presented at 22nd IEEE Intl. Conf. on Softw. Eng., Limerick, Ireland, 2000. Kirsopp, C., Shepperd, M. 2002. Case and Feature Subset Selection in Case-Based Software Project Effort Prediction, Proc. 22nd SGAI Int’l Conf. Knowledge-Based Systems and Applied Artificial Intelligence.
Questions

More Related Content

What's hot (18)

PPTX
Em Algorithm | Statistics
Transweb Global Inc
 
DOCX
A fast clustering based feature subset selection algorithm for high-dimension...
JPINFOTECH JAYAPRAKASH
 
PDF
debatrim_report (1)
Debatri Mitra
 
PPTX
Rohit 10103543
Pulkit Chhabra
 
PDF
final paper1
Leon Hunter
 
PPTX
Using Negative Detectors for Identifying Adversarial Data Manipulation in Mac...
Kishor Datta Gupta
 
PDF
Deep learning MindMap
Ashish Patel
 
PDF
Machine learning Mind Map
Ashish Patel
 
PPT
Decentralized Data Fusion Algorithm using Factor Analysis Model
Sayed Abulhasan Quadri
 
PDF
IRJET- Predicting Customers Churn in Telecom Industry using Centroid Oversamp...
IRJET Journal
 
PDF
Long-Term Robust Tracking Whith on Failure Recovery
TELKOMNIKA JOURNAL
 
PPTX
PEMF-1-MAO2012-Ali
MDO_Lab
 
PDF
Multimodal interactions in recommender systems (Bracis 2014)
Arthur Fortes
 
PPTX
Forecasting time series for business and operations data: A tutorial
Colleen Farrelly
 
PDF
fmelleHumanActivityRecognitionWithMobileSensors
Fridtjof Melle
 
PDF
A Novel Hybrid Voter Using Genetic Algorithm and Performance History
Waqas Tariq
 
PDF
IROS_2019_Naoki_Akai
Naoki Akai
 
PPTX
PEMF2_SDM_2012_Ali
MDO_Lab
 
Em Algorithm | Statistics
Transweb Global Inc
 
A fast clustering based feature subset selection algorithm for high-dimension...
JPINFOTECH JAYAPRAKASH
 
debatrim_report (1)
Debatri Mitra
 
Rohit 10103543
Pulkit Chhabra
 
final paper1
Leon Hunter
 
Using Negative Detectors for Identifying Adversarial Data Manipulation in Mac...
Kishor Datta Gupta
 
Deep learning MindMap
Ashish Patel
 
Machine learning Mind Map
Ashish Patel
 
Decentralized Data Fusion Algorithm using Factor Analysis Model
Sayed Abulhasan Quadri
 
IRJET- Predicting Customers Churn in Telecom Industry using Centroid Oversamp...
IRJET Journal
 
Long-Term Robust Tracking Whith on Failure Recovery
TELKOMNIKA JOURNAL
 
PEMF-1-MAO2012-Ali
MDO_Lab
 
Multimodal interactions in recommender systems (Bracis 2014)
Arthur Fortes
 
Forecasting time series for business and operations data: A tutorial
Colleen Farrelly
 
fmelleHumanActivityRecognitionWithMobileSensors
Fridtjof Melle
 
A Novel Hybrid Voter Using Genetic Algorithm and Performance History
Waqas Tariq
 
IROS_2019_Naoki_Akai
Naoki Akai
 
PEMF2_SDM_2012_Ali
MDO_Lab
 

Viewers also liked (6)

PPTX
Lecture 25 hill climbing
Hema Kashyap
 
PPTX
Clustering using GA and Hill-climbing
Fatemeh Karimi
 
PPTX
Hill-climbing #2
Mohamed Gad
 
PPT
Multimodal Biometric Systems
Piyush Mittal
 
PPTX
MULTIMODAL BIOMETRIC SECURITY SYSTEM
xiaomi5
 
ODP
Hillclimbing search algorthim #introduction
Mohamed Gad
 
Lecture 25 hill climbing
Hema Kashyap
 
Clustering using GA and Hill-climbing
Fatemeh Karimi
 
Hill-climbing #2
Mohamed Gad
 
Multimodal Biometric Systems
Piyush Mittal
 
MULTIMODAL BIOMETRIC SECURITY SYSTEM
xiaomi5
 
Hillclimbing search algorthim #introduction
Mohamed Gad
 
Ad

Similar to Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selection Algorithm (20)

PDF
Deep_Learning__INAF_baroncelli.pdf
asdfasdf214078
 
DOCX
JAVA 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subset ...
IEEEGLOBALSOFTTECHNOLOGIES
 
DOCX
JAVA 2013 IEEE PROJECT A fast clustering based feature subset selection algor...
IEEEGLOBALSOFTTECHNOLOGIES
 
DOCX
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subse...
IEEEGLOBALSOFTTECHNOLOGIES
 
PPTX
Feature Selections Methods
zahramojtahediin
 
PDF
Visual diagnostics for more effective machine learning
Benjamin Bengfort
 
PDF
Optimal Feature Selection from VMware ESXi 5.1 Feature Set
ijccmsjournal
 
PDF
Optimal feature selection from v mware esxi 5.1 feature set
ijccmsjournal
 
PDF
A Threshold fuzzy entropy based feature selection method applied in various b...
IJMER
 
PDF
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET Journal
 
PDF
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET Journal
 
PDF
Working with the data for Machine Learning
Mehwish690898
 
PDF
A Novel Methodology to Implement Optimization Algorithms in Machine Learning
Venkata Karthik Gullapalli
 
DOCX
2014 IEEE JAVA DATA MINING PROJECT A fast clustering based feature subset sel...
IEEEMEMTECHSTUDENTSPROJECTS
 
PDF
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET Journal
 
PDF
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...
ijcsit
 
PDF
AN ANN APPROACH FOR NETWORK INTRUSION DETECTION USING ENTROPY BASED FEATURE S...
IJNSA Journal
 
PDF
An unsupervised feature selection algorithm with feature ranking for maximizi...
Asir Singh
 
PDF
Multilinear Kernel Mapping for Feature Dimension Reduction in Content Based M...
ijma
 
PDF
AIRLINE FARE PRICE PREDICTION
IRJET Journal
 
Deep_Learning__INAF_baroncelli.pdf
asdfasdf214078
 
JAVA 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subset ...
IEEEGLOBALSOFTTECHNOLOGIES
 
JAVA 2013 IEEE PROJECT A fast clustering based feature subset selection algor...
IEEEGLOBALSOFTTECHNOLOGIES
 
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subse...
IEEEGLOBALSOFTTECHNOLOGIES
 
Feature Selections Methods
zahramojtahediin
 
Visual diagnostics for more effective machine learning
Benjamin Bengfort
 
Optimal Feature Selection from VMware ESXi 5.1 Feature Set
ijccmsjournal
 
Optimal feature selection from v mware esxi 5.1 feature set
ijccmsjournal
 
A Threshold fuzzy entropy based feature selection method applied in various b...
IJMER
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET Journal
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET Journal
 
Working with the data for Machine Learning
Mehwish690898
 
A Novel Methodology to Implement Optimization Algorithms in Machine Learning
Venkata Karthik Gullapalli
 
2014 IEEE JAVA DATA MINING PROJECT A fast clustering based feature subset sel...
IEEEMEMTECHSTUDENTSPROJECTS
 
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET Journal
 
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...
ijcsit
 
AN ANN APPROACH FOR NETWORK INTRUSION DETECTION USING ENTROPY BASED FEATURE S...
IJNSA Journal
 
An unsupervised feature selection algorithm with feature ranking for maximizi...
Asir Singh
 
Multilinear Kernel Mapping for Feature Dimension Reduction in Content Based M...
ijma
 
AIRLINE FARE PRICE PREDICTION
IRJET Journal
 
Ad

More from gregoryg (20)

PDF
Community-Assisted Software Engineering Decision Making
gregoryg
 
PDF
The Robust Optimization of Non-Linear Requirements Models
gregoryg
 
PPTX
Finding Robust Solutions to Requirements Models
gregoryg
 
PDF
Distributed Decision Tree Induction
gregoryg
 
PDF
Irrf Presentation
gregoryg
 
PDF
Optimizing Requirements Decisions with KEYS
gregoryg
 
PPT
Confidence in Software Cost Estimation Results based on MMRE and PRED
gregoryg
 
PPT
Promise08 Wrapup
gregoryg
 
PPT
Software Defect Repair Times: A Multiplicative Model
gregoryg
 
PPT
Complementing Approaches in ERP Effort Estimation Practice: an Industrial Study
gregoryg
 
PPT
Multi-criteria Decision Analysis for Customization of Estimation by Analogy M...
gregoryg
 
PPT
Implications of Ceiling Effects in Defect Predictors
gregoryg
 
PDF
Practical use of defect detection and prediction
gregoryg
 
PPT
Risk And Relevance 20080414ppt
gregoryg
 
PDF
Organizations Use Data
gregoryg
 
PPT
Cukic Promise08 V3
gregoryg
 
PPT
Boetticher Presentation Promise 2008v2
gregoryg
 
PPT
Elane - Promise08
gregoryg
 
PPT
Risk And Relevance 20080414ppt
gregoryg
 
PPT
Introduction Promise 2008 V3
gregoryg
 
Community-Assisted Software Engineering Decision Making
gregoryg
 
The Robust Optimization of Non-Linear Requirements Models
gregoryg
 
Finding Robust Solutions to Requirements Models
gregoryg
 
Distributed Decision Tree Induction
gregoryg
 
Irrf Presentation
gregoryg
 
Optimizing Requirements Decisions with KEYS
gregoryg
 
Confidence in Software Cost Estimation Results based on MMRE and PRED
gregoryg
 
Promise08 Wrapup
gregoryg
 
Software Defect Repair Times: A Multiplicative Model
gregoryg
 
Complementing Approaches in ERP Effort Estimation Practice: an Industrial Study
gregoryg
 
Multi-criteria Decision Analysis for Customization of Estimation by Analogy M...
gregoryg
 
Implications of Ceiling Effects in Defect Predictors
gregoryg
 
Practical use of defect detection and prediction
gregoryg
 
Risk And Relevance 20080414ppt
gregoryg
 
Organizations Use Data
gregoryg
 
Cukic Promise08 V3
gregoryg
 
Boetticher Presentation Promise 2008v2
gregoryg
 
Elane - Promise08
gregoryg
 
Risk And Relevance 20080414ppt
gregoryg
 
Introduction Promise 2008 V3
gregoryg
 

Recently uploaded (20)

PDF
Deception Technology: The Cybersecurity Paradigm We Didn’t Know We Needed
GauriKale30
 
PDF
Corporate Social Responsibility and Ethical Practices in the Readymade Garmen...
Samsul Alam
 
PDF
CFG application - 2025 - Curtis Funding Group, LLC
Curt MacRae
 
PDF
Haiti Educational System Le Floridien.pdf
LE FLORIDIEN
 
PDF
How is IMSLP Wagner Connected with Pachelbel & Shostakovich.pdf
SheetMusic International
 
PDF
Adnan Imam - A Dynamic Freelance Writer
Adnan Imam
 
PPTX
Jessica Garza: At the Intersection of Technology and Humanity
Jessica Garza
 
PPTX
Appreciations - June 25.pptxggggggghhhhhh
anushavnayak
 
PPTX
Essar at IEW 2025, Leading the Way to India’s Green Energy Transition.
essarcase
 
PPTX
Vedanta’s Pivotal Role in India’s Growth with Record Vedanta Tax Contribution...
Vedanta Cases
 
PPT
SixSigma Training Course homework in 2016
Boise State University Student
 
DOCX
Top Digital Marketing Services Company | Fusion Digitech
ketulraval6
 
PDF
Trends in Artificial Intelligence 2025 M Meeker
EricSabandal1
 
PPTX
Integrating Customer Journey Insights into Your Business Process Management S...
RUPAL AGARWAL
 
PDF
Fueling Growth - Funding & Scaling Your Business - AI Amplified SB Summit 202...
Hector Del Castillo, CPM, CPMM
 
PDF
GIÁO TRÌNH KINH DOANH QUỐC TẾ ĐẠI HỌC NGOẠI THƯƠNG
k622314115078
 
PDF
Top 25 FinOps Tools to Watch in 2025.pdf
Amnic
 
PDF
Netflix Social Watchlists Business Proposal
lexarofficial222
 
PPTX
Black life TeleHealth 3 (1).pptx Business Plan
mdthelackyboy
 
PPT
How Cybersecurity Training Can Protect Your Business from Costly Threats
Sam Vohra
 
Deception Technology: The Cybersecurity Paradigm We Didn’t Know We Needed
GauriKale30
 
Corporate Social Responsibility and Ethical Practices in the Readymade Garmen...
Samsul Alam
 
CFG application - 2025 - Curtis Funding Group, LLC
Curt MacRae
 
Haiti Educational System Le Floridien.pdf
LE FLORIDIEN
 
How is IMSLP Wagner Connected with Pachelbel & Shostakovich.pdf
SheetMusic International
 
Adnan Imam - A Dynamic Freelance Writer
Adnan Imam
 
Jessica Garza: At the Intersection of Technology and Humanity
Jessica Garza
 
Appreciations - June 25.pptxggggggghhhhhh
anushavnayak
 
Essar at IEW 2025, Leading the Way to India’s Green Energy Transition.
essarcase
 
Vedanta’s Pivotal Role in India’s Growth with Record Vedanta Tax Contribution...
Vedanta Cases
 
SixSigma Training Course homework in 2016
Boise State University Student
 
Top Digital Marketing Services Company | Fusion Digitech
ketulraval6
 
Trends in Artificial Intelligence 2025 M Meeker
EricSabandal1
 
Integrating Customer Journey Insights into Your Business Process Management S...
RUPAL AGARWAL
 
Fueling Growth - Funding & Scaling Your Business - AI Amplified SB Summit 202...
Hector Del Castillo, CPM, CPMM
 
GIÁO TRÌNH KINH DOANH QUỐC TẾ ĐẠI HỌC NGOẠI THƯƠNG
k622314115078
 
Top 25 FinOps Tools to Watch in 2025.pdf
Amnic
 
Netflix Social Watchlists Business Proposal
lexarofficial222
 
Black life TeleHealth 3 (1).pptx Business Plan
mdthelackyboy
 
How Cybersecurity Training Can Protect Your Business from Costly Threats
Sam Vohra
 

Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selection Algorithm

  • 1. IMPROVING ANALOGY SOFTWARE EFFORT ESTIMATION USING FUZZY FEATURE SUBSET SELECTION ALGORITHM Mohammad Azzeh [email_address] Dr. Daniel Neagu [email_address] Prof. Peter Cowling [email_address] Computing Department, School of informatics University of Bradford Promise’08
  • 2. Agenda Motivation The Problem. The Proposed Solution. Results. Conclusions.
  • 3. Motivation Estimation by Analogy (EA) is the common technique for Software Cost Estimation. The user may be willing to accept such kind of estimation because it mimics human problem solving. Based on actual project data and experience. It can model the complex relationship.
  • 4. The Problem The quality of data is important issue in analogy software estimation as being precondition to obtain quality knowledge and accurate estimation. Typically data sets are not collected with a particular prediction task in mind. [Kirsopp & Shepperd, 2002] 1 This estimation approach is sensitive to Incomplete and noisy data Irrelevant and misleading features.
  • 5. Feature Subset Selection (FSS) Benefits of FFS: Reducing time of training and utilization, Improving data understanding and visualization, and Reducing data dimensionality. Dataset D(K x M) FSS Dataset D(K x N) Where N<M
  • 6. FSS in Software estimation Existing searching techniques: Wrappers: use machine learning algorithms [Kirsopp & Shepperd, 2002] Exhaustive search Random search. Hill Climbing. Forward selection Backward selection Filters: use statistical approaches [Briand et al, 2000] Fitness criteria of Wrappers is often MMRE.
  • 7. The proposed Solution The algorithm is designed based on Fuzzy c-Means and Fuzzy Logic. Selecting the optimal feature subset based on the similarity between fuzzy clusters. The feature set the presents a smaller similarity degree has the potential to deliver accurate estimation. It reflects the data structure. It combines both numerical and categorical data.
  • 8. The proposed Solution Using Fuzzy c-Means Using Partition matrix and clusters centres Data set with M features Select a subset feature with N dimension
  • 9. The Proposed Solution Definition 1. Similarity between two clusters for given m -dimensional features: is normalized weighting factor representing the importance of some features among others Definition 2 . Overall similarity between all clusters in a feature subset S i is given as :
  • 10. FFSS algorithm Input: D(F 1 ,F 2 ,...F M ) //input dataset (NxM) Out //Output Dataset(Nx1) Output : D best //feature subset of high predictive features. begin Do : Step1 :Select feature subset to be searched S i . Step2 :Fuzzify the feature subset (S i ). Step3 :For feature subset S i , assess similarity degree between all pairs of clusters (i.e. fuzzy sets) in all features in S i . Until all feature subsets are searched . Step4 :Evaluate each feature subset S i. using E si Step5 : Evaluation: best feature subset is one with minimum E si . End;
  • 11. Empirical validation We built analogy estimation model for each FSS algorithm, where Euclidian distance measure was the similarity measure. Validation strategy: 10-Fold CV Evaluation criteria: Mean Magnitude of Relative Errors (MMRE), Median MRE (MdMRE), Performance indicator (Pred(25%)) Dataset Number of feature Number of Projects ISBSG (release 10) 14 400 Desharnais 10 77
  • 12. Kirsopp & Shepperd said: “ It is difficult to compare algorithms used classification problems with algorithms deals with prediction problems so the measures of accuracy used are different”
  • 13. Results...ISBSG Algorithm used in analogy model One analogy mean of 2 analogies mean of 3 analogies All features 37.74% 41.0% 29.4% Exhaustive search, Hill climbing, Random search 28.25% 30.3% 30.2% Forward subset selection 33.3% 30.4% 31.2% Backward subset selection 34.7% 38% 34.4% FFSS 28.7% 30.6% 32.2%
  • 14. Results...ISBSG Algorithm used in analogy model One analogy mean of 2 analogies mean of 3 analogies All features 31.6% 33% 20.62% Exhaustive search. Hill climbing, Random search 21.9% 24% 20.8% Forward subset selection 21.0% 21.4% 20.7% Backward subset selection 22.6% 28.7% 25.2% FFSS 21.8% 22.3% 22.7%
  • 15. Results...Desharnais Algorithm used in analogy model One analogy mean of 2 analogies mean of 3 analogies All features 60.1% 51.5% 50.0% Exhaustive search, Forward subset selection, Hill climbing, Random search 38.2% 39.4% 36.4% Backward subset selection 42.4% 43.9% 46.6% FFSS 40.2% 40.3% 38.5%
  • 16. Results...Desharnais Algorithm used in analogy model One analogy mean of 2 analogies mean of 3 analogies All features 41.7% 41.0% 36.1% Exhaustive search, Forward subset selection Hill climbing, Random search 30.8% 38.0% 30.9% Backward subset selection 38.4% 37.4% 34.6% FFSS 32.4% 33.3% 31.7%
  • 17. Conclusions The fuzzy feature subset selection has a significant impact on accuracy of EA. Our FFSS algorithm produces comparable results with exhaustive search, Hill climbing and forward selection. It reduces uncertainty when categorical data is involved.
  • 18. Conclusions…cont Which FSS is suitable ? -Fuzzy Feature subset selection -Forward feature selection, -Backward features selection Exhaustive search and Hill Climbing Accuracy only Less time, and quit reasonable Accuracy And data set is large Dataset size Random Hill Climbing Small Large
  • 19. Threats to experiment validity Selecting ISBSG representative data. MMRE has been used as fitness criteria for all feature selection algorithms except our FFSS. Number of projects. Outliers and extreme values.
  • 20. References Briand L, Langley T, Wieczorek I. Using the European Space Agency data set: a replicated assessment and comparison of common software cost modelling techniques, presented at 22nd IEEE Intl. Conf. on Softw. Eng., Limerick, Ireland, 2000. Kirsopp, C., Shepperd, M. 2002. Case and Feature Subset Selection in Case-Based Software Project Effort Prediction, Proc. 22nd SGAI Int’l Conf. Knowledge-Based Systems and Applied Artificial Intelligence.