A FAST CLUSTERING-BASED FEATURE SUBSET SELECTION ALGORITHM FOR
HIGH-DIMENSIONAL DATA
ABSTRACT:
Feature selection involves identifying a subset of the most useful features that produces
compatible results as the original entire set of features. A feature selection algorithm may be
evaluated from both the efficiency and effectiveness points of view. While the efficiency
concerns the time required to find a subset of features, the effectiveness is related to the quality
of the subset of features. Based on these criteria, a fast clustering-based feature selection
algorithm (FAST) is proposed and experimentally evaluated in this paper.
The FAST algorithm works in two steps.
In the first step, features are divided into clusters by using graph-theoretic clustering methods.
In the second step, the most representative feature that is strongly related to target classes is
selected from each cluster to form a subset of features.
Features in different clusters are relatively independent; the clustering-based strategy of FAST
has a high probability of producing a subset of useful and independent features. To ensure the
efficiency of FAST, we adopt the efficient minimum-spanning tree (MST) clustering method.
The efficiency and effectiveness of the FAST algorithm are evaluated through an empirical
study. Extensive experiments are carried out to compare FAST and several representative feature
selection algorithms results, on 35 publicly available real-world high-dimensional image,
microarray, and text data, demonstrate that the FAST not only produces smaller subsets of
features but also improves the performances of the four types of classifiers.
ECWAY TECHNOLOGIES
IEEE PROJECTS & SOFTWARE DEVELOPMENTS
OUR OFFICES @ CHENNAI / TRICHY / KARUR / ERODE / MADURAI / SALEM / COIMBATORE
CELL: +91 98949 17187, +91 875487 2111 / 3111 / 4111 / 5111 / 6111
VISIT: www.ecwayprojects.com MAIL TO: ecwaytechnologies@gmail.com

More Related Content

DOCX
IEEE 2014 DOTNET DATA MINING PROJECTS Similarity preserving snippet based vis...
PPT
Yangetal Efficient Letkf
PDF
Economic dispatch using fuzzy logic
PPT
Understanding Map Integration Using GIS Software Poster_ff
PDF
Object Tracking By Online Discriminative Feature Selection Algorithm
PDF
IMPL Data Analysis
PPT
Learning from data for wind–wave forecasting
PPTX
Freenome's Biological Machine Learning Platform
IEEE 2014 DOTNET DATA MINING PROJECTS Similarity preserving snippet based vis...
Yangetal Efficient Letkf
Economic dispatch using fuzzy logic
Understanding Map Integration Using GIS Software Poster_ff
Object Tracking By Online Discriminative Feature Selection Algorithm
IMPL Data Analysis
Learning from data for wind–wave forecasting
Freenome's Biological Machine Learning Platform

What's hot (16)

PPTX
Neural Network Presentation
PPT
Graph-Based Technique for Extracting Keyphrases In a Single-Document (GTEK)
PPTX
Collaborative Filtering Survey
PPTX
Ppt manqing
PDF
Poster: Improving Bug Localization with Report Quality Dynamics and Query Ref...
PDF
D0931621
PPTX
A value added predictive defect type distribution model
PDF
Poster: ICPR 2008
DOCX
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Tension in active shapes
DOCX
Cloud migration research a systematic review
PDF
Matlab reversible watermarking based on invariant image classification and d...
PDF
Different approaches for controlling Boolean networks
PPT
One–day wave forecasts based on artificial neural networks
PPTX
New Rough Set Attribute Reduction Algorithm based on Grey Wolf Optimization
PPT
Integrative information management for systems biology
PDF
Java region-based foldings in process discovery
Neural Network Presentation
Graph-Based Technique for Extracting Keyphrases In a Single-Document (GTEK)
Collaborative Filtering Survey
Ppt manqing
Poster: Improving Bug Localization with Report Quality Dynamics and Query Ref...
D0931621
A value added predictive defect type distribution model
Poster: ICPR 2008
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Tension in active shapes
Cloud migration research a systematic review
Matlab reversible watermarking based on invariant image classification and d...
Different approaches for controlling Boolean networks
One–day wave forecasts based on artificial neural networks
New Rough Set Attribute Reduction Algorithm based on Grey Wolf Optimization
Integrative information management for systems biology
Java region-based foldings in process discovery
Ad

Similar to A fast clustering based feature subset selection algorithm for high-dimensional data (20)

DOCX
JAVA 2013 IEEE PROJECT A fast clustering based feature subset selection algor...
DOCX
A fast clustering based feature subset selection algorithm for high-dimension...
DOCX
JAVA 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subset ...
DOCX
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subse...
DOCX
2014 IEEE JAVA DATA MINING PROJECT A fast clustering based feature subset sel...
DOCX
IEEE 2014 JAVA DATA MINING PROJECTS A fast clustering based feature subset se...
DOCX
JAVA 2013 IEEE DATAMINING PROJECT A fast clustering based feature subset sele...
DOCX
A fast clustering based feature subset selection algorithm for high-dimension...
DOCX
A fast clustering based feature subset selection algorithm for high-dimension...
PDF
Iaetsd an efficient and large data base using subset selection algorithm
PDF
Iaetsd an enhanced feature selection for
PDF
C LUSTERING B ASED A TTRIBUTE S UBSET S ELECTION U SING F AST A LGORITHm
PDF
M43016571
PDF
Feature Selection Algorithm for Supervised and Semisupervised Clustering
PDF
Network Based Intrusion Detection System using Filter Based Feature Selection...
PDF
Unsupervised Feature Selection Based on the Distribution of Features Attribut...
PDF
The International Journal of Engineering and Science (The IJES)
PPT
SEO PROCESS
PPTX
33365_Poster for firefly optimization algorithm
PDF
Optimization Technique for Feature Selection and Classification Using Support...
JAVA 2013 IEEE PROJECT A fast clustering based feature subset selection algor...
A fast clustering based feature subset selection algorithm for high-dimension...
JAVA 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subset ...
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subse...
2014 IEEE JAVA DATA MINING PROJECT A fast clustering based feature subset sel...
IEEE 2014 JAVA DATA MINING PROJECTS A fast clustering based feature subset se...
JAVA 2013 IEEE DATAMINING PROJECT A fast clustering based feature subset sele...
A fast clustering based feature subset selection algorithm for high-dimension...
A fast clustering based feature subset selection algorithm for high-dimension...
Iaetsd an efficient and large data base using subset selection algorithm
Iaetsd an enhanced feature selection for
C LUSTERING B ASED A TTRIBUTE S UBSET S ELECTION U SING F AST A LGORITHm
M43016571
Feature Selection Algorithm for Supervised and Semisupervised Clustering
Network Based Intrusion Detection System using Filter Based Feature Selection...
Unsupervised Feature Selection Based on the Distribution of Features Attribut...
The International Journal of Engineering and Science (The IJES)
SEO PROCESS
33365_Poster for firefly optimization algorithm
Optimization Technique for Feature Selection and Classification Using Support...
Ad

Recently uploaded (20)

PDF
Everyday Spelling and Grammar by Kathi Wyldeck
PPTX
Education and Perspectives of Education.pptx
PDF
AI-driven educational solutions for real-life interventions in the Philippine...
PDF
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
PDF
Journal of Dental Science - UDMY (2022).pdf
PDF
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
PDF
Journal of Dental Science - UDMY (2021).pdf
PPTX
Climate Change and Its Global Impact.pptx
PDF
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
PDF
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
PPTX
INSTRUMENT AND INSTRUMENTATION PRESENTATION
PPTX
Module on health assessment of CHN. pptx
PDF
HVAC Specification 2024 according to central public works department
PDF
Farming Based Livelihood Systems English Notes
PDF
PowerPoint for Climate Change by T.T.pdf
PPTX
Core Concepts of Personalized Learning and Virtual Learning Environments
PPTX
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
PPTX
MICROPARA INTRODUCTION XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
PDF
M.Tech in Aerospace Engineering | BIT Mesra
PDF
Journal of Dental Science - UDMY (2020).pdf
Everyday Spelling and Grammar by Kathi Wyldeck
Education and Perspectives of Education.pptx
AI-driven educational solutions for real-life interventions in the Philippine...
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
Journal of Dental Science - UDMY (2022).pdf
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
Journal of Dental Science - UDMY (2021).pdf
Climate Change and Its Global Impact.pptx
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
INSTRUMENT AND INSTRUMENTATION PRESENTATION
Module on health assessment of CHN. pptx
HVAC Specification 2024 according to central public works department
Farming Based Livelihood Systems English Notes
PowerPoint for Climate Change by T.T.pdf
Core Concepts of Personalized Learning and Virtual Learning Environments
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
MICROPARA INTRODUCTION XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
M.Tech in Aerospace Engineering | BIT Mesra
Journal of Dental Science - UDMY (2020).pdf

A fast clustering based feature subset selection algorithm for high-dimensional data

  • 1. A FAST CLUSTERING-BASED FEATURE SUBSET SELECTION ALGORITHM FOR HIGH-DIMENSIONAL DATA ABSTRACT: Feature selection involves identifying a subset of the most useful features that produces compatible results as the original entire set of features. A feature selection algorithm may be evaluated from both the efficiency and effectiveness points of view. While the efficiency concerns the time required to find a subset of features, the effectiveness is related to the quality of the subset of features. Based on these criteria, a fast clustering-based feature selection algorithm (FAST) is proposed and experimentally evaluated in this paper. The FAST algorithm works in two steps. In the first step, features are divided into clusters by using graph-theoretic clustering methods. In the second step, the most representative feature that is strongly related to target classes is selected from each cluster to form a subset of features. Features in different clusters are relatively independent; the clustering-based strategy of FAST has a high probability of producing a subset of useful and independent features. To ensure the efficiency of FAST, we adopt the efficient minimum-spanning tree (MST) clustering method. The efficiency and effectiveness of the FAST algorithm are evaluated through an empirical study. Extensive experiments are carried out to compare FAST and several representative feature selection algorithms results, on 35 publicly available real-world high-dimensional image, microarray, and text data, demonstrate that the FAST not only produces smaller subsets of features but also improves the performances of the four types of classifiers. ECWAY TECHNOLOGIES IEEE PROJECTS & SOFTWARE DEVELOPMENTS OUR OFFICES @ CHENNAI / TRICHY / KARUR / ERODE / MADURAI / SALEM / COIMBATORE CELL: +91 98949 17187, +91 875487 2111 / 3111 / 4111 / 5111 / 6111 VISIT: www.ecwayprojects.com MAIL TO: [email protected]