SlideShare a Scribd company logo
The International Journal Of Engineering And Science (IJES)
||Volume||2 ||Issue|| 6 ||Pages||01-04||2013||
ISSN(e): 2319 – 1813 ISSN(p): 2319 – 1805
www.theijes.com The IJES Page 1
Increasing the Efficiency of Credit Card Fraud Deduction using
Attribute Reduction
Geetha Mary A, Arun Kodnani, Harshit Singhal, Swati Kothari
School of Computing Science and Engineering, VIT University, Vellore, Tamil Nadu, India
----------------------------------------------------------ABSTRACT------------------------------------------------------------
The detection of fraudulent credit card usage is of immense importance for banks as well as the card users and
requires highly efficient techniques to increase the chance of correctly classifying each transaction as fraud or
genuine. One of the techniques used to perform this classification is decision tree. Attribute reduction is used to
increase the efficiency of the technique, which is decided based on entropy.
INDEX TERMS:- Data Mining, Decision tree, data cleaning, Attribute Reduction, Entropy
---------------------------------------------------------------------------------------------------------------------------------------
Date Of Submission: 15 May 2013 Date Of Publication: 10,June.2013
-----------------------------------------------------------------------------------------------------------------------------------------
I. INTRODUCTION
In today’s economy, credit cards hold utmost importance in many sectors. Credit card fraud detection is
a topic which is applicable to many industries including banking and financial sectors, insurance, government
agencies, etc. Fraudulent transactions are a significant problem, one that will grow in importance as the number
of access points grow. Certainly, all transactions which deal with known illegal use are not authorised.
Nevertheless, there are transactions which appear to be valid but experienced people can tell that they are
probably misused, caused by stolen cards or fake merchants. So, the task is to avoid fraud by a credit card
transaction before it is known as illegal.
The paper deals with the problem specific to this special data mining application and tries to solve them by
doing data cleaning[3]using attribute reduction and then applying decision tree technique to achieve improved
efficiency of output.
II. METHODOLOGY
Decision trees are the methodologies to classify data into discrete ones using the tree structured
algorithms. The main purpose of the decision tree is to expose the structural information contained in the data.
Decision tree is made by tentatively selecting an attribute to place on the root node and make one branch for
each possible value of that attribute [1]. Thus, the data set at the root node split and moves into daughter nodes
producing a partial tree. Then an assessment is made of the quality of the split. This process is repeated with all
the attributes. Each attribute chosen for splitting produces a partial tree. Depending on the quality of the partial
tree, one partial tree is selected. This virtually means selecting an attribute for splitting. The process is repeated
for the data in each daughter node of the selected partial tree. If at any time, all instances at the node have the
same classification, stop developing that part of the tree.
The assessment is made depending upon the purity of the daughter nodes produced; the most widely
used measure to do this is called information entropy[1].
Entropy is a concept originated in thermodynamics but later found its way to information theory. In the
decision tree construction process, definition of entropy as a measure of disorder suits well. If the class values of
the data in a node are equally divided among possible values of the class value, it can be said, that entropy is
maximum. If the class values of the data in a node are same for all records, then entropy is minimum.
Through splitting, pure nodes have to be achieved, as far as it is possible. This corresponds to reducing the
entropy of the system. However, this may not be as simple as it sounds for there is no way to stop the training
set from containing two examples with identical sets of attributes but different classes.To overcome the problem
mentioned above, the concept of information gain is used. In addition to information entropy, information
gain[1] also takes into consideration the factor of the number and size of daughter nodes into which attribute
splits the data set.
Increasing The Efficiency Of Credit Card...
www.theijes.com The IJES Page 2
III. APPROACH
There are also cases in which an attribute claims its position as the root node of a certain partial tree
during the building of the decision tree on the basis of both information gain and entropy. There may exist such
an attribute which has same value for majority of the records. Although, it may be an appropriate candidate for
that position but since its value doesn’t vary much over the entire dataset, considering it for classification would
reduce the efficiency of the algorithm. Including these attributes results in undesirable efficiency reduction.
Thus, such unnecessary attributes are reduced first, and then the decision tree algorithm is applied. This can be
shown through the following analysis.
Architecture Diagram
(Module here represents our approach of attribute reduction)
IV. ANALYSIS USING WEKA AND ORANGE
1) Dataset description:
The experimental data considered for analysis contains 1000 records, each consisting of values for
attributes such as over draft, credit usage, existing credits, no. of dependents, employment of the user and more
and finally there is a class label which classifies the transaction based on these values into two classes namely,
good and bad, indicating whether the transaction was legal or illegal.A symbolic field can contain as low as two
values(e.g. the kind of credit card) up to several hundred thousand values (as the code) [2].
Note that the dataset used is a sample dataset, as actual dataset for credit card cannot be accessed since
it is a protected property of the banks or any other concerned financial organization.
The idea proposed in this paper is valid for the entire classification algorithm but for the sake of analysis, only
J48 (i.e. BasedC4.5) is used to show the result.
ATTRIB
UTE
VA
L
MAX
(%)
ATTRIB
UTE
VA
L
MAX
(%)
over_draf
t
4 39.4 Other
_paymen
t
_plans
3 81.4
avg_cred
it_balanc
e
5 60.3 Employe
m-
-ent
5 33.9
other
parties
3 90.7 Credit_
History
5 53
personal_
status
5 54.8 Property
_
magnitud
e
4 33.2
Housing 3 71.3 Purpose 11 28
own_
telephone
2 59.6 Job 4 63
foreign_
worker
2 96.3
(Table 1)
Increasing The Efficiency Of Credit Card...
www.theijes.com The IJES Page 3
VAL: number of the distinct values of the attribute
MAX: the percentage of maximum occurrence of a value in a particular attribute
Snapshot of attribute statistics for the attribute over_draft using ORANGE:
Snapshot of attribute statistics for the attribute foreign_worker using ORANGE:
Using the attribute statistics widget in ORANGE software the maximum percentage of occurrence of a
value in an attribute is obtained. For instance, in case of over_draft the MAX% is very low, hence it should not
be removed, whereas in case of foreign_worker, the value of VAL is very low and that of MAX is very
high,hence, this attribute will decrease efficiency of output and should be removed.
2) Comparison of different classification results done on the dataset:
All the following tests were performed with the help of WEKA software:
Test1- In this test the J48 algorithm has been implemented (i.e. extension of C4.5) on the dataset without
making any changes to any of its attributes.
Test2- In this test the foreign_worker attribute has been removed (as it has the highest % in the MAX column)
and then the same algorithm has been applied to compute the result.
Test3- In this test the other_payment_plansattribute has been removed from the original dataset (note that this
test is exclusive of the previous test i.e.foreign_workerattribute has not been removed) and then the same
algorithmhas been applied to compute the result.
Test4- Similarly, in this test the other_parties attribute has been removed (without changing any other attribute)
and then the same algorithm has been applied to compute the result.
Test5- In this test the housing attribute has been removed (without changing any other attribute) and then the
same algorithm has been applied to compute the result.
Test 6- In this test the foreign_worker, other_payement_plans, other_parties and housing attribute has been
removed (without changing any other attribute) and then the same algorithm has been applied to compute the
result.
After each test is done, the decision tree algorithm is applied; the following table contains the six
outcomes of the six tests respectively.
Tests % of correctly classified
instances using J48
Algorithm
Test 1 70.5
Test 2 72
Test 3 71
Test 4 72.1
Test 5 71
Test 6 72.9
(Table 2)
Snapshot of partial decision tree after test 1 using weka:
Increasing The Efficiency Of Credit Card...
www.theijes.com The IJES Page 4
Snapshot of partial decision tree after test 6 using weka:
(Comparison of Test Case1 with all the Test Cases)
V. CONCLUSION
From the results shown in the table 2, it can be seen that the percentage of the correctly classified
instances of J48 Algorithm is increasing whenever an attribute with a very high percentage of occurrence of a
single value (i.e. high value of MAX in table 1), and with very low number of distinct values for that particular
attribute (i.e. low value for VAL in table 1) throughout the records is being removed. This means that even if the
value of MAX is considerably high, the attribute should not be removed if the value of VAL is also high. In the
last test when all the 4 attributes satisfying this criteria were deleted the percentage of the correctly classified
instances jumped to 72.9% from 70.5% (result in case of normal classification done without deleting any
attribute). This showed an increase of 2.4% whichis a very prominent increase, considering the sensitivity of its
application and its impact on saving major monetary loss.
Hence by removing these kind of attributes the efficiency of the classification can be increased.Thus resulting in
a better and efficient way to identify the fraudulent transactions amongst all the credit card transactions
specified in the dataset.
REFERENCES
[1].
[2]. K.P. Soman, ShyamDiwaka, V. Ajay, “Insight into datamining theory and practice”, prentice-hall of
india, 2006.
[3]. R. Brause, T. Langsdorf, M. Hepp, “Neural Data Mining for Credit Card Fraud Detection”, Frankfurt,
Germany.
[4]. Dipti Thakur, Shalini Bhatia, ”Distributive Data Mining approach to Credit Card Fraud detection”,
SPIT-IEEE Colloquium and International Conference, Mumbai, India

More Related Content

PDF
IRJET- Missing Data Imputation by Evidence Chain
PDF
IRJET - Survey on Clustering based Categorical Data Protection
PDF
Comparative study of various supervisedclassification methodsforanalysing def...
PDF
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
PDF
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...
PDF
Survey on semi supervised classification methods and
PDF
G046024851
PDF
MULTI-PARAMETER BASED PERFORMANCE EVALUATION OF CLASSIFICATION ALGORITHMS
IRJET- Missing Data Imputation by Evidence Chain
IRJET - Survey on Clustering based Categorical Data Protection
Comparative study of various supervisedclassification methodsforanalysing def...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...
Survey on semi supervised classification methods and
G046024851
MULTI-PARAMETER BASED PERFORMANCE EVALUATION OF CLASSIFICATION ALGORITHMS

What's hot (19)

PDF
Survey on semi supervised classification methods and feature selection
PDF
C LUSTERING B ASED A TTRIBUTE S UBSET S ELECTION U SING F AST A LGORITHm
PDF
Control chart pattern recognition using k mica clustering and neural networks
PDF
IRJET- Evidence Chain for Missing Data Imputation: Survey
PDF
An integrated mechanism for feature selection
PDF
Machine_Learning_Trushita
PDF
Data mining Algorithm’s Variant Analysis
PPTX
XL-MINER:Partition
PPTX
XL-MINER:Prediction
PPTX
XL Miner: Classification
PDF
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE
PDF
Improved Frequent Pattern Mining Algorithm using Divide and Conquer Technique...
PPTX
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
PDF
Improved correlation analysis and visualization of industrial alarm data
PDF
CCC-Bicluster Analysis for Time Series Gene Expression Data
PPTX
XL-MINER: Data Exploration
PDF
Preprocessing and Classification in WEKA Using Different Classifiers
PDF
IRJET- The Machine Learning: The method of Artificial Intelligence
PDF
Survey on Feature Selection and Dimensionality Reduction Techniques
Survey on semi supervised classification methods and feature selection
C LUSTERING B ASED A TTRIBUTE S UBSET S ELECTION U SING F AST A LGORITHm
Control chart pattern recognition using k mica clustering and neural networks
IRJET- Evidence Chain for Missing Data Imputation: Survey
An integrated mechanism for feature selection
Machine_Learning_Trushita
Data mining Algorithm’s Variant Analysis
XL-MINER:Partition
XL-MINER:Prediction
XL Miner: Classification
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE
Improved Frequent Pattern Mining Algorithm using Divide and Conquer Technique...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Improved correlation analysis and visualization of industrial alarm data
CCC-Bicluster Analysis for Time Series Gene Expression Data
XL-MINER: Data Exploration
Preprocessing and Classification in WEKA Using Different Classifiers
IRJET- The Machine Learning: The method of Artificial Intelligence
Survey on Feature Selection and Dimensionality Reduction Techniques
Ad

Viewers also liked (18)

PDF
K0371063068
PDF
D0362023027
PDF
C026010018
PDF
The International Journal of Engineering and Science (The IJES)
PDF
The International Journal of Engineering and Science (The IJES)
PDF
D0373024030
PDF
Shellfish shell as a Bio-filler: Preparation, characterization and its effec...
PDF
Sources of Financing Shopping Centers in Lagos Metropolis
PDF
The International Journal of Engineering and Science (The IJES)
PDF
The International Journal of Engineering and Science (The IJES)
PDF
The International Journal of Engineering and Science (The IJES)
PDF
The International Journal of Engineering and Science (The IJES)
PDF
B0350309011
PDF
C0255027032
PDF
Evaluation of Rubber Seed Oil as Foundry Sand-Core Binder in Castings
PDF
E03504025036
PDF
The International Journal of Engineering and Science (The IJES)
PDF
The International Journal of Engineering and Science (The IJES)
K0371063068
D0362023027
C026010018
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
D0373024030
Shellfish shell as a Bio-filler: Preparation, characterization and its effec...
Sources of Financing Shopping Centers in Lagos Metropolis
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
B0350309011
C0255027032
Evaluation of Rubber Seed Oil as Foundry Sand-Core Binder in Castings
E03504025036
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
Ad

Similar to A02610104 (20)

PDF
Lecture 5 Decision tree.pdf
PDF
Classification Algorithms with Attribute Selection: an evaluation study using...
PPTX
DecisionTree.pptx for btech cse student
PDF
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
PDF
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
PDF
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
PDF
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
PDF
Synthetic feature generation to improve accuracy in prediction of credit limits
PDF
J48 and JRIP Rules for E-Governance Data
PDF
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...
PPTX
Decision tree
PDF
Decision tree for data mining and computer
PPTX
Dataming-chapter-7-Classification-Basic.pptx
PDF
Supervised Learning Decision Trees Review of Entropy
PDF
Supervised Learning Decision Trees Machine Learning
PPT
08 classbasic
PPT
08 classbasic
PPT
Data Mining Concepts and Techniques.ppt
PPT
Data Mining Concepts and Techniques.ppt
PDF
Chapter 4.pdf
Lecture 5 Decision tree.pdf
Classification Algorithms with Attribute Selection: an evaluation study using...
DecisionTree.pptx for btech cse student
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
Synthetic feature generation to improve accuracy in prediction of credit limits
J48 and JRIP Rules for E-Governance Data
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...
Decision tree
Decision tree for data mining and computer
Dataming-chapter-7-Classification-Basic.pptx
Supervised Learning Decision Trees Review of Entropy
Supervised Learning Decision Trees Machine Learning
08 classbasic
08 classbasic
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.ppt
Chapter 4.pdf

Recently uploaded (20)

PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Empathic Computing: Creating Shared Understanding
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Encapsulation theory and applications.pdf
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Tartificialntelligence_presentation.pptx
PPTX
TLE Review Electricity (Electricity).pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPTX
A Presentation on Artificial Intelligence
PPTX
Programs and apps: productivity, graphics, security and other tools
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Encapsulation_ Review paper, used for researhc scholars
Empathic Computing: Creating Shared Understanding
A comparative analysis of optical character recognition models for extracting...
Encapsulation theory and applications.pdf
Heart disease approach using modified random forest and particle swarm optimi...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Unlocking AI with Model Context Protocol (MCP)
Digital-Transformation-Roadmap-for-Companies.pptx
cloud_computing_Infrastucture_as_cloud_p
Network Security Unit 5.pdf for BCA BBA.
Tartificialntelligence_presentation.pptx
TLE Review Electricity (Electricity).pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Accuracy of neural networks in brain wave diagnosis of schizophrenia
A Presentation on Artificial Intelligence
Programs and apps: productivity, graphics, security and other tools

A02610104

  • 1. The International Journal Of Engineering And Science (IJES) ||Volume||2 ||Issue|| 6 ||Pages||01-04||2013|| ISSN(e): 2319 – 1813 ISSN(p): 2319 – 1805 www.theijes.com The IJES Page 1 Increasing the Efficiency of Credit Card Fraud Deduction using Attribute Reduction Geetha Mary A, Arun Kodnani, Harshit Singhal, Swati Kothari School of Computing Science and Engineering, VIT University, Vellore, Tamil Nadu, India ----------------------------------------------------------ABSTRACT------------------------------------------------------------ The detection of fraudulent credit card usage is of immense importance for banks as well as the card users and requires highly efficient techniques to increase the chance of correctly classifying each transaction as fraud or genuine. One of the techniques used to perform this classification is decision tree. Attribute reduction is used to increase the efficiency of the technique, which is decided based on entropy. INDEX TERMS:- Data Mining, Decision tree, data cleaning, Attribute Reduction, Entropy --------------------------------------------------------------------------------------------------------------------------------------- Date Of Submission: 15 May 2013 Date Of Publication: 10,June.2013 ----------------------------------------------------------------------------------------------------------------------------------------- I. INTRODUCTION In today’s economy, credit cards hold utmost importance in many sectors. Credit card fraud detection is a topic which is applicable to many industries including banking and financial sectors, insurance, government agencies, etc. Fraudulent transactions are a significant problem, one that will grow in importance as the number of access points grow. Certainly, all transactions which deal with known illegal use are not authorised. Nevertheless, there are transactions which appear to be valid but experienced people can tell that they are probably misused, caused by stolen cards or fake merchants. So, the task is to avoid fraud by a credit card transaction before it is known as illegal. The paper deals with the problem specific to this special data mining application and tries to solve them by doing data cleaning[3]using attribute reduction and then applying decision tree technique to achieve improved efficiency of output. II. METHODOLOGY Decision trees are the methodologies to classify data into discrete ones using the tree structured algorithms. The main purpose of the decision tree is to expose the structural information contained in the data. Decision tree is made by tentatively selecting an attribute to place on the root node and make one branch for each possible value of that attribute [1]. Thus, the data set at the root node split and moves into daughter nodes producing a partial tree. Then an assessment is made of the quality of the split. This process is repeated with all the attributes. Each attribute chosen for splitting produces a partial tree. Depending on the quality of the partial tree, one partial tree is selected. This virtually means selecting an attribute for splitting. The process is repeated for the data in each daughter node of the selected partial tree. If at any time, all instances at the node have the same classification, stop developing that part of the tree. The assessment is made depending upon the purity of the daughter nodes produced; the most widely used measure to do this is called information entropy[1]. Entropy is a concept originated in thermodynamics but later found its way to information theory. In the decision tree construction process, definition of entropy as a measure of disorder suits well. If the class values of the data in a node are equally divided among possible values of the class value, it can be said, that entropy is maximum. If the class values of the data in a node are same for all records, then entropy is minimum. Through splitting, pure nodes have to be achieved, as far as it is possible. This corresponds to reducing the entropy of the system. However, this may not be as simple as it sounds for there is no way to stop the training set from containing two examples with identical sets of attributes but different classes.To overcome the problem mentioned above, the concept of information gain is used. In addition to information entropy, information gain[1] also takes into consideration the factor of the number and size of daughter nodes into which attribute splits the data set.
  • 2. Increasing The Efficiency Of Credit Card... www.theijes.com The IJES Page 2 III. APPROACH There are also cases in which an attribute claims its position as the root node of a certain partial tree during the building of the decision tree on the basis of both information gain and entropy. There may exist such an attribute which has same value for majority of the records. Although, it may be an appropriate candidate for that position but since its value doesn’t vary much over the entire dataset, considering it for classification would reduce the efficiency of the algorithm. Including these attributes results in undesirable efficiency reduction. Thus, such unnecessary attributes are reduced first, and then the decision tree algorithm is applied. This can be shown through the following analysis. Architecture Diagram (Module here represents our approach of attribute reduction) IV. ANALYSIS USING WEKA AND ORANGE 1) Dataset description: The experimental data considered for analysis contains 1000 records, each consisting of values for attributes such as over draft, credit usage, existing credits, no. of dependents, employment of the user and more and finally there is a class label which classifies the transaction based on these values into two classes namely, good and bad, indicating whether the transaction was legal or illegal.A symbolic field can contain as low as two values(e.g. the kind of credit card) up to several hundred thousand values (as the code) [2]. Note that the dataset used is a sample dataset, as actual dataset for credit card cannot be accessed since it is a protected property of the banks or any other concerned financial organization. The idea proposed in this paper is valid for the entire classification algorithm but for the sake of analysis, only J48 (i.e. BasedC4.5) is used to show the result. ATTRIB UTE VA L MAX (%) ATTRIB UTE VA L MAX (%) over_draf t 4 39.4 Other _paymen t _plans 3 81.4 avg_cred it_balanc e 5 60.3 Employe m- -ent 5 33.9 other parties 3 90.7 Credit_ History 5 53 personal_ status 5 54.8 Property _ magnitud e 4 33.2 Housing 3 71.3 Purpose 11 28 own_ telephone 2 59.6 Job 4 63 foreign_ worker 2 96.3 (Table 1)
  • 3. Increasing The Efficiency Of Credit Card... www.theijes.com The IJES Page 3 VAL: number of the distinct values of the attribute MAX: the percentage of maximum occurrence of a value in a particular attribute Snapshot of attribute statistics for the attribute over_draft using ORANGE: Snapshot of attribute statistics for the attribute foreign_worker using ORANGE: Using the attribute statistics widget in ORANGE software the maximum percentage of occurrence of a value in an attribute is obtained. For instance, in case of over_draft the MAX% is very low, hence it should not be removed, whereas in case of foreign_worker, the value of VAL is very low and that of MAX is very high,hence, this attribute will decrease efficiency of output and should be removed. 2) Comparison of different classification results done on the dataset: All the following tests were performed with the help of WEKA software: Test1- In this test the J48 algorithm has been implemented (i.e. extension of C4.5) on the dataset without making any changes to any of its attributes. Test2- In this test the foreign_worker attribute has been removed (as it has the highest % in the MAX column) and then the same algorithm has been applied to compute the result. Test3- In this test the other_payment_plansattribute has been removed from the original dataset (note that this test is exclusive of the previous test i.e.foreign_workerattribute has not been removed) and then the same algorithmhas been applied to compute the result. Test4- Similarly, in this test the other_parties attribute has been removed (without changing any other attribute) and then the same algorithm has been applied to compute the result. Test5- In this test the housing attribute has been removed (without changing any other attribute) and then the same algorithm has been applied to compute the result. Test 6- In this test the foreign_worker, other_payement_plans, other_parties and housing attribute has been removed (without changing any other attribute) and then the same algorithm has been applied to compute the result. After each test is done, the decision tree algorithm is applied; the following table contains the six outcomes of the six tests respectively. Tests % of correctly classified instances using J48 Algorithm Test 1 70.5 Test 2 72 Test 3 71 Test 4 72.1 Test 5 71 Test 6 72.9 (Table 2) Snapshot of partial decision tree after test 1 using weka:
  • 4. Increasing The Efficiency Of Credit Card... www.theijes.com The IJES Page 4 Snapshot of partial decision tree after test 6 using weka: (Comparison of Test Case1 with all the Test Cases) V. CONCLUSION From the results shown in the table 2, it can be seen that the percentage of the correctly classified instances of J48 Algorithm is increasing whenever an attribute with a very high percentage of occurrence of a single value (i.e. high value of MAX in table 1), and with very low number of distinct values for that particular attribute (i.e. low value for VAL in table 1) throughout the records is being removed. This means that even if the value of MAX is considerably high, the attribute should not be removed if the value of VAL is also high. In the last test when all the 4 attributes satisfying this criteria were deleted the percentage of the correctly classified instances jumped to 72.9% from 70.5% (result in case of normal classification done without deleting any attribute). This showed an increase of 2.4% whichis a very prominent increase, considering the sensitivity of its application and its impact on saving major monetary loss. Hence by removing these kind of attributes the efficiency of the classification can be increased.Thus resulting in a better and efficient way to identify the fraudulent transactions amongst all the credit card transactions specified in the dataset. REFERENCES [1]. [2]. K.P. Soman, ShyamDiwaka, V. Ajay, “Insight into datamining theory and practice”, prentice-hall of india, 2006. [3]. R. Brause, T. Langsdorf, M. Hepp, “Neural Data Mining for Credit Card Fraud Detection”, Frankfurt, Germany. [4]. Dipti Thakur, Shalini Bhatia, ”Distributive Data Mining approach to Credit Card Fraud detection”, SPIT-IEEE Colloquium and International Conference, Mumbai, India