SlideShare a Scribd company logo
International Journal of Database Management Systems ( IJDMS ) Vol.5, No.2, April 2013
DOI : 10.5121/ijdms.2013.5203 29
PRIVACY PRESERVING DATA MINING BASED
ON VECTOR QUANTIZATION
D.Aruna Kumari1
, Dr.K.Rajasekhara Rao2
, M.Suman3
1,3
Department of Electronics and Computer Engineering, Associate professors ,CSI Life
Member K.L.University, Vaddeswaram,Guntur
1
Aruna_D@kluniversity.in and 3
suman.maloji@gmail.com
2
Department of Computer Science and Engineering ,professor, K.L.University,
Vaddeswaram,Guntur
2
krr@kluniversity.in
1,2,3
CSI LIFE MEMEBERS, 3
CSI-AP Student CO-coordinator
ABSTRACT
Huge Volumes of detailed personal data is continuously collected and analyzed by different types of
applications using data mining, analysing such data is beneficial to the application users. It is an important
asset to application users like business organizations, governments for taking effective decisions. But
analysing such data opens treats to privacy if not done properly. This work aims to reveal the information
by protecting sensitive data. Various methods including Randomization, k-anonymity and data hiding have
been suggested for the same. In this work, a novel technique is suggested that makes use of LBG design
algorithm to preserve the privacy of data along with compression of data. Quantization will be performed
on training data it will produce transformed data set. It provides individual privacy while allowing
extraction of useful knowledge from data, Hence privacy is preserved. Distortion measures are used to
analyze the accuracy of transformed data.
KEYWORDS
Vector quantization, code book generation, privacy preserving data mining ,k-means clustering.
1. INTRODUCTION
Privacy preserving data mining (PPDM) is one of the important area of data mining that aims to
provide security for secret information from unsolicited or unsanctioned disclosure. Data mining
techniques analyzes and predicts useful information. Analyzing such data may opens treat to
privacy .The concept of privacy preserving data mining is primarily concerned with protecting
secret data against unsolicited access. It is important because Now a days Treat to privacy is
becoming real since data mining techniques are able to predict high sensitive knowledge from
huge volumes of data[1].
Authors Agrawal & Srikant introduced the problem of “privacy preserving data mining” and it
was also introduced by Lindell & Pinkas. Those papers have concentrated on privacy preserving
data mining using randomization and cryptographic techniques. Lindell and Pinkas designed new
approach to PPDM using Cryptography but cryptography solution does not provides expected
International Journal of Database Management Systems ( IJDMS ) Vol.5, No.2, April 2013
30
accuracy after mining result. And agrawal and srikanth focused on randomization and preserving
privacy when the data is taken from multiple parties. When the data is coming from multiple
sources then also privacy should be maintained. Now a days this privacy preserving data mining
is becoming one of the focusing area because data mining predicts more valuable information that
may be beneficial to the business, education systems, medical field, political ,…etc.
2. RELATED WORK
Many Data modification techniques are discussed in [1 ][3][4]
A. Perturbation or Randomization:
Agrawal and Srikant (2000) Introduced the randomization algorithm for PPDM, Randomization
allows a several number of users to submit their sensitive data for effective centralized data
mining while limiting the disclosure of sensitive values. It is relatively easy and effective
technique for protecting sensitive electronic data from unauthorized use. In this case there is one
server and multiple clients will operate ,Clients are supposed to send their data to server for
mining purpose , in this approach each client adds some random noise before sending it to the
server. So Sever will perform mining on that randomized data.
B. Suppression
Another way of preserving the privacy is suppressing the sensitive data before any disclosure or
before actual mining takes place. Generally Data contains several attributes, where some of the
attributes may poses personal information and some of the attributes predicts valuable
information. So we can suppress the attributes in particular fashion that reveals the personal
information.
They are different types are there
1. Rounding
2. Generalization
In rounding process the values like 23.56 will be rounded to 23 and 25.77 rounded to
26,…etc
In generalization process, values will be generalized like an address is represented with zip
code.
if data mining requires full access to the entire database at that time all this privacy
preserving data mining techniques are not required.
C. Cryptography
This is Also one of the famous approach for data modification techniques, Here Original Data
will be encrypted and encrypted data will be given to data miners. If data owners require original
data back they will apply decryption techniques.
3. VECTOR QUANTIZATION
This is the new technique proposed by (D.Aruna Kumari, Dr.k.Rajasekhara Rao, Suman,)), it
transforms the original data to a new form using LBG. The design of a Vector Quantization
consist of following steps:
International Journal of Database Management Systems ( IJDMS ) Vol.5, No.2, April 2013
• Design a codebook from input training data set;
• Encoding the original point of data with the indices of the nearest code vectors in the
codebook;
• Use index representation so as to reconstruct the data by looking up in the codebook.
For our PPDM problem , reconstructing the original data is not required,
involved such that it is difficult to get the original data back hence privacy is preserved.
LBG Design Algorithm
1. Given . Fixed
2. Let and
Calculate
3. Splitting: For
Set .
4. Iteration: Let
i. For
over all . Let
ii. For
iii. Set .
iv. Calculate
International Journal of Database Management Systems ( IJDMS ) Vol.5, No.2, April 2013
Design a codebook from input training data set;
ding the original point of data with the indices of the nearest code vectors in the
Use index representation so as to reconstruct the data by looking up in the codebook.
For our PPDM problem , reconstructing the original data is not required, so above two steps are
involved such that it is difficult to get the original data back hence privacy is preserved.
to be a ``small'' number.
, set
. Set the iteration index .
, find the minimum value of
. Let be the index which achieves the minimum. Set
, update the codevector
.
International Journal of Database Management Systems ( IJDMS ) Vol.5, No.2, April 2013
31
ding the original point of data with the indices of the nearest code vectors in the
Use index representation so as to reconstruct the data by looking up in the codebook.
so above two steps are
involved such that it is difficult to get the original data back hence privacy is preserved.
be the index which achieves the minimum. Set
International Journal of Database Management Systems ( IJDMS ) Vol.5, No.2, April 2013
v. If
vi. Set
as the final codevectors.
5. Repeat Steps 3 and 4 until the desired number of codevectors is obtained.
codebook is generated, one can perform transformation using quantization
Results :
We have implemented above LBG algorithm using Matlab
output screen shots Blue line represents original data and red line represents Codebook that is
compressed form of original data , hence it does not reveal the complete original information and
it will reveal only cluster centroids
Figure 1: VQ based on LBG design Figure 2: VQ based on LBG design Algorithm y=x;
Algorithm y=exp(x);
Figure 3: VQ based on LBG design Algorithm
y=x2; y=sin(x);
International Journal of Database Management Systems ( IJDMS ) Vol.5, No.2, April 2013
, go back to Step (i).
. For , set
as the final codevectors.
Repeat Steps 3 and 4 until the desired number of codevectors is obtained.
codebook is generated, one can perform transformation using quantization
We have implemented above LBG algorithm using Matlab Software, and tested the results. In the
output screen shots Blue line represents original data and red line represents Codebook that is
compressed form of original data , hence it does not reveal the complete original information and
luster centroids.
Figure 1: VQ based on LBG design Figure 2: VQ based on LBG design Algorithm y=x;
Figure 3: VQ based on LBG design Algorithm Figure 4: VQ based on LBG design Algorithm
International Journal of Database Management Systems ( IJDMS ) Vol.5, No.2, April 2013
32
Repeat Steps 3 and 4 until the desired number of codevectors is obtained. Once the
Software, and tested the results. In the
output screen shots Blue line represents original data and red line represents Codebook that is
compressed form of original data , hence it does not reveal the complete original information and
Figure 1: VQ based on LBG design Figure 2: VQ based on LBG design Algorithm y=x;
Algorithm
International Journal of Database Management Systems ( IJDMS ) Vol.5, No.2, April 2013
33
Backup And recovery of Information Loss
Generally Privacy preserving data mining , we apply some techniques for modifying data and
that modified data will be given to data miners.
In this paper we also concentrates keeping of original data as it is, so whenever data miners or
owners of that data requires original data , they will get it by maintaining a backup copy of that
data.
1. Taking all the data in to a DB1
2. We have to copy all the data in DB1 to DB2 (DB1=DB2)
3. Perform Data modification using LBG Design Algorithm and store the data in to DB1
4. Write the program to delete the all the contents in the backup of DB1,(so no except the
users cannot access the information in DB1)
5. To see the actual details and to modify the details the admin should access the
information in DB2
6. So, far the users know only DB1, data base owners can access to DB2
Bit Error Rate
In Data transmission, the number of bit errors is the number of received bits of a data stream
over a communication channel that have been altered due to noise, interference, distortion or bit
synchronization errors.
In our problem, we are transforming the original data to some other form using vector
quantization. Hence we need to calculate the bit error rate for compressed data.
Always we try to minimise the bit error rate for accuracy
For example
Original data is
1 0 1 0 1 1 0 0 1 0
And after the transformation, the received data is
1 1 1 0 1 1 0 1 1 0
(Two errors are there, i.e, we are not receiving exact data only 80% accuracy is achieved because
of two bit errors)
4. CONCLUSIONS
This work is based on vector quantization , it is a new approach for privacy preserving data
mining, upon applying this encoding procedure one cannot reveal the original data hence privacy
is preserved. At the same time one can get the accurate clustering results. Finally we would like
conclude that Efficiency depends on the code book generation.
International Journal of Database Management Systems ( IJDMS ) Vol.5, No.2, April 2013
34
REFERENCES
[1] D.Aruna Kumari , Dr.K.Rajasekhar rao, M.suman “ Privacy preserving distributed data mining using
steganography “In Procc. Of CNSA-2010, Springer Libyary
[2] T.Anuradha, suman M,Aruna Kumari D “Data obscuration in privacy preserving data mining in Procc
International conference on web sciences ICWS 2009.
[3] Agrawal, R. & Srikant, R. (2000). Privacy Preserving Data Mining. In Proc. of ACM SIGMOD
Conference on Management of Data (SIGMOD’00), Dallas, TX.
[4] Alexandre Evfimievski, Tyrone Grandison Privacy Preserving Data Mining. IBM Almaden Research
Center 650 Harry Road, San Jose, California 95120, USA
[5] Agarwal Charu C., Yu Philip S., Privacy Preserving Data Mining: Models and Algorithms, New
York, Springer, 2008.
[6] Oliveira S.R.M, Zaiane Osmar R., A Privacy-Preserving Clustering Approach Toward Secure and
Effective Data Analysis for Business Collaboration, In Proceedings of the International Workshop on
Privacy and Security Aspects of Data Mining in conjunction with ICDM 2004, Brighton, UK,
November 2004.
[7] Wang Qiang , Megalooikonomou, Vasileios, A dimensionality reduction technique for efficient time
series similarity analysis, Inf. Syst. 33, 1 (Mar.2008), 115- 132.
[8] UCI Repository of machine learning databases, University of California,
Irvine.http://archive.ics.uci.edu/ml/
[9] Wikipedia. Data mining. http://en.wikipedia.org/wiki/Data_mining
[10] Kurt Thearling, Information about data mining and analytic technologies http://www.thearling.com/
[11] Flavius L. Gorgônio and José Alfredo F. Costa “Privacy-Preserving Clustering on Distributed
Databases: A Review and Some Contributions
[12] D.Aruna Kumari, Dr.K.rajasekhar rao,M.Suman “Privacy preserving distributed data mining: a new
approach for detecting network traffic using steganography” in international journal of systems and
technology(IJST) june 2011.
[13] Binit kumar Sinha “Privacy preserving clustering in data mining”.
[14] C. W. Tsai, C. Y. Lee, M. C. Chiang, and C. S. Yang, A Fast VQ Codebook Generation Algorithm
via Pattern Reduction, Pattern Recognition Letters, vol. 30, pp. 653{660, 2009}
[15] K.Somasundaram, S.Vimala, “A Novel Codebook Initialization Technique for Generalized Lloyd
Algorithm using Cluster Density”, International Journal on Computer Science and Engineering, Vol.
2, No. 5, pp. 1807-1809, 2010.
[16] K.Somasundaram, S.Vimala, “Codebook Generation for Vector Quantization with Edge Features”,
CiiT International Journal of Digital Image Processing, Vol. 2, No.7, pp. 194-198, 2010.
[17] Vassilios S. Verykios, Elisa Bertino, Igor Nai Fovino State-of-the-art in Privacy Preserving Data
Mining in SIGMOD Record, Vol. 33, No. 1, March 2004.
[18] Maloji Suman,Habibulla Khan,M. Madhavi Latha,D. Aruna Kumari “Speech Enhancement and
Recognition of Compressed Speech Signal in Noisy Reverberant Conditions “ Springer -Advances in
Intelligent and Soft Computing (AISC) Volume 132, 2012, pp 379-386

More Related Content

PDF
PRIVACY PRESERVING CLUSTERING IN DATA MINING USING VQ CODE BOOK GENERATION
PDF
VECTOR QUANTIZATION FOR PRIVACY PRESERVING CLUSTERING IN DATA MINING
PDF
Survey paper on Big Data Imputation and Privacy Algorithms
PDF
Reversible Data Hiding In Encrypted Images And Its Application To Secure Miss...
PDF
A Survey on Privacy-Preserving Data Aggregation Without Secure Channel
PDF
IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)
PDF
Dft based individual extraction of steganographic compression of images
PDF
Dft based individual extraction of steganographic compression of images
PRIVACY PRESERVING CLUSTERING IN DATA MINING USING VQ CODE BOOK GENERATION
VECTOR QUANTIZATION FOR PRIVACY PRESERVING CLUSTERING IN DATA MINING
Survey paper on Big Data Imputation and Privacy Algorithms
Reversible Data Hiding In Encrypted Images And Its Application To Secure Miss...
A Survey on Privacy-Preserving Data Aggregation Without Secure Channel
IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)
Dft based individual extraction of steganographic compression of images
Dft based individual extraction of steganographic compression of images

What's hot (20)

PDF
Privacy preserving and delegated access control for cloud applications
DOCX
High performance intrusion detection using modified k mean & naïve bayes
PDF
PRIVACY PRESERVING DATA MINING BY USING IMPLICIT FUNCTION THEOREM
PDF
Frequency and similarity aware partitioning for cloud storage based on space ...
PDF
Framework to Avoid Similarity Attack in Big Streaming Data
PDF
A Survey on Different Data Hiding Techniques in Encrypted Images
PDF
Histogram-based multilayer reversible data hiding method for securing secret ...
PDF
Performance Analysis of Hashing Mathods on the Employment of App
PDF
LOSSLESS RECONSTRUCTION OF SECRET IMAGE USING THRESHOLD SECRET SHARING AND TR...
PDF
Protecting Data by Improving Quality of Stego Image based on Enhanced Reduced...
PDF
The Use of K-NN and Bees Algorithm for Big Data Intrusion Detection System
PDF
Paper id 212014109
PDF
Jj3616251628
PDF
Additive gaussian noise based data perturbation in multi level trust privacy ...
PDF
Privacy Preserving Data Mining Technique to Recover Association Rules Using H...
PDF
A Review on Reversible Data Hiding Scheme by Image Contrast Enhancement
PDF
Fast Range Aggregate Queries for Big Data Analysis
PDF
1699 1704
DOCX
Implementation of digital image watermarking techniques using dwt and dwt svd...
PDF
EXECUTION OF ASSOCIATION RULE MINING WITH DATA GRIDS IN WEKA 3.8
Privacy preserving and delegated access control for cloud applications
High performance intrusion detection using modified k mean & naïve bayes
PRIVACY PRESERVING DATA MINING BY USING IMPLICIT FUNCTION THEOREM
Frequency and similarity aware partitioning for cloud storage based on space ...
Framework to Avoid Similarity Attack in Big Streaming Data
A Survey on Different Data Hiding Techniques in Encrypted Images
Histogram-based multilayer reversible data hiding method for securing secret ...
Performance Analysis of Hashing Mathods on the Employment of App
LOSSLESS RECONSTRUCTION OF SECRET IMAGE USING THRESHOLD SECRET SHARING AND TR...
Protecting Data by Improving Quality of Stego Image based on Enhanced Reduced...
The Use of K-NN and Bees Algorithm for Big Data Intrusion Detection System
Paper id 212014109
Jj3616251628
Additive gaussian noise based data perturbation in multi level trust privacy ...
Privacy Preserving Data Mining Technique to Recover Association Rules Using H...
A Review on Reversible Data Hiding Scheme by Image Contrast Enhancement
Fast Range Aggregate Queries for Big Data Analysis
1699 1704
Implementation of digital image watermarking techniques using dwt and dwt svd...
EXECUTION OF ASSOCIATION RULE MINING WITH DATA GRIDS IN WEKA 3.8
Ad

Similar to PRIVACY PRESERVING DATA MINING BASED ON VECTOR QUANTIZATION (20)

PDF
A Review on Privacy Preservation in Data Mining
PDF
A Review on Privacy Preservation in Data Mining
PDF
A Review on Privacy Preservation in Data Mining
PDF
A review on privacy preservation in data mining
PDF
Privacy Preserving Mining in Code Profiling Data
PDF
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)
PDF
Data attribute security and privacy in Collaborative distributed database Pub...
PDF
Misusability Measure Based Sanitization of Big Data for Privacy Preserving Ma...
PDF
IRJET- Secure Data Deduplication and Auditing for Cloud Data Storage
DOCX
High performance intrusion detection using modified k mean & naïve bayes
PDF
Comparison Between WEKA and Salford System in Data Mining Software
PDF
IRJET - A Secure AMR Stganography Scheme based on Pulse Distribution Mode...
PDF
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
DOCX
Seminar Report Vaibhav
PDF
IRJET - Security Model for Preserving the Privacy of Medical Big Data in ...
PDF
Different Classification Technique for Data mining in Insurance Industry usin...
PDF
Detecting Password brute force attack and Protecting the cloud data with AES ...
PDF
ACTIVITY SPOTTER DURING MEDICAL TREATMENT USING VISUAL CRYPTOGRAPHY TECHNIQUE
PDF
Ej24856861
PDF
Distributed Framework for Data Mining As a Service on Private Cloud
A Review on Privacy Preservation in Data Mining
A Review on Privacy Preservation in Data Mining
A Review on Privacy Preservation in Data Mining
A review on privacy preservation in data mining
Privacy Preserving Mining in Code Profiling Data
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)
Data attribute security and privacy in Collaborative distributed database Pub...
Misusability Measure Based Sanitization of Big Data for Privacy Preserving Ma...
IRJET- Secure Data Deduplication and Auditing for Cloud Data Storage
High performance intrusion detection using modified k mean & naïve bayes
Comparison Between WEKA and Salford System in Data Mining Software
IRJET - A Secure AMR Stganography Scheme based on Pulse Distribution Mode...
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
Seminar Report Vaibhav
IRJET - Security Model for Preserving the Privacy of Medical Big Data in ...
Different Classification Technique for Data mining in Insurance Industry usin...
Detecting Password brute force attack and Protecting the cloud data with AES ...
ACTIVITY SPOTTER DURING MEDICAL TREATMENT USING VISUAL CRYPTOGRAPHY TECHNIQUE
Ej24856861
Distributed Framework for Data Mining As a Service on Private Cloud
Ad

Recently uploaded (20)

PDF
Heart disease approach using modified random forest and particle swarm optimi...
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Mushroom cultivation and it's methods.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
Machine learning based COVID-19 study performance prediction
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Machine Learning_overview_presentation.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
Heart disease approach using modified random forest and particle swarm optimi...
Programs and apps: productivity, graphics, security and other tools
MIND Revenue Release Quarter 2 2025 Press Release
TLE Review Electricity (Electricity).pptx
Mushroom cultivation and it's methods.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
cloud_computing_Infrastucture_as_cloud_p
Machine learning based COVID-19 study performance prediction
Diabetes mellitus diagnosis method based random forest with bat algorithm
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Building Integrated photovoltaic BIPV_UPV.pdf
A comparative study of natural language inference in Swahili using monolingua...
Advanced methodologies resolving dimensionality complications for autism neur...
OMC Textile Division Presentation 2021.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Reach Out and Touch Someone: Haptics and Empathic Computing
Machine Learning_overview_presentation.pptx
Network Security Unit 5.pdf for BCA BBA.

PRIVACY PRESERVING DATA MINING BASED ON VECTOR QUANTIZATION

  • 1. International Journal of Database Management Systems ( IJDMS ) Vol.5, No.2, April 2013 DOI : 10.5121/ijdms.2013.5203 29 PRIVACY PRESERVING DATA MINING BASED ON VECTOR QUANTIZATION D.Aruna Kumari1 , Dr.K.Rajasekhara Rao2 , M.Suman3 1,3 Department of Electronics and Computer Engineering, Associate professors ,CSI Life Member K.L.University, Vaddeswaram,Guntur 1 [email protected] and 3 [email protected] 2 Department of Computer Science and Engineering ,professor, K.L.University, Vaddeswaram,Guntur 2 [email protected] 1,2,3 CSI LIFE MEMEBERS, 3 CSI-AP Student CO-coordinator ABSTRACT Huge Volumes of detailed personal data is continuously collected and analyzed by different types of applications using data mining, analysing such data is beneficial to the application users. It is an important asset to application users like business organizations, governments for taking effective decisions. But analysing such data opens treats to privacy if not done properly. This work aims to reveal the information by protecting sensitive data. Various methods including Randomization, k-anonymity and data hiding have been suggested for the same. In this work, a novel technique is suggested that makes use of LBG design algorithm to preserve the privacy of data along with compression of data. Quantization will be performed on training data it will produce transformed data set. It provides individual privacy while allowing extraction of useful knowledge from data, Hence privacy is preserved. Distortion measures are used to analyze the accuracy of transformed data. KEYWORDS Vector quantization, code book generation, privacy preserving data mining ,k-means clustering. 1. INTRODUCTION Privacy preserving data mining (PPDM) is one of the important area of data mining that aims to provide security for secret information from unsolicited or unsanctioned disclosure. Data mining techniques analyzes and predicts useful information. Analyzing such data may opens treat to privacy .The concept of privacy preserving data mining is primarily concerned with protecting secret data against unsolicited access. It is important because Now a days Treat to privacy is becoming real since data mining techniques are able to predict high sensitive knowledge from huge volumes of data[1]. Authors Agrawal & Srikant introduced the problem of “privacy preserving data mining” and it was also introduced by Lindell & Pinkas. Those papers have concentrated on privacy preserving data mining using randomization and cryptographic techniques. Lindell and Pinkas designed new approach to PPDM using Cryptography but cryptography solution does not provides expected
  • 2. International Journal of Database Management Systems ( IJDMS ) Vol.5, No.2, April 2013 30 accuracy after mining result. And agrawal and srikanth focused on randomization and preserving privacy when the data is taken from multiple parties. When the data is coming from multiple sources then also privacy should be maintained. Now a days this privacy preserving data mining is becoming one of the focusing area because data mining predicts more valuable information that may be beneficial to the business, education systems, medical field, political ,…etc. 2. RELATED WORK Many Data modification techniques are discussed in [1 ][3][4] A. Perturbation or Randomization: Agrawal and Srikant (2000) Introduced the randomization algorithm for PPDM, Randomization allows a several number of users to submit their sensitive data for effective centralized data mining while limiting the disclosure of sensitive values. It is relatively easy and effective technique for protecting sensitive electronic data from unauthorized use. In this case there is one server and multiple clients will operate ,Clients are supposed to send their data to server for mining purpose , in this approach each client adds some random noise before sending it to the server. So Sever will perform mining on that randomized data. B. Suppression Another way of preserving the privacy is suppressing the sensitive data before any disclosure or before actual mining takes place. Generally Data contains several attributes, where some of the attributes may poses personal information and some of the attributes predicts valuable information. So we can suppress the attributes in particular fashion that reveals the personal information. They are different types are there 1. Rounding 2. Generalization In rounding process the values like 23.56 will be rounded to 23 and 25.77 rounded to 26,…etc In generalization process, values will be generalized like an address is represented with zip code. if data mining requires full access to the entire database at that time all this privacy preserving data mining techniques are not required. C. Cryptography This is Also one of the famous approach for data modification techniques, Here Original Data will be encrypted and encrypted data will be given to data miners. If data owners require original data back they will apply decryption techniques. 3. VECTOR QUANTIZATION This is the new technique proposed by (D.Aruna Kumari, Dr.k.Rajasekhara Rao, Suman,)), it transforms the original data to a new form using LBG. The design of a Vector Quantization consist of following steps:
  • 3. International Journal of Database Management Systems ( IJDMS ) Vol.5, No.2, April 2013 • Design a codebook from input training data set; • Encoding the original point of data with the indices of the nearest code vectors in the codebook; • Use index representation so as to reconstruct the data by looking up in the codebook. For our PPDM problem , reconstructing the original data is not required, involved such that it is difficult to get the original data back hence privacy is preserved. LBG Design Algorithm 1. Given . Fixed 2. Let and Calculate 3. Splitting: For Set . 4. Iteration: Let i. For over all . Let ii. For iii. Set . iv. Calculate International Journal of Database Management Systems ( IJDMS ) Vol.5, No.2, April 2013 Design a codebook from input training data set; ding the original point of data with the indices of the nearest code vectors in the Use index representation so as to reconstruct the data by looking up in the codebook. For our PPDM problem , reconstructing the original data is not required, so above two steps are involved such that it is difficult to get the original data back hence privacy is preserved. to be a ``small'' number. , set . Set the iteration index . , find the minimum value of . Let be the index which achieves the minimum. Set , update the codevector . International Journal of Database Management Systems ( IJDMS ) Vol.5, No.2, April 2013 31 ding the original point of data with the indices of the nearest code vectors in the Use index representation so as to reconstruct the data by looking up in the codebook. so above two steps are involved such that it is difficult to get the original data back hence privacy is preserved. be the index which achieves the minimum. Set
  • 4. International Journal of Database Management Systems ( IJDMS ) Vol.5, No.2, April 2013 v. If vi. Set as the final codevectors. 5. Repeat Steps 3 and 4 until the desired number of codevectors is obtained. codebook is generated, one can perform transformation using quantization Results : We have implemented above LBG algorithm using Matlab output screen shots Blue line represents original data and red line represents Codebook that is compressed form of original data , hence it does not reveal the complete original information and it will reveal only cluster centroids Figure 1: VQ based on LBG design Figure 2: VQ based on LBG design Algorithm y=x; Algorithm y=exp(x); Figure 3: VQ based on LBG design Algorithm y=x2; y=sin(x); International Journal of Database Management Systems ( IJDMS ) Vol.5, No.2, April 2013 , go back to Step (i). . For , set as the final codevectors. Repeat Steps 3 and 4 until the desired number of codevectors is obtained. codebook is generated, one can perform transformation using quantization We have implemented above LBG algorithm using Matlab Software, and tested the results. In the output screen shots Blue line represents original data and red line represents Codebook that is compressed form of original data , hence it does not reveal the complete original information and luster centroids. Figure 1: VQ based on LBG design Figure 2: VQ based on LBG design Algorithm y=x; Figure 3: VQ based on LBG design Algorithm Figure 4: VQ based on LBG design Algorithm International Journal of Database Management Systems ( IJDMS ) Vol.5, No.2, April 2013 32 Repeat Steps 3 and 4 until the desired number of codevectors is obtained. Once the Software, and tested the results. In the output screen shots Blue line represents original data and red line represents Codebook that is compressed form of original data , hence it does not reveal the complete original information and Figure 1: VQ based on LBG design Figure 2: VQ based on LBG design Algorithm y=x; Algorithm
  • 5. International Journal of Database Management Systems ( IJDMS ) Vol.5, No.2, April 2013 33 Backup And recovery of Information Loss Generally Privacy preserving data mining , we apply some techniques for modifying data and that modified data will be given to data miners. In this paper we also concentrates keeping of original data as it is, so whenever data miners or owners of that data requires original data , they will get it by maintaining a backup copy of that data. 1. Taking all the data in to a DB1 2. We have to copy all the data in DB1 to DB2 (DB1=DB2) 3. Perform Data modification using LBG Design Algorithm and store the data in to DB1 4. Write the program to delete the all the contents in the backup of DB1,(so no except the users cannot access the information in DB1) 5. To see the actual details and to modify the details the admin should access the information in DB2 6. So, far the users know only DB1, data base owners can access to DB2 Bit Error Rate In Data transmission, the number of bit errors is the number of received bits of a data stream over a communication channel that have been altered due to noise, interference, distortion or bit synchronization errors. In our problem, we are transforming the original data to some other form using vector quantization. Hence we need to calculate the bit error rate for compressed data. Always we try to minimise the bit error rate for accuracy For example Original data is 1 0 1 0 1 1 0 0 1 0 And after the transformation, the received data is 1 1 1 0 1 1 0 1 1 0 (Two errors are there, i.e, we are not receiving exact data only 80% accuracy is achieved because of two bit errors) 4. CONCLUSIONS This work is based on vector quantization , it is a new approach for privacy preserving data mining, upon applying this encoding procedure one cannot reveal the original data hence privacy is preserved. At the same time one can get the accurate clustering results. Finally we would like conclude that Efficiency depends on the code book generation.
  • 6. International Journal of Database Management Systems ( IJDMS ) Vol.5, No.2, April 2013 34 REFERENCES [1] D.Aruna Kumari , Dr.K.Rajasekhar rao, M.suman “ Privacy preserving distributed data mining using steganography “In Procc. Of CNSA-2010, Springer Libyary [2] T.Anuradha, suman M,Aruna Kumari D “Data obscuration in privacy preserving data mining in Procc International conference on web sciences ICWS 2009. [3] Agrawal, R. & Srikant, R. (2000). Privacy Preserving Data Mining. In Proc. of ACM SIGMOD Conference on Management of Data (SIGMOD’00), Dallas, TX. [4] Alexandre Evfimievski, Tyrone Grandison Privacy Preserving Data Mining. IBM Almaden Research Center 650 Harry Road, San Jose, California 95120, USA [5] Agarwal Charu C., Yu Philip S., Privacy Preserving Data Mining: Models and Algorithms, New York, Springer, 2008. [6] Oliveira S.R.M, Zaiane Osmar R., A Privacy-Preserving Clustering Approach Toward Secure and Effective Data Analysis for Business Collaboration, In Proceedings of the International Workshop on Privacy and Security Aspects of Data Mining in conjunction with ICDM 2004, Brighton, UK, November 2004. [7] Wang Qiang , Megalooikonomou, Vasileios, A dimensionality reduction technique for efficient time series similarity analysis, Inf. Syst. 33, 1 (Mar.2008), 115- 132. [8] UCI Repository of machine learning databases, University of California, Irvine.http://archive.ics.uci.edu/ml/ [9] Wikipedia. Data mining. http://en.wikipedia.org/wiki/Data_mining [10] Kurt Thearling, Information about data mining and analytic technologies http://www.thearling.com/ [11] Flavius L. Gorgônio and José Alfredo F. Costa “Privacy-Preserving Clustering on Distributed Databases: A Review and Some Contributions [12] D.Aruna Kumari, Dr.K.rajasekhar rao,M.Suman “Privacy preserving distributed data mining: a new approach for detecting network traffic using steganography” in international journal of systems and technology(IJST) june 2011. [13] Binit kumar Sinha “Privacy preserving clustering in data mining”. [14] C. W. Tsai, C. Y. Lee, M. C. Chiang, and C. S. Yang, A Fast VQ Codebook Generation Algorithm via Pattern Reduction, Pattern Recognition Letters, vol. 30, pp. 653{660, 2009} [15] K.Somasundaram, S.Vimala, “A Novel Codebook Initialization Technique for Generalized Lloyd Algorithm using Cluster Density”, International Journal on Computer Science and Engineering, Vol. 2, No. 5, pp. 1807-1809, 2010. [16] K.Somasundaram, S.Vimala, “Codebook Generation for Vector Quantization with Edge Features”, CiiT International Journal of Digital Image Processing, Vol. 2, No.7, pp. 194-198, 2010. [17] Vassilios S. Verykios, Elisa Bertino, Igor Nai Fovino State-of-the-art in Privacy Preserving Data Mining in SIGMOD Record, Vol. 33, No. 1, March 2004. [18] Maloji Suman,Habibulla Khan,M. Madhavi Latha,D. Aruna Kumari “Speech Enhancement and Recognition of Compressed Speech Signal in Noisy Reverberant Conditions “ Springer -Advances in Intelligent and Soft Computing (AISC) Volume 132, 2012, pp 379-386