SlideShare a Scribd company logo
@ IJTSRD | Available Online @ www.ijtsrd.com
ISSN No: 2456
International
Research
Outlier Detection
Neighbor
1
V. V. R. Manoj
A. Lakshmi
Dhanekula Institute of Engineering and Technology, Ganguru,
ABSTRACT
Data mining has become one of the most popular and
new technology that it has gained a lot of attention in
the recent times and with the increase in the
popularity and the usage there comes a lot of
issues/problems with the usage one of it Outlier
detection and maintaining the datasets without the
expected patterns. To identify the difference between
Outlier and normal behavior we use key assumption
techniques. We Provide the reverse nearest neighbor
technique. There is a connection between the hubs
and antihubs, outliers and the present unsupervised
detection methods. With the KNN method it will be
possible to identify and influence the outlier and
antihub methods on real life datasets and synthetic
datasets. So, From this we provide the insight of the
Reverse neighbor count on unsupervised outlier
detection.
Keywords: Reverse nearest neighbor; Outlier
detection
INTRODUCTION:
Outliers are huge values that differentiate from other
observations on data; they may showcase difference
in measurements and experimental errors. That is an
outlier is an observation which separates
overall pattern. These outliers can be divided into two
types. Those are univariate and multivariate.
Univariate outliers can be found in a single feature
space having a lot of values. Multivariate outliers can
be found in a multi-dimensional space. Identifying the
@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 2 | Issue – 3 | Mar-Apr 2018
ISSN No: 2456 - 6470 | www.ijtsrd.com | Volume
International Journal of Trend in Scientific
Research and Development (IJTSRD)
International Open Access Journal
Detection using Reverse Neares
Neighbor for Unsupervised Data
V. V. R. Manoj, V. Aditya Rama Narayana,
Lakshmi Prasanna, A. Bhargavi, Md. Aakhila Bhanu
1
Assistant Professor,
Dhanekula Institute of Engineering and Technology, Ganguru, Vijayawada, Andhra Pradesh
Data mining has become one of the most popular and
has gained a lot of attention in
the recent times and with the increase in the
popularity and the usage there comes a lot of
issues/problems with the usage one of it Outlier
detection and maintaining the datasets without the
the difference between
Outlier and normal behavior we use key assumption
techniques. We Provide the reverse nearest neighbor
technique. There is a connection between the hubs
and antihubs, outliers and the present unsupervised
KNN method it will be
possible to identify and influence the outlier and
antihub methods on real life datasets and synthetic
datasets. So, From this we provide the insight of the
Reverse neighbor count on unsupervised outlier
nearest neighbor; Outlier
Outliers are huge values that differentiate from other
observations on data; they may showcase difference
in measurements and experimental errors. That is an
separates from an
outliers can be divided into two
types. Those are univariate and multivariate.
Univariate outliers can be found in a single feature
space having a lot of values. Multivariate outliers can
e. Identifying the
multi-dimensional distributions can be very difficult
for the human brain, that is why we need to train a
model to do it for us.
With the decrease in the rate of events against the
parameters that are present, the expected background
data is very much less compared to the prediction
with the higgs theorem.
Detecting outliers can be divided into three different
and effective ways. Those ways are supervised, semi
supervised, and unsupervised; the outliers are divided
into those categories depending on the labels for
outliers. From the above given categories,
unsupervised methods are the ones that are mostly
used as the other categories require accurate and
representative labels that are expensive to obtain.
Unsupervised methods also include distance
methods that depend on a measure of distance or
Apr 2018 Page: 1511
6470 | www.ijtsrd.com | Volume - 2 | Issue – 3
Scientific
(IJTSRD)
International Open Access Journal
Andhra Pradesh, India
dimensional distributions can be very difficult
for the human brain, that is why we need to train a
With the decrease in the rate of events against the
parameters that are present, the expected background
data is very much less compared to the prediction
Detecting outliers can be divided into three different
and effective ways. Those ways are supervised, semi-
supervised, and unsupervised; the outliers are divided
into those categories depending on the labels for
outliers. From the above given categories,
nsupervised methods are the ones that are mostly
used as the other categories require accurate and
representative labels that are expensive to obtain.
Unsupervised methods also include distance-based
methods that depend on a measure of distance or
International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470
@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 2 | Issue – 3 | Mar-Apr 2018 Page: 1512
similarity to detect outliers. It is known that with the
dimensionalities curse, distance becomes
meaningless, that means pair wise distances become
impossible to see as dimensionality increases.
Distance on unsupervised outlier detection becomes
good when linked to the high dimensionalities.
Outlier Detection from Antihubs:
Antihubs search and find the nearest neighbor and
outlier can be detected using distance methods and for
example KNN identifies the last nearest neighbor.
System Architecture:
System Architecture is a conceptual model that
defines the structure, behavior, and more views of a
system. An architecture description is a description
that represents the system and is organized in a way
such that it helps to evaluate the process that is
present in the system and helps to understand
reasoning about the structures and behavior of system.
System Architecture describes that from the data
dump we select some specific data that we require
which is converted into the target data on which
preprocessing is applied. Preprocessing is converting
the raw data collected into some understandable
format. As the data maybe incomplete or insufficient
applying preprocessing technique may help to resolve
the issue. After the preprocessed data some
transformation is done, and the transformed data is
then mined. Now applying the data mining technique
to the transformed data, we get the patterns from the
data acquired. Evaluating the acquired patterns, we
can get the actual and factual data that we require.
CONCLUSION:
With this we like to say that we can calculate the ratio
for the change in the normal data to the preprocessed
data in a large dimensional dataset. When a data is
preprocessed data is formatted it changes from the
dump data to make it understandable. With the help of
the above methods i.e., distance methods, KNN, and
other different methods.
From the above figure we can clearly see that the
original data lost the data but with the KNN approach
we can see the all the formatted data that can be
understandable to everyone and can be read.
REFERENCES:
1. M. Newman and Y. Rinott, “Nearest neighbors
and Voronoivolumes in high-dimensional point
processes with various distancefunctions,” Adv.
Appl. Probab., vol. 17, no. 4, pp. 794–809,1985.
2. M. Breunig, H.-P. Kriegel, R. Ng, and J. Sander,
“LOF: Identifyingdensity-based local outliers,” in
Proc. ACM Int. Conf. Manage.Data, 2000, pp. 93–
104.
3. E. Achtert, S. Goldhofer, H.-P. Kriegel, E.
Schubert, and A. Zimek,“Evaluation of
clusterings—metrics and visual support,” in
Proc.28th Int. Conf. Data Eng., 2012, pp. 1285–
1288.
4. E. M€uller, M. Schiffer, and T. Seidl, “Statistical
selection of relevantsubspace projections for
outlier ranking,” in Proc. 27th IEEEInt. Conf.
Data Eng., 2011, pp. 434–445.
5. J. M. Geusebroek, G. J. Burghouts, and A. W. M.
Smeulders, “TheAmsterdam library of object
International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470
@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 2 | Issue – 3 | Mar-Apr 2018 Page: 1513
images,” Int. J. Comput. Vis., vol. 61,no. 1, pp.
103–112, 2005.
6. E. M. Knorr, R. T. Ng, and V. Tucakov,
“Distance-based outliers:Algorithms and
applications,” VLDB J., vol. 8, nos. 3–4, pp. 237–
253, 2000.
7. K. S. Beyer, J. Goldstein, R. Ramakrishnan, and
U. Shaft, “When is“nearest neighbor”
meaningful?” in Proc. 7th Int. Conf. Database
Theory, 1999, pp. 217–235.
8. C. Aggarwal, A. Hinneburg, and D. A. Keim, “On
the surprisingbehavior of distance metrics in high
dimensional spaces,” inProc. 8th Int. Conf.
Database Theory, 2001, pp. 420–434.
9. D. Franc¸ois, V. Wertz, and M. Verleysen, “The
concentration offractional distances,” IEEE Trans.
Knowl. Data. Eng., vol. 19, no. 7,pp. 873–886,
Jul. 2007.
10. C. Aggarwal and P. S. Yu, “Outlier detection for
high dimensionaldata,” in Proc. 27th ACM
SIGMOD Int. Conf. Manage. Data,2001, pp. 37–
46.
11. Zimek, E. Schubert, and H.-P. Kriegel, “A survey
on unsupervisedoutlier detection in high-
dimensional numerical data,” Statist. Anal. Data
Mining, vol. 5, no. 5, pp. 363–387, 2012.
12. T. H. Cormen, C. E. Leiserson, R. L. Rivest, and
C. Stein, Introductionto Algorithms, 3rd ed.
Cambridge, MA, USA: MIT Press, 2009.
13. N. Toma_sev and D. Mladeni_c, “Nearest
neighbor voting in highdimensional data:
Learning from past occurrences,” Comput.
Sci.Inform. Syst., vol. 9, no. 2, pp. 691–712, 2012.
14. N. Toma_sev, M. Radovanovi_c, D. Mladeni_c,
and M. Ivanovi_c,“The role of hubness in
clustering high-dimensional data,” IEEETrans.
Knowl. Data Eng., vol. 26, no. 3, pp. 739–751,
Mar. 2014.
15. M. E. Houle, H.-P. Kriegel, P. Kr€oger, E.
Schubert, and A. Zimek,“Can shared-neighbor
distances defeat the curse of dimensionality?”in
Proc 22nd Int. Conf. Sci. Statist. Database
Manage., 2010, pp. 482–500.
16. Singh, H. Ferhatosmano_glu, and A.
¸SamanTosun, “Highdimensional reverse nearest
neighbor queries,” in Proc 12th ACM Conf.
Inform. Knowl. Manage., 2003, pp. 91–98.

More Related Content

PDF
Outlier Detection Using Unsupervised Learning on High Dimensional Data
PDF
Data Analytics on Solar Energy Using Hadoop
PDF
Front End Data Cleaning And Transformation In Standard Printed Form Using Neu...
PDF
Anomaly detection via eliminating data redundancy and rectifying data error i...
DOC
DATA MINING.doc
PDF
On Using Network Science in Mining Developers Collaboration in Software Engin...
PDF
Improving IF Algorithm for Data Aggregation Techniques in Wireless Sensor Net...
PPTX
What is Deep Learning and how it helps to Healthcare Sector?
Outlier Detection Using Unsupervised Learning on High Dimensional Data
Data Analytics on Solar Energy Using Hadoop
Front End Data Cleaning And Transformation In Standard Printed Form Using Neu...
Anomaly detection via eliminating data redundancy and rectifying data error i...
DATA MINING.doc
On Using Network Science in Mining Developers Collaboration in Software Engin...
Improving IF Algorithm for Data Aggregation Techniques in Wireless Sensor Net...
What is Deep Learning and how it helps to Healthcare Sector?

What's hot (17)

PDF
Brief bibliography of interestingness measure, bayesian belief network and ca...
PDF
Drug Discovery and Development Using AI
PDF
AN APPROACH FOR IRIS PLANT CLASSIFICATION USING NEURAL NETWORK
PDF
Anomaly Detection using multidimensional reduction Principal Component Analysis
PDF
Slima abstract XAI Deep learning for health using fuzzy logic
PDF
Hy3414631468
PDF
System for Fingerprint Image Analysis
PDF
A genetic algorithm approach for predicting ribonucleic acid sequencing data ...
PDF
An approach for ids by combining svm and ant colony algorithm
PDF
An approach for ids by combining svm and ant colony algorithm
PDF
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
DOCX
Novel Methodology of Data Management in Ad Hoc Network Formulated using Nanos...
PDF
A new clutering approach for anomaly intrusion detection
PDF
Mapping of genes using cloud technologies
PDF
Neural networks, naïve bayes and decision tree machine learning
Brief bibliography of interestingness measure, bayesian belief network and ca...
Drug Discovery and Development Using AI
AN APPROACH FOR IRIS PLANT CLASSIFICATION USING NEURAL NETWORK
Anomaly Detection using multidimensional reduction Principal Component Analysis
Slima abstract XAI Deep learning for health using fuzzy logic
Hy3414631468
System for Fingerprint Image Analysis
A genetic algorithm approach for predicting ribonucleic acid sequencing data ...
An approach for ids by combining svm and ant colony algorithm
An approach for ids by combining svm and ant colony algorithm
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
Novel Methodology of Data Management in Ad Hoc Network Formulated using Nanos...
A new clutering approach for anomaly intrusion detection
Mapping of genes using cloud technologies
Neural networks, naïve bayes and decision tree machine learning
Ad

Similar to Outlier Detection using Reverse Neares Neighbor for Unsupervised Data (20)

PDF
Unsupervised Distance Based Detection of Outliers by using Anti-hubs
PDF
An Efficient Approach for Outlier Detection in Wireless Sensor Network
PPTX
Outlier Detection in Data Mining An Essential Component of Semiconductor Manu...
PDF
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
PDF
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
PDF
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
PDF
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
PDF
IDENTIFICATION OF OUTLIERS IN OXAZOLINES AND OXAZOLES HIGH DIMENSION MOLECULA...
PDF
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
PDF
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
PDF
Data repository for sensor network a data mining approach
PDF
Detection of Outliers in Large Dataset using Distributed Approach
PDF
G44093135
PDF
A systematic review of non-intrusive human activity recognition in smart home...
PDF
IRJET- Credit Card Fraud Detection using Isolation Forest
PDF
International Journal of Engineering Inventions (IJEI),
PDF
Evaluation of Logistic Regression and Neural Network Model With Sensitivity A...
DOCX
A Survey on Cluster Based Outlier Detection Techniques in Data Stream
PPTX
Charleston Conference 2016
PDF
P017129296
Unsupervised Distance Based Detection of Outliers by using Anti-hubs
An Efficient Approach for Outlier Detection in Wireless Sensor Network
Outlier Detection in Data Mining An Essential Component of Semiconductor Manu...
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
IDENTIFICATION OF OUTLIERS IN OXAZOLINES AND OXAZOLES HIGH DIMENSION MOLECULA...
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
Data repository for sensor network a data mining approach
Detection of Outliers in Large Dataset using Distributed Approach
G44093135
A systematic review of non-intrusive human activity recognition in smart home...
IRJET- Credit Card Fraud Detection using Isolation Forest
International Journal of Engineering Inventions (IJEI),
Evaluation of Logistic Regression and Neural Network Model With Sensitivity A...
A Survey on Cluster Based Outlier Detection Techniques in Data Stream
Charleston Conference 2016
P017129296
Ad

More from ijtsrd (20)

PDF
A Study of School Dropout in Rural Districts of Darjeeling and Its Causes
PDF
Pre extension Demonstration and Evaluation of Soybean Technologies in Fedis D...
PDF
Pre extension Demonstration and Evaluation of Potato Technologies in Selected...
PDF
Pre extension Demonstration and Evaluation of Animal Drawn Potato Digger in S...
PDF
Pre extension Demonstration and Evaluation of Drought Tolerant and Early Matu...
PDF
Pre extension Demonstration and Evaluation of Double Cropping Practice Legume...
PDF
Pre extension Demonstration and Evaluation of Common Bean Technology in Low L...
PDF
Enhancing Image Quality in Compression and Fading Channels A Wavelet Based Ap...
PDF
Manpower Training and Employee Performance in Mellienium Ltdawka, Anambra State
PDF
A Statistical Analysis on the Growth Rate of Selected Sectors of Nigerian Eco...
PDF
Automatic Accident Detection and Emergency Alert System using IoT
PDF
Corporate Social Responsibility Dimensions and Corporate Image of Selected Up...
PDF
The Role of Media in Tribal Health and Educational Progress of Odisha
PDF
Advancements and Future Trends in Advanced Quantum Algorithms A Prompt Scienc...
PDF
A Study on Seismic Analysis of High Rise Building with Mass Irregularities, T...
PDF
Descriptive Study to Assess the Knowledge of B.Sc. Interns Regarding Biomedic...
PDF
Performance of Grid Connected Solar PV Power Plant at Clear Sky Day
PDF
Vitiligo Treated Homoeopathically A Case Report
PDF
Vitiligo Treated Homoeopathically A Case Report
PDF
Uterine Fibroids Homoeopathic Perspectives
A Study of School Dropout in Rural Districts of Darjeeling and Its Causes
Pre extension Demonstration and Evaluation of Soybean Technologies in Fedis D...
Pre extension Demonstration and Evaluation of Potato Technologies in Selected...
Pre extension Demonstration and Evaluation of Animal Drawn Potato Digger in S...
Pre extension Demonstration and Evaluation of Drought Tolerant and Early Matu...
Pre extension Demonstration and Evaluation of Double Cropping Practice Legume...
Pre extension Demonstration and Evaluation of Common Bean Technology in Low L...
Enhancing Image Quality in Compression and Fading Channels A Wavelet Based Ap...
Manpower Training and Employee Performance in Mellienium Ltdawka, Anambra State
A Statistical Analysis on the Growth Rate of Selected Sectors of Nigerian Eco...
Automatic Accident Detection and Emergency Alert System using IoT
Corporate Social Responsibility Dimensions and Corporate Image of Selected Up...
The Role of Media in Tribal Health and Educational Progress of Odisha
Advancements and Future Trends in Advanced Quantum Algorithms A Prompt Scienc...
A Study on Seismic Analysis of High Rise Building with Mass Irregularities, T...
Descriptive Study to Assess the Knowledge of B.Sc. Interns Regarding Biomedic...
Performance of Grid Connected Solar PV Power Plant at Clear Sky Day
Vitiligo Treated Homoeopathically A Case Report
Vitiligo Treated Homoeopathically A Case Report
Uterine Fibroids Homoeopathic Perspectives

Recently uploaded (20)

PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
PDF
Updated Idioms and Phrasal Verbs in English subject
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
Orientation - ARALprogram of Deped to the Parents.pptx
PDF
Trump Administration's workforce development strategy
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Cell Structure & Organelles in detailed.
PDF
LDMMIA Reiki Yoga Finals Review Spring Summer
PPTX
UNIT III MENTAL HEALTH NURSING ASSESSMENT
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PDF
Complications of Minimal Access Surgery at WLH
PDF
Yogi Goddess Pres Conference Studio Updates
Chinmaya Tiranga quiz Grand Finale.pdf
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
Updated Idioms and Phrasal Verbs in English subject
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
STATICS OF THE RIGID BODIES Hibbelers.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Microbial disease of the cardiovascular and lymphatic systems
Orientation - ARALprogram of Deped to the Parents.pptx
Trump Administration's workforce development strategy
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Cell Structure & Organelles in detailed.
LDMMIA Reiki Yoga Finals Review Spring Summer
UNIT III MENTAL HEALTH NURSING ASSESSMENT
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Practical Manual AGRO-233 Principles and Practices of Natural Farming
Complications of Minimal Access Surgery at WLH
Yogi Goddess Pres Conference Studio Updates

Outlier Detection using Reverse Neares Neighbor for Unsupervised Data

  • 1. @ IJTSRD | Available Online @ www.ijtsrd.com ISSN No: 2456 International Research Outlier Detection Neighbor 1 V. V. R. Manoj A. Lakshmi Dhanekula Institute of Engineering and Technology, Ganguru, ABSTRACT Data mining has become one of the most popular and new technology that it has gained a lot of attention in the recent times and with the increase in the popularity and the usage there comes a lot of issues/problems with the usage one of it Outlier detection and maintaining the datasets without the expected patterns. To identify the difference between Outlier and normal behavior we use key assumption techniques. We Provide the reverse nearest neighbor technique. There is a connection between the hubs and antihubs, outliers and the present unsupervised detection methods. With the KNN method it will be possible to identify and influence the outlier and antihub methods on real life datasets and synthetic datasets. So, From this we provide the insight of the Reverse neighbor count on unsupervised outlier detection. Keywords: Reverse nearest neighbor; Outlier detection INTRODUCTION: Outliers are huge values that differentiate from other observations on data; they may showcase difference in measurements and experimental errors. That is an outlier is an observation which separates overall pattern. These outliers can be divided into two types. Those are univariate and multivariate. Univariate outliers can be found in a single feature space having a lot of values. Multivariate outliers can be found in a multi-dimensional space. Identifying the @ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 2 | Issue – 3 | Mar-Apr 2018 ISSN No: 2456 - 6470 | www.ijtsrd.com | Volume International Journal of Trend in Scientific Research and Development (IJTSRD) International Open Access Journal Detection using Reverse Neares Neighbor for Unsupervised Data V. V. R. Manoj, V. Aditya Rama Narayana, Lakshmi Prasanna, A. Bhargavi, Md. Aakhila Bhanu 1 Assistant Professor, Dhanekula Institute of Engineering and Technology, Ganguru, Vijayawada, Andhra Pradesh Data mining has become one of the most popular and has gained a lot of attention in the recent times and with the increase in the popularity and the usage there comes a lot of issues/problems with the usage one of it Outlier detection and maintaining the datasets without the the difference between Outlier and normal behavior we use key assumption techniques. We Provide the reverse nearest neighbor technique. There is a connection between the hubs and antihubs, outliers and the present unsupervised KNN method it will be possible to identify and influence the outlier and antihub methods on real life datasets and synthetic datasets. So, From this we provide the insight of the Reverse neighbor count on unsupervised outlier nearest neighbor; Outlier Outliers are huge values that differentiate from other observations on data; they may showcase difference in measurements and experimental errors. That is an separates from an outliers can be divided into two types. Those are univariate and multivariate. Univariate outliers can be found in a single feature space having a lot of values. Multivariate outliers can e. Identifying the multi-dimensional distributions can be very difficult for the human brain, that is why we need to train a model to do it for us. With the decrease in the rate of events against the parameters that are present, the expected background data is very much less compared to the prediction with the higgs theorem. Detecting outliers can be divided into three different and effective ways. Those ways are supervised, semi supervised, and unsupervised; the outliers are divided into those categories depending on the labels for outliers. From the above given categories, unsupervised methods are the ones that are mostly used as the other categories require accurate and representative labels that are expensive to obtain. Unsupervised methods also include distance methods that depend on a measure of distance or Apr 2018 Page: 1511 6470 | www.ijtsrd.com | Volume - 2 | Issue – 3 Scientific (IJTSRD) International Open Access Journal Andhra Pradesh, India dimensional distributions can be very difficult for the human brain, that is why we need to train a With the decrease in the rate of events against the parameters that are present, the expected background data is very much less compared to the prediction Detecting outliers can be divided into three different and effective ways. Those ways are supervised, semi- supervised, and unsupervised; the outliers are divided into those categories depending on the labels for outliers. From the above given categories, nsupervised methods are the ones that are mostly used as the other categories require accurate and representative labels that are expensive to obtain. Unsupervised methods also include distance-based methods that depend on a measure of distance or
  • 2. International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470 @ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 2 | Issue – 3 | Mar-Apr 2018 Page: 1512 similarity to detect outliers. It is known that with the dimensionalities curse, distance becomes meaningless, that means pair wise distances become impossible to see as dimensionality increases. Distance on unsupervised outlier detection becomes good when linked to the high dimensionalities. Outlier Detection from Antihubs: Antihubs search and find the nearest neighbor and outlier can be detected using distance methods and for example KNN identifies the last nearest neighbor. System Architecture: System Architecture is a conceptual model that defines the structure, behavior, and more views of a system. An architecture description is a description that represents the system and is organized in a way such that it helps to evaluate the process that is present in the system and helps to understand reasoning about the structures and behavior of system. System Architecture describes that from the data dump we select some specific data that we require which is converted into the target data on which preprocessing is applied. Preprocessing is converting the raw data collected into some understandable format. As the data maybe incomplete or insufficient applying preprocessing technique may help to resolve the issue. After the preprocessed data some transformation is done, and the transformed data is then mined. Now applying the data mining technique to the transformed data, we get the patterns from the data acquired. Evaluating the acquired patterns, we can get the actual and factual data that we require. CONCLUSION: With this we like to say that we can calculate the ratio for the change in the normal data to the preprocessed data in a large dimensional dataset. When a data is preprocessed data is formatted it changes from the dump data to make it understandable. With the help of the above methods i.e., distance methods, KNN, and other different methods. From the above figure we can clearly see that the original data lost the data but with the KNN approach we can see the all the formatted data that can be understandable to everyone and can be read. REFERENCES: 1. M. Newman and Y. Rinott, “Nearest neighbors and Voronoivolumes in high-dimensional point processes with various distancefunctions,” Adv. Appl. Probab., vol. 17, no. 4, pp. 794–809,1985. 2. M. Breunig, H.-P. Kriegel, R. Ng, and J. Sander, “LOF: Identifyingdensity-based local outliers,” in Proc. ACM Int. Conf. Manage.Data, 2000, pp. 93– 104. 3. E. Achtert, S. Goldhofer, H.-P. Kriegel, E. Schubert, and A. Zimek,“Evaluation of clusterings—metrics and visual support,” in Proc.28th Int. Conf. Data Eng., 2012, pp. 1285– 1288. 4. E. M€uller, M. Schiffer, and T. Seidl, “Statistical selection of relevantsubspace projections for outlier ranking,” in Proc. 27th IEEEInt. Conf. Data Eng., 2011, pp. 434–445. 5. J. M. Geusebroek, G. J. Burghouts, and A. W. M. Smeulders, “TheAmsterdam library of object
  • 3. International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470 @ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 2 | Issue – 3 | Mar-Apr 2018 Page: 1513 images,” Int. J. Comput. Vis., vol. 61,no. 1, pp. 103–112, 2005. 6. E. M. Knorr, R. T. Ng, and V. Tucakov, “Distance-based outliers:Algorithms and applications,” VLDB J., vol. 8, nos. 3–4, pp. 237– 253, 2000. 7. K. S. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, “When is“nearest neighbor” meaningful?” in Proc. 7th Int. Conf. Database Theory, 1999, pp. 217–235. 8. C. Aggarwal, A. Hinneburg, and D. A. Keim, “On the surprisingbehavior of distance metrics in high dimensional spaces,” inProc. 8th Int. Conf. Database Theory, 2001, pp. 420–434. 9. D. Franc¸ois, V. Wertz, and M. Verleysen, “The concentration offractional distances,” IEEE Trans. Knowl. Data. Eng., vol. 19, no. 7,pp. 873–886, Jul. 2007. 10. C. Aggarwal and P. S. Yu, “Outlier detection for high dimensionaldata,” in Proc. 27th ACM SIGMOD Int. Conf. Manage. Data,2001, pp. 37– 46. 11. Zimek, E. Schubert, and H.-P. Kriegel, “A survey on unsupervisedoutlier detection in high- dimensional numerical data,” Statist. Anal. Data Mining, vol. 5, no. 5, pp. 363–387, 2012. 12. T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introductionto Algorithms, 3rd ed. Cambridge, MA, USA: MIT Press, 2009. 13. N. Toma_sev and D. Mladeni_c, “Nearest neighbor voting in highdimensional data: Learning from past occurrences,” Comput. Sci.Inform. Syst., vol. 9, no. 2, pp. 691–712, 2012. 14. N. Toma_sev, M. Radovanovi_c, D. Mladeni_c, and M. Ivanovi_c,“The role of hubness in clustering high-dimensional data,” IEEETrans. Knowl. Data Eng., vol. 26, no. 3, pp. 739–751, Mar. 2014. 15. M. E. Houle, H.-P. Kriegel, P. Kr€oger, E. Schubert, and A. Zimek,“Can shared-neighbor distances defeat the curse of dimensionality?”in Proc 22nd Int. Conf. Sci. Statist. Database Manage., 2010, pp. 482–500. 16. Singh, H. Ferhatosmano_glu, and A. ¸SamanTosun, “Highdimensional reverse nearest neighbor queries,” in Proc 12th ACM Conf. Inform. Knowl. Manage., 2003, pp. 91–98.