SlideShare a Scribd company logo
© 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 963
Classifying Twitter Data in Multiple Classes Based On Sentiment Class
Labels
Richa Jain1, Namrata Sharma2
1M.Tech Scholar, Department of CSE, Sushila Devi Bansal College of Engineering, Indore (M.P.), India
2Assistant Professor, Department of CSE, Sushila Devi Bansal College of Engineering, Indore (M.P.), India
---------------------------------------------------------------------------***---------------------------------------------------------------------------
Abstract–The applications of sentiment analysis of text data is
increases day by day. Now in these days it is used for
conducting product reviews, as helping function of stock
market data analysis, product reviews and feedback analysis
of e-commerce. Therefore need of accurate classification of
text data in their sentiment classes become more crucial. In
this presented work the data mining techniques are employed
for evaluating text data in multiple sub-classes of positive and
negative data. In this context a new model is proposed for
investigation and design. This model usages twitter dataset as
experimental dataset. The initial data is preprocessed first for
eliminating the stop words and special characters from text
data. In next process the negation filter is applied to group the
data into positive and negative classes. In this context the
negation operator in given text sentences are considered. This
grouped data is used with the NLP parser for finding the POS
(part of speech) information from text data. This information
is works here as the text features and can be used classifying
the entire data samples. Based on the NLP attributes the data
is transformed into unstructured to structured data source.
After conversion of data SVM classifier is used for performing
training and testing of trained data model. Based on different
experiments the performance of implemented technique is
measured and found efficient and accurate for micro-blog
data analysis in terms of multiple classes.
Keywords: NLP, Sentiment analysis, text classification,
SVM classification, performance evaluation.
1.INTRODUCTION
The scope of text data mining is increases day by day
because the text data is generated in a huge amount and the
analysis and pattern extraction from such kind of huge and
unstructured data format is a complicated task. Therefore
the different data mining techniques and algorithms are
applied on text data for recovering the valuable patterns. The
text mining approaches are little bit different from the
traditional data mining techniques because of the nature and
format of data. Therefore we need a technique that process
the initial information and convert it in such format by which
the data is become acceptable for the learning and
classification algorithms. In this presented work the text data
mining is the key area of interest. Therefore the different
approaches and techniques of text mining are evaluated. In
addition of that the sentiment based classification for text
data is proposed for design and development.
The sentiment analysis of text data is performed in order to
find the author’s emotions in the given text. Thus this
technique is suitable for evaluation of user’s product
reviews, social text analysis, terror or unsolicited message
analysis and spam filtering in micro-blogs. The most of such
techniques basically classify the data in two major classes
namely positive and negative. There are very few techniques
available that works on sub-classes classification of text data.
Additionally the available multiclass text classification
techniques are not much accurate. Therefore the proposed
technique is wishes to enhance the performance of
sentiment based multi-class classification technique in terms
of accuracy. Using the classification techniques first the text
data is learned by some classification or data mining
algorithm and then classify when the similar text patterns
are appeared to classify.
2.PROPOSED WORK
This chapter provides the understanding about the proposed
methodology for improved sentiment based text
classification. Therefore the proposed methodology and the
proposed algorithm are discussed in this chapter.
A. System Overview
The data mining and their techniques are employed on
different formats of data for process the data and obtain the
required information from raw data. In this context different
algorithms and processes are involved for refining the
information, evaluation of information and group then
similar kinds of patterns. For performing these tasks the data
mining techniques support the supervised and unsupervised
techniques of learning. The supervised learning techniques
are efficient and accurate as compared to the unsupervised
learning techniques. The supervised learning techniques
basically learn on the predefined patterns where the output
is known to correct and adjust the errors during the learning
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 08 | Aug 2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 964
and in the techniques of unsupervised learning the
algorithms directly employed on data to cluster the
information into groups. In this work the text data is tried to
classify using the supervised learning technique in addition
of that the sentiment based analysis is also performed on
data for finding the authors emotions in the given text
blocks.
The proposed work is aimed to find an effective combination
of techniques that can improve the performance of multi-
class classification of text data according to their hidden
sentiment patterns. In this context a new model is proposed
for design and implementation. That text classification model
implements the SVM (support vector machine) as the
classification algorithm. Additionally the NLP (natural
language processing) parser which is used to find the part of
speech tagging from the text sentences are used for feature
extraction. In addition of that the negation based grouping of
data is also performed in initial phases for improving the
classifiers performance. This section the basic overview of
the proposed text classification technique is provided and in
next section the detailed modeling of the proposed system is
described.
B. Methodology
The proposed methodology of the proposed system is
explained using figure 2.1. This diagram contains the
different intermediate processes that help to process the
information and extract the desired information from raw
data.
Input dataset: the supervised learning techniques required
the training samples by which algorithm becomes trained.
Therefore some initial input samples are required for
initiating the learning process. In this presented work the
sentiment or emotional class labels are required to be
predicting therefore the twitter micro blog data is provided
as input to the system. The twitter dataset contains the
user’s information and the twitted data by the users.
Data preprocessing: after input of the initial training
samples the data preprocessing is performed. The
preprocessing of data is essential step of data mining
technique. Using this filtering of information is performed by
which unwanted data is removed and the target information
is recovered from the input data. In addition of that the aim
of data is to manipulate is such a way by which the target
algorithm can learn with the processed data. Therefore
transformation of data, mapping of data, and other
techniques are applied. In this presented work the text data
is accepted as input therefore two basic operations are
involved. First is used for removing the stop words and
second is used for removing the special characters from data.
In order to achieve the required function the find and replace
function is implemented with the list of stop words and
special characters. And using the find and replace function
the target data is removed from the input dataset.
Figure 2.1 Proposed system architecture
Negation filter: after preprocessing of data the data is
cleaned and now can be used for classification task. In this
context a negation filter is implemented to group entire text
data into two groups i.e. negative and positive data groups.
The filtering of data is performed in such manner by which
the sentences which consumes the negation such no, not are
targeted to cluster the data. After this filter the data is
grouped in two classes namely the positive data and negative
data.
Positive data: the set of positive data contains those
sentences that are not utilizing negations.
Negative data: the text data with the negation is clustered in
this group. Both the data samples are now used with the next
process for finding the part of speech information.
NLP tagging: the input data is basically the unstructured
source of information and the length and other properties of
each data instances are different from each other. Therefore
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 08 | Aug 2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 965
to make the data suitable the text feature in terms of POS
(part of speech) is used for classifiers training and testing.
The Stanford NLP (natural language processing) parser is
employed in this phase of data processing. The NLP parser
first maps all the data into the part of speech information.
And the implemented function counts the POS information.
Using the counted information from the input data is
transformed in a two dimension vector. The example of
parsed data is given using table 2.1.
Noun Pronoun Verb Adverb ….
1 2 1 0
Table 2.1 example of POS tagging
SVM training: the SVM (support vector machine) is a
supervised learning algorithm. This algorithm is basically
used for classifying two classes. This classifier basically
works on distance based technique, in this context the entire
data is first scattered into a N dimensional space where the
dimensions of space are equivalent to the number of
attributes in the given dataset. Additionally the dataset
instances are treated as a single point in space. In order to
classify the data instances the hyper plan is plotted in
optimal distance by which the data become separable. In this
presented work the primarily positive and negative data
groups are classified in again sub classes therefore the SMO
(Sequential minimal optimization) technique is utilized for
classifying the sub-classes of data.
Class label prediction: after training of the SVM classifier
the test data samples are produced to the classifier. The test
set is prepared on the basis of input training sample and
30% randomly selected data instances are selected for
preparing the test dataset.
C. Proposed Algorithm
This section summarizes the processes involved in the
methodology section in form of algorithm steps. Table 2.2
contains the processes involved in the proposed
methodology.
Input: training dataset D
Output : class labels of text data C
Processes:
1. ( )
2. ( )
a. ( )
b. ( )
3.
4. [ ] ( )
5. ([ ])
6. ( )
7. ( )
8. Return C
Table 2.2 proposed algorithm
3.RESULT ANALYSIS
This chapter provides the results analysis in the form of
measured performance parameters. The different computed
parameters and relevant outcomes are reported in this
chapter.
A. Accuracy
Accuracy is the measurement of the data mining system
performance in terms of how accurately a data mining
algorithm performs the classification. That is computed by
the ratio of total correctly classified data and total samples to
be classified. The following formula can be used for
computing classifier’s accuracy.
Figure 3.1 Accuracy in percentage
78
80
82
84
86
88
90
92
94
50 100 200 300 500 800 1000
Accuracy%
Dataset instances
Accuracy
Accuracy
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 08 | Aug 2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 966
Dataset Size Accuracy
50 86.4
100 84.1
200 85.9
300 89.7
500 88.3
800 90.6
1000 92.5
Table 3.1 accuracy in percentage
Accuracy of the proposed sentiment based text classification
technique is evaluated and reported using table 3.1 and
figure 3.1. In this diagram the X axis contains the size of
dataset in terms of number of instances additionally the Y
axis contains the corresponding accuracy of the proposed
approach in terms of percentage. According to the obtained
performance of classification system the proposed technique
improves their accuracy as the amount of data instances for
training and testing is increases. Therefore the proposed
system is acceptable for real world application usages.
B. Error Rate
Error rate is also an essential parameter of classifier
performance using this parameter it is measured how much
amount a classifier misclassify the data. That is the ratio of
data misclassified and total samples provided for
classification to algorithm. The error rate of a classifier is
evaluated using the following formula:
Or
Figure 3.2 error rate percentage
Dataset Size Error rate
50 13.6
100 16
200 14.1
300 10.3
500 11.7
800 9.4
1000 7.5
Table 3.2 error rate percentage
The computed error rate of the proposed text classification
data model is demonstrated using table 3.2 and figure 3.2. In
this diagram the X axis shows the amount of data supplied
for classification in terms of number of instances of data
objects and the Y axis of the diagram shows the error rate
percentage. According to the obtained performance the
proposed technique minimizes the amount of error
percentage as the amount of data size is increases. Therefore
the proposed approach is acceptable for real world
application usages.
C. Memory Usages
Memory usages of a process are also known as the space
complexity of algorithm. A data mining algorithm for
classification how much amount of main memory consume is
termed here as the memory usages of the process. The java
based process how much memory usages can be estimate
using the following formula:
0
5
10
15
20
50 100 200 300 500 800 1000
ErrorRate%
Dataset instances
Error rate
Error rate
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 08 | Aug 2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 967
Figure 3.3 memory consumption in KB
The memory consumption of the proposed classification
technique of text is demonstrated using figure 3.3 and table
3.3. According to the given graph the X axis contains the
amount of data set instances to be classified and the Y axis
represents the computed memory usages of the given
process. The memory consumption of the target algorithm is
computed here in terms of KB (kilobytes). According to the
obtained results the memory consumption of the system is
increases as the amount of data instances for classification is
increases. But the increment in memory is not much higher
therefore it is acceptable in cost of their classification
accuracy. Therefore the memory consumption is directly
depends upon the size of data to be classify.
Dataset Size Memory in KB
50 28474
100 29747
200 30084
300 31837
500 32994
800 33913
1000 34927
Table 3.3 memory consumption in KB
D. Time Consumption
Time consumption is the measurement of time required to
classify the input data. In this given system the time
consumption is computed using the following formula:
Dataset Size Time in MS
50 36
100 43
200 57
300 74
500 95
800 120
1000 146
Table 3.4 time consumption in MS
Figure 3.4 time consumption in MS
The time consumption of the proposed classification
technique is demonstrated using figure 3.4 and table 3.4. The
time is measured here in terms of milliseconds (MS). The
diagram shows in X axis the amount of dataset size input to
classification and the Y axis shows the time consumed for
performing the classification. According to the given graph
the time consumption of the system is increases as the
amount of data size is increases. Additionally that is also
observed the amount of time is depends on the amount of
data supplied for classification.
0
5000
10000
15000
20000
25000
30000
35000
40000
50 100 200 300 500 800 1000
MemoryinKB
Dataset Instances
Memory in KB
Memory in KB
0
50
100
150
200
50 100 200 300 500 800 1000
TimeinMS
Dataset instances
Time in MS
Time in MS
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 08 | Aug 2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 968
4 .CONCLUSION
This chapter provides the conclusion of the proposed
sentiment analysis based text classification. The
experimentations and observations are used for providing
the conclusion of the proposed work in addition of that the
future extension of the proposed work is also discussed in
this chapter.
A. Conclusion
The trend of text classification approaches are increases day
by day. As the amount of text data requirements is increases
we need some efficient and accurate data mining techniques
that help for data analysis. In addition of that the
applications of text classification techniques are also
increases in different purpose sometimes it is used for
making groups of text data and sometimes it is required to
find the emotions from data. The classification of text data is
complicated as compared to structured data source because
the data in text format is not regular in size and amount.
Therefore to find suitable attributes and features from data
additional efforts are required to implement.
The proposed work is tried to design and develop an
enhanced method by which the text is classified in sentiment
based classes. Therefore first the data is preprocessed and
unwanted data is removed from the initial data. In next the
data is filtered on the basis of negation operators available in
text sentences. After applying the negation filter the data is
clustered into positive and negative data groups. In further
the data is transformed into the structured information in
this context NLP features of text data is used. In order to
identify the NLP features the POS tagging on data is
performed and based on part of speech information the data
is transformed into two dimensional vector. This 2D vector is
used with the SVM classifier for training and testing of
implemented method.
The implementation of the proposed approach is performed
using JAVA technology and NetBean IDE. After
implementation of the proposed technique the performance
of the proposed classification system is computed and
reported. The table 4.1 contains the summary of the
conducted experiment based results analysis.
S. No. Parameters Remark
1 Accuracy Accuracy of the proposed
approach is also improved with
size of training data
additionally after sometimes it
remains consistent between 90-
95% of classification rate
2 Error rate The error rate is reduces with
the amount of training samples
increases but sometimes it
fluctuating because of the noise
composition of data
3 Memory usages The memory usages of system
are not varying with the
amount of data but still it
depends on the size of input
data.
4 Time complexity The time complexity of the
proposed approach is
acceptable and it is depends on
the size or number of instances
of data produced for
classification
Table 4.1 performance summary
According to the obtained performance of the proposed
multi-class sentiment analysis technique the proposed
technique found the optimal performance and efficient for
computing the class labels of text data. Thus the proposed
technique is acceptable for the data classification and
utilization with the real world applications.
B. Future Work
The main aim of the proposed work is to find an optimum
classification technique to improve the performance of
multiclass sentiment classification. In this context a new data
model is proposed for implementation and the performance
is improved. In near future the following work is extended
using the following concepts.
1. The proposed technique implemented with the
traditional SVM classification technique the
performance can more optimized with the help of
ensemble based learn approach in near future the
ensemble learning is proposed for implementation
2. The proposed technique just consider the negation
feature for filtering the data in two groups in near
future more characteristics are recovered for
optimizing the negation filter
REFERENCES
[1] MondherBouazizi and TomoakiOhtsuki, “A Pattern-
Based Approach for Multi-Class Sentiment Analysis in
Twitter”, 2169-3536, 2017 IEEE, VOLUME 5, 2017
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 08 | Aug 2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 969
[2] Han, Jiawei, Jian Pei, and MichelineKamber, “Data
mining: concepts and techniques”, Elsevier, 2011.
[3] Bharati M. Ramageri, “Data Mining Techniques and
Applications”, Indian Journal of Computer Science and
Engineering, Volume 1 Number 4, pp. 301-305.
[4] Dunham, M.H. Data mining introductory and advanced
topics. Upper Saddle River, NJ: Pearson Education, New
Delhi, 2003. Print. ISBN: 81-7758-785-4, 2006.
[5] Ian H. Witten; Eibe Frank; Mark A. Hall, ―Data Mining:
Practical Machine Learning Tools and Techniques (3rd
Ed.), Elsevier, 30 January 2011.
[6] Kantardzic and Mehmed, "Preparing the Data." Data
Mining: Concepts, Models, Methods, and Algorithms,
Second Edition (2003): 26-52.
[7] Sebastiani, F., “Machine learning in automated text
categorization.” ACM Computing Surveys, Vol. 34, No. 1,
pp. 1–47, 2002.
[8] Sonali Vijay Gaikwad and ArchanaChaugule “Text
Mining Methods and Techniques”, International Journal
of Computer Applications (IJCA), Volume 85 – No 17,
January 2014.
[9] Ian H. Witten, “Text mining”, 2004, available online at:
https://www.cs.waikato.ac.nz/~ihw/papers/04-IHW-
Textmining.pdf
[10] Miss LatikaKaushik, “Text Mining - Scope and
Applications”, Journal of Computer Science and
Applications, Volume 5, Number 2 (2013), pp. 51-55
[11] RamzanTalib and Muhammad KashifHanif, “Text
Mining: Techniques, Applications and Issues”,
International Journal of Advanced Computer Science
and Applications, Vol. 7 No. 11, 2016
[12] A.H. Tan, Text Mining: The State of the Art and the
Challenges, in PAKDD99 Whorkshop on Knowledge
Discovery from advanced Databases, Beijing, China, and
April 1999.c.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 08 | Aug 2018 www.irjet.net p-ISSN: 2395-0072

More Related Content

What's hot (19)

PDF
Document Classification Using Expectation Maximization with Semi Supervised L...
ijsc
 
PDF
M43016571
IJERA Editor
 
PDF
Enhancing Keyword Query Results Over Database for Improving User Satisfaction
ijmpict
 
PDF
A survey of modified support vector machine using particle of swarm optimizat...
Editor Jacotech
 
PDF
Jurnal REMIK 2019
Sita Anggraeni
 
PPTX
Lec1,2
alaa223
 
PPTX
A Fuzzy Logic Intelligent Agent for Information Extraction
TarekMourad8
 
PPTX
Internship project presentation_final_upload
Suraj Rathore
 
PPTX
Lectures 1,2,3
alaa223
 
PDF
Feature Selection Algorithm for Supervised and Semisupervised Clustering
Editor IJCATR
 
PDF
Iaetsd an enhanced feature selection for
Iaetsd Iaetsd
 
PDF
SUPPORT VECTOR MACHINE CLASSIFIER FOR SENTIMENT ANALYSIS OF FEEDBACK MARKETPL...
AM Publications
 
PDF
Performance Comparision of Machine Learning Algorithms
Dinusha Dilanka
 
PPTX
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
Geetika Gautam
 
PDF
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm
ijnlc
 
PDF
[IJET-V1I3P11] Authors : Hemangi Bhalekar, Swati Kumbhar, Hiral Mewada, Prati...
IJET - International Journal of Engineering and Techniques
 
PDF
Opinion mining on newspaper headlines using SVM and NLP
IJECEIAES
 
PDF
A Survey on Sentiment Analysis and Opinion Mining
IJSRD
 
PDF
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
IRJET Journal
 
Document Classification Using Expectation Maximization with Semi Supervised L...
ijsc
 
M43016571
IJERA Editor
 
Enhancing Keyword Query Results Over Database for Improving User Satisfaction
ijmpict
 
A survey of modified support vector machine using particle of swarm optimizat...
Editor Jacotech
 
Jurnal REMIK 2019
Sita Anggraeni
 
Lec1,2
alaa223
 
A Fuzzy Logic Intelligent Agent for Information Extraction
TarekMourad8
 
Internship project presentation_final_upload
Suraj Rathore
 
Lectures 1,2,3
alaa223
 
Feature Selection Algorithm for Supervised and Semisupervised Clustering
Editor IJCATR
 
Iaetsd an enhanced feature selection for
Iaetsd Iaetsd
 
SUPPORT VECTOR MACHINE CLASSIFIER FOR SENTIMENT ANALYSIS OF FEEDBACK MARKETPL...
AM Publications
 
Performance Comparision of Machine Learning Algorithms
Dinusha Dilanka
 
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
Geetika Gautam
 
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm
ijnlc
 
[IJET-V1I3P11] Authors : Hemangi Bhalekar, Swati Kumbhar, Hiral Mewada, Prati...
IJET - International Journal of Engineering and Techniques
 
Opinion mining on newspaper headlines using SVM and NLP
IJECEIAES
 
A Survey on Sentiment Analysis and Opinion Mining
IJSRD
 
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
IRJET Journal
 

Similar to IRJET- Classifying Twitter Data in Multiple Classes based on Sentiment Class Labels (20)

PDF
Methods for Sentiment Analysis: A Literature Study
vivatechijri
 
PDF
Study on Relavance Feature Selection Methods
IRJET Journal
 
PDF
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
IRJET Journal
 
PDF
Sentiment Analysis and Classification of Tweets using Data Mining
IRJET Journal
 
PDF
Twitter Sentiment Analysis: An Unsupervised Approach
IRJET Journal
 
PDF
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
IRJET Journal
 
PDF
IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...
IRJET Journal
 
PDF
Sentiment Analysis on Twitter Data
IRJET Journal
 
PDF
Automated Essay Grading using Features Selection
IRJET Journal
 
PPTX
Poster (2)
Mukund Krishna Ravi
 
PDF
IRJET- Classification of Food Recipe Comments using Naive Bayes
IRJET Journal
 
PDF
Development of Information Extraction for Data Analysis using NLP
IRJET Journal
 
PDF
IRJET- Analysis of Brand Value Prediction based on Social Media Data
IRJET Journal
 
PDF
Machine learning for text document classification-efficient classification ap...
IAESIJAI
 
PDF
Detailed Investigation of Text Classification and Clustering of Twitter Data ...
ijtsrd
 
PDF
IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...
IRJET Journal
 
PDF
IRJET- Sentimental Prediction of Users Perspective through Live Streaming : T...
IRJET Journal
 
PDF
Sentiment Analysis of Twitter Data
IRJET Journal
 
PDF
An efficient feature selection in
csandit
 
PDF
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILES
cscpconf
 
Methods for Sentiment Analysis: A Literature Study
vivatechijri
 
Study on Relavance Feature Selection Methods
IRJET Journal
 
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
IRJET Journal
 
Sentiment Analysis and Classification of Tweets using Data Mining
IRJET Journal
 
Twitter Sentiment Analysis: An Unsupervised Approach
IRJET Journal
 
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
IRJET Journal
 
IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...
IRJET Journal
 
Sentiment Analysis on Twitter Data
IRJET Journal
 
Automated Essay Grading using Features Selection
IRJET Journal
 
IRJET- Classification of Food Recipe Comments using Naive Bayes
IRJET Journal
 
Development of Information Extraction for Data Analysis using NLP
IRJET Journal
 
IRJET- Analysis of Brand Value Prediction based on Social Media Data
IRJET Journal
 
Machine learning for text document classification-efficient classification ap...
IAESIJAI
 
Detailed Investigation of Text Classification and Clustering of Twitter Data ...
ijtsrd
 
IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...
IRJET Journal
 
IRJET- Sentimental Prediction of Users Perspective through Live Streaming : T...
IRJET Journal
 
Sentiment Analysis of Twitter Data
IRJET Journal
 
An efficient feature selection in
csandit
 
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILES
cscpconf
 
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
PDF
Kiona – A Smart Society Automation Project
IRJET Journal
 
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
PDF
Breast Cancer Detection using Computer Vision
IRJET Journal
 
PDF
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
PDF
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
PDF
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
PDF
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
Kiona – A Smart Society Automation Project
IRJET Journal
 
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
Breast Cancer Detection using Computer Vision
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Ad

Recently uploaded (20)

PDF
تقرير عن التحليل الديناميكي لتدفق الهواء حول جناح.pdf
محمد قصص فتوتة
 
PDF
CLIP_Internals_and_Architecture.pdf sdvsdv sdv
JoseLuisCahuanaRamos3
 
PPTX
Precooling and Refrigerated storage.pptx
ThongamSunita
 
PDF
lesson4-occupationalsafetyandhealthohsstandards-240812020130-1a7246d0.pdf
arvingallosa3
 
PPT
SF 9_Unit 1.ppt software engineering ppt
AmarrKannthh
 
PDF
Rapid Prototyping for XR: Lecture 4 - High Level Prototyping.
Mark Billinghurst
 
PDF
Rapid Prototyping for XR: Lecture 1 Introduction to Prototyping
Mark Billinghurst
 
PPT
دراسة حاله لقرية تقع في جنوب غرب السودان
محمد قصص فتوتة
 
PDF
01-introduction to the ProcessDesign.pdf
StiveBrack
 
PDF
Rapid Prototyping for XR: Lecture 3 - Video and Paper Prototyping
Mark Billinghurst
 
PPTX
CST413 KTU S7 CSE Machine Learning Neural Networks and Support Vector Machine...
resming1
 
PPTX
Kel.3_A_Review_on_Internet_of_Things_for_Defense_v3.pptx
Endang Saefullah
 
PDF
輪読会資料_Miipher and Miipher2 .
NABLAS株式会社
 
PPTX
LECTURE 7 COMPUTATIONS OF LEVELING DATA APRIL 2025.pptx
rr22001247
 
PPTX
WHO And BIS std- for water quality .pptx
dhanashree78
 
PDF
Rapid Prototyping for XR: Lecture 6 - AI for Prototyping and Research Directi...
Mark Billinghurst
 
PPTX
Computer network Computer network Computer network Computer network
Shrikant317689
 
PDF
Plant Control_EST_85520-01_en_AllChanges_20220127.pdf
DarshanaChathuranga4
 
PPTX
Work at Height training for workers .pptx
cecos12
 
PDF
Validating a Citizen Observatories enabling Platform by completing a Citizen ...
Diego López-de-Ipiña González-de-Artaza
 
تقرير عن التحليل الديناميكي لتدفق الهواء حول جناح.pdf
محمد قصص فتوتة
 
CLIP_Internals_and_Architecture.pdf sdvsdv sdv
JoseLuisCahuanaRamos3
 
Precooling and Refrigerated storage.pptx
ThongamSunita
 
lesson4-occupationalsafetyandhealthohsstandards-240812020130-1a7246d0.pdf
arvingallosa3
 
SF 9_Unit 1.ppt software engineering ppt
AmarrKannthh
 
Rapid Prototyping for XR: Lecture 4 - High Level Prototyping.
Mark Billinghurst
 
Rapid Prototyping for XR: Lecture 1 Introduction to Prototyping
Mark Billinghurst
 
دراسة حاله لقرية تقع في جنوب غرب السودان
محمد قصص فتوتة
 
01-introduction to the ProcessDesign.pdf
StiveBrack
 
Rapid Prototyping for XR: Lecture 3 - Video and Paper Prototyping
Mark Billinghurst
 
CST413 KTU S7 CSE Machine Learning Neural Networks and Support Vector Machine...
resming1
 
Kel.3_A_Review_on_Internet_of_Things_for_Defense_v3.pptx
Endang Saefullah
 
輪読会資料_Miipher and Miipher2 .
NABLAS株式会社
 
LECTURE 7 COMPUTATIONS OF LEVELING DATA APRIL 2025.pptx
rr22001247
 
WHO And BIS std- for water quality .pptx
dhanashree78
 
Rapid Prototyping for XR: Lecture 6 - AI for Prototyping and Research Directi...
Mark Billinghurst
 
Computer network Computer network Computer network Computer network
Shrikant317689
 
Plant Control_EST_85520-01_en_AllChanges_20220127.pdf
DarshanaChathuranga4
 
Work at Height training for workers .pptx
cecos12
 
Validating a Citizen Observatories enabling Platform by completing a Citizen ...
Diego López-de-Ipiña González-de-Artaza
 

IRJET- Classifying Twitter Data in Multiple Classes based on Sentiment Class Labels

  • 1. © 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 963 Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels Richa Jain1, Namrata Sharma2 1M.Tech Scholar, Department of CSE, Sushila Devi Bansal College of Engineering, Indore (M.P.), India 2Assistant Professor, Department of CSE, Sushila Devi Bansal College of Engineering, Indore (M.P.), India ---------------------------------------------------------------------------***--------------------------------------------------------------------------- Abstract–The applications of sentiment analysis of text data is increases day by day. Now in these days it is used for conducting product reviews, as helping function of stock market data analysis, product reviews and feedback analysis of e-commerce. Therefore need of accurate classification of text data in their sentiment classes become more crucial. In this presented work the data mining techniques are employed for evaluating text data in multiple sub-classes of positive and negative data. In this context a new model is proposed for investigation and design. This model usages twitter dataset as experimental dataset. The initial data is preprocessed first for eliminating the stop words and special characters from text data. In next process the negation filter is applied to group the data into positive and negative classes. In this context the negation operator in given text sentences are considered. This grouped data is used with the NLP parser for finding the POS (part of speech) information from text data. This information is works here as the text features and can be used classifying the entire data samples. Based on the NLP attributes the data is transformed into unstructured to structured data source. After conversion of data SVM classifier is used for performing training and testing of trained data model. Based on different experiments the performance of implemented technique is measured and found efficient and accurate for micro-blog data analysis in terms of multiple classes. Keywords: NLP, Sentiment analysis, text classification, SVM classification, performance evaluation. 1.INTRODUCTION The scope of text data mining is increases day by day because the text data is generated in a huge amount and the analysis and pattern extraction from such kind of huge and unstructured data format is a complicated task. Therefore the different data mining techniques and algorithms are applied on text data for recovering the valuable patterns. The text mining approaches are little bit different from the traditional data mining techniques because of the nature and format of data. Therefore we need a technique that process the initial information and convert it in such format by which the data is become acceptable for the learning and classification algorithms. In this presented work the text data mining is the key area of interest. Therefore the different approaches and techniques of text mining are evaluated. In addition of that the sentiment based classification for text data is proposed for design and development. The sentiment analysis of text data is performed in order to find the author’s emotions in the given text. Thus this technique is suitable for evaluation of user’s product reviews, social text analysis, terror or unsolicited message analysis and spam filtering in micro-blogs. The most of such techniques basically classify the data in two major classes namely positive and negative. There are very few techniques available that works on sub-classes classification of text data. Additionally the available multiclass text classification techniques are not much accurate. Therefore the proposed technique is wishes to enhance the performance of sentiment based multi-class classification technique in terms of accuracy. Using the classification techniques first the text data is learned by some classification or data mining algorithm and then classify when the similar text patterns are appeared to classify. 2.PROPOSED WORK This chapter provides the understanding about the proposed methodology for improved sentiment based text classification. Therefore the proposed methodology and the proposed algorithm are discussed in this chapter. A. System Overview The data mining and their techniques are employed on different formats of data for process the data and obtain the required information from raw data. In this context different algorithms and processes are involved for refining the information, evaluation of information and group then similar kinds of patterns. For performing these tasks the data mining techniques support the supervised and unsupervised techniques of learning. The supervised learning techniques are efficient and accurate as compared to the unsupervised learning techniques. The supervised learning techniques basically learn on the predefined patterns where the output is known to correct and adjust the errors during the learning International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 08 | Aug 2018 www.irjet.net p-ISSN: 2395-0072
  • 2. © 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 964 and in the techniques of unsupervised learning the algorithms directly employed on data to cluster the information into groups. In this work the text data is tried to classify using the supervised learning technique in addition of that the sentiment based analysis is also performed on data for finding the authors emotions in the given text blocks. The proposed work is aimed to find an effective combination of techniques that can improve the performance of multi- class classification of text data according to their hidden sentiment patterns. In this context a new model is proposed for design and implementation. That text classification model implements the SVM (support vector machine) as the classification algorithm. Additionally the NLP (natural language processing) parser which is used to find the part of speech tagging from the text sentences are used for feature extraction. In addition of that the negation based grouping of data is also performed in initial phases for improving the classifiers performance. This section the basic overview of the proposed text classification technique is provided and in next section the detailed modeling of the proposed system is described. B. Methodology The proposed methodology of the proposed system is explained using figure 2.1. This diagram contains the different intermediate processes that help to process the information and extract the desired information from raw data. Input dataset: the supervised learning techniques required the training samples by which algorithm becomes trained. Therefore some initial input samples are required for initiating the learning process. In this presented work the sentiment or emotional class labels are required to be predicting therefore the twitter micro blog data is provided as input to the system. The twitter dataset contains the user’s information and the twitted data by the users. Data preprocessing: after input of the initial training samples the data preprocessing is performed. The preprocessing of data is essential step of data mining technique. Using this filtering of information is performed by which unwanted data is removed and the target information is recovered from the input data. In addition of that the aim of data is to manipulate is such a way by which the target algorithm can learn with the processed data. Therefore transformation of data, mapping of data, and other techniques are applied. In this presented work the text data is accepted as input therefore two basic operations are involved. First is used for removing the stop words and second is used for removing the special characters from data. In order to achieve the required function the find and replace function is implemented with the list of stop words and special characters. And using the find and replace function the target data is removed from the input dataset. Figure 2.1 Proposed system architecture Negation filter: after preprocessing of data the data is cleaned and now can be used for classification task. In this context a negation filter is implemented to group entire text data into two groups i.e. negative and positive data groups. The filtering of data is performed in such manner by which the sentences which consumes the negation such no, not are targeted to cluster the data. After this filter the data is grouped in two classes namely the positive data and negative data. Positive data: the set of positive data contains those sentences that are not utilizing negations. Negative data: the text data with the negation is clustered in this group. Both the data samples are now used with the next process for finding the part of speech information. NLP tagging: the input data is basically the unstructured source of information and the length and other properties of each data instances are different from each other. Therefore International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 08 | Aug 2018 www.irjet.net p-ISSN: 2395-0072
  • 3. © 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 965 to make the data suitable the text feature in terms of POS (part of speech) is used for classifiers training and testing. The Stanford NLP (natural language processing) parser is employed in this phase of data processing. The NLP parser first maps all the data into the part of speech information. And the implemented function counts the POS information. Using the counted information from the input data is transformed in a two dimension vector. The example of parsed data is given using table 2.1. Noun Pronoun Verb Adverb …. 1 2 1 0 Table 2.1 example of POS tagging SVM training: the SVM (support vector machine) is a supervised learning algorithm. This algorithm is basically used for classifying two classes. This classifier basically works on distance based technique, in this context the entire data is first scattered into a N dimensional space where the dimensions of space are equivalent to the number of attributes in the given dataset. Additionally the dataset instances are treated as a single point in space. In order to classify the data instances the hyper plan is plotted in optimal distance by which the data become separable. In this presented work the primarily positive and negative data groups are classified in again sub classes therefore the SMO (Sequential minimal optimization) technique is utilized for classifying the sub-classes of data. Class label prediction: after training of the SVM classifier the test data samples are produced to the classifier. The test set is prepared on the basis of input training sample and 30% randomly selected data instances are selected for preparing the test dataset. C. Proposed Algorithm This section summarizes the processes involved in the methodology section in form of algorithm steps. Table 2.2 contains the processes involved in the proposed methodology. Input: training dataset D Output : class labels of text data C Processes: 1. ( ) 2. ( ) a. ( ) b. ( ) 3. 4. [ ] ( ) 5. ([ ]) 6. ( ) 7. ( ) 8. Return C Table 2.2 proposed algorithm 3.RESULT ANALYSIS This chapter provides the results analysis in the form of measured performance parameters. The different computed parameters and relevant outcomes are reported in this chapter. A. Accuracy Accuracy is the measurement of the data mining system performance in terms of how accurately a data mining algorithm performs the classification. That is computed by the ratio of total correctly classified data and total samples to be classified. The following formula can be used for computing classifier’s accuracy. Figure 3.1 Accuracy in percentage 78 80 82 84 86 88 90 92 94 50 100 200 300 500 800 1000 Accuracy% Dataset instances Accuracy Accuracy International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 08 | Aug 2018 www.irjet.net p-ISSN: 2395-0072
  • 4. © 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 966 Dataset Size Accuracy 50 86.4 100 84.1 200 85.9 300 89.7 500 88.3 800 90.6 1000 92.5 Table 3.1 accuracy in percentage Accuracy of the proposed sentiment based text classification technique is evaluated and reported using table 3.1 and figure 3.1. In this diagram the X axis contains the size of dataset in terms of number of instances additionally the Y axis contains the corresponding accuracy of the proposed approach in terms of percentage. According to the obtained performance of classification system the proposed technique improves their accuracy as the amount of data instances for training and testing is increases. Therefore the proposed system is acceptable for real world application usages. B. Error Rate Error rate is also an essential parameter of classifier performance using this parameter it is measured how much amount a classifier misclassify the data. That is the ratio of data misclassified and total samples provided for classification to algorithm. The error rate of a classifier is evaluated using the following formula: Or Figure 3.2 error rate percentage Dataset Size Error rate 50 13.6 100 16 200 14.1 300 10.3 500 11.7 800 9.4 1000 7.5 Table 3.2 error rate percentage The computed error rate of the proposed text classification data model is demonstrated using table 3.2 and figure 3.2. In this diagram the X axis shows the amount of data supplied for classification in terms of number of instances of data objects and the Y axis of the diagram shows the error rate percentage. According to the obtained performance the proposed technique minimizes the amount of error percentage as the amount of data size is increases. Therefore the proposed approach is acceptable for real world application usages. C. Memory Usages Memory usages of a process are also known as the space complexity of algorithm. A data mining algorithm for classification how much amount of main memory consume is termed here as the memory usages of the process. The java based process how much memory usages can be estimate using the following formula: 0 5 10 15 20 50 100 200 300 500 800 1000 ErrorRate% Dataset instances Error rate Error rate International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 08 | Aug 2018 www.irjet.net p-ISSN: 2395-0072
  • 5. © 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 967 Figure 3.3 memory consumption in KB The memory consumption of the proposed classification technique of text is demonstrated using figure 3.3 and table 3.3. According to the given graph the X axis contains the amount of data set instances to be classified and the Y axis represents the computed memory usages of the given process. The memory consumption of the target algorithm is computed here in terms of KB (kilobytes). According to the obtained results the memory consumption of the system is increases as the amount of data instances for classification is increases. But the increment in memory is not much higher therefore it is acceptable in cost of their classification accuracy. Therefore the memory consumption is directly depends upon the size of data to be classify. Dataset Size Memory in KB 50 28474 100 29747 200 30084 300 31837 500 32994 800 33913 1000 34927 Table 3.3 memory consumption in KB D. Time Consumption Time consumption is the measurement of time required to classify the input data. In this given system the time consumption is computed using the following formula: Dataset Size Time in MS 50 36 100 43 200 57 300 74 500 95 800 120 1000 146 Table 3.4 time consumption in MS Figure 3.4 time consumption in MS The time consumption of the proposed classification technique is demonstrated using figure 3.4 and table 3.4. The time is measured here in terms of milliseconds (MS). The diagram shows in X axis the amount of dataset size input to classification and the Y axis shows the time consumed for performing the classification. According to the given graph the time consumption of the system is increases as the amount of data size is increases. Additionally that is also observed the amount of time is depends on the amount of data supplied for classification. 0 5000 10000 15000 20000 25000 30000 35000 40000 50 100 200 300 500 800 1000 MemoryinKB Dataset Instances Memory in KB Memory in KB 0 50 100 150 200 50 100 200 300 500 800 1000 TimeinMS Dataset instances Time in MS Time in MS International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 08 | Aug 2018 www.irjet.net p-ISSN: 2395-0072
  • 6. © 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 968 4 .CONCLUSION This chapter provides the conclusion of the proposed sentiment analysis based text classification. The experimentations and observations are used for providing the conclusion of the proposed work in addition of that the future extension of the proposed work is also discussed in this chapter. A. Conclusion The trend of text classification approaches are increases day by day. As the amount of text data requirements is increases we need some efficient and accurate data mining techniques that help for data analysis. In addition of that the applications of text classification techniques are also increases in different purpose sometimes it is used for making groups of text data and sometimes it is required to find the emotions from data. The classification of text data is complicated as compared to structured data source because the data in text format is not regular in size and amount. Therefore to find suitable attributes and features from data additional efforts are required to implement. The proposed work is tried to design and develop an enhanced method by which the text is classified in sentiment based classes. Therefore first the data is preprocessed and unwanted data is removed from the initial data. In next the data is filtered on the basis of negation operators available in text sentences. After applying the negation filter the data is clustered into positive and negative data groups. In further the data is transformed into the structured information in this context NLP features of text data is used. In order to identify the NLP features the POS tagging on data is performed and based on part of speech information the data is transformed into two dimensional vector. This 2D vector is used with the SVM classifier for training and testing of implemented method. The implementation of the proposed approach is performed using JAVA technology and NetBean IDE. After implementation of the proposed technique the performance of the proposed classification system is computed and reported. The table 4.1 contains the summary of the conducted experiment based results analysis. S. No. Parameters Remark 1 Accuracy Accuracy of the proposed approach is also improved with size of training data additionally after sometimes it remains consistent between 90- 95% of classification rate 2 Error rate The error rate is reduces with the amount of training samples increases but sometimes it fluctuating because of the noise composition of data 3 Memory usages The memory usages of system are not varying with the amount of data but still it depends on the size of input data. 4 Time complexity The time complexity of the proposed approach is acceptable and it is depends on the size or number of instances of data produced for classification Table 4.1 performance summary According to the obtained performance of the proposed multi-class sentiment analysis technique the proposed technique found the optimal performance and efficient for computing the class labels of text data. Thus the proposed technique is acceptable for the data classification and utilization with the real world applications. B. Future Work The main aim of the proposed work is to find an optimum classification technique to improve the performance of multiclass sentiment classification. In this context a new data model is proposed for implementation and the performance is improved. In near future the following work is extended using the following concepts. 1. The proposed technique implemented with the traditional SVM classification technique the performance can more optimized with the help of ensemble based learn approach in near future the ensemble learning is proposed for implementation 2. The proposed technique just consider the negation feature for filtering the data in two groups in near future more characteristics are recovered for optimizing the negation filter REFERENCES [1] MondherBouazizi and TomoakiOhtsuki, “A Pattern- Based Approach for Multi-Class Sentiment Analysis in Twitter”, 2169-3536, 2017 IEEE, VOLUME 5, 2017 International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 08 | Aug 2018 www.irjet.net p-ISSN: 2395-0072
  • 7. © 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 969 [2] Han, Jiawei, Jian Pei, and MichelineKamber, “Data mining: concepts and techniques”, Elsevier, 2011. [3] Bharati M. Ramageri, “Data Mining Techniques and Applications”, Indian Journal of Computer Science and Engineering, Volume 1 Number 4, pp. 301-305. [4] Dunham, M.H. Data mining introductory and advanced topics. Upper Saddle River, NJ: Pearson Education, New Delhi, 2003. Print. ISBN: 81-7758-785-4, 2006. [5] Ian H. Witten; Eibe Frank; Mark A. Hall, ―Data Mining: Practical Machine Learning Tools and Techniques (3rd Ed.), Elsevier, 30 January 2011. [6] Kantardzic and Mehmed, "Preparing the Data." Data Mining: Concepts, Models, Methods, and Algorithms, Second Edition (2003): 26-52. [7] Sebastiani, F., “Machine learning in automated text categorization.” ACM Computing Surveys, Vol. 34, No. 1, pp. 1–47, 2002. [8] Sonali Vijay Gaikwad and ArchanaChaugule “Text Mining Methods and Techniques”, International Journal of Computer Applications (IJCA), Volume 85 – No 17, January 2014. [9] Ian H. Witten, “Text mining”, 2004, available online at: https://www.cs.waikato.ac.nz/~ihw/papers/04-IHW- Textmining.pdf [10] Miss LatikaKaushik, “Text Mining - Scope and Applications”, Journal of Computer Science and Applications, Volume 5, Number 2 (2013), pp. 51-55 [11] RamzanTalib and Muhammad KashifHanif, “Text Mining: Techniques, Applications and Issues”, International Journal of Advanced Computer Science and Applications, Vol. 7 No. 11, 2016 [12] A.H. Tan, Text Mining: The State of the Art and the Challenges, in PAKDD99 Whorkshop on Knowledge Discovery from advanced Databases, Beijing, China, and April 1999.c. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 08 | Aug 2018 www.irjet.net p-ISSN: 2395-0072