SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 737
Classification of Student Query using Machine Learning
Voore Saithanish1, K. Sai Varun2, Dr. M. Senthil Kumaran3
1-2Student, Dept. Of CSE, SCSVMV (Deemed to be University), Kanchipuram, Tamil Nadu, India
3Professor, Dept. Of CSE, SCSVMV (Deemed to be University), Kanchipuram, Tamil Nadu, India
---------------------------------------------------------------------***----------------------------------------------------------------------
Abstract – The Educational institutions and universities
were getting bulk amount of data in the form of queries
send by students regarding their academics and educational
issues. Because of this huge data it is difficult for the
universities to classify, sort and resolve which takes much
amount of time. This Project algorithm which works for
classifying the data into their respective departments using
Machine Learning Algorithm in the way assigning Keywords
for the data then sorting them into the category. So, the
students get resolved their queries in short span of time by
classifying their quires directly to their respective
Departments.
Key Words: Classification, Text Processing, Machine
Learning, TF-IDF (term frequency-inverse document
frequency), Data Analysis, SVM (support vector machine)
1.INTRODUCTION
The data received from students to the universities in
daily bias in the bulk form which makes the universities
difficult to sort out the queries according the departments,
taking huge amount of time and complexity in classifying
the data.
The data in the fields of students queries in every
department as the fee issues, transportation, library and
many more in this form. This type of data is much complex
to find out and resolve in a period of time. The students
facing problems as well as the time period of resolving
their queries is delay too. So, by this project where it is
designed to classify the data into the departments by
giving the data keywords and making into the sub groups
which the algorithm differentiates the data into types of
departments that makes them easier to sort them out. The
query raised by the students is stored in a database where
it is received from a website, having the terms as student
name, class, reg no, department, mail, category, and the
complaint data, priority.
The data given by the student is then received by the
category department with the priority and the students
receives the notification of his/her status of the query. The
department gets informed regarding the query, time
posted, priority which makes the department easier to
resolve the query. After the query resolved the status of
the query is seen by the student whether it is solved, in
progress, hold, etc.
The TF-IDF (term frequency-inverse document frequency)
classification algorithm is used to classify the data into the
category using the label number and names given in form
of vectors which are converted from the data form by the
algorithm. This makes the task easier and faster in finding
the query related to the category that makes the students
issues resolve in time and making the task simple for the
management.
1.1 Objective
The main objective is to make the task easy and in short
span time and in the way helping both the students and
management as
 Students get their queries resolved in short time
and,
 Managements find it easy to classify the data and
resolving them.
 Using the Machine learning and cutting-edge
technologies in daily life situations and making
them easier and faster.
2 Problem Statement
In every educational institution, there will be Many
queries for students regarding the technical or
administration and other categories. So, to clear the
student query in a quick and easy manner this algorithm
helps the institution to classify the student posted queries
to respective departments.
The time delay in resolving the problems is no more and
the process is in a lucid way. No more confusions and
complex situations as clashing the queries and not able to
find one in a bulk file.
3. Algorithm
Input:
D: grumblings information (comprises of the relative
multitude of grievances)
Yield:
Weight Matrix (which comprises of the multitude of loads
of terms are
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 738
called vectors)
Method:
1-for every grumbling archive (ci) do
2-for each term (tj) in ci do
3-TF-IDF score for term tj
in record
ci = TF (ci
, tj) * IDF (tj)
Where, IDF = Inverse Document Frequency
TF = Term Frequency
TF (ci
, tj) = (Term tj recurrence in record ci)
--------------
(Complete words in archive ci)
IDF (ci) = log2 ((Total Documents)/ (records
With term tj))
4-End for of term
5-End for of objection record
6-The vectors are put away in an exhibit for preparing and
testing
purposes, during arrangement.
Chart -1: Flow Chart
4. Project Description
The complaints are in text format; in order to classify them
using a classification method, the text must be translated
into vectors.to be able to foresee the class We use TF-IDF
to accomplish this.TF-IDF is a method for converting text
to vectors. The inverse document frequency is used to find
the frequency of a document. Determine which terms are
the most relevant to a particular issue. It's a unique
situation. Statistics are used to determine how relevant a
term or word is. refers to a document in a corpus or a
collection of documents. The TF-IDF of a word in a
document is determined using two indices IDF (inverse
term frequency) and TF (term frequency) The term
frequency (TF) is calculated by counting the number of
times a word appears in a document and adjusting the
frequency for the document's length or number of words.
IDF (inverse document frequency) of a word or phrase the
term denotes how uncommon or uncommon a word is
throughout the entire dictionary. A corpus is a group of
documents. This can be computed by dividing the number
of papers by the total number of documents. The word
occurred in a significant number of documents. If a word
or term appears in a large number of places in the
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 739
manuscript, it's a good sign. If it's highly common, it's
scaled to '0,' else it's scaled to '1.' We can get the result by
multiplying the two terms together.
The TD-IDF score the greater the score, the more relevant
it is. After translating the text, we use techniques such as
Random Forest Classifier, Linear SVC, Multinomial NB, and
Logistic Regression to classify it. Regression The
"complaints.csv" data collection will contain the Token
No., Date, Year, Student-ID, Email Id, and other attributes
Category of Complaints Cat, Issue Resolver, Counselor
Name, Issue Date, and Issue, Number of Days to Resolve
Status: Completed, Status: Completed, Status: Completed,
Status: Completed, Status: Completed, Using the
"complaints.csv" file dataset, we'll create a new Data
Frame with the following elements:
(Categories includes health issues, the examination part,
and so on detention, etc.) and the Grievance Category,
which includes a comprehensive grievance Now we'll get
rid of the duplicates in the database.
Fig -01: The accuracy and deviation shown as output
Assign a unique Id to the newly formed Data Frame, let's
call it "df1." making a temporary for each category in other
works a dictionary for future use. We can now see which
section or department is receiving the most complaints
from students. Now we'll put the theory into practice.
TFID-Vectorizer, which converts each complaint into a
vector. We'll store the vectors in an array, and we'll use
them later. can find out how many Unigrams and Bigrams
there are. Following that, we will create a map. the
Unigrams and Bigrams with the most connected
Remove the stop words from each complaint. The division
of Data for Training and Testing will be collected in the
same way as 'X' is collected. Having all of the Grievance
Categories, as well as 'y', which is made up of We need to
forecast the labels of the target labels. Everything is
completed at this point. will be sorted out by data training
and assessment Now we use a variety of machine learning
classification methods to forecast the outcome of the
complaints. The other is now. Maintaining the database for
sending the messages is an element of the project.
Regarding the complaints, bidirectional notification is
required. As a result, When the categorization procedure
is finished, the anticipated results are displayed. We'll take
the output and make a prediction based on it. cause a
notification to be sent to that department's employee who
will be responsible for resolving the issue Finally, once the
complaint has been resolved, resolved, and the issue has
been posted on the website The issue raiser will be
notified, and work will begin. will be performed quickly
and without wasting time, and when compared to other
complaint classifiers, it will be the best. interaction
between two people on a one-to-one basis.
Fig -02: The output shows the sorting of data as of
category_id
5. Result
As the queries received from the students, they were
analyzed and classified to the departments mentioned
according to the query which were converted to vectors to
identify the category then were classified and shown as in
the figure below the departments were shown.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 740
Fig -03: The Queries Classified into Respective
Departments
6. Conclusion
The student query classification system using Linear SVC
with the combination of TF-IDF (Term Frequency-Inverse
Document Frequency) as results in giving the classification
of data in the database according to the category which
were divided by the use of vector notation assigned for the
data that makes sorting the data easier. The interface
jupyter notebook is used to read and take the data and
giving the output in the forms of tables and graphs for the
respective queries. Using machine learning we make the
query collection and classification simple and this is
widely used technology now-a-days. This model results in
accuracy of 89% and efficient in working the data in the
bulk form. This helps in reducing the time factor and for
the benefit of students and organizations both.
Fig -1: Accuracy Graph
References
1. N. S. Altman. 1992. An introduction to kernel and
nearest-neighbor nonparametric regression. The
American Statistician, 46(3):175–185.
2. Koray Balcı -Department of Computer Engineering,
Boğaziçi University, Istanbul, Turkey Albert Ali Salah -
Department of Computer Engineering, Boğaziçi University,
Istanbul, Turkey Automatic Classification of Player
Complaints in Social Games.
3. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio.
2015. Neural machine translation by jointly learning to
align and translate. In 3rd International Conference on
Learning Representations, ICLR 2015, San Diego, CA, USA,
May 7-9, 2015, Conference Track Proceedings.
4. M.A. Fauzi, Automatic complaint classification system
using classifier ensembles, January2018.
5. Ganesan, Kavita, and Guangyu Zhou. (2016), “Linguistic
Understanding of Complaints and Praises in User
Reviews.”, Proceedings of NAACLHLT.
6. Imam Cholissodin, Maya Kurniawati, Indriati, Issa
Arwani Informatics Department, PTIIK, Brawijiaya
University, Malang, Indonesia.Classification of Campus E-
Complaint Documents using Directed Acyclic Graph Multi-
Class SVM Based on Analytic Hierarchy Process 2014.
7. Moschitti, A., & Basili, R. (2004), “Complex Linguistic
Features for Text Classification: A Comprehensive Study.”,
Advances in Information Retrieval, 181–19.
8. Badjatiya, P., Gupta, S., Gupta, M., & Varma, V. (2017).
“Deep Learning for Hate Speech Detection in Tweets”,
Proceedings of the 26th International Conference on
World Wide Web Companion - WWW ’17.
9. Ryan M. Eshleman and Hui Yang.2014” Hey #311, Come
Clean My Street! ”: A Spatio-temporal Sentiment Analysis
of Twitter Data and 311 Civil Complaints. In 2014 IEEE
Fourth International Conference on Big Data and Cloud
Computing, pages 477– 484.
10. Ahmad Fauzan and Masayu Leylia Khodra. 2014.
Automatic Multilabel Categorization using Learning to
Rank Framework for Complaint Text on Bandung
Government. In 2014 Int. Conf. of Advanced Informatics:
Concept, Theory and Application (ICAICTA), pages 28–33.
Institut Teknologi Bandung, IEEE.
11. Ana Catarina Forte and Pavel B. Brazdil. 2016.
Determining the Level of Clients’ Dissatisfaction from
Their Commentaries. In Computational Processing of the
Portuguese Language - 12th Int. Conf., PROPOR 2016,
volume 9727 of Lecture Notes in Computer Science, pages
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 741
74–85. Springer. (Basic Book/Monograph Online Sources)
J. K. Author. (Year, month, day). Title (edition) [Type of
medium]. Volume(issue).
Akhter, M.P., Jiangbin, Z., Naqvi, I.R., Abdelmajeed, M.,
Mehmood, A., Sadiq, M.T.: Document-level text
classification using single-layer multisize filters
convolutional neural network. IEEE Access 8, 42689–
42707 (2020)
12. Mrs Sujata Khedkar a, Dr. Subhash Shinde:Deep
Learning and Ensemble Approach for Praise or Complaint
Classification,sh Shinde, Professor, Computer Engineering
Department, LTCE,Koparkhairane, Navi Mumbai, 400050,
India,Dr. Subhash Shinde, Professor, Computer
Engineering Department, LTCE,Koparkhairane, Navi
Mumbai, 400709, India.
13. Joao Filgueiras ˜ *,Lu´ıs Barbosa*, Gil Rocha*, Henrique
Lopes Cardoso*, Lu´ıs Paulo Reis*, Joao Pedro Machado ˜ +,
Ana Maria Oliveira,Complaint Analysis and Classification
for Economic and Food Safety, *Laboratorio de Intelig ´
encia Artificial e Ci ˆ enciade Computadores (LIACC)
Faculdade deEngenhariadaUniversidade do Porto Rua Dr.
Roberto Frias, s/n, 4200-465 Porto, Portugal.

More Related Content

Similar to Classification of Student Query using Machine Learning (20)

PDF
K017626773
IOSR Journals
 
PDF
IRJET - Recommendation of Branch of Engineering using Machine Learning
IRJET Journal
 
PDF
University Recommendation Support System using ML Algorithms
IRJET Journal
 
PDF
IRJET- Analysis of Student Performance using Machine Learning Techniques
IRJET Journal
 
PDF
IRJET- Educational Data Mining for Prediction of StudentsPerformance using Cl...
IRJET Journal
 
PPTX
Data Mining Email SPam Detection PPT WITH Algorithms
deepika90811
 
PDF
Fd33935939
IJERA Editor
 
PDF
Fd33935939
IJERA Editor
 
PDF
Student Performance Predictor
IRJET Journal
 
PDF
An Empirical Study of the Applications of Classification Techniques in Studen...
IJERA Editor
 
PDF
M-Learners Performance Using Intelligence and Adaptive E-Learning Classify th...
IRJET Journal
 
PDF
IRJET- Using Data Mining to Predict Students Performance
IRJET Journal
 
PDF
Big data project
Kedar Kumar
 
PDF
Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Prediction
ijtsrd
 
PDF
IRJET- Placement Recommender and Evaluator
IRJET Journal
 
PDF
Using Naive Bayesian Classifier for Predicting Performance of a Student
ijtsrd
 
PDF
Survey on Techniques for Predictive Analysis of Student Grades and Career
IRJET Journal
 
PDF
Measure the Similarity of Complaint Document Using Cosine Similarity Based on...
Editor IJCATR
 
PDF
IRJET- Evaluation Technique of Student Performance in various Courses
IRJET Journal
 
K017626773
IOSR Journals
 
IRJET - Recommendation of Branch of Engineering using Machine Learning
IRJET Journal
 
University Recommendation Support System using ML Algorithms
IRJET Journal
 
IRJET- Analysis of Student Performance using Machine Learning Techniques
IRJET Journal
 
IRJET- Educational Data Mining for Prediction of StudentsPerformance using Cl...
IRJET Journal
 
Data Mining Email SPam Detection PPT WITH Algorithms
deepika90811
 
Fd33935939
IJERA Editor
 
Fd33935939
IJERA Editor
 
Student Performance Predictor
IRJET Journal
 
An Empirical Study of the Applications of Classification Techniques in Studen...
IJERA Editor
 
M-Learners Performance Using Intelligence and Adaptive E-Learning Classify th...
IRJET Journal
 
IRJET- Using Data Mining to Predict Students Performance
IRJET Journal
 
Big data project
Kedar Kumar
 
Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Prediction
ijtsrd
 
IRJET- Placement Recommender and Evaluator
IRJET Journal
 
Using Naive Bayesian Classifier for Predicting Performance of a Student
ijtsrd
 
Survey on Techniques for Predictive Analysis of Student Grades and Career
IRJET Journal
 
Measure the Similarity of Complaint Document Using Cosine Similarity Based on...
Editor IJCATR
 
IRJET- Evaluation Technique of Student Performance in various Courses
IRJET Journal
 

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
PDF
Kiona – A Smart Society Automation Project
IRJET Journal
 
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
PDF
Breast Cancer Detection using Computer Vision
IRJET Journal
 
PDF
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
PDF
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
PDF
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
PDF
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
Kiona – A Smart Society Automation Project
IRJET Journal
 
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
Breast Cancer Detection using Computer Vision
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Ad

Recently uploaded (20)

PPTX
FSE_LLM4SE1_A Tool for In-depth Analysis of Code Execution Reasoning of Large...
cl144
 
DOCX
Engineering Geology Field Report to Malekhu .docx
justprashant567
 
PPSX
OOPS Concepts in Python and Exception Handling
Dr. A. B. Shinde
 
PPTX
Work at Height training for workers .pptx
cecos12
 
PDF
01-introduction to the ProcessDesign.pdf
StiveBrack
 
PDF
13th International Conference of Security, Privacy and Trust Management (SPTM...
ijcisjournal
 
PDF
June 2025 Top 10 Sites -Electrical and Electronics Engineering: An Internatio...
elelijjournal653
 
PDF
lesson4-occupationalsafetyandhealthohsstandards-240812020130-1a7246d0.pdf
arvingallosa3
 
PPTX
Artificial Intelligence jejeiejj3iriejrjifirirjdjeie
VikingsGaming2
 
PDF
How to Buy Verified CashApp Accounts IN 2025
Buy Verified CashApp Accounts
 
PDF
June 2025 - Top 10 Read Articles in Network Security and Its Applications
IJNSA Journal
 
PPTX
Functions in Python Programming Language
BeulahS2
 
PDF
Generative AI & Scientific Research : Catalyst for Innovation, Ethics & Impact
AlqualsaDIResearchGr
 
PPTX
CST413 KTU S7 CSE Machine Learning Introduction Parameter Estimation MLE MAP ...
resming1
 
PDF
FSE-Journal-First-Automated code editing with search-generate-modify.pdf
cl144
 
PPTX
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
AsadShad4
 
PPTX
Kel.3_A_Review_on_Internet_of_Things_for_Defense_v3.pptx
Endang Saefullah
 
PPTX
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
AsadShad4
 
PPT
SF 9_Unit 1.ppt software engineering ppt
AmarrKannthh
 
PDF
Module - 4 Machine Learning -22ISE62.pdf
Dr. Shivashankar
 
FSE_LLM4SE1_A Tool for In-depth Analysis of Code Execution Reasoning of Large...
cl144
 
Engineering Geology Field Report to Malekhu .docx
justprashant567
 
OOPS Concepts in Python and Exception Handling
Dr. A. B. Shinde
 
Work at Height training for workers .pptx
cecos12
 
01-introduction to the ProcessDesign.pdf
StiveBrack
 
13th International Conference of Security, Privacy and Trust Management (SPTM...
ijcisjournal
 
June 2025 Top 10 Sites -Electrical and Electronics Engineering: An Internatio...
elelijjournal653
 
lesson4-occupationalsafetyandhealthohsstandards-240812020130-1a7246d0.pdf
arvingallosa3
 
Artificial Intelligence jejeiejj3iriejrjifirirjdjeie
VikingsGaming2
 
How to Buy Verified CashApp Accounts IN 2025
Buy Verified CashApp Accounts
 
June 2025 - Top 10 Read Articles in Network Security and Its Applications
IJNSA Journal
 
Functions in Python Programming Language
BeulahS2
 
Generative AI & Scientific Research : Catalyst for Innovation, Ethics & Impact
AlqualsaDIResearchGr
 
CST413 KTU S7 CSE Machine Learning Introduction Parameter Estimation MLE MAP ...
resming1
 
FSE-Journal-First-Automated code editing with search-generate-modify.pdf
cl144
 
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
AsadShad4
 
Kel.3_A_Review_on_Internet_of_Things_for_Defense_v3.pptx
Endang Saefullah
 
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
AsadShad4
 
SF 9_Unit 1.ppt software engineering ppt
AmarrKannthh
 
Module - 4 Machine Learning -22ISE62.pdf
Dr. Shivashankar
 
Ad

Classification of Student Query using Machine Learning

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 737 Classification of Student Query using Machine Learning Voore Saithanish1, K. Sai Varun2, Dr. M. Senthil Kumaran3 1-2Student, Dept. Of CSE, SCSVMV (Deemed to be University), Kanchipuram, Tamil Nadu, India 3Professor, Dept. Of CSE, SCSVMV (Deemed to be University), Kanchipuram, Tamil Nadu, India ---------------------------------------------------------------------***---------------------------------------------------------------------- Abstract – The Educational institutions and universities were getting bulk amount of data in the form of queries send by students regarding their academics and educational issues. Because of this huge data it is difficult for the universities to classify, sort and resolve which takes much amount of time. This Project algorithm which works for classifying the data into their respective departments using Machine Learning Algorithm in the way assigning Keywords for the data then sorting them into the category. So, the students get resolved their queries in short span of time by classifying their quires directly to their respective Departments. Key Words: Classification, Text Processing, Machine Learning, TF-IDF (term frequency-inverse document frequency), Data Analysis, SVM (support vector machine) 1.INTRODUCTION The data received from students to the universities in daily bias in the bulk form which makes the universities difficult to sort out the queries according the departments, taking huge amount of time and complexity in classifying the data. The data in the fields of students queries in every department as the fee issues, transportation, library and many more in this form. This type of data is much complex to find out and resolve in a period of time. The students facing problems as well as the time period of resolving their queries is delay too. So, by this project where it is designed to classify the data into the departments by giving the data keywords and making into the sub groups which the algorithm differentiates the data into types of departments that makes them easier to sort them out. The query raised by the students is stored in a database where it is received from a website, having the terms as student name, class, reg no, department, mail, category, and the complaint data, priority. The data given by the student is then received by the category department with the priority and the students receives the notification of his/her status of the query. The department gets informed regarding the query, time posted, priority which makes the department easier to resolve the query. After the query resolved the status of the query is seen by the student whether it is solved, in progress, hold, etc. The TF-IDF (term frequency-inverse document frequency) classification algorithm is used to classify the data into the category using the label number and names given in form of vectors which are converted from the data form by the algorithm. This makes the task easier and faster in finding the query related to the category that makes the students issues resolve in time and making the task simple for the management. 1.1 Objective The main objective is to make the task easy and in short span time and in the way helping both the students and management as  Students get their queries resolved in short time and,  Managements find it easy to classify the data and resolving them.  Using the Machine learning and cutting-edge technologies in daily life situations and making them easier and faster. 2 Problem Statement In every educational institution, there will be Many queries for students regarding the technical or administration and other categories. So, to clear the student query in a quick and easy manner this algorithm helps the institution to classify the student posted queries to respective departments. The time delay in resolving the problems is no more and the process is in a lucid way. No more confusions and complex situations as clashing the queries and not able to find one in a bulk file. 3. Algorithm Input: D: grumblings information (comprises of the relative multitude of grievances) Yield: Weight Matrix (which comprises of the multitude of loads of terms are
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 738 called vectors) Method: 1-for every grumbling archive (ci) do 2-for each term (tj) in ci do 3-TF-IDF score for term tj in record ci = TF (ci , tj) * IDF (tj) Where, IDF = Inverse Document Frequency TF = Term Frequency TF (ci , tj) = (Term tj recurrence in record ci) -------------- (Complete words in archive ci) IDF (ci) = log2 ((Total Documents)/ (records With term tj)) 4-End for of term 5-End for of objection record 6-The vectors are put away in an exhibit for preparing and testing purposes, during arrangement. Chart -1: Flow Chart 4. Project Description The complaints are in text format; in order to classify them using a classification method, the text must be translated into vectors.to be able to foresee the class We use TF-IDF to accomplish this.TF-IDF is a method for converting text to vectors. The inverse document frequency is used to find the frequency of a document. Determine which terms are the most relevant to a particular issue. It's a unique situation. Statistics are used to determine how relevant a term or word is. refers to a document in a corpus or a collection of documents. The TF-IDF of a word in a document is determined using two indices IDF (inverse term frequency) and TF (term frequency) The term frequency (TF) is calculated by counting the number of times a word appears in a document and adjusting the frequency for the document's length or number of words. IDF (inverse document frequency) of a word or phrase the term denotes how uncommon or uncommon a word is throughout the entire dictionary. A corpus is a group of documents. This can be computed by dividing the number of papers by the total number of documents. The word occurred in a significant number of documents. If a word or term appears in a large number of places in the
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 739 manuscript, it's a good sign. If it's highly common, it's scaled to '0,' else it's scaled to '1.' We can get the result by multiplying the two terms together. The TD-IDF score the greater the score, the more relevant it is. After translating the text, we use techniques such as Random Forest Classifier, Linear SVC, Multinomial NB, and Logistic Regression to classify it. Regression The "complaints.csv" data collection will contain the Token No., Date, Year, Student-ID, Email Id, and other attributes Category of Complaints Cat, Issue Resolver, Counselor Name, Issue Date, and Issue, Number of Days to Resolve Status: Completed, Status: Completed, Status: Completed, Status: Completed, Status: Completed, Using the "complaints.csv" file dataset, we'll create a new Data Frame with the following elements: (Categories includes health issues, the examination part, and so on detention, etc.) and the Grievance Category, which includes a comprehensive grievance Now we'll get rid of the duplicates in the database. Fig -01: The accuracy and deviation shown as output Assign a unique Id to the newly formed Data Frame, let's call it "df1." making a temporary for each category in other works a dictionary for future use. We can now see which section or department is receiving the most complaints from students. Now we'll put the theory into practice. TFID-Vectorizer, which converts each complaint into a vector. We'll store the vectors in an array, and we'll use them later. can find out how many Unigrams and Bigrams there are. Following that, we will create a map. the Unigrams and Bigrams with the most connected Remove the stop words from each complaint. The division of Data for Training and Testing will be collected in the same way as 'X' is collected. Having all of the Grievance Categories, as well as 'y', which is made up of We need to forecast the labels of the target labels. Everything is completed at this point. will be sorted out by data training and assessment Now we use a variety of machine learning classification methods to forecast the outcome of the complaints. The other is now. Maintaining the database for sending the messages is an element of the project. Regarding the complaints, bidirectional notification is required. As a result, When the categorization procedure is finished, the anticipated results are displayed. We'll take the output and make a prediction based on it. cause a notification to be sent to that department's employee who will be responsible for resolving the issue Finally, once the complaint has been resolved, resolved, and the issue has been posted on the website The issue raiser will be notified, and work will begin. will be performed quickly and without wasting time, and when compared to other complaint classifiers, it will be the best. interaction between two people on a one-to-one basis. Fig -02: The output shows the sorting of data as of category_id 5. Result As the queries received from the students, they were analyzed and classified to the departments mentioned according to the query which were converted to vectors to identify the category then were classified and shown as in the figure below the departments were shown.
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 740 Fig -03: The Queries Classified into Respective Departments 6. Conclusion The student query classification system using Linear SVC with the combination of TF-IDF (Term Frequency-Inverse Document Frequency) as results in giving the classification of data in the database according to the category which were divided by the use of vector notation assigned for the data that makes sorting the data easier. The interface jupyter notebook is used to read and take the data and giving the output in the forms of tables and graphs for the respective queries. Using machine learning we make the query collection and classification simple and this is widely used technology now-a-days. This model results in accuracy of 89% and efficient in working the data in the bulk form. This helps in reducing the time factor and for the benefit of students and organizations both. Fig -1: Accuracy Graph References 1. N. S. Altman. 1992. An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3):175–185. 2. Koray Balcı -Department of Computer Engineering, Boğaziçi University, Istanbul, Turkey Albert Ali Salah - Department of Computer Engineering, Boğaziçi University, Istanbul, Turkey Automatic Classification of Player Complaints in Social Games. 3. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings. 4. M.A. Fauzi, Automatic complaint classification system using classifier ensembles, January2018. 5. Ganesan, Kavita, and Guangyu Zhou. (2016), “Linguistic Understanding of Complaints and Praises in User Reviews.”, Proceedings of NAACLHLT. 6. Imam Cholissodin, Maya Kurniawati, Indriati, Issa Arwani Informatics Department, PTIIK, Brawijiaya University, Malang, Indonesia.Classification of Campus E- Complaint Documents using Directed Acyclic Graph Multi- Class SVM Based on Analytic Hierarchy Process 2014. 7. Moschitti, A., & Basili, R. (2004), “Complex Linguistic Features for Text Classification: A Comprehensive Study.”, Advances in Information Retrieval, 181–19. 8. Badjatiya, P., Gupta, S., Gupta, M., & Varma, V. (2017). “Deep Learning for Hate Speech Detection in Tweets”, Proceedings of the 26th International Conference on World Wide Web Companion - WWW ’17. 9. Ryan M. Eshleman and Hui Yang.2014” Hey #311, Come Clean My Street! ”: A Spatio-temporal Sentiment Analysis of Twitter Data and 311 Civil Complaints. In 2014 IEEE Fourth International Conference on Big Data and Cloud Computing, pages 477– 484. 10. Ahmad Fauzan and Masayu Leylia Khodra. 2014. Automatic Multilabel Categorization using Learning to Rank Framework for Complaint Text on Bandung Government. In 2014 Int. Conf. of Advanced Informatics: Concept, Theory and Application (ICAICTA), pages 28–33. Institut Teknologi Bandung, IEEE. 11. Ana Catarina Forte and Pavel B. Brazdil. 2016. Determining the Level of Clients’ Dissatisfaction from Their Commentaries. In Computational Processing of the Portuguese Language - 12th Int. Conf., PROPOR 2016, volume 9727 of Lecture Notes in Computer Science, pages
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 741 74–85. Springer. (Basic Book/Monograph Online Sources) J. K. Author. (Year, month, day). Title (edition) [Type of medium]. Volume(issue). Akhter, M.P., Jiangbin, Z., Naqvi, I.R., Abdelmajeed, M., Mehmood, A., Sadiq, M.T.: Document-level text classification using single-layer multisize filters convolutional neural network. IEEE Access 8, 42689– 42707 (2020) 12. Mrs Sujata Khedkar a, Dr. Subhash Shinde:Deep Learning and Ensemble Approach for Praise or Complaint Classification,sh Shinde, Professor, Computer Engineering Department, LTCE,Koparkhairane, Navi Mumbai, 400050, India,Dr. Subhash Shinde, Professor, Computer Engineering Department, LTCE,Koparkhairane, Navi Mumbai, 400709, India. 13. Joao Filgueiras ˜ *,Lu´ıs Barbosa*, Gil Rocha*, Henrique Lopes Cardoso*, Lu´ıs Paulo Reis*, Joao Pedro Machado ˜ +, Ana Maria Oliveira,Complaint Analysis and Classification for Economic and Food Safety, *Laboratorio de Intelig ´ encia Artificial e Ci ˆ enciade Computadores (LIACC) Faculdade deEngenhariadaUniversidade do Porto Rua Dr. Roberto Frias, s/n, 4200-465 Porto, Portugal.