Visualizing UML’s Sequence and
Class Diagrams Using Graph-Based
Clusters
Paper ID 65
Nakul Sharma, Dr. Prasanth Yalla
Department of Computer Science and Engineering
Koneru Lakshmiah Education Foundation
Vaddeswaram,Guntur-522502, India
Agenda
• Abstract
• Introduction
• Literature Review
• Proposed Methodology
• Results & Discussion
• Conclusion & Future Scope
Abstract
The paper discusses the creation of UML diagram based
recommendation system using java and class files as the
input. The existing systems do not make use of
techniques available in text-mining for creating UML
diagrams. The overall methodology makes use of
keyphrase extraction, contextual similarity calculation,
and graph-based clusters in creating UML diagrams. The
existing systems survey of state-of-art UML diagram
generation techniques and keyphrase extraction survey is
also provided. A comparative analysis of the existing
tools for generating UML-diagrams is also provided. The
recommendation system generated is useful to
maintenance engineers and software developers.
Introduction
• In the current work, a literature review of UML diagram
construction from text or source code is done. A
comparative analysis of different methods used in UML
diagram construction is also proposed.
• In this paper, the authors propose a contextual
similarity approach combined with cluster and graph
creation. A multi-step approach involves keyphrase
extraction, graph construction, clustering of related
documents together and finally creation of UML class
& sequence diagrams.
Literature Review
• There are several methodologies being used
in developing UML diagrams. The most
common diagrams which are developed are
use-case and class diagrams.
• However little work has been done wrt using
text, source code, API documentation for
generating UML diagrams. In addition text
mining techniques are not used extensively in
generating UML diagrams.
Existing System Developed (UML Diagram Generation)
Sr. No. Name of Tool Generated Title of Publication Name of Author Publication
Venue
1 Extended ForUML (2019) Extended ForUML for Automatic
Generation of UML Sequence
Diagrams from Object-Oriented Fortran
Aziz Nanthaamornphong,
Anawat Leatongkam
Scientific
Programming, Hindwai
Publications, 2019
2 Automatic Builder of Class Diagram
(ABCD) (2016)
Automatic Builder of Class Diagram
(ABCD): an application of
UML generation from functional
requirements
Wahiba Ben Abdessalem,
Karaa Zeineb Ben Azzouz,
Aarti Singh, Nilanjan Dey,
Amira S. Ashour, Henda Ben
Ghazala
Software Practice and
Experience: Wiley
Publication, 2016
3 RECAA (2015) From requirements to UML models and
back: How automatic processing of
text can support requirements
engineering
Mathias Landhaußer , Sven J.
Korner, Walter F. Tichy
Software Qual
J,Springer, 2014
4 ForUML (2015) Extracting UML Class Diagrams from
Object-Oriented
Fortran: ForUML
Aziz
Nanthaamornphong,Jeffrey
Carver, KarlaMorris,
Salvatore Filippone
Scientific
Programming,
Hindwai Publications,
2015
5 Class-Gen (2010) Parsed use case descriptions as a basis
for object-oriented class model
generation
Mosa Elbendak, Paul
Vickers∗, Nick Rossiter
The Journal of Systems
and Software,
Springer, 2011
6 UMGAR (2008) An Automated Tool for Generating
UML Models from Natural Language
Requirements.
Deeptimahanti, D. K. and
Babar, M. A
IEEE Conference,
2008
7 ER convertor (2008) Heuristics-based entity relationship
modeling through natural language
processing.
Nazlia Omar , Paul Hanna,
and Paul Mc Kevitt
15th Artificial
Intelligence and
Cognitive Science
Conference, Ireland
Input files
Text Pre-
processing
Calculation
of Similarity
Measures
Source
Code
Clustering
Constructing
the UML Class
and Sequence
Diagram using
Clusters
SDG
Representation
Key-phrase
extraction
Overall Architecture Of Proposed
Methodology
Module-1:A Conceptual Dependency Graph Based Keyword Extraction Model for
Source Code to API Documentation Mapping
Algorithm1: Data Filtering
Input : Source code files SC, Class files CF.
Step 1: Read input source codes files SC.
Step 2: Read input class files CF.
Step 3:for each source code SCi in SC[]
Do
Parse source code SCi with methods M and Fields F.
Mi=ExtractMethods(SCi)
Fi=ExtractFields(SCi)
Mapping (Mi , Fi) to SCi
SC1 (M1,F1)
SC2 (M2,F2)
… …..
SCn (Mn,Fn)
done
Step 4: for each class file CFi in CF[]
Do
Parse class files CFi with methods M and Fields F.
Mi=ExtractMethods(CFi)
Fi=ExtractFields(CFi)
Mapping (Mi , Fi) to CFi
CF1 (M1,F1)
CF2 (M2,F2)
… …..
CFn (Mn,Fn)
done
Step 5: // Remove the duplicate methods and fields in each source code and class files
For each code Ci in i j
SC CF

Do
i i j
i i j
M Prob(M M / C);i j
F Prob(F F / C);i j
  
  
If( Mi!=0 AND Fi!=0)
Then
Remove Mi in Ci or Cj
Remove Fi in Ci or Cj
End if
Done
Step 6: //Pre-processing source code comments using Stanford NLP parser.
For each document di in D
Do
T[]=Tokenize(di)
For each token t in T[]
Do
Apply stemming, stopword removal using Stanford NLP library.
Done
Done
Module-2 & 3 Source Code Dependency Graph Based Contextual Probabilistic
Clustering Approach for class dependency Diagrams
Probabilistic Weighted based contextual similarity measure for Source
code and class files dependency graph
Input : Project source codes SC, Project class files CF, Project source metrics
(SMi,SFi) and Project class metrics (CMi,CFi).
Procedure:
Step 1: Read source code metrics , sci(SMi,SFi) and Project class metrics
cfi(CMi,CFi)
Step 2: Constructing a source code dependency graph SDG(V,E) with vertex set V
and Edge set E using source code metrics. Here vertex set V is represented with source
code methods and fields and edge set E is represented as weighted rank between the
vertices.
Step 3: The probabilistic weights of the edges are computed using the vertex terms ti
and tj where i i
t V
 and j j
t V
 .
i, j
i,j i j i j
Prob(t t )
Edgeweight : w(i, j)
2.max{Prob(t ),Prob(t )} Prob(t ,t )




i j
Prob(t , t ) is the number of times both terms i j
(t , t ) occurred
together.
i
Prob(t ) is the number of occurrence of i
t in vertex Vi
j
Prob(t ) is the number of occurrence of j
t in vertex Vj
Step 4: The vertices with positive edge weights are sorted in ascending order in the
dependency graph to find the contextual similarity between the source code metrics.
Step 5: Source code dependency graph SDG is used to find the contextual similarity
between the vertex nodes to the neighbor metrics using the following proposed
measure.
Let U(SMi)  (m1,m2,….mn) denotes the source codes metrics vector at vertex
i.
V(SMj)  (m1,m2,….mr) denotes the source code metrics vector at vertex j.
 
2 2 2
i 1 2 p
2 2 2
1 2 q
i j 1 1 2 2 p q
j
| U(SM ) | U(m ) U(m ) ....U(m )
| | V(m ) V(m ) ....V(m )
| U(SM ).V(SM ) | U(m ).V(m ) U(m ).V(m )... U(m ).V(m )
Pr oposed Contextual source code dependency graph dissimilarity index
is computed as
SM
C
V
 
 
  
1
3
i j i j
i j
U(SM ).V(SM )*tan (| U(SM ) | | V(SM ) |)
SDGDI= ;where i j
2*(| U(SM ) |*| V(SM ) |)
Contextual source code dependency graph similarity index
CSDGSI 1 CSDGDI;

 

 
Contextual source code graph based clustering algorithm
Step 1: Read number of clusters c.
Step 2: Read number of iterations I.
Step 3: Initialize k random clusters as centroids.
Step 4: for each document at vertex V in graph
Do
TF-ID[]= Compute term frequency tf-id
Done
Step 5: Repeat until c clusters
Find nearest cluster distance metrics using the following equation
Let Document vector one V1, document vector two V2
2 2
3
Cosine(V1[i],V2[i])
Dist(V1,V2)
Correlation(V1,V2). V1[i] V2[i]


 
Done
Step 6: Merge the graph nodes using the nearest distance measure.
Step 7: Update cluster centroid using mean distance.
Step 8: Construct the class diagram using the plant UML library to the
filtered top k-clusters C[k].
Step 9: For each source code file SC[i] do
Check the source code file has distance metric >0
If(dist(SC[i],C[k])>0)
Then
Display class diagram in source code file SC[i].
End if
Step 10. done
Step 6: Class file dependency graph CDG is used to find the contextual similarity
between the vertex nodes to the neighbor metrics using the following proposed
measure.
Let U(CMi)  (m1,m2,….mn) denotes the source codes metrics vector at vertex i.
V(CMj)  (m1,m2,….mr) denotes the source code metrics vector at vertex j.
 
2 2 2
i 1 2 p
2 2 2
1 2 q
i j 1 1 2 2 p q
j
| U(CM ) | U(m ) U(m ) ....U(m )
| | V(m ) V(m ) ....V(m )
| U(CM ).V(CM ) | U(m ).V(m ) U(m ).V(m )... U(m ).V(m )
Pr oposed Contextual class code depenedency graph dissimilarity index
is computed as
CM
C
V
 
 
  
3
i j i j
i j
U(CM ).V(CM )*cos(| U(CM ) | | V(CM ) |)
CDGDI= ;where i j
2*(| U(CM ) |*| V(CM ) |)
Contextual class code depenedency graph similarity index
CCDGSI 1 CCDGDI;


 
For each class file in CF[i]
Do
Add to Sequence diagram designer S.
Done
Visualize sequence diagram to all the class files in the given relational packages.
Thursday, March 4, 2021
Key Phrases in SDG :{m_items.iterator()} {m_items.add(i)}  Score :0.9073701027137411
Key Phrases in SDG :{m_items.iterator()} {Collections.sort(m_items)}  Score :0.9073701027137411
Key Phrases in SDG :{m_items.iterator()} {m_items.get(index)}  Score :0.9073701027137411
Key Phrases in SDG :{m_items.iterator()} {m_items.size()}  Score :0.8626786872190586
Key Phrases in SDG :{m_items.iterator()} {m_items.iterator()}  Score :0.826985987428094
Key Phrases in SDG :{m_items.iterator()} {i.hasNext()}  Score :1.0
Key Phrases in SDG :{m_items.iterator()} {i.next()}  Score :1.0
Key Phrases in SDG :{m_items.iterator()} {i.next().toString()}  Score :1.0
Key Phrases in SDG :{m_items.iterator()} {buff.append(i.next().toString() + “ “)}  Score :1.0
Key Phrases in SDG :{i.hasNext()} {Collections.sort(m_items)}  Score :1.0
Key Phrases in SDG :{i.hasNext()} {m_items.add(i)}  Score :0.9073701027137411
Key Phrases in SDG :{i.hasNext()} {Collections.sort(m_items)}  Score :1.0
Key Phrases in SDG :{i.hasNext()} {m_items.get(index)}  Score :1.0
Key Phrases in SDG :{i.hasNext()} {m_items.size()}  Score :1.0
Key Phrases in SDG :{i.hasNext()} {m_items.iterator()}  Score :1.0
Key Phrases in SDG :{i.hasNext()} {i.hasNext()}  Score :0.826985987428094
Key Phrases in SDG :{i.hasNext()} {i.next()}  Score :0.8626786872190586
Result in Form of
Creation of Clusters
Thursday, March 4, 2021
Cluster-1{
MultiNomialBMAEstimator.java
SimpleEstimator.java
}
[D@c88a32 = [2]
Cluster-2{
DiscreteEstimatorBayes.java
}
[D@17c2f4f = [0]
Cluster-3{
BayesNetEstimator.java
}
[D@80cdf3 = [1]
Cluster-4{
BMAEstimator.java
}
[D@f9296d = [3]
Cluster-5{
DiscreteEstimatorFullBayes.java
Results in form of Diagrams
Generated
• Class Diagram Sequence Diagram
Visualizing UML’s Sequence and   Class Diagrams Using Graph-Based Clusters
Analysis of Existing Systems
Name of UML Tool Techniques / Input files
used for Conversion
NLP
SOFTWARES
NLP AND Rules (Heuristics) XMI/XML
Representation
Source Code API Documentation
Automatic Builder of Class
Diagram (2016)
No Yes Yes No Yes
RECAA (2015) Yes Yes No No Yes
CM-Builder (2000) Yes Yes No No No
UMGAR (2008) Yes Yes No No No
SENSE (2007) Yes Yes No No No
ER convertor (2008) No Yes No No No
LIDA (2001) No Yes No No No
ForUML (2015) Yes Yes Yes Yes No
Extended ForUML (2019) Yes Yes Yes Yes No
SDG Graph Based Yes Yes No Yes Yes
Thursday, March 4, 2021
Conclusion
• The paper discusses how UML diagram can be
used as a tool for recommending most
essential classes within a given set of project.
A large-scale open source project cannot be
assessed using the existing similarity
measures. Hence, a new hybrid probabilistic
model is proposed for large open-source
projects
References
• Radoslav Kirkov, Gennady Agre, “Source Code Analysis – An Overview”, Cybernetics And Information Technologies, Volume 10, No 2, Bulgarian
Academy Of Sciences, 2010.
• Mohammed J. Zaki, Wagner Meira Jr., “Data Mining and Analysis: Fundamental Concepts and Algorithms”, Chapter 13, page 370.
• “About the Unified Modeling Language Specification Version 2.5” , https://www.omg.org/spec/UML/2.5/About-UML/
• Nakul Sharma, Prasanth Yalla, “A Hybrid Weighted Probabilistic based source code graph clustering algorithm for class diagram and sequence
diagram visualization”, --Under Review.
• Mariem Abdouli, Wahiba Ben Abdessalem Karaa, Henda Ben Ghezala, "Survey of Works that Transform Requirements into UML Diagrams", SERA
2016, June 8-10, 2016, Baltimore, USA, ISBN: 978-1-5090-0809-4
• B.A.K.Wahiba, B.A. Zeineb. S.Aarti. D.Nilanjan. A.Amira. B.G. Henda. Automatic builder of class diagram (ABCD): an application of
• UML generation from functional requirements. Software: Practice and Experience (2015). Published online in Wiley Online Library.
• Mathias Landha ̈ußer • Sven J. Ko ̈rner • Walter F. Tichy, "From Requirements to UML Models & Back : How automatic processing of text can support
requierments engineering", Software Qual J, DOI 10.1007/s11219-013-9210-6, pp 1-29.
• Harmain Mohamed Harmain and Robert J. Gaizauskas. CM-Builder: An automated NLbased CASE tool. In ASE, pages 45-54, 2000.
• Herchi H, Ben Abdessalem W (2012). From user requirements to UML class diagram. International Conference on Computer Related
• Knowledge. 4 Nov 2012.
• Deeptimahanti, D. K. and Babar, M. A. An Automated Tool for Generating UML Models from Natural Language Requirements. IEEE/ ACM int.Conf. on
ASE, 2009.
• Fabbrini F., M. Fusani, Gnesi S., Lami G., "An automatic quality evaluation for natural language requirements",
• 7th International Workshop on Requirements Engineering: Foundation for Software Quality, pp. 150-164, Interlaken, Switzerland, 4-5 Giugno
2001.
• Omar N, Hanna P, Mc Kevitt P (2004) Heuristics-based entity relationship modeling through natural language processing. Proceedings
• of the 15th Irish Conference on Artificial Intelligence and Cognitive Science (AICS-04) 302-313.
• Zhenchang Xing and Eleni Stroulia. Umldiff: an algorithm for object oriented design differencing. In Proceedings of the 20th IEEE/ACM international
Conference on Automated software engineering, ASE '05, pages 54{65, New York, NY, USA, 2005. ACM. ISBN 1-58113-993-4.
• Overmyer, S., Benoit, L., Rambow, O. Conceptual Modeling through Linguistic Analysis Using LIDA. 23rd International Conference on
• Software Engineering. 2001.
• Aziz Nanthaamornphong,Jeffrey Carver,Karla Morris,Salvatore Filippone, "Extracting UML Class Diagrams from Object-Oriented Fortran: ForUML",
Hindawi Publishing Corporation, Scientific Programming, Volume 2015, 15 pages, http://dx.doi.org/10.1155/2015/421816
• Aziz Nanthaamornphong, Anawat Leatongkam, "Extended ForUML for Automatic Generation of UML Sequence Diagrams from Object-Oriented
Fortran" Hindawi, Scientific Programming, Volume 2019, https://doi.org/10.1155/2019/2542686
Thank You,
Any Questions ?

More Related Content

PDF
Mapping and visualization of source code a survey
PDF
Solutions manual for c++ programming from problem analysis to program design ...
PPT
Chapter 4 5
PPT
Chap 02-1
PPT
Chapter2
PPT
Chap02
PDF
OOM MCQ 2018
PDF
OOM MCQ Dev by Prof PL Pradhan TGPCET, NAGPUR
Mapping and visualization of source code a survey
Solutions manual for c++ programming from problem analysis to program design ...
Chapter 4 5
Chap 02-1
Chapter2
Chap02
OOM MCQ 2018
OOM MCQ Dev by Prof PL Pradhan TGPCET, NAGPUR

What's hot (20)

PDF
Finding the shortest path in a graph and its visualization using C# and WPF
DOC
Programming in c notes
PDF
Java quick reference
PDF
A WHITE BOX TESTING TECHNIQUE IN SOFTWARE TESTING : BASIS PATH TESTING
DOC
C notes by m v b reddy(gitam)imp notes all units notes 5 unit order
PDF
Handout#12
PDF
C programming notes BATRACOMPUTER CENTRE IN Ambala CANTT
PDF
Abc c program
PDF
Leveraging Model-Driven Technologies for JSON Artefacts: The Shipyard Case Study
PDF
High quality implementation for
PPTX
UML Modeling and Profiling Lab - Advanced Software Engineering Course 2014/2015
DOCX
PDF
Handout#02
PDF
ListMyPolygons 0.6
PDF
CS8592 Object Oriented Analysis & Design - UNIT III
PPSX
C basics 4 std11(GujBoard)
PPT
Book ppt
Finding the shortest path in a graph and its visualization using C# and WPF
Programming in c notes
Java quick reference
A WHITE BOX TESTING TECHNIQUE IN SOFTWARE TESTING : BASIS PATH TESTING
C notes by m v b reddy(gitam)imp notes all units notes 5 unit order
Handout#12
C programming notes BATRACOMPUTER CENTRE IN Ambala CANTT
Abc c program
Leveraging Model-Driven Technologies for JSON Artefacts: The Shipyard Case Study
High quality implementation for
UML Modeling and Profiling Lab - Advanced Software Engineering Course 2014/2015
Handout#02
ListMyPolygons 0.6
CS8592 Object Oriented Analysis & Design - UNIT III
C basics 4 std11(GujBoard)
Book ppt
Ad

Similar to Visualizing UML’s Sequence and Class Diagrams Using Graph-Based Clusters (20)

PPTX
A Conceptual Dependency Graph Based Keyword Extraction Model for Source Code...
PDF
Proposal of a similarity measure for unified modeling language class diagram ...
PPT
M03_1_Structur alDiagrams.ppt
PPTX
Unit 1- OOAD ppt
PPT
M03_1_StructuralDiagrams in unified modeling language
PPTX
Fundamentals of Software Engineering
PDF
Mixing Diagram, Tree, Text, Table and Form editors to build a kick-ass modeli...
PPTX
Interaction overview and Profile UML Diagrams
PPTX
Understanding unified modelling language
PPT
PPTX
1. introduction to uml
PPTX
UNIT-3 Design Using UML (1).pptx
PPT
07. Class Diagram.ppt
PPTX
UNIT-2 OOM.pptxUNIT-2 OOM.pptxUNIT-2 OOM.pptx
PPT
M03 1 Structuraldiagrams
PPTX
Unified Modeling Language
PPT
uml.ppt
PPT
Week 10-classdiagrams.pptdddddddddddddddddddddddddddd
A Conceptual Dependency Graph Based Keyword Extraction Model for Source Code...
Proposal of a similarity measure for unified modeling language class diagram ...
M03_1_Structur alDiagrams.ppt
Unit 1- OOAD ppt
M03_1_StructuralDiagrams in unified modeling language
Fundamentals of Software Engineering
Mixing Diagram, Tree, Text, Table and Form editors to build a kick-ass modeli...
Interaction overview and Profile UML Diagrams
Understanding unified modelling language
1. introduction to uml
UNIT-3 Design Using UML (1).pptx
07. Class Diagram.ppt
UNIT-2 OOM.pptxUNIT-2 OOM.pptxUNIT-2 OOM.pptx
M03 1 Structuraldiagrams
Unified Modeling Language
uml.ppt
Week 10-classdiagrams.pptdddddddddddddddddddddddddddd
Ad

More from Nakul Sharma (10)

PDF
Machine Translation- Indian Regional lannguages.pdf
PPTX
A tool for Detecting Source Code Plagarism-SourcePlag
PPTX
Keyphrase Extraction And Source Code Similarity Detection- A Survey
PPTX
Mapping and visualization of source code a survey
PDF
Integrating natural language processing and software engineering
PDF
Possibility of interdisciplinary research software engineering andnatural lan...
ODP
Possibility of interdisciplinary research software engineering and
PDF
Session on machine translation batu 19 march2016
PPT
Integrating natural language processing and software engineering
PPT
Statistical machine translation for indian language copy
Machine Translation- Indian Regional lannguages.pdf
A tool for Detecting Source Code Plagarism-SourcePlag
Keyphrase Extraction And Source Code Similarity Detection- A Survey
Mapping and visualization of source code a survey
Integrating natural language processing and software engineering
Possibility of interdisciplinary research software engineering andnatural lan...
Possibility of interdisciplinary research software engineering and
Session on machine translation batu 19 march2016
Integrating natural language processing and software engineering
Statistical machine translation for indian language copy

Recently uploaded (20)

PDF
distributed database system" (DDBS) is often used to refer to both the distri...
PDF
August 2025 - Top 10 Read Articles in Network Security & Its Applications
PPTX
mechattonicsand iotwith sensor and actuator
PPTX
A Brief Introduction to IoT- Smart Objects: The "Things" in IoT
PPTX
ASME PCC-02 TRAINING -DESKTOP-NLE5HNP.pptx
PPT
Chapter 1 - Introduction to Manufacturing Technology_2.ppt
PDF
Accra-Kumasi Expressway - Prefeasibility Report Volume 1 of 7.11.2018.pdf
PDF
Beginners-Guide-to-Artificial-Intelligence.pdf
PDF
Cryptography and Network Security-Module-I.pdf
PDF
Influence of Green Infrastructure on Residents’ Endorsement of the New Ecolog...
PPTX
"Array and Linked List in Data Structures with Types, Operations, Implementat...
PDF
UEFA_Embodied_Carbon_Emissions_Football_Infrastructure.pdf
PDF
Unit I -OPERATING SYSTEMS_SRM_KATTANKULATHUR.pptx.pdf
PPTX
ai_satellite_crop_management_20250815030350.pptx
PDF
August -2025_Top10 Read_Articles_ijait.pdf
PDF
Exploratory_Data_Analysis_Fundamentals.pdf
PPTX
CN_Unite_1 AI&DS ENGGERING SPPU PUNE UNIVERSITY
PDF
Introduction to Power System StabilityPS
PPTX
AUTOMOTIVE ENGINE MANAGEMENT (MECHATRONICS).pptx
PPTX
wireless networks, mobile computing.pptx
distributed database system" (DDBS) is often used to refer to both the distri...
August 2025 - Top 10 Read Articles in Network Security & Its Applications
mechattonicsand iotwith sensor and actuator
A Brief Introduction to IoT- Smart Objects: The "Things" in IoT
ASME PCC-02 TRAINING -DESKTOP-NLE5HNP.pptx
Chapter 1 - Introduction to Manufacturing Technology_2.ppt
Accra-Kumasi Expressway - Prefeasibility Report Volume 1 of 7.11.2018.pdf
Beginners-Guide-to-Artificial-Intelligence.pdf
Cryptography and Network Security-Module-I.pdf
Influence of Green Infrastructure on Residents’ Endorsement of the New Ecolog...
"Array and Linked List in Data Structures with Types, Operations, Implementat...
UEFA_Embodied_Carbon_Emissions_Football_Infrastructure.pdf
Unit I -OPERATING SYSTEMS_SRM_KATTANKULATHUR.pptx.pdf
ai_satellite_crop_management_20250815030350.pptx
August -2025_Top10 Read_Articles_ijait.pdf
Exploratory_Data_Analysis_Fundamentals.pdf
CN_Unite_1 AI&DS ENGGERING SPPU PUNE UNIVERSITY
Introduction to Power System StabilityPS
AUTOMOTIVE ENGINE MANAGEMENT (MECHATRONICS).pptx
wireless networks, mobile computing.pptx

Visualizing UML’s Sequence and Class Diagrams Using Graph-Based Clusters

  • 1. Visualizing UML’s Sequence and Class Diagrams Using Graph-Based Clusters Paper ID 65 Nakul Sharma, Dr. Prasanth Yalla Department of Computer Science and Engineering Koneru Lakshmiah Education Foundation Vaddeswaram,Guntur-522502, India
  • 2. Agenda • Abstract • Introduction • Literature Review • Proposed Methodology • Results & Discussion • Conclusion & Future Scope
  • 3. Abstract The paper discusses the creation of UML diagram based recommendation system using java and class files as the input. The existing systems do not make use of techniques available in text-mining for creating UML diagrams. The overall methodology makes use of keyphrase extraction, contextual similarity calculation, and graph-based clusters in creating UML diagrams. The existing systems survey of state-of-art UML diagram generation techniques and keyphrase extraction survey is also provided. A comparative analysis of the existing tools for generating UML-diagrams is also provided. The recommendation system generated is useful to maintenance engineers and software developers.
  • 4. Introduction • In the current work, a literature review of UML diagram construction from text or source code is done. A comparative analysis of different methods used in UML diagram construction is also proposed. • In this paper, the authors propose a contextual similarity approach combined with cluster and graph creation. A multi-step approach involves keyphrase extraction, graph construction, clustering of related documents together and finally creation of UML class & sequence diagrams.
  • 5. Literature Review • There are several methodologies being used in developing UML diagrams. The most common diagrams which are developed are use-case and class diagrams. • However little work has been done wrt using text, source code, API documentation for generating UML diagrams. In addition text mining techniques are not used extensively in generating UML diagrams.
  • 6. Existing System Developed (UML Diagram Generation) Sr. No. Name of Tool Generated Title of Publication Name of Author Publication Venue 1 Extended ForUML (2019) Extended ForUML for Automatic Generation of UML Sequence Diagrams from Object-Oriented Fortran Aziz Nanthaamornphong, Anawat Leatongkam Scientific Programming, Hindwai Publications, 2019 2 Automatic Builder of Class Diagram (ABCD) (2016) Automatic Builder of Class Diagram (ABCD): an application of UML generation from functional requirements Wahiba Ben Abdessalem, Karaa Zeineb Ben Azzouz, Aarti Singh, Nilanjan Dey, Amira S. Ashour, Henda Ben Ghazala Software Practice and Experience: Wiley Publication, 2016 3 RECAA (2015) From requirements to UML models and back: How automatic processing of text can support requirements engineering Mathias Landhaußer , Sven J. Korner, Walter F. Tichy Software Qual J,Springer, 2014 4 ForUML (2015) Extracting UML Class Diagrams from Object-Oriented Fortran: ForUML Aziz Nanthaamornphong,Jeffrey Carver, KarlaMorris, Salvatore Filippone Scientific Programming, Hindwai Publications, 2015 5 Class-Gen (2010) Parsed use case descriptions as a basis for object-oriented class model generation Mosa Elbendak, Paul Vickers∗, Nick Rossiter The Journal of Systems and Software, Springer, 2011 6 UMGAR (2008) An Automated Tool for Generating UML Models from Natural Language Requirements. Deeptimahanti, D. K. and Babar, M. A IEEE Conference, 2008 7 ER convertor (2008) Heuristics-based entity relationship modeling through natural language processing. Nazlia Omar , Paul Hanna, and Paul Mc Kevitt 15th Artificial Intelligence and Cognitive Science Conference, Ireland
  • 7. Input files Text Pre- processing Calculation of Similarity Measures Source Code Clustering Constructing the UML Class and Sequence Diagram using Clusters SDG Representation Key-phrase extraction Overall Architecture Of Proposed Methodology
  • 8. Module-1:A Conceptual Dependency Graph Based Keyword Extraction Model for Source Code to API Documentation Mapping
  • 9. Algorithm1: Data Filtering Input : Source code files SC, Class files CF. Step 1: Read input source codes files SC. Step 2: Read input class files CF. Step 3:for each source code SCi in SC[] Do Parse source code SCi with methods M and Fields F. Mi=ExtractMethods(SCi) Fi=ExtractFields(SCi) Mapping (Mi , Fi) to SCi SC1 (M1,F1) SC2 (M2,F2) … ….. SCn (Mn,Fn) done Step 4: for each class file CFi in CF[] Do Parse class files CFi with methods M and Fields F. Mi=ExtractMethods(CFi) Fi=ExtractFields(CFi) Mapping (Mi , Fi) to CFi CF1 (M1,F1) CF2 (M2,F2) … ….. CFn (Mn,Fn) done
  • 10. Step 5: // Remove the duplicate methods and fields in each source code and class files For each code Ci in i j SC CF  Do i i j i i j M Prob(M M / C);i j F Prob(F F / C);i j       If( Mi!=0 AND Fi!=0) Then Remove Mi in Ci or Cj Remove Fi in Ci or Cj End if Done Step 6: //Pre-processing source code comments using Stanford NLP parser. For each document di in D Do T[]=Tokenize(di) For each token t in T[] Do Apply stemming, stopword removal using Stanford NLP library. Done Done
  • 11. Module-2 & 3 Source Code Dependency Graph Based Contextual Probabilistic Clustering Approach for class dependency Diagrams
  • 12. Probabilistic Weighted based contextual similarity measure for Source code and class files dependency graph Input : Project source codes SC, Project class files CF, Project source metrics (SMi,SFi) and Project class metrics (CMi,CFi). Procedure: Step 1: Read source code metrics , sci(SMi,SFi) and Project class metrics cfi(CMi,CFi) Step 2: Constructing a source code dependency graph SDG(V,E) with vertex set V and Edge set E using source code metrics. Here vertex set V is represented with source code methods and fields and edge set E is represented as weighted rank between the vertices. Step 3: The probabilistic weights of the edges are computed using the vertex terms ti and tj where i i t V  and j j t V  .
  • 13. i, j i,j i j i j Prob(t t ) Edgeweight : w(i, j) 2.max{Prob(t ),Prob(t )} Prob(t ,t )     i j Prob(t , t ) is the number of times both terms i j (t , t ) occurred together. i Prob(t ) is the number of occurrence of i t in vertex Vi j Prob(t ) is the number of occurrence of j t in vertex Vj Step 4: The vertices with positive edge weights are sorted in ascending order in the dependency graph to find the contextual similarity between the source code metrics. Step 5: Source code dependency graph SDG is used to find the contextual similarity between the vertex nodes to the neighbor metrics using the following proposed measure. Let U(SMi)  (m1,m2,….mn) denotes the source codes metrics vector at vertex i. V(SMj)  (m1,m2,….mr) denotes the source code metrics vector at vertex j.
  • 14.   2 2 2 i 1 2 p 2 2 2 1 2 q i j 1 1 2 2 p q j | U(SM ) | U(m ) U(m ) ....U(m ) | | V(m ) V(m ) ....V(m ) | U(SM ).V(SM ) | U(m ).V(m ) U(m ).V(m )... U(m ).V(m ) Pr oposed Contextual source code dependency graph dissimilarity index is computed as SM C V        1 3 i j i j i j U(SM ).V(SM )*tan (| U(SM ) | | V(SM ) |) SDGDI= ;where i j 2*(| U(SM ) |*| V(SM ) |) Contextual source code dependency graph similarity index CSDGSI 1 CSDGDI;      
  • 15. Contextual source code graph based clustering algorithm Step 1: Read number of clusters c. Step 2: Read number of iterations I. Step 3: Initialize k random clusters as centroids. Step 4: for each document at vertex V in graph Do TF-ID[]= Compute term frequency tf-id Done Step 5: Repeat until c clusters Find nearest cluster distance metrics using the following equation Let Document vector one V1, document vector two V2 2 2 3 Cosine(V1[i],V2[i]) Dist(V1,V2) Correlation(V1,V2). V1[i] V2[i]     Done Step 6: Merge the graph nodes using the nearest distance measure. Step 7: Update cluster centroid using mean distance. Step 8: Construct the class diagram using the plant UML library to the filtered top k-clusters C[k]. Step 9: For each source code file SC[i] do Check the source code file has distance metric >0 If(dist(SC[i],C[k])>0) Then Display class diagram in source code file SC[i]. End if Step 10. done
  • 16. Step 6: Class file dependency graph CDG is used to find the contextual similarity between the vertex nodes to the neighbor metrics using the following proposed measure. Let U(CMi)  (m1,m2,….mn) denotes the source codes metrics vector at vertex i. V(CMj)  (m1,m2,….mr) denotes the source code metrics vector at vertex j.   2 2 2 i 1 2 p 2 2 2 1 2 q i j 1 1 2 2 p q j | U(CM ) | U(m ) U(m ) ....U(m ) | | V(m ) V(m ) ....V(m ) | U(CM ).V(CM ) | U(m ).V(m ) U(m ).V(m )... U(m ).V(m ) Pr oposed Contextual class code depenedency graph dissimilarity index is computed as CM C V        3 i j i j i j U(CM ).V(CM )*cos(| U(CM ) | | V(CM ) |) CDGDI= ;where i j 2*(| U(CM ) |*| V(CM ) |) Contextual class code depenedency graph similarity index CCDGSI 1 CCDGDI;     For each class file in CF[i] Do Add to Sequence diagram designer S. Done Visualize sequence diagram to all the class files in the given relational packages.
  • 17. Thursday, March 4, 2021 Key Phrases in SDG :{m_items.iterator()} {m_items.add(i)}  Score :0.9073701027137411 Key Phrases in SDG :{m_items.iterator()} {Collections.sort(m_items)}  Score :0.9073701027137411 Key Phrases in SDG :{m_items.iterator()} {m_items.get(index)}  Score :0.9073701027137411 Key Phrases in SDG :{m_items.iterator()} {m_items.size()}  Score :0.8626786872190586 Key Phrases in SDG :{m_items.iterator()} {m_items.iterator()}  Score :0.826985987428094 Key Phrases in SDG :{m_items.iterator()} {i.hasNext()}  Score :1.0 Key Phrases in SDG :{m_items.iterator()} {i.next()}  Score :1.0 Key Phrases in SDG :{m_items.iterator()} {i.next().toString()}  Score :1.0 Key Phrases in SDG :{m_items.iterator()} {buff.append(i.next().toString() + “ “)}  Score :1.0 Key Phrases in SDG :{i.hasNext()} {Collections.sort(m_items)}  Score :1.0 Key Phrases in SDG :{i.hasNext()} {m_items.add(i)}  Score :0.9073701027137411 Key Phrases in SDG :{i.hasNext()} {Collections.sort(m_items)}  Score :1.0 Key Phrases in SDG :{i.hasNext()} {m_items.get(index)}  Score :1.0 Key Phrases in SDG :{i.hasNext()} {m_items.size()}  Score :1.0 Key Phrases in SDG :{i.hasNext()} {m_items.iterator()}  Score :1.0 Key Phrases in SDG :{i.hasNext()} {i.hasNext()}  Score :0.826985987428094 Key Phrases in SDG :{i.hasNext()} {i.next()}  Score :0.8626786872190586
  • 18. Result in Form of Creation of Clusters Thursday, March 4, 2021 Cluster-1{ MultiNomialBMAEstimator.java SimpleEstimator.java } [D@c88a32 = [2] Cluster-2{ DiscreteEstimatorBayes.java } [D@17c2f4f = [0] Cluster-3{ BayesNetEstimator.java } [D@80cdf3 = [1] Cluster-4{ BMAEstimator.java } [D@f9296d = [3] Cluster-5{ DiscreteEstimatorFullBayes.java
  • 19. Results in form of Diagrams Generated • Class Diagram Sequence Diagram
  • 21. Analysis of Existing Systems Name of UML Tool Techniques / Input files used for Conversion NLP SOFTWARES NLP AND Rules (Heuristics) XMI/XML Representation Source Code API Documentation Automatic Builder of Class Diagram (2016) No Yes Yes No Yes RECAA (2015) Yes Yes No No Yes CM-Builder (2000) Yes Yes No No No UMGAR (2008) Yes Yes No No No SENSE (2007) Yes Yes No No No ER convertor (2008) No Yes No No No LIDA (2001) No Yes No No No ForUML (2015) Yes Yes Yes Yes No Extended ForUML (2019) Yes Yes Yes Yes No SDG Graph Based Yes Yes No Yes Yes Thursday, March 4, 2021
  • 22. Conclusion • The paper discusses how UML diagram can be used as a tool for recommending most essential classes within a given set of project. A large-scale open source project cannot be assessed using the existing similarity measures. Hence, a new hybrid probabilistic model is proposed for large open-source projects
  • 23. References • Radoslav Kirkov, Gennady Agre, “Source Code Analysis – An Overview”, Cybernetics And Information Technologies, Volume 10, No 2, Bulgarian Academy Of Sciences, 2010. • Mohammed J. Zaki, Wagner Meira Jr., “Data Mining and Analysis: Fundamental Concepts and Algorithms”, Chapter 13, page 370. • “About the Unified Modeling Language Specification Version 2.5” , https://www.omg.org/spec/UML/2.5/About-UML/ • Nakul Sharma, Prasanth Yalla, “A Hybrid Weighted Probabilistic based source code graph clustering algorithm for class diagram and sequence diagram visualization”, --Under Review. • Mariem Abdouli, Wahiba Ben Abdessalem Karaa, Henda Ben Ghezala, "Survey of Works that Transform Requirements into UML Diagrams", SERA 2016, June 8-10, 2016, Baltimore, USA, ISBN: 978-1-5090-0809-4 • B.A.K.Wahiba, B.A. Zeineb. S.Aarti. D.Nilanjan. A.Amira. B.G. Henda. Automatic builder of class diagram (ABCD): an application of • UML generation from functional requirements. Software: Practice and Experience (2015). Published online in Wiley Online Library. • Mathias Landha ̈ußer • Sven J. Ko ̈rner • Walter F. Tichy, "From Requirements to UML Models & Back : How automatic processing of text can support requierments engineering", Software Qual J, DOI 10.1007/s11219-013-9210-6, pp 1-29. • Harmain Mohamed Harmain and Robert J. Gaizauskas. CM-Builder: An automated NLbased CASE tool. In ASE, pages 45-54, 2000. • Herchi H, Ben Abdessalem W (2012). From user requirements to UML class diagram. International Conference on Computer Related • Knowledge. 4 Nov 2012. • Deeptimahanti, D. K. and Babar, M. A. An Automated Tool for Generating UML Models from Natural Language Requirements. IEEE/ ACM int.Conf. on ASE, 2009. • Fabbrini F., M. Fusani, Gnesi S., Lami G., "An automatic quality evaluation for natural language requirements", • 7th International Workshop on Requirements Engineering: Foundation for Software Quality, pp. 150-164, Interlaken, Switzerland, 4-5 Giugno 2001. • Omar N, Hanna P, Mc Kevitt P (2004) Heuristics-based entity relationship modeling through natural language processing. Proceedings • of the 15th Irish Conference on Artificial Intelligence and Cognitive Science (AICS-04) 302-313. • Zhenchang Xing and Eleni Stroulia. Umldiff: an algorithm for object oriented design differencing. In Proceedings of the 20th IEEE/ACM international Conference on Automated software engineering, ASE '05, pages 54{65, New York, NY, USA, 2005. ACM. ISBN 1-58113-993-4. • Overmyer, S., Benoit, L., Rambow, O. Conceptual Modeling through Linguistic Analysis Using LIDA. 23rd International Conference on • Software Engineering. 2001. • Aziz Nanthaamornphong,Jeffrey Carver,Karla Morris,Salvatore Filippone, "Extracting UML Class Diagrams from Object-Oriented Fortran: ForUML", Hindawi Publishing Corporation, Scientific Programming, Volume 2015, 15 pages, http://dx.doi.org/10.1155/2015/421816 • Aziz Nanthaamornphong, Anawat Leatongkam, "Extended ForUML for Automatic Generation of UML Sequence Diagrams from Object-Oriented Fortran" Hindawi, Scientific Programming, Volume 2019, https://doi.org/10.1155/2019/2542686

Editor's Notes