SlideShare a Scribd company logo
© 2022 Thoughtworks | Confidential
CaD: (Code As Data)
How data insights
on legacy codebases
can fill the knowledge gap
in complex modernization projects
© 2022 Thoughtworks | Confidential
Why legacy modernization
is always so challenging?
2
We usually rely on people who built
the software (i.e. DEVs) or the ones
dealing with it (e.g. SMEs, Users)
to collect knowledge about it
(e.g. what it does?, why?).
But people come and go
and knowledge is scattered or lost…
What if source code could tell
some interesting facts
about itself too?
1998 2016
© 2022 Thoughtworks | Confidential 3
What People
Think Code Is
CaG
C(ode) a(s) G(ibberish)
What Developers
Think Code Is
CaL
C(ode) a(s) L(iterature)
What Computers
Think Code Is
CaD
C(ode) a(s) D(data)
© 2022 Thoughtworks | Confidential 4
#include <stdio.h>
main( ) {
printf("hello, world");
}
Hello, World
Kernighan, Brian W.; Ritchie, Dennis M. (1978). The C Programming Language (1st ed.). Englewood Cliffs, NJ: Prentice Hall. ISBN 0-13-110163-3.
(developer view)
© 2022 Thoughtworks | Confidential
$ clang -Xclang -ast-dump -fsyntax-only hello-world.c
1 5 10 15 20 25 30
1 # i n c l u d e < s t d i o . h >
2
3 m a i n ( ) {
4 p r i n t f ( “ h e l l o , w o r l d “ ) ;
5 }
5
where token
who
what
semantics
Hello, World
(computer view)
AST
syntax
Abstract Syntax Tree
↳
↳
↳
↳
↳
↳
↳
© 2022 Thoughtworks | Confidential 6
TYPE ID PARENT_ID FILE LINE COLUMN TEXT
FunctionDecl 0x7fedd489f830 hello-world.c 3 main
CompoundStmt 0x7fedd489fa10 0x7fedd489f830 hello-world.c 3 12
CallExpr 0x7fedd489f9b8 0x7fedd489fa10 hello-world.c 4 3
ImplicitCastExpr 0x7fedd489f9a0 0x7fedd489f9b8 hello-world.c 4 3
DeclRefExpr 0x7fedd489f8d0 0x7fedd489f9a0 hello-world.c 4 3 printf
ImplicitCastExpr 0x7fedd489f9f8 0x7fedd489f9b8 hello-world.c 4 10
ImplicitCastExpr 0x7fedd489f9e0 0x7fedd489f9f8 hello-world.c 4 10
StringLiteral 0x7fedd489f928 0x7fedd489f9e0 hello-world.c 4 10 Hello, World!n
$ clang -Xclang -ast-dump -fsyntax-only hello-world.c
1 5 10 15 20 25 30
1 # i n c l u d e < s t d i o . h >
2
3 m a i n ( ) {
4 p r i n t f ( “ h e l l o , w o r l d “ ) ;
5 }
where token
who
what
semantics
Hello, World
(data view)
AST
syntax
Abstract Syntax Tree
↳
↳
↳
↳
↳
↳
↳
© 2022 Thoughtworks | Confidential 7
Java
class HelloWorld {
public static void main(String[]
args) {
System.out.println("Hello, world!");
}
}
C#
using System;
class Program
{
public static void Main(string[]
args)
{
Console.WriteLine("Hello,
world!");
}
}
Python
print("Hello, world!")
Ruby
puts "Hello,
world!"
Scala
object HelloWorld extends
App {
println("Hello, world!")
}
ASP.NET
Response.Write("Hello World!");
Lisp
(princ "Hello, world!")
Haskell
main = putStrLn "Hello, world!"
Malbolge
('&%:9]!~}|z2Vxwv-,POqponl$Hjig%eB@@>}=<M:9wv6WsU
2T|nm-,jcL(I&%$#"
`CB]V?Tx<uVtT`Rpo3NlF.Jh++FdbCBA@?]!~|4XzyTT43Qsq
q(Lnmkj"Fhg${z@>
Hello, (all) World
there is a parser for every programming language…
abb, abnf, acme, agc, alef, fix, algol60, alloy, alpaca, angelscript, space, antlr, apex, apt, argus, arithmetic, asl, asm, asn, aspectj, atl, b, basic, bcl, bcl, bcpl,
bdf, bibcode, bnf, brainflak, brainfuck, c, calculator, callable, capnproto, cayenne, symbol conflicts, clf, clif, clojure, clu, cmake, cobol85, cookie, cool, cpp,
cql, cql3, creole, csharp, css3, csv, ctl, cto, dart2, databank, dcm, dgol, dice, dif, doiurl, dot, edif300, edn, erlang, fasta, fdo91, fen, flatbuffers, flowmatic,
fixes, focal, fol, fortran77, fusion-tables, gdscript, gedcom, gff3, gml, golang, graphql, graphstream-dgs, gtin, guido, guitartab, haskell, html, http, hypertalk,
icalendar, icon, idl, inf, informix, infosapient, iri, iso8601, istc, itn, first commit, janus, java, javadoc, javascript, joss, jpa, json, json5, karel, karel, kirikiri-tjs,
kotlin, kquery, kuka, lambda, lark, lcc, less, limbo, lisa, logo, lolcode, loop, lpc, lrc, ltl, lua, lucene, matlab, mckeeman-form, mdx, memcached_protocol,
metamath, metric, microc, modelica, modula2pim4, molecule, moo, morsecode, mps, muddb, mumath, mumps, muparser, nanofuck, newick, oberon, objc,
oncrpc, orwell, p, parkingsign, pascal, pcre, pddl, pdn, peoplecode, pgn, php, pii, pike, pl0, plucid, fix, ply, pmmn, postalcode, powerbuilder, powerbuilderdw,
powerquery, prolog, promql, propcalc, properties, protobuf2, protobuf3, prov-n, python, qif, quakemap, r, racket-bsl, racket-isl, rcs, redcode, refal,
comments, rego, restructuredtext, rexx, rfc1035, rfc1960, rfc3080, rfc822, robotwars, romannumerals, rpn, ruby, rust, last month, scala, scotty, scss, last
month, sexpression, sgf, sharc, sici, sickbay, sieve, smalltalk, smiles, smtlibv2, snobol, snowball, more, solidity, sparql, spass, sql, stacktrace, stellaris, stl,
stringtemplate, suokif, swift-fin, swift, tcpheader, teal, telephone, terraform, thrift, tiny, tinybasic, tinyc, tinymud, tinyos_nesc, tl, tnsnames, tnt, toml, trac,
tsv, ttm, turing, turtle-doc, turtle, unicode, unreal_angelscript, upnp, url, useragent, v, vb6, vba, velocity, verilog, vhdl, vmf, wat, wavefront, webidl, wkt, wln,
wren, xml, xpath, xsd-regex, xyz, z
opensource ANTLR grammars available at https://github.com/antlr/grammars-v4
…or you can use
© 2022 Thoughtworks | Confidential
Compilers use all metadata to translate
code into executable instructions.
What is all this for?
Is there anything that can be leveraged
by Business Analysts, Project
Managers, or IT managers at large*?
8
Static code analysis tools (e.g. linters)
use AST metadata to identify potential
issues (e.g. programming errors, bugs,
non idiomatic code, and suspicious
constructs, metrics).
IDEs use filtered metadata (e.g.
variables, functions, classes, methods) to
provide navigation, hints, and code
completion.
DEVs use actually the same metadata
(unconsciously) to read the code!
* and DEVs too…
© 2022 Thoughtworks | Confidential 9
Known Knowns
Identified Knowledge
Known Unknowns
Identified Risk
Unknown Knowns
Untapped Knowledge
Unknown
Unknowns
Unidentified Risk
Legacy Modernization
Challenges…
proactively reactively
discoveries, inceptions,
user stories, acceptance criteria
spikes, RAIDs,
modernization patterns
cross-functional teams,
short iterations, IPMs
… where Agile helps
Agile tools & techniques help
to proactively address KK & KU
and reactively UK & UU.
© 2022 Thoughtworks | Confidential
CaD can help to proactively mitigate
the risks about UK & UU.
… where can CaD help?
10
Known Knowns
Identified Knowledge
Known Unknowns
Identified Risk
Unknown Knowns
Untapped Knowledge
Unknown
Unknowns
Unidentified Risk
proactively
Legacy Modernization
Challenges…
proactively reactively
© 2022 Thoughtworks | Confidential
Example n. 1 - Unknown Knowns
project-level support for BAs & tech analysis
11
© 2022 Thoughtworks | Confidential
Project-level risks mitigation
Use Case: Modernization of a Pricing Engine
We were asked to replace a pricing engine
under development for the past 30 years.
12
We went through an inception and several workshops
with stakeholders, SMEs, DEVs.
We collected all the available knowledge (KK),
and identified all the grey areas that would require
further investigations (KU).
Are these really just
all the business
rules?
We found out that business rules were encoded
as table rows (e.g. exception/inclusion rules)
or field values (e.g. operation rules and values),
referenced and manipulated inside legacy code.
SMEs and DEVs told us that there are only 60 tables
to care about…
© 2022 Thoughtworks | Confidential 13
Inception
Proactively untap knowledge
Consolidate SMEs knowledge
BAs
Refine the project scope
PO
Trigger SMEs conversations
BAs
Explore grey areas
BAs
identify missing tables/proc
identify referenced fields
Explore legacy code
Parse legacy code
DEVs
DEVs
BAs
BAs
CaD Pipeline & Tools
Goals:
● Tactical: it should not require a huge
investment in time and resources
(i.e. DEVs should not have to become legacy
code experts)
● Pragmatic: just search for possible clues
(e.g. tables, fields, procedures not mentioned
in the workshops)
● Accessible: BAs and SMEs should be able to
use and explore the outcomes
(i.e. use tools they already know)
Use Case: Modernization of a pricing engine
© 2022 Thoughtworks | Confidential
CaD Pipeline Goals
● Tactical: ~300 clojure LoC leveraging
an existing open-source legacy
language ANTLR-based parser.
● Pragmatic: leveraging semantic
features of the legacy language
to filter tokens (never underestimate
the expressiveness of an old
programming language)
● Accessible: the output was a
spreadsheet that could be easily
filtered by table and column name,
or explored with pivot tables.
All the tokens were connected
(via Excel hyperlinks) to tables
documentation and specific
line/column of the source code
in VS Code (with syntax coloring
thanks to an open-source plugin).
DB Catalog Parse
Tables & Columns
Metadata
Tables/Fields
Names
1.1k tables
11k fields
Table List Parse
Project Scope
SMEs
60 tables
??? fields
Filter
& Merge
Tokens
referencing
Tables & Fields
128 tables
1k fields
4.7k tokens
Source Code Parse
P
r
o
j
e
c
t
S
c
o
p
e
4M LoC
22k LoC
117k tokens
+
Browser
Online Docs
BAs/SMEs
XLSX
Excel
BAs
VS Code
4th gen language
VS Code plugin
DEVs/SMEs
© 2022 Thoughtworks | Confidential 15
Easy interoperability
with Java libraries
Easy access to ANTRL
objects and attributes and
XLSX libraries to read/write
large files.
Fast in-memory parallel
data transformations
Clojure transducers and
core.async libraries provide
easy & fast parallel
in-memory transformations
without requiring huge
resources or infra.
REPL driven
development
The REPL allow an instant
feedback workflow that can
dramatically speed up
exploring Java libraries and
data structures.
Why Clojure?
Clojure is a fast modern Lisp that runs on top of the JVM (and CRI/V8 too).
because we love parenthesis ;)
© 2022 Thoughtworks | Confidential
How it looks like
16
Excel Spreadsheet
VS Code editor: token context
Token
Table’s online docs
Hyperlink Hyperlink
Dataset
Source
Code
DB
SME
DEVs/SMEs
BAs/SMEs
BAs
● parse unit path: source file path
● file path: original source file path
(may be different in case of include
file)
● line: token line inside the source file
● column: token starting column
inside the line
● source docs: link to VS code to
highlight the token inside the source
file
● type: token semantics tag
● text: token actual text
● node id: token id (parse unit context)
● parent id: token AST parent
● level: token AST indentation
● procedure id: procedure uuid
● procedure name: procedure name
● table name: matching table
● column name: matching column
● table docs: link to table’s online docs
● ambiguous term: true/false
● in scope: true/false
© 2022 Thoughtworks | Confidential 17
Are these really just
all the business
rules?
Use Case: Modernization of a pricing engine
CaD outcomes
● Tables: +40% more tables in scope
(some were edge cases, other seldom used)
● Fields: scope reduced to 36% of fields
(most fields were used for other purposes)
● Business Rules: whenever there was
a computation issue we could go
to the exact point in the source code
to clarify assumptions and behaviors
© 2022 Thoughtworks | Confidential
Example n. 2 - Unknown Unknowns
program-level support for legacy modernization
18
© 2022 Thoughtworks | Confidential
Program-level risks mitigation
Use Case: legacy ERP modernization
We were asked to replace an existing monolithic
on-prem ERP-like system made of several modules,
and under development for the past 30 years.
19
We went through a discovery and several workshops
with DEVs, OPSs, DBAs, SMEs, and business
stakeholders.
We defined a target functional & tech architecture
(KK), and identified modernisation patterns & RAIDs
(KU) with tentative mitigations.
how much
is going to cost?
where should
we start from?
We found out that the system was integrated
with several business processes, exchanging data
with many applications, and everybody was scared
of breaking something…
© 2022 Thoughtworks | Confidential
20
© 2022 Thoughtworks | Confidential
21
There should be a
wire somewhere on
your left…
Or maybe it’s on the
right… but don’t cut
the other ones!
© 2022 Thoughtworks | Confidential
We collected an amazing
amount of information
inside stickies
of different colors
and shape.
What if we could
translate stickies
into data?
22
22
© 2022 Thoughtworks
© 2022 Thoughtworks | Confidential 23
Discovery
& Workshops
Proactively uncover unknown risks
Consolidate FIndings
BAs
Refine the program roadmap
PO
Learn from mistakes
BAs
Start several
Inceptions/deliveries
BAs
Use Case: legacy ERP modernization
Goals:
● Strategical: it should help defining a long term
plan backed by data and KPIs that can evolve
over time.
● Comprehensive: it should cover the entire
applications landscape not just a single project.
● Flexible: it should quickly provide answer to basic
questions, but also support further investigations.
CaD Pipeline & Tools
Convert Stickies to Data
DEVs
Parse Code
& Merge with Stickies Data
DEVs Explore & Compute KPIs
DEVs
DEVs
Collect Projects Metrics
BAs
© 2022 Thoughtworks | Confidential
..to data (visualization)
Area
3
M3
M2
M1
Area
2
M3
M2
M1
Area
4
M3
M2
M1
Area
1
M3
M2
M1
from stickies…
1. We split the monolith into
logical Areas & Modules
(both existing and new ones)
© 2022 Thoughtworks | Confidential
..to data (visualization)
Area
3
M3
M2
M1
Area
2
M3
M2
M1
Area
4
M3
M2
M1
Area
1
M3
M2
M1
Tables
from stickies…
1. We split the monolith into
logical Areas & Modules
(both existing and new ones)
2. We map Tables belonging to
just a single Module (if any)
© 2022 Thoughtworks | Confidential
..to data (visualization)
Area
3
M3
M2
M1
Area
2
M3
M2
M1
Area
4
M3
M2
M1
Area
1
M3
M2
M1
APIs
Tables
from stickies…
1. We split the monolith into
logical Areas & Modules
(both existing and new ones)
2. We map Tables belonging to
just a single Module (if any)
3. We map APIs belonging to just
a single Module (if any)
© 2022 Thoughtworks | Confidential
..to data (visualization)
Area
3
M3
M2
M1
Area
2
M3
M2
M1
Area
4
M3
M2
M1
Area
1
M3
M2
M1
APIs
Tables
Source Code
from stickies…
1. We split the monolith into
logical Areas & Modules
(both existing and new ones)
2. We map Tables belonging to
just a single Module (if any)
3. We map APIs belonging to just
a single Module (if any)
4. We parse source code
to identify chain of calls
(who calls whom)
and access to tables
(who reads/writes data where)
© 2022 Thoughtworks | Confidential
..to data (visualization)
Area
3
M3
M2
M1
Area
2
M3
M2
M1
Area
4
M3
M2
M1
Area
1
M3
M2
M1
APIs
Tables
Source Code
from stickies…
1. We split the monolith into
logical Areas & Modules
(both existing and new ones)
2. We map Tables belonging to
just a single Module (if any)
3. We map APIs belonging to just
a single Module (if any)
4. We parse source code
to identify chain of calls
(who calls whom)
and access to tables
(who reads/writes data where)
5. We map APIs
to source code entry points
(e.g. functions)
© 2022 Thoughtworks | Confidential
we can now explore
each area, module, table,
procedure, or API and
follow interactively
all the trails that connect
stickies to source code.
29
29
© 2022 Thoughtworks
© 2022 Thoughtworks | Confidential
CaD Pipeline Goals
● Strategical: integrating workshop
outcomes with technical catalogs, we
can intersect target and current state
(e.g. sizing target features complexity
slicing current implementation).
● Comprehensive: the pipeline can be
easily extended to include more
languages, projects, or artefacts (e.g.
configuration files, parsable
documentation)
● Flexible: leveraging meta-models we
can explore source code in a guided
way or build our own way through it
and identify risks and areas to
deep-dive (e.g. shared dependencies,
domain bleeding, domain complexity).
Build
Annotated
Graph
Source Code Parse
+
Discovery
Stickies
Parse
SMEs
Tables & APIs
Annotations
DB Catalog
APIs Catalog
Parse
Pharo
DEVs/SMEs
Explore Graph
& Build KPIs
Refine Roadmap
BAs/SMEs
Excel
Merge KPIs
& Metrics
© 2022 Thoughtworks | Confidential 31
Complex data structures
visualization tools
Pharo integrate Roassal
library to display complex
interactive graph-oriented
data structures.
Dynamic graphical
inspector
Every object can be explored
with a graphical inspector
and may define custom views
based on Roassal.
Pause & resume
support
In every moment, we can
pause the exploration and
save objects/views to disk
and restart later from where
we left.
Why Pharo?
Pharo is a fast modern Smalltalk focused on simplicity and immediate feedback.
because we love objects soup ;)
© 2022 Thoughtworks | Confidential 32
32
Lesson learned so far
Do we have answers?
Not yet, but we started to collect
evidences not just gut feelings.
Look for what overlaps
(e.g. shared libraries, table accessed
by several modules)
to anticipate possible issues
Look for what matches
to collect data about effort
(e.g. LoC).
Look for what doesn’t match
(e.g. table not accessed by code, code
not invoked by other code or API)
to uncover unseen risks.
© 2022 Thoughtworks | Confidential
“In contrast to visual
programming and
diagramming for software
design, software visualization
is not so much concerned with
the construction, but with the
analysis of programs and their
development process.”
33
S. Diehl, Software Visualization
Springer, 1998, ISBN 9783540465041
33
© 2022 Thoughtworks
© 2022 Thoughtworks | Confidential
“Challenges in data
visualization does not actually
involve visualizing Data. [...] The
challenge is in crafting a
visualization that is easily
reusable, composable, and
extensible.”
34
A. Bergel, Agile Visualization
Lulu Press, 2016, ISBN 978136531409
34
© 2022 Thoughtworks
© 2022 Thoughtworks | Confidential
$ tail -f questions
Alessandro Confetti
Tech Principal
aconfet@thoughtworks.com
35
© 2022 Thoughtworks | Confidential
36
© 2022 Thoughtworks | Confidential
37
W
I
P
© 2022 Thoughtworks | Confidential 38
Hello, World
main( ) {
extern a, b, c;
putchar(a); putchar(b);
putchar(c); putchar('!*n');
}
a 'hell';
b 'o, w';
c 'orld';
Kernighan, Brian W. (1972). A Tutorial Introduction to the Language B. Bell Laboratories (p 4)
© 2022 Thoughtworks | Confidential
39
What is a
meta-model?
When we need to explore and reason
about complex systems, we need to find
the right kind of representation
(i.e. the right questions we need answer to).
A pragmatic way to find the right balance
between accuracy and outcome.
Concrete
Abstract Meta-[...]-Model
describes
e.g. alphabet, numbers, units,
colors, cartographic projection
Symbols and grammar to represent structure
and vocabulary of a valid meta-model.
Meta-Model
describes
e.g. map legend and conventions
Structure and vocabulary of a valid model.
Model
represents
e.g. street map
Simplified representation of the problem, driven
by questions we need answered.
Subject/Problem
e.g. route between two cities
Something we want to reason about Complex
Simplified
See for reference: J. Bezivin and O. Gerbe, Towards a precise definition of the OMG/MDA
framework,
Proceedings 16th AICASE (ASE 2001), 2001, pp. 273-280, doi: 10.1109/ASE.2001.989813.
If we oversimplify it, we may end up with lot
of underestimated or unmitigated risks.
If we overcomplicate it, we may easily enter
never-ending rabbit-holes and struggle to
deliver the overall picture.
© 2022 Thoughtworks | Confidential
● Can be used to describe different kind of common diagrams (e.g.
E/R, UML), semantics for hierarchical structures (e.g. XML, JSON), or
programming languages (e.g. procedural, functional,
object-oriented).
40
What is a
meta-model?
When we need to explore and reason
about complex systems, we need to find
the right kind of representation
(i.e. the right questions we need answer to).
A pragmatic way to find the right balance
between accuracy and outcome.
If we oversimplify it, we may end up with lot
of underestimated or unmitigated risks.
If we overcomplicate it, we may easily enter
never-ending rabbit-holes and struggle to
deliver the overall picture.
FAMIX meta-model
FAME meta-meta-model
Can be used to describe different kind
of common diagrams (e.g. E/R, UML),
semantics for hierarchical structures
(e.g. XML, JSON), or programming
languages (e.g. procedural, functional,
object-oriented).
Support both procedural and object
oriented languages.
Plugins available for many languages
(e.g. C/C++, C#, Clojure, Java,
JavaScript, JSX/React, PHP).
MSE file format can be used to
export/import models based on FAMIX.
© 2022 Thoughtworks | Confidential
FAME meta-meta-model Description
family of meta-meta-models for
describing and defining meta-models
● All meta-models share a series of
common features and basic enquiring
capabilities.
● Can be used to describe different kind
of common diagrams (e.g. E/R, UML),
semantics for hierarchical structures
(e.g. XML, JSON), or programming
languages (e.g. procedural, functional,
object-oriented).
© 2022 Thoughtworks | Confidential
FAMIX meta-model Description
family of meta-models for representing
the structure of software projects.
● Support both procedural and object
oriented languages.
● Plugins available for many languages
(e.g. C/C++, C#, Clojure, Java,
JavaScript, JSX/React, PHP).
● MSE file format can be used to
export/import models based on
FAMIX.
● All models share a series of common
features and basic enquiring
capabilities (e.g. dependency trees).
© 2022 Thoughtworks | Confidential
43
43
There is an growing community of researchers and tools
We are not alone…

More Related Content

PDF
20191116 DevFest 2019 The Legacy Code came to stay (El legacy vino para queda...
PDF
BigDL webinar - Deep Learning Library for Spark
PDF
Summer training vhdl
PPTX
Summer training vhdl
PDF
SC20 SYCL and C++ Birds of a Feather 19th Nov 2020
PDF
Eugene Burmako
PDF
Scaling AI in production using PyTorch
PPTX
Seattle Cassandra Users: An OSS Java Abstraction Layer for Cassandra
20191116 DevFest 2019 The Legacy Code came to stay (El legacy vino para queda...
BigDL webinar - Deep Learning Library for Spark
Summer training vhdl
Summer training vhdl
SC20 SYCL and C++ Birds of a Feather 19th Nov 2020
Eugene Burmako
Scaling AI in production using PyTorch
Seattle Cassandra Users: An OSS Java Abstraction Layer for Cassandra

Similar to XConf 2022 - Code As Data: How data insights on legacy codebases can fill the knowledge gap in complex modernization projects. (20)

PDF
Big data distributed processing: Spark introduction
PDF
Summer training vhdl
PDF
OrientDB and Hazelcast
PDF
OrientDB & Hazelcast: In-Memory Distributed Graph Database
PPT
DotNet Introduction
PDF
Speeding up Programs with OpenACC in GCC
PDF
Scaling Up AI Research to Production with PyTorch and MLFlow
DOC
verification resume
PDF
mloc.js 2014 - JavaScript and the browser as a platform for game development
PDF
Clipper: A Low-Latency Online Prediction Serving System
PDF
Introduction to SeqAn, an Open-source C++ Template Library
PDF
Verilog HDL coding in VLSi Design circuits.pdf
PDF
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
PPTX
Madeo - a CAD Tool for reconfigurable Hardware
PDF
[Gary entsminger] turbo_pascal_for_windows_bible(book_fi.org)
PPT
Unit 2 Java
DOC
V.S.VamsiKrishna
PDF
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
PDF
24-02-18 Rejender pratap.pdf
PDF
Java Day Minsk 2016 Keynote about Microservices in real world
Big data distributed processing: Spark introduction
Summer training vhdl
OrientDB and Hazelcast
OrientDB & Hazelcast: In-Memory Distributed Graph Database
DotNet Introduction
Speeding up Programs with OpenACC in GCC
Scaling Up AI Research to Production with PyTorch and MLFlow
verification resume
mloc.js 2014 - JavaScript and the browser as a platform for game development
Clipper: A Low-Latency Online Prediction Serving System
Introduction to SeqAn, an Open-source C++ Template Library
Verilog HDL coding in VLSi Design circuits.pdf
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
Madeo - a CAD Tool for reconfigurable Hardware
[Gary entsminger] turbo_pascal_for_windows_bible(book_fi.org)
Unit 2 Java
V.S.VamsiKrishna
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
24-02-18 Rejender pratap.pdf
Java Day Minsk 2016 Keynote about Microservices in real world
Ad

More from Alessandro Confetti (13)

PDF
CDO Exchange - Lesson learned implementing a large data mesh at Payback.pdf
PDF
Rethinking AI_ Can We Do Better Than Good Enough?.pdf
PDF
Was the technology really useful this time?
PDF
Scuttlebutt or how to exit facebook and start coding your first web 3.0 socia...
PDF
How to avoid a web 3.0 babele transclusions and folksonomies in a content-a...
PDF
How to avoid a web 3.0 babele transclusions and folksonomies in a content-a...
PDF
Oop vs functional stop the fight and start building message driven serverle...
PDF
Through the looking glass (of the blockchain)
PDF
Learn how to build decentralized and serverless html5 applications with embar...
PDF
Learn how to build decentralized and serverless html5 applications with embar...
PDF
PDF
The Pandora Security Model
PDF
Agile vs ??
CDO Exchange - Lesson learned implementing a large data mesh at Payback.pdf
Rethinking AI_ Can We Do Better Than Good Enough?.pdf
Was the technology really useful this time?
Scuttlebutt or how to exit facebook and start coding your first web 3.0 socia...
How to avoid a web 3.0 babele transclusions and folksonomies in a content-a...
How to avoid a web 3.0 babele transclusions and folksonomies in a content-a...
Oop vs functional stop the fight and start building message driven serverle...
Through the looking glass (of the blockchain)
Learn how to build decentralized and serverless html5 applications with embar...
Learn how to build decentralized and serverless html5 applications with embar...
The Pandora Security Model
Agile vs ??
Ad

Recently uploaded (20)

PPTX
CHAPTER 2 - PM Management and IT Context
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPT
Introduction Database Management System for Course Database
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PPTX
Introduction to Artificial Intelligence
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PPTX
ai tools demonstartion for schools and inter college
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PPTX
assetexplorer- product-overview - presentation
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
CHAPTER 2 - PM Management and IT Context
PTS Company Brochure 2025 (1).pdf.......
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
How to Choose the Right IT Partner for Your Business in Malaysia
Introduction Database Management System for Course Database
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Introduction to Artificial Intelligence
2025 Textile ERP Trends: SAP, Odoo & Oracle
ai tools demonstartion for schools and inter college
Odoo Companies in India – Driving Business Transformation.pdf
Navsoft: AI-Powered Business Solutions & Custom Software Development
Upgrade and Innovation Strategies for SAP ERP Customers
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
assetexplorer- product-overview - presentation
wealthsignaloriginal-com-DS-text-... (1).pdf
VVF-Customer-Presentation2025-Ver1.9.pptx
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Which alternative to Crystal Reports is best for small or large businesses.pdf

XConf 2022 - Code As Data: How data insights on legacy codebases can fill the knowledge gap in complex modernization projects.

  • 1. © 2022 Thoughtworks | Confidential CaD: (Code As Data) How data insights on legacy codebases can fill the knowledge gap in complex modernization projects
  • 2. © 2022 Thoughtworks | Confidential Why legacy modernization is always so challenging? 2 We usually rely on people who built the software (i.e. DEVs) or the ones dealing with it (e.g. SMEs, Users) to collect knowledge about it (e.g. what it does?, why?). But people come and go and knowledge is scattered or lost… What if source code could tell some interesting facts about itself too? 1998 2016
  • 3. © 2022 Thoughtworks | Confidential 3 What People Think Code Is CaG C(ode) a(s) G(ibberish) What Developers Think Code Is CaL C(ode) a(s) L(iterature) What Computers Think Code Is CaD C(ode) a(s) D(data)
  • 4. © 2022 Thoughtworks | Confidential 4 #include <stdio.h> main( ) { printf("hello, world"); } Hello, World Kernighan, Brian W.; Ritchie, Dennis M. (1978). The C Programming Language (1st ed.). Englewood Cliffs, NJ: Prentice Hall. ISBN 0-13-110163-3. (developer view)
  • 5. © 2022 Thoughtworks | Confidential $ clang -Xclang -ast-dump -fsyntax-only hello-world.c 1 5 10 15 20 25 30 1 # i n c l u d e < s t d i o . h > 2 3 m a i n ( ) { 4 p r i n t f ( “ h e l l o , w o r l d “ ) ; 5 } 5 where token who what semantics Hello, World (computer view) AST syntax Abstract Syntax Tree ↳ ↳ ↳ ↳ ↳ ↳ ↳
  • 6. © 2022 Thoughtworks | Confidential 6 TYPE ID PARENT_ID FILE LINE COLUMN TEXT FunctionDecl 0x7fedd489f830 hello-world.c 3 main CompoundStmt 0x7fedd489fa10 0x7fedd489f830 hello-world.c 3 12 CallExpr 0x7fedd489f9b8 0x7fedd489fa10 hello-world.c 4 3 ImplicitCastExpr 0x7fedd489f9a0 0x7fedd489f9b8 hello-world.c 4 3 DeclRefExpr 0x7fedd489f8d0 0x7fedd489f9a0 hello-world.c 4 3 printf ImplicitCastExpr 0x7fedd489f9f8 0x7fedd489f9b8 hello-world.c 4 10 ImplicitCastExpr 0x7fedd489f9e0 0x7fedd489f9f8 hello-world.c 4 10 StringLiteral 0x7fedd489f928 0x7fedd489f9e0 hello-world.c 4 10 Hello, World!n $ clang -Xclang -ast-dump -fsyntax-only hello-world.c 1 5 10 15 20 25 30 1 # i n c l u d e < s t d i o . h > 2 3 m a i n ( ) { 4 p r i n t f ( “ h e l l o , w o r l d “ ) ; 5 } where token who what semantics Hello, World (data view) AST syntax Abstract Syntax Tree ↳ ↳ ↳ ↳ ↳ ↳ ↳
  • 7. © 2022 Thoughtworks | Confidential 7 Java class HelloWorld { public static void main(String[] args) { System.out.println("Hello, world!"); } } C# using System; class Program { public static void Main(string[] args) { Console.WriteLine("Hello, world!"); } } Python print("Hello, world!") Ruby puts "Hello, world!" Scala object HelloWorld extends App { println("Hello, world!") } ASP.NET Response.Write("Hello World!"); Lisp (princ "Hello, world!") Haskell main = putStrLn "Hello, world!" Malbolge ('&%:9]!~}|z2Vxwv-,POqponl$Hjig%eB@@>}=<M:9wv6WsU 2T|nm-,jcL(I&%$#" `CB]V?Tx<uVtT`Rpo3NlF.Jh++FdbCBA@?]!~|4XzyTT43Qsq q(Lnmkj"Fhg${z@> Hello, (all) World there is a parser for every programming language… abb, abnf, acme, agc, alef, fix, algol60, alloy, alpaca, angelscript, space, antlr, apex, apt, argus, arithmetic, asl, asm, asn, aspectj, atl, b, basic, bcl, bcl, bcpl, bdf, bibcode, bnf, brainflak, brainfuck, c, calculator, callable, capnproto, cayenne, symbol conflicts, clf, clif, clojure, clu, cmake, cobol85, cookie, cool, cpp, cql, cql3, creole, csharp, css3, csv, ctl, cto, dart2, databank, dcm, dgol, dice, dif, doiurl, dot, edif300, edn, erlang, fasta, fdo91, fen, flatbuffers, flowmatic, fixes, focal, fol, fortran77, fusion-tables, gdscript, gedcom, gff3, gml, golang, graphql, graphstream-dgs, gtin, guido, guitartab, haskell, html, http, hypertalk, icalendar, icon, idl, inf, informix, infosapient, iri, iso8601, istc, itn, first commit, janus, java, javadoc, javascript, joss, jpa, json, json5, karel, karel, kirikiri-tjs, kotlin, kquery, kuka, lambda, lark, lcc, less, limbo, lisa, logo, lolcode, loop, lpc, lrc, ltl, lua, lucene, matlab, mckeeman-form, mdx, memcached_protocol, metamath, metric, microc, modelica, modula2pim4, molecule, moo, morsecode, mps, muddb, mumath, mumps, muparser, nanofuck, newick, oberon, objc, oncrpc, orwell, p, parkingsign, pascal, pcre, pddl, pdn, peoplecode, pgn, php, pii, pike, pl0, plucid, fix, ply, pmmn, postalcode, powerbuilder, powerbuilderdw, powerquery, prolog, promql, propcalc, properties, protobuf2, protobuf3, prov-n, python, qif, quakemap, r, racket-bsl, racket-isl, rcs, redcode, refal, comments, rego, restructuredtext, rexx, rfc1035, rfc1960, rfc3080, rfc822, robotwars, romannumerals, rpn, ruby, rust, last month, scala, scotty, scss, last month, sexpression, sgf, sharc, sici, sickbay, sieve, smalltalk, smiles, smtlibv2, snobol, snowball, more, solidity, sparql, spass, sql, stacktrace, stellaris, stl, stringtemplate, suokif, swift-fin, swift, tcpheader, teal, telephone, terraform, thrift, tiny, tinybasic, tinyc, tinymud, tinyos_nesc, tl, tnsnames, tnt, toml, trac, tsv, ttm, turing, turtle-doc, turtle, unicode, unreal_angelscript, upnp, url, useragent, v, vb6, vba, velocity, verilog, vhdl, vmf, wat, wavefront, webidl, wkt, wln, wren, xml, xpath, xsd-regex, xyz, z opensource ANTLR grammars available at https://github.com/antlr/grammars-v4 …or you can use
  • 8. © 2022 Thoughtworks | Confidential Compilers use all metadata to translate code into executable instructions. What is all this for? Is there anything that can be leveraged by Business Analysts, Project Managers, or IT managers at large*? 8 Static code analysis tools (e.g. linters) use AST metadata to identify potential issues (e.g. programming errors, bugs, non idiomatic code, and suspicious constructs, metrics). IDEs use filtered metadata (e.g. variables, functions, classes, methods) to provide navigation, hints, and code completion. DEVs use actually the same metadata (unconsciously) to read the code! * and DEVs too…
  • 9. © 2022 Thoughtworks | Confidential 9 Known Knowns Identified Knowledge Known Unknowns Identified Risk Unknown Knowns Untapped Knowledge Unknown Unknowns Unidentified Risk Legacy Modernization Challenges… proactively reactively discoveries, inceptions, user stories, acceptance criteria spikes, RAIDs, modernization patterns cross-functional teams, short iterations, IPMs … where Agile helps Agile tools & techniques help to proactively address KK & KU and reactively UK & UU.
  • 10. © 2022 Thoughtworks | Confidential CaD can help to proactively mitigate the risks about UK & UU. … where can CaD help? 10 Known Knowns Identified Knowledge Known Unknowns Identified Risk Unknown Knowns Untapped Knowledge Unknown Unknowns Unidentified Risk proactively Legacy Modernization Challenges… proactively reactively
  • 11. © 2022 Thoughtworks | Confidential Example n. 1 - Unknown Knowns project-level support for BAs & tech analysis 11
  • 12. © 2022 Thoughtworks | Confidential Project-level risks mitigation Use Case: Modernization of a Pricing Engine We were asked to replace a pricing engine under development for the past 30 years. 12 We went through an inception and several workshops with stakeholders, SMEs, DEVs. We collected all the available knowledge (KK), and identified all the grey areas that would require further investigations (KU). Are these really just all the business rules? We found out that business rules were encoded as table rows (e.g. exception/inclusion rules) or field values (e.g. operation rules and values), referenced and manipulated inside legacy code. SMEs and DEVs told us that there are only 60 tables to care about…
  • 13. © 2022 Thoughtworks | Confidential 13 Inception Proactively untap knowledge Consolidate SMEs knowledge BAs Refine the project scope PO Trigger SMEs conversations BAs Explore grey areas BAs identify missing tables/proc identify referenced fields Explore legacy code Parse legacy code DEVs DEVs BAs BAs CaD Pipeline & Tools Goals: ● Tactical: it should not require a huge investment in time and resources (i.e. DEVs should not have to become legacy code experts) ● Pragmatic: just search for possible clues (e.g. tables, fields, procedures not mentioned in the workshops) ● Accessible: BAs and SMEs should be able to use and explore the outcomes (i.e. use tools they already know) Use Case: Modernization of a pricing engine
  • 14. © 2022 Thoughtworks | Confidential CaD Pipeline Goals ● Tactical: ~300 clojure LoC leveraging an existing open-source legacy language ANTLR-based parser. ● Pragmatic: leveraging semantic features of the legacy language to filter tokens (never underestimate the expressiveness of an old programming language) ● Accessible: the output was a spreadsheet that could be easily filtered by table and column name, or explored with pivot tables. All the tokens were connected (via Excel hyperlinks) to tables documentation and specific line/column of the source code in VS Code (with syntax coloring thanks to an open-source plugin). DB Catalog Parse Tables & Columns Metadata Tables/Fields Names 1.1k tables 11k fields Table List Parse Project Scope SMEs 60 tables ??? fields Filter & Merge Tokens referencing Tables & Fields 128 tables 1k fields 4.7k tokens Source Code Parse P r o j e c t S c o p e 4M LoC 22k LoC 117k tokens + Browser Online Docs BAs/SMEs XLSX Excel BAs VS Code 4th gen language VS Code plugin DEVs/SMEs
  • 15. © 2022 Thoughtworks | Confidential 15 Easy interoperability with Java libraries Easy access to ANTRL objects and attributes and XLSX libraries to read/write large files. Fast in-memory parallel data transformations Clojure transducers and core.async libraries provide easy & fast parallel in-memory transformations without requiring huge resources or infra. REPL driven development The REPL allow an instant feedback workflow that can dramatically speed up exploring Java libraries and data structures. Why Clojure? Clojure is a fast modern Lisp that runs on top of the JVM (and CRI/V8 too). because we love parenthesis ;)
  • 16. © 2022 Thoughtworks | Confidential How it looks like 16 Excel Spreadsheet VS Code editor: token context Token Table’s online docs Hyperlink Hyperlink Dataset Source Code DB SME DEVs/SMEs BAs/SMEs BAs ● parse unit path: source file path ● file path: original source file path (may be different in case of include file) ● line: token line inside the source file ● column: token starting column inside the line ● source docs: link to VS code to highlight the token inside the source file ● type: token semantics tag ● text: token actual text ● node id: token id (parse unit context) ● parent id: token AST parent ● level: token AST indentation ● procedure id: procedure uuid ● procedure name: procedure name ● table name: matching table ● column name: matching column ● table docs: link to table’s online docs ● ambiguous term: true/false ● in scope: true/false
  • 17. © 2022 Thoughtworks | Confidential 17 Are these really just all the business rules? Use Case: Modernization of a pricing engine CaD outcomes ● Tables: +40% more tables in scope (some were edge cases, other seldom used) ● Fields: scope reduced to 36% of fields (most fields were used for other purposes) ● Business Rules: whenever there was a computation issue we could go to the exact point in the source code to clarify assumptions and behaviors
  • 18. © 2022 Thoughtworks | Confidential Example n. 2 - Unknown Unknowns program-level support for legacy modernization 18
  • 19. © 2022 Thoughtworks | Confidential Program-level risks mitigation Use Case: legacy ERP modernization We were asked to replace an existing monolithic on-prem ERP-like system made of several modules, and under development for the past 30 years. 19 We went through a discovery and several workshops with DEVs, OPSs, DBAs, SMEs, and business stakeholders. We defined a target functional & tech architecture (KK), and identified modernisation patterns & RAIDs (KU) with tentative mitigations. how much is going to cost? where should we start from? We found out that the system was integrated with several business processes, exchanging data with many applications, and everybody was scared of breaking something…
  • 20. © 2022 Thoughtworks | Confidential 20
  • 21. © 2022 Thoughtworks | Confidential 21 There should be a wire somewhere on your left… Or maybe it’s on the right… but don’t cut the other ones!
  • 22. © 2022 Thoughtworks | Confidential We collected an amazing amount of information inside stickies of different colors and shape. What if we could translate stickies into data? 22 22 © 2022 Thoughtworks
  • 23. © 2022 Thoughtworks | Confidential 23 Discovery & Workshops Proactively uncover unknown risks Consolidate FIndings BAs Refine the program roadmap PO Learn from mistakes BAs Start several Inceptions/deliveries BAs Use Case: legacy ERP modernization Goals: ● Strategical: it should help defining a long term plan backed by data and KPIs that can evolve over time. ● Comprehensive: it should cover the entire applications landscape not just a single project. ● Flexible: it should quickly provide answer to basic questions, but also support further investigations. CaD Pipeline & Tools Convert Stickies to Data DEVs Parse Code & Merge with Stickies Data DEVs Explore & Compute KPIs DEVs DEVs Collect Projects Metrics BAs
  • 24. © 2022 Thoughtworks | Confidential ..to data (visualization) Area 3 M3 M2 M1 Area 2 M3 M2 M1 Area 4 M3 M2 M1 Area 1 M3 M2 M1 from stickies… 1. We split the monolith into logical Areas & Modules (both existing and new ones)
  • 25. © 2022 Thoughtworks | Confidential ..to data (visualization) Area 3 M3 M2 M1 Area 2 M3 M2 M1 Area 4 M3 M2 M1 Area 1 M3 M2 M1 Tables from stickies… 1. We split the monolith into logical Areas & Modules (both existing and new ones) 2. We map Tables belonging to just a single Module (if any)
  • 26. © 2022 Thoughtworks | Confidential ..to data (visualization) Area 3 M3 M2 M1 Area 2 M3 M2 M1 Area 4 M3 M2 M1 Area 1 M3 M2 M1 APIs Tables from stickies… 1. We split the monolith into logical Areas & Modules (both existing and new ones) 2. We map Tables belonging to just a single Module (if any) 3. We map APIs belonging to just a single Module (if any)
  • 27. © 2022 Thoughtworks | Confidential ..to data (visualization) Area 3 M3 M2 M1 Area 2 M3 M2 M1 Area 4 M3 M2 M1 Area 1 M3 M2 M1 APIs Tables Source Code from stickies… 1. We split the monolith into logical Areas & Modules (both existing and new ones) 2. We map Tables belonging to just a single Module (if any) 3. We map APIs belonging to just a single Module (if any) 4. We parse source code to identify chain of calls (who calls whom) and access to tables (who reads/writes data where)
  • 28. © 2022 Thoughtworks | Confidential ..to data (visualization) Area 3 M3 M2 M1 Area 2 M3 M2 M1 Area 4 M3 M2 M1 Area 1 M3 M2 M1 APIs Tables Source Code from stickies… 1. We split the monolith into logical Areas & Modules (both existing and new ones) 2. We map Tables belonging to just a single Module (if any) 3. We map APIs belonging to just a single Module (if any) 4. We parse source code to identify chain of calls (who calls whom) and access to tables (who reads/writes data where) 5. We map APIs to source code entry points (e.g. functions)
  • 29. © 2022 Thoughtworks | Confidential we can now explore each area, module, table, procedure, or API and follow interactively all the trails that connect stickies to source code. 29 29 © 2022 Thoughtworks
  • 30. © 2022 Thoughtworks | Confidential CaD Pipeline Goals ● Strategical: integrating workshop outcomes with technical catalogs, we can intersect target and current state (e.g. sizing target features complexity slicing current implementation). ● Comprehensive: the pipeline can be easily extended to include more languages, projects, or artefacts (e.g. configuration files, parsable documentation) ● Flexible: leveraging meta-models we can explore source code in a guided way or build our own way through it and identify risks and areas to deep-dive (e.g. shared dependencies, domain bleeding, domain complexity). Build Annotated Graph Source Code Parse + Discovery Stickies Parse SMEs Tables & APIs Annotations DB Catalog APIs Catalog Parse Pharo DEVs/SMEs Explore Graph & Build KPIs Refine Roadmap BAs/SMEs Excel Merge KPIs & Metrics
  • 31. © 2022 Thoughtworks | Confidential 31 Complex data structures visualization tools Pharo integrate Roassal library to display complex interactive graph-oriented data structures. Dynamic graphical inspector Every object can be explored with a graphical inspector and may define custom views based on Roassal. Pause & resume support In every moment, we can pause the exploration and save objects/views to disk and restart later from where we left. Why Pharo? Pharo is a fast modern Smalltalk focused on simplicity and immediate feedback. because we love objects soup ;)
  • 32. © 2022 Thoughtworks | Confidential 32 32 Lesson learned so far Do we have answers? Not yet, but we started to collect evidences not just gut feelings. Look for what overlaps (e.g. shared libraries, table accessed by several modules) to anticipate possible issues Look for what matches to collect data about effort (e.g. LoC). Look for what doesn’t match (e.g. table not accessed by code, code not invoked by other code or API) to uncover unseen risks.
  • 33. © 2022 Thoughtworks | Confidential “In contrast to visual programming and diagramming for software design, software visualization is not so much concerned with the construction, but with the analysis of programs and their development process.” 33 S. Diehl, Software Visualization Springer, 1998, ISBN 9783540465041 33 © 2022 Thoughtworks
  • 34. © 2022 Thoughtworks | Confidential “Challenges in data visualization does not actually involve visualizing Data. [...] The challenge is in crafting a visualization that is easily reusable, composable, and extensible.” 34 A. Bergel, Agile Visualization Lulu Press, 2016, ISBN 978136531409 34 © 2022 Thoughtworks
  • 35. © 2022 Thoughtworks | Confidential $ tail -f questions Alessandro Confetti Tech Principal [email protected] 35
  • 36. © 2022 Thoughtworks | Confidential 36
  • 37. © 2022 Thoughtworks | Confidential 37 W I P
  • 38. © 2022 Thoughtworks | Confidential 38 Hello, World main( ) { extern a, b, c; putchar(a); putchar(b); putchar(c); putchar('!*n'); } a 'hell'; b 'o, w'; c 'orld'; Kernighan, Brian W. (1972). A Tutorial Introduction to the Language B. Bell Laboratories (p 4)
  • 39. © 2022 Thoughtworks | Confidential 39 What is a meta-model? When we need to explore and reason about complex systems, we need to find the right kind of representation (i.e. the right questions we need answer to). A pragmatic way to find the right balance between accuracy and outcome. Concrete Abstract Meta-[...]-Model describes e.g. alphabet, numbers, units, colors, cartographic projection Symbols and grammar to represent structure and vocabulary of a valid meta-model. Meta-Model describes e.g. map legend and conventions Structure and vocabulary of a valid model. Model represents e.g. street map Simplified representation of the problem, driven by questions we need answered. Subject/Problem e.g. route between two cities Something we want to reason about Complex Simplified See for reference: J. Bezivin and O. Gerbe, Towards a precise definition of the OMG/MDA framework, Proceedings 16th AICASE (ASE 2001), 2001, pp. 273-280, doi: 10.1109/ASE.2001.989813. If we oversimplify it, we may end up with lot of underestimated or unmitigated risks. If we overcomplicate it, we may easily enter never-ending rabbit-holes and struggle to deliver the overall picture.
  • 40. © 2022 Thoughtworks | Confidential ● Can be used to describe different kind of common diagrams (e.g. E/R, UML), semantics for hierarchical structures (e.g. XML, JSON), or programming languages (e.g. procedural, functional, object-oriented). 40 What is a meta-model? When we need to explore and reason about complex systems, we need to find the right kind of representation (i.e. the right questions we need answer to). A pragmatic way to find the right balance between accuracy and outcome. If we oversimplify it, we may end up with lot of underestimated or unmitigated risks. If we overcomplicate it, we may easily enter never-ending rabbit-holes and struggle to deliver the overall picture. FAMIX meta-model FAME meta-meta-model Can be used to describe different kind of common diagrams (e.g. E/R, UML), semantics for hierarchical structures (e.g. XML, JSON), or programming languages (e.g. procedural, functional, object-oriented). Support both procedural and object oriented languages. Plugins available for many languages (e.g. C/C++, C#, Clojure, Java, JavaScript, JSX/React, PHP). MSE file format can be used to export/import models based on FAMIX.
  • 41. © 2022 Thoughtworks | Confidential FAME meta-meta-model Description family of meta-meta-models for describing and defining meta-models ● All meta-models share a series of common features and basic enquiring capabilities. ● Can be used to describe different kind of common diagrams (e.g. E/R, UML), semantics for hierarchical structures (e.g. XML, JSON), or programming languages (e.g. procedural, functional, object-oriented).
  • 42. © 2022 Thoughtworks | Confidential FAMIX meta-model Description family of meta-models for representing the structure of software projects. ● Support both procedural and object oriented languages. ● Plugins available for many languages (e.g. C/C++, C#, Clojure, Java, JavaScript, JSX/React, PHP). ● MSE file format can be used to export/import models based on FAMIX. ● All models share a series of common features and basic enquiring capabilities (e.g. dependency trees).
  • 43. © 2022 Thoughtworks | Confidential 43 43 There is an growing community of researchers and tools We are not alone…