SlideShare a Scribd company logo
IF4IT
AUTOMATIC AND RAPID
GENERATION OF MASSIVE
KNOWLEDGE REPOSITORIES,
DIRECTLY FROM DATA
Author/Presenter: Frank Guerino
Chairman for The International Foundation for Information Technology (IF4IT)
Email: Frank.Guerino @ if4it.com
LinkedIn: https://www.linkedin.com/in/frankguerino/
Follow Us on Twitter: @IF4IT
Co-Author: Dr. Joel Kline, PhD.
Board of Advisors, The International Foundation for Information Technology (IF4IT)
Professor, Lebanon Valley College, PA-USA
1
IF4IT
The Future isAutomated Synthesis of Knowledge Repositories
Read More: https://www.if4it.com/knowledge-management-automated-content-generation-and-curation/
Meet Bob.
Bob is very competent.
Bob outperforms other people
by generating one great
knowledge article per hour.
Automated Content
Generation
Software
Meet Bob’s
replacement.
Bob’s replacement generates millions of
higher quality, highly curated, and
semantically inter-linked knowledge articles,
in the time it takes Bob to create just one… at
a fraction of the cost.
2
Few knowledge repositories,
limited content, poor curation,
lots of dead links, and no
semantic relationships.
More knowledge repositories,
far more content, greater
curation, almost no dead links,
and semantic relationships.
✖
✔
ACTOR ACTIONS RESULTS
IF4IT
The Wikipedia Problem
• The Wikipedia Community is NOT like an
Enterprise Work Community
- About 17 years to develop,
- Over 130M voluntary editors (i.e. free labor),
- Over 6M content articles
• People believe they can build internal
knowledge repositories (like libraries and intranets) using the same
manual content development paradigm as Wikipedia
• The end result is almost always the same… “Relatively empty and
low value Knowledge/Content Repositories”
People often can’t find the answers they need.
Read More: https://www.if4it.com/wikipedia-problem-understanding-enterprise-knowledge-repositories-fail/
3
IF4IT
The Problem is Manual Labor
Quantity: Low quantities of artifact delivery.
Quality: Higher levels of human-introduced errors.
Time: Longer artifact delivery times.
Money: High costs for delivery of artifacts.
Trend: Knowledge Repository Automation is very important because,
more often than not, teams that build them have very limited resource
(people & finances).
Trend: With the move to “Digital” the expectation of Knowledge
Repositories is even higher.
4
IF4IT
The Solution = Automation via Compilation
• The process is called Synthesis (a.k.a. Compilation)
• Compilation is the word used by software developers
• Synthesis is the word used by non-software developers
• Specifically, we use and recommend Data Driven
Synthesis (DDS)
• We use Compiler-based DDS to generate content, curate
content, interlink content, and automatically build and
provision Knowledge/Content repositories
Read More: https://www.if4it.com/understanding-data-driven-synthesis/
5
IF4IT
Many Decades of Successful Synthesis
 Synthesis/Compilation of Software (Since 1970s)
 Synthesis of Integrated Circuit Schematics (Since 1992)
- Inputs are Hardware Descriptive Languages (HDLs) like VHDL and Verilog.
- Outputs are used for Simulation, Acceleration, Emulation, and Fabrication
 Synthesis of APIs and software code (i.e. Scaffolding for Software
Developers, such as for Java Spring and Ruby on Rails)
 Synthesis of large volumes of test data to exercise complex systems
 Synthesis of chemical Compounds for Drug Discovery
 Synthesis of Health Care Pathways (Diagnosis + Treatments)
 Synthesis of (computer generated) Music and Art
 Synthesis of Electronic Documentation
(i.e. data driven content)
 Synthesis of Digital Libraries (massive web sites)
 Synthesis of Semantic Data Graphs (SDGs)
6
IF4IT
Who cares about DDS-based automation?
• Internet and Intranet Web Content Managers & Developers
• Technical Writers / Technical Communicators
• Architects (Enterprise/Solutions/Business/Applications/Data/etc.)
• Enterprise Models
• Software Developers (Using Compilation for about 5 Decades)
• API Documentation
• Software Configuration Documentation
• Engineers (Using Synthesis for about 3 Decades)
• Hardware, Network, Communications, & Semiconductor Documentation
• Anyone who documents topics, curates, and who publishes results to
web pages in some Content/Knowledge Repository
7
IF4IT
Common Use Cases Driving DDS
• Strategic Planning – Enterprise Portfolio Impact Analysis
• Faster Domain Documentation, - More inter-linked documentation,
with interactive data and with fewer errors, @ far lower costs
• Better Customer Support – Rapid and more accurate Incident Impact
Analysis
• Better Operational Work - Faster Knowledge Discovery = faster &
better work decisions
• Lower Development Costs – Synthesis helps eliminate significant
Software Development
• Better Search & Discovery – Synthesis helps yield better & more
accurate Search Results
Higher Levels of Customer / End-User Satisfaction
8
IF4IT
Synthesis is Compiler-based
Data
Compiler/Synthes
izer
Baseline Input
Data
Processing
Rules
Synthesized
Output(s)
Outputs are used for
machines like computers
AND for Humans.
Flat files like *.csv
sourced from spreadsheets
and systems.
Controls ontologies,
formatting, view controls,
report generation, semantic
relationship harvesting, etc.
9
Software
Compiler/Synthes
izer
Source Code
Files
Compiled
Software
Software
Compilation/Synthesis
Data
Compilation/Synthesis
IF4IT
Benefits of DDS
Agile: Changes can be made iteratively and in
seconds/minutes
• Simple CSV flat files can be compiled
• No long software development cycles
Scalable: Hundreds of Thousands or Millions of content
pages can be generated in minutes
Stable: Elimination of human errors, like dead links, leads
to far higher levels of quality.
Affordable: The cost per content page (including both
Quantity and Quality) is a small fraction of manually
generated content
10
IF4IT
The Synthesis Sequence of Events
Application Data
(e.g. .CSV File)
Capability Data
(e.g. .CSV File)
Human Resource Data
(e.g. .CSV File)
Product Data
(e.g. .CSV File)
Service Data
(e.g. .CSV File)
Etc. Data
(e.g. .CSV File)
Facility Data
(e.g. .CSV File)
Organization Data
(e.g. .CSV File)
…Synthesizer Inputs
Fromspreadsheetsandsystems.
1
Processing Rules
for
• Relationship Discovery
• Data Formatting
• View Generation
• Report Calculations
• Etc.
2
Data Synthesizer/
Data Compiler
3
Node Views
Data Graph/Network
Relationships
CI (z)
CI (y)
CI (x)
Business Intelligence
• Inventories
• Reports
• Graphs & Charts
• Glossaries
• Dashboards
• Visualizations
• Abbreviations
• Acronyms
Data Indexes
Catalogs
Intranet/
Digital Library
4
11
IF4IT
Real Business Impacts
12
Your Compiler
Intranets / Content Management Systems
(Confluence, Jive, Drupal, MediaWiki, etc.)
Architecture Modeling Tools (AMTs)
(Troux, Mega, Adaptive, System Architect, etc.)
Configuration Management Databases (CMDBs)
(HP, BMC, ServiceNow, etc.)
Stand-Alone Knowledge Management Systems
(Madcap, KPS, Bitrix, SalesForce, ServiceNow, etc.)
Library Management Systems (LMSs)
(Koha, Soft Link, NGL, LibSys, Folet, etc.)
Semantic Data Systems
(Cambridge Semantics, Protégé, Swoop, LDIF, etc.)
The Traditional Way = $$$$$$$$$$$$$$$$$$$
(Too many complex, expensive, difficult to deliver & operate systems
and tools… just to get to a comprehensive view of your enterprise!)
ExpensiveIntegration
ExpensiveBusinessIntelligence&Reporting
ExpensivePeoplewithSpecificSkills
DDS Results = $
(A very simple, very quick, and very
affordable “Compiler Based Approach”)
Your Data
Your Branded Digital Libraries
(Complete with Catalogs, Indexes,
Relationships, Data Views, Reports,
Dashboards, Visualizations, etc.)
3
4
Your Data + Your Rules1
Complexity Simplicity
2
Data Synthesizer/
Data Compiler
✖ ✔
Many Years & Countless Resources Minutes/Hours & Small # of Resources
IF4IT
Compiler-based DDS helps generate
“Knowledge Structures”
1. Content – High quantities, richly formatted, highly
structured, and strongly inter-linked
2. Interactive Data Visualizations - for Interactive
Analytics, Data Science, and Visual Discovery
3. Knowledge Repositories – fully curated structures
like advanced Intranets and Digital Libraries
Read More: https://www.if4it.com/knowledge-management-understanding-knowledge-structures/
13
IF4IT
1. Content: SFN over LFN
Raw and unstructured human
narrative in the form of “content”
(not “data”).
Highly structured data, based on
Name/Value pair paradigms
(e.g. CSV, JSON, etc.).
✖ ✔
14
IF4IT
2. Interactive Data Visualizations
VisualComplexity.com D3js.org
• Data Science and Data Scientists are VERY expensive.
• DDS creates a common set of fully integrated Data Visualizations
• DDS automatically creates many more out-of-the-box and ready-
to-use Data Visualizations, faster and at far lower costs.
15
IF4IT
Geographic Maps
Interactive Data Visualization Examples…
Force Directed Graphs Bubbles
Condegram Spirals
Bars, Pies, Lines
Sankey FlowsChords Multivariate Grids
See many interactive examples in the gallery at: http://www.d3js.org
16
IF4IT
3. Knowledge Repositories
Read More: https://www.if4it.com/nounz/
Generic Example: http://nounz.if4it.com Domain-Specific Example: http://km.if4it.com
17
IF4IT
The Spectrum of Synthesizable Knowledge Structures
Range of Synthesizable Knowledge Structures
• Data Records/Nodes
• Tables & Inventories
• Charts (Pie, Bar, Area,
Bubble, etc.)
• Graphs (Line, Multi-
Line, etc.)
• Web Pages
• Catalogs
• Indexes
• Reports
• Semantic Relationships
• Semantic Predicates
Simple Knowledge
Structures
• Dashboards
• Data Visualizations
(many different
visualizations)
• Semantic Data Graphs
(SDGs) / Semantic Data
Networks (SDNs)
• HTML Link Networks
• Navigation Taxonomies
• Classification
Taxonomies
Moderately Complex
Knowledge
Structures
• General Web Sites
• Intranets
• Architecture Models
• Architecture
Repositories
• Configuration
Management
Databases (CMDBs)
• Domain-specific
Knowledge
Repositories
Complex Knowledge
Structures
• Multi-Context/Multi-
Domain Digital Libraries
that include all other
structures in the
spectrum (all columns
to the left)
• Industry Specific
Determinations…
- Automatic Claim
Processing
- New Viable Drugs
- Healthcare Care
Pathways
- High Frequency Auto-
Investing
- Etc.
Super Complex
Knowledge
Structures
Example Formats = TXT, CSV, TSV, JSON, XML, HTML, SVG, PDF, Etc.
Simplest Most Complex
• Bits and Bytes
• Built-In Types and
Constants
• Lists, Arrays, and Hash
Tables
• Stacks and Heaps
• For Loops, Do Loops,
and While Loops
• Formulas and
Algorithms
• Buffers, Streams and
Files
• Classes and Objects
Simplest Knowledge
Structures
Read More: https://www.if4it.com/knowledge-management-understanding-knowledge-structures/
18
IF4IT
DDS Solves the Wikipedia Problem for Enterprises...
Quantity: Much higher quantities of artifact delivery.
Quality: Much higher levels quality.
Time: Much shorter times for artifact delivery (i.e.
much higher quantities with higher quality).
Money: Much lower costs to deliver artifacts
(especially for Data Science & Data Visualizations).
FASTER & BETTER
KNOWLEDGE DISCOVERY
AND DECISION MAKING
19
IF4IT
The Benefits of DDS
• More and Better Knowledge Repositories
- Far higher quantities of more advanced content
- More advanced features and capabilities
- Dynamic integration of data with content
- Higher quality of content (e.g. far fewer dead links)
- Far less investment of time and funds
• Higher stakeholder satisfaction and engagement
20
IF4IT
Getting Started with DDS
1. Acquire a Data Compiler/Synthesizer
• Contact IF4IT for a free NOUNZ Lite compiler https://www.if4it.com/contact-us/
2. Start with simple Spreadsheet-based Inventories (and Sharepoint List
Structure extracts)
3. Incrementally customize small data sets to meet your needs and your
desired look-and-feel
4. Slowly progress to more complicated Data Extracts (from proprietary
systems)
5. Keep in mind that Time-To-Learn is “incremental” [you don’t have to
start with big projects]
Crawl Walk Run
21
IF4IT
Questions and Discussion
22
Frank Guerino
CEO & Chairman
The International Foundation for
Information Technology (IF4IT)
Email: Frank.Guerino@if4it.com
Twitter: @IF4IT
IF4IT
Read More:
• Automated Content Generation & Curation: https://www.if4it.com/knowledge-
management-automated-content-generation-and-curation/
• The Wikipedia Problem: https://www.if4it.com/wikipedia-problem-understanding-
enterprise-knowledge-repositories-fail/
• Understanding Data Driven Synthesis: https://www.if4it.com/understanding-data-
driven-synthesis/
• Understanding Knowledge Structures: https://www.if4it.com/knowledge-management-
understanding-knowledge-structures/
• Learn about D3 and Interactive Visualizations: http:www.d3js.org
• Understanding Knowledge Structures: https://www.if4it.com/knowledge-management-
understanding-knowledge-structures/
• Learn about the IF4IT NOUNZ Data Compilation Platform:
https://www.if4it.com/nounz/
• See Interactive Example of DDS-generated Generic Digital Library:
http://nounz.if4it.com (Less than 3 minutes to generate.)
• See Interactive Example of DDS-generated KM Body of Knowledge:
http://km.if4it.com (Only seconds to generate.)
23
IF4IT24
APPENDIX
Real Case Studies
IF4IT
Global Biopharmaceutical
25
-- TOTAL Administration Category Noun Instances = 5: Time = Wednesday June 15, 2016 at 10:04:08
-- TOTAL Assay Noun Instances = 749: Time = Wednesday June 15, 2016 at 10:04:08
-- TOTAL Biological Matrix Category Noun Instances = 42: Time = Wednesday June 15, 2016 at 10:04:08
-- TOTAL Biomarker Noun Instances = 42: Time = Wednesday June 15, 2016 at 10:04:08
-- TOTAL Company Noun Instances = 18: Time = Wednesday June 15, 2016 at 10:04:08
-- TOTAL Disease Mechanism Noun Instances = 17: Time = Wednesday June 15, 2016 at 10:04:08
-- TOTAL Facility Noun Instances = 3: Time = Wednesday June 15, 2016 at 10:04:08
-- TOTAL Immunoassay Platform Noun Instances = 6: Time = Wednesday June 15, 2016 at 10:04:08
-- TOTAL Instrument Category Noun Instances = 5: Time = Wednesday June 15, 2016 at 10:04:08
-- TOTAL Instrument Noun Instances = 37: Time = Wednesday June 15, 2016 at 10:04:08
-- TOTAL Offering Noun Instances = 516: Time = Wednesday June 15, 2016 at 10:04:09
-- TOTAL Program Category Noun Instances = 5: Time = Wednesday June 15, 2016 at 10:04:09
-- TOTAL Study Type Noun Instances = 17: Time = Wednesday June 15, 2016 at 10:04:09
-- TOTAL White Paper Noun Instances = 28: Time = Wednesday June 15, 2016 at 10:04:09
-- TOTAL Application Noun Instances = 1000: Time = Wednesday June 15, 2016 at 10:04:09
-- TOTAL Business Domain Noun Instances = 9: Time = Wednesday June 15, 2016 at 10:04:09
-- TOTAL Capability Noun Instances = 32: Time = Wednesday June 15, 2016 at 10:04:09
-- TOTAL Computing Server Noun Instances = 100: Time = Wednesday June 15, 2016 at 10:04:09
-- TOTAL Contract Noun Instances = 1166: Time = Wednesday June 15, 2016 at 10:04:09
-- TOTAL Country Noun Instances = 251: Time = Wednesday June 15, 2016 at 10:04:09
-- TOTAL Customer Noun Instances = 150: Time = Wednesday June 15, 2016 at 10:04:10
-- TOTAL Database Noun Instances = 100: Time = Wednesday June 15, 2016 at 10:04:10
-- TOTAL Data Transport Technology Noun Instances = 4: Time = Wednesday June 15, 2016 at 10:04:10
-- TOTAL Environment Noun Instances = 8: Time = Wednesday June 15, 2016 at 10:04:10
-- TOTAL Frequently Asked Question Noun Instances = 32: Time = Wednesday June 15, 2016 at 10:04:10
-- TOTAL Information Category Noun Instances = 16: Time = Wednesday June 15, 2016 at 10:04:10
-- TOTAL Interface Noun Instances = 99: Time = Wednesday June 15, 2016 at 10:04:10
-- TOTAL Language Code Noun Instances = 504: Time = Wednesday June 15, 2016 at 10:04:10
-- TOTAL Letter Noun Instances = 26: Time = Wednesday June 15, 2016 at 10:04:10
-- TOTAL Location Noun Instances = 50: Time = Wednesday June 15, 2016 at 10:04:10
-- TOTAL Market Sector Noun Instances = 2: Time = Wednesday June 15, 2016 at 10:04:10
-- TOTAL Market Segment Noun Instances = 2: Time = Wednesday June 15, 2016 at 10:04:10
-- TOTAL News Article Noun Instances = 6: Time = Wednesday June 15, 2016 at 10:04:10
-- TOTAL Number Noun Instances = 9: Time = Wednesday June 15, 2016 at 10:04:10
-- TOTAL Organization Noun Instances = 29: Time = Wednesday June 15, 2016 at 10:04:10
-- TOTAL Policy Noun Instances = 100: Time = Wednesday June 15, 2016 at 10:04:10
-- TOTAL Process Noun Instances = 26: Time = Wednesday June 15, 2016 at 10:04:10
-- TOTAL Product Noun Instances = 25: Time = Wednesday June 15, 2016 at 10:04:10
-- TOTAL Project Noun Instances = 1000: Time = Wednesday June 15, 2016 at 10:04:10
-- TOTAL Resource Noun Instances = 14: Time = Wednesday June 15, 2016 at 10:04:10
-- TOTAL Sales Transaction Noun Instances = 886: Time = Wednesday June 15, 2016 at 10:04:11
-- TOTAL SDLC Activity Noun Instances = 353: Time = Wednesday June 15, 2016 at 10:04:11
-- TOTAL SDLC Phase Noun Instances = 14: Time = Wednesday June 15, 2016 at 10:04:11
-- TOTAL Service Noun Instances = 561: Time = Wednesday June 15, 2016 at 10:04:11
-- TOTAL Software Noun Instances = 100: Time = Wednesday June 15, 2016 at 10:04:11
-- TOTAL Glossary Term Noun Instances = 235: Time = Wednesday June 15, 2016 at 10:04:11
-- TOTAL Vendor Noun Instances = 100: Time = Wednesday June 15, 2016 at 10:04:11
-- TOTAL Undefined Noun Type Noun Instances = 1: Time = Wednesday June 15, 2016 at 10:04:11
TOTAL Number of Unique Noun Types = 48: Time = Wednesday June 15, 2016 at 10:04:11
TOTAL Noun Instances registered = 8500: Time = Wednesday June 15, 2016 at 10:04:11
TOTAL Number of Unique Abbreviations or Acronyms = 655: Time = Wednesday June 15, 2016 at 10:04:11
TOTAL Number of Unique Semantic Relationships = 30767: Time = Wednesday June 15, 2016 at 10:04:15
TOTAL Number of Unique Semantic Relationship Predicates = 97: Time = Wednesday June 15, 2016 at 10:04:15
TOTAL Minimum Number of HTML Links = 113536: Time = Wednesday June 15, 2016 at 10:07:27
Spreadsheets were used to easily and quickly
collect, organize, and supply data to NOUNZ
Compiler in 1st Normal Form CSV formats.
Vertical industry and business data was collected
from public Biopharma web site, organized and
cleansed in about 5 hours.
Generic IT Data was intentionally comingled with
Biopharma vertical industry and business data, in
order to show the effects of mixing different data
types.
TOTALS:
Total unique Noun Types (Data Types) = 48
Total Catalogs = 50
Total Noun Instances (across all Noun Types = 8500
Total Semantic Relationships = 30767
Total Semantic Predicates = 97
Total Abbreviations and Acronyms = 655
Total “minimum” # of HTML links = 113536
Total Compile Time = 3 Minutes and 27 Seconds
IF4IT
Regional Health Care Payer/Insurer
26
• 47 defined Noun Types (a.k.a. Data Types),
• almost 49,000 Noun Instances (a.k.a. Data Instances or Records) that are sourced
from the different Noun Types,
• Almost 294,000 automatically synthesized web pages with different views of data
and information,
• Over 300K automatically discovered and harvested Semantic Relationships that
translate directly to over 1,100,000 contextual and meaningful HTML links.
• 46 total Catalogs, Including a Master Catalog, 47 Noun Domain Specific Catalogs
(one for each Noun Type), an Abbreviations/Acronyms Catalog, and a Relationship
Predicates Catalog
• 288 unique Indexing Categories with 2582 unique Data Indexes
• 869 harvested and curated Abbreviations and Acronyms
• Over 1,600 unique semantic relationship descriptors (i.e. Predicates)
• 47 Domain Specific Dashboards (one for each Noun Type).
Total Compiler Time = Approximately 15 minutes

More Related Content

PDF
Content Analytics
PDF
Re-examining the Jennex Olfman KM Success Model
PDF
Benchmarking IT Agility Final Report
PDF
Data science governance and GDPR
PDF
Data Prep - A Key Ingredient for Cloud-based Analytics
PDF
SharePoint Saturday London - The Nuts and Bolts of Metadata Tagging and Taxon...
PDF
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
PDF
How to Get Started with Your MongoDB Pilot Project
Content Analytics
Re-examining the Jennex Olfman KM Success Model
Benchmarking IT Agility Final Report
Data science governance and GDPR
Data Prep - A Key Ingredient for Cloud-based Analytics
SharePoint Saturday London - The Nuts and Bolts of Metadata Tagging and Taxon...
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
How to Get Started with Your MongoDB Pilot Project

What's hot (20)

PPTX
Introduction to Big Data Analytics
PDF
Focus on Your Analysis, Not Your SQL Code
PDF
Data science governance : what and how
PPT
Chapter12
PDF
Data-Ed: Data Architecture Requirements
PDF
The Nuts and Bolts of Metadata Tagging and Taxonomies Made Easy Webinar
PPTX
Key Elements for a Successful Service Analytics Program
PDF
Data-Ed Online: Emerging Trends in Data Jobs
PPTX
BlueBrain Nexus Technical Introduction
PDF
Groundbreaking and Game-changing Enterprise Search Webinar
PDF
The Key to Big Data Modeling: Collaboration
PPTX
A Year in Review - Building a Comprehensive Data Management Program
PDF
Strategic imperative the enterprise data model
PDF
DI&A Slides: Data Lake vs. Data Warehouse
PDF
ADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced Analytics
PDF
ARMA Calgary Spring Seminar: The Nuts and Bolts of Metadata Tagging and Taxon...
PDF
Building the Modern Data Hub
PDF
DataOps - The Foundation for Your Agile Data Architecture
PDF
“Semantic Technologies for Smart Services”
PDF
Data Systems Integration & Business Value Pt. 1: Metadata
Introduction to Big Data Analytics
Focus on Your Analysis, Not Your SQL Code
Data science governance : what and how
Chapter12
Data-Ed: Data Architecture Requirements
The Nuts and Bolts of Metadata Tagging and Taxonomies Made Easy Webinar
Key Elements for a Successful Service Analytics Program
Data-Ed Online: Emerging Trends in Data Jobs
BlueBrain Nexus Technical Introduction
Groundbreaking and Game-changing Enterprise Search Webinar
The Key to Big Data Modeling: Collaboration
A Year in Review - Building a Comprehensive Data Management Program
Strategic imperative the enterprise data model
DI&A Slides: Data Lake vs. Data Warehouse
ADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced Analytics
ARMA Calgary Spring Seminar: The Nuts and Bolts of Metadata Tagging and Taxon...
Building the Modern Data Hub
DataOps - The Foundation for Your Agile Data Architecture
“Semantic Technologies for Smart Services”
Data Systems Integration & Business Value Pt. 1: Metadata
Ad

Similar to Automatic and rapid generation of massive knowledge repositories from data (20)

PPTX
The exciting new world of code & data
PDF
Getting Started with Unstructured Data
PPTX
Hybrid systems
PPTX
Lecture 01-1-IIS.pptx
PPTX
BEST DIGITAL MARKETING IN PASCHIM VIHAR
PPT
Big Data Ecosystem for Data-Driven Decision Making
PPTX
Where the data jobs are? A Data PDX talk
PDF
From Rocket Science to Data Science
PDF
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
PDF
AI 101 for km washington november 2018 km world workshop
PPTX
Workshop_Presentation.pptx
PDF
Smart Data for Smart Labs
PPTX
ET Ch - 2.pptx
PDF
Architecting for Data Science
PDF
How good are you working with intelligent machines?
PDF
ISDC_2015_Glenn Brouwer_Digital Transformation
PPTX
Intel 20180608 v2
PDF
IRJET- A Survey on Soft Computing Techniques and Applications
PDF
Data and analytic strategies for developing ethical it
PPTX
Job Openings in IT and Decision Sciences
The exciting new world of code & data
Getting Started with Unstructured Data
Hybrid systems
Lecture 01-1-IIS.pptx
BEST DIGITAL MARKETING IN PASCHIM VIHAR
Big Data Ecosystem for Data-Driven Decision Making
Where the data jobs are? A Data PDX talk
From Rocket Science to Data Science
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
AI 101 for km washington november 2018 km world workshop
Workshop_Presentation.pptx
Smart Data for Smart Labs
ET Ch - 2.pptx
Architecting for Data Science
How good are you working with intelligent machines?
ISDC_2015_Glenn Brouwer_Digital Transformation
Intel 20180608 v2
IRJET- A Survey on Soft Computing Techniques and Applications
Data and analytic strategies for developing ethical it
Job Openings in IT and Decision Sciences
Ad

More from SIKM (20)

PDF
Knowledge Retention Framework and Maturity Model
PDF
To ISO or not to ISO?
PPTX
Accelerating Knowledge at Scale
PDF
The crossroads of Information Architecture and Knowledge Management
PPTX
A system-thinking approach to a learning organization transformation
PDF
Resilience and KM
PPTX
Expert Knowledge Transfer - Reflections and Panel Discussion
PDF
The Value of Knowledge
PPTX
Communities of Practice - Challenges, Curiosity and Dragons
PDF
Data Curation - Data probity in a time of COVID
PPTX
AI and Big Data in KM
PPTX
Tips & Tricks for Your Lessons Learned Program
PDF
Integration of Knowledge and Innovation Standards
PPTX
Behavioral DNA of Collaborative Leadership
PPTX
More Than a Feeling: Emotions and Knowledge Management
PPTX
Applied Knowledge Services: A New Approach for Management and Leadership in t...
PPTX
Could a Rural Island Inspire KM Approaches?
PPTX
Tom Barfield - Navigating Knowledge to the User
PDF
The Impact of Data Analytics in Digital Transformation Programs
PDF
Alchemy of Data Elements - Top Down Meets Bottom Up
Knowledge Retention Framework and Maturity Model
To ISO or not to ISO?
Accelerating Knowledge at Scale
The crossroads of Information Architecture and Knowledge Management
A system-thinking approach to a learning organization transformation
Resilience and KM
Expert Knowledge Transfer - Reflections and Panel Discussion
The Value of Knowledge
Communities of Practice - Challenges, Curiosity and Dragons
Data Curation - Data probity in a time of COVID
AI and Big Data in KM
Tips & Tricks for Your Lessons Learned Program
Integration of Knowledge and Innovation Standards
Behavioral DNA of Collaborative Leadership
More Than a Feeling: Emotions and Knowledge Management
Applied Knowledge Services: A New Approach for Management and Leadership in t...
Could a Rural Island Inspire KM Approaches?
Tom Barfield - Navigating Knowledge to the User
The Impact of Data Analytics in Digital Transformation Programs
Alchemy of Data Elements - Top Down Meets Bottom Up

Recently uploaded (20)

PDF
Digital Marketing & E-commerce Certificate Glossary.pdf.................
PPTX
Probability Distribution, binomial distribution, poisson distribution
PDF
NISM Series V-A MFD Workbook v December 2024.khhhjtgvwevoypdnew one must use ...
PDF
Tata consultancy services case study shri Sharda college, basrur
PDF
Module 3 - Functions of the Supervisor - Part 1 - Student Resource (1).pdf
PDF
Keppel_Proposed Divestment of M1 Limited
PDF
Comments on Crystal Cloud and Energy Star.pdf
PDF
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
PDF
Stem Cell Market Report | Trends, Growth & Forecast 2025-2034
PDF
Family Law: The Role of Communication in Mediation (www.kiu.ac.ug)
DOCX
Business Management - unit 1 and 2
PDF
NEW - FEES STRUCTURES (01-july-2024).pdf
PPTX
3. HISTORICAL PERSPECTIVE UNIIT 3^..pptx
PPTX
Belch_12e_PPT_Ch18_Accessible_university.pptx
PPTX
2025 Product Deck V1.0.pptxCATALOGTCLCIA
PDF
Nidhal Samdaie CV - International Business Consultant
PDF
How to Get Business Funding for Small Business Fast
PPT
Lecture 3344;;,,(,(((((((((((((((((((((((
PPTX
Business Ethics - An introduction and its overview.pptx
PDF
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
Digital Marketing & E-commerce Certificate Glossary.pdf.................
Probability Distribution, binomial distribution, poisson distribution
NISM Series V-A MFD Workbook v December 2024.khhhjtgvwevoypdnew one must use ...
Tata consultancy services case study shri Sharda college, basrur
Module 3 - Functions of the Supervisor - Part 1 - Student Resource (1).pdf
Keppel_Proposed Divestment of M1 Limited
Comments on Crystal Cloud and Energy Star.pdf
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
Stem Cell Market Report | Trends, Growth & Forecast 2025-2034
Family Law: The Role of Communication in Mediation (www.kiu.ac.ug)
Business Management - unit 1 and 2
NEW - FEES STRUCTURES (01-july-2024).pdf
3. HISTORICAL PERSPECTIVE UNIIT 3^..pptx
Belch_12e_PPT_Ch18_Accessible_university.pptx
2025 Product Deck V1.0.pptxCATALOGTCLCIA
Nidhal Samdaie CV - International Business Consultant
How to Get Business Funding for Small Business Fast
Lecture 3344;;,,(,(((((((((((((((((((((((
Business Ethics - An introduction and its overview.pptx
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi

Automatic and rapid generation of massive knowledge repositories from data

  • 1. IF4IT AUTOMATIC AND RAPID GENERATION OF MASSIVE KNOWLEDGE REPOSITORIES, DIRECTLY FROM DATA Author/Presenter: Frank Guerino Chairman for The International Foundation for Information Technology (IF4IT) Email: Frank.Guerino @ if4it.com LinkedIn: https://www.linkedin.com/in/frankguerino/ Follow Us on Twitter: @IF4IT Co-Author: Dr. Joel Kline, PhD. Board of Advisors, The International Foundation for Information Technology (IF4IT) Professor, Lebanon Valley College, PA-USA 1
  • 2. IF4IT The Future isAutomated Synthesis of Knowledge Repositories Read More: https://www.if4it.com/knowledge-management-automated-content-generation-and-curation/ Meet Bob. Bob is very competent. Bob outperforms other people by generating one great knowledge article per hour. Automated Content Generation Software Meet Bob’s replacement. Bob’s replacement generates millions of higher quality, highly curated, and semantically inter-linked knowledge articles, in the time it takes Bob to create just one… at a fraction of the cost. 2 Few knowledge repositories, limited content, poor curation, lots of dead links, and no semantic relationships. More knowledge repositories, far more content, greater curation, almost no dead links, and semantic relationships. ✖ ✔ ACTOR ACTIONS RESULTS
  • 3. IF4IT The Wikipedia Problem • The Wikipedia Community is NOT like an Enterprise Work Community - About 17 years to develop, - Over 130M voluntary editors (i.e. free labor), - Over 6M content articles • People believe they can build internal knowledge repositories (like libraries and intranets) using the same manual content development paradigm as Wikipedia • The end result is almost always the same… “Relatively empty and low value Knowledge/Content Repositories” People often can’t find the answers they need. Read More: https://www.if4it.com/wikipedia-problem-understanding-enterprise-knowledge-repositories-fail/ 3
  • 4. IF4IT The Problem is Manual Labor Quantity: Low quantities of artifact delivery. Quality: Higher levels of human-introduced errors. Time: Longer artifact delivery times. Money: High costs for delivery of artifacts. Trend: Knowledge Repository Automation is very important because, more often than not, teams that build them have very limited resource (people & finances). Trend: With the move to “Digital” the expectation of Knowledge Repositories is even higher. 4
  • 5. IF4IT The Solution = Automation via Compilation • The process is called Synthesis (a.k.a. Compilation) • Compilation is the word used by software developers • Synthesis is the word used by non-software developers • Specifically, we use and recommend Data Driven Synthesis (DDS) • We use Compiler-based DDS to generate content, curate content, interlink content, and automatically build and provision Knowledge/Content repositories Read More: https://www.if4it.com/understanding-data-driven-synthesis/ 5
  • 6. IF4IT Many Decades of Successful Synthesis  Synthesis/Compilation of Software (Since 1970s)  Synthesis of Integrated Circuit Schematics (Since 1992) - Inputs are Hardware Descriptive Languages (HDLs) like VHDL and Verilog. - Outputs are used for Simulation, Acceleration, Emulation, and Fabrication  Synthesis of APIs and software code (i.e. Scaffolding for Software Developers, such as for Java Spring and Ruby on Rails)  Synthesis of large volumes of test data to exercise complex systems  Synthesis of chemical Compounds for Drug Discovery  Synthesis of Health Care Pathways (Diagnosis + Treatments)  Synthesis of (computer generated) Music and Art  Synthesis of Electronic Documentation (i.e. data driven content)  Synthesis of Digital Libraries (massive web sites)  Synthesis of Semantic Data Graphs (SDGs) 6
  • 7. IF4IT Who cares about DDS-based automation? • Internet and Intranet Web Content Managers & Developers • Technical Writers / Technical Communicators • Architects (Enterprise/Solutions/Business/Applications/Data/etc.) • Enterprise Models • Software Developers (Using Compilation for about 5 Decades) • API Documentation • Software Configuration Documentation • Engineers (Using Synthesis for about 3 Decades) • Hardware, Network, Communications, & Semiconductor Documentation • Anyone who documents topics, curates, and who publishes results to web pages in some Content/Knowledge Repository 7
  • 8. IF4IT Common Use Cases Driving DDS • Strategic Planning – Enterprise Portfolio Impact Analysis • Faster Domain Documentation, - More inter-linked documentation, with interactive data and with fewer errors, @ far lower costs • Better Customer Support – Rapid and more accurate Incident Impact Analysis • Better Operational Work - Faster Knowledge Discovery = faster & better work decisions • Lower Development Costs – Synthesis helps eliminate significant Software Development • Better Search & Discovery – Synthesis helps yield better & more accurate Search Results Higher Levels of Customer / End-User Satisfaction 8
  • 9. IF4IT Synthesis is Compiler-based Data Compiler/Synthes izer Baseline Input Data Processing Rules Synthesized Output(s) Outputs are used for machines like computers AND for Humans. Flat files like *.csv sourced from spreadsheets and systems. Controls ontologies, formatting, view controls, report generation, semantic relationship harvesting, etc. 9 Software Compiler/Synthes izer Source Code Files Compiled Software Software Compilation/Synthesis Data Compilation/Synthesis
  • 10. IF4IT Benefits of DDS Agile: Changes can be made iteratively and in seconds/minutes • Simple CSV flat files can be compiled • No long software development cycles Scalable: Hundreds of Thousands or Millions of content pages can be generated in minutes Stable: Elimination of human errors, like dead links, leads to far higher levels of quality. Affordable: The cost per content page (including both Quantity and Quality) is a small fraction of manually generated content 10
  • 11. IF4IT The Synthesis Sequence of Events Application Data (e.g. .CSV File) Capability Data (e.g. .CSV File) Human Resource Data (e.g. .CSV File) Product Data (e.g. .CSV File) Service Data (e.g. .CSV File) Etc. Data (e.g. .CSV File) Facility Data (e.g. .CSV File) Organization Data (e.g. .CSV File) …Synthesizer Inputs Fromspreadsheetsandsystems. 1 Processing Rules for • Relationship Discovery • Data Formatting • View Generation • Report Calculations • Etc. 2 Data Synthesizer/ Data Compiler 3 Node Views Data Graph/Network Relationships CI (z) CI (y) CI (x) Business Intelligence • Inventories • Reports • Graphs & Charts • Glossaries • Dashboards • Visualizations • Abbreviations • Acronyms Data Indexes Catalogs Intranet/ Digital Library 4 11
  • 12. IF4IT Real Business Impacts 12 Your Compiler Intranets / Content Management Systems (Confluence, Jive, Drupal, MediaWiki, etc.) Architecture Modeling Tools (AMTs) (Troux, Mega, Adaptive, System Architect, etc.) Configuration Management Databases (CMDBs) (HP, BMC, ServiceNow, etc.) Stand-Alone Knowledge Management Systems (Madcap, KPS, Bitrix, SalesForce, ServiceNow, etc.) Library Management Systems (LMSs) (Koha, Soft Link, NGL, LibSys, Folet, etc.) Semantic Data Systems (Cambridge Semantics, Protégé, Swoop, LDIF, etc.) The Traditional Way = $$$$$$$$$$$$$$$$$$$ (Too many complex, expensive, difficult to deliver & operate systems and tools… just to get to a comprehensive view of your enterprise!) ExpensiveIntegration ExpensiveBusinessIntelligence&Reporting ExpensivePeoplewithSpecificSkills DDS Results = $ (A very simple, very quick, and very affordable “Compiler Based Approach”) Your Data Your Branded Digital Libraries (Complete with Catalogs, Indexes, Relationships, Data Views, Reports, Dashboards, Visualizations, etc.) 3 4 Your Data + Your Rules1 Complexity Simplicity 2 Data Synthesizer/ Data Compiler ✖ ✔ Many Years & Countless Resources Minutes/Hours & Small # of Resources
  • 13. IF4IT Compiler-based DDS helps generate “Knowledge Structures” 1. Content – High quantities, richly formatted, highly structured, and strongly inter-linked 2. Interactive Data Visualizations - for Interactive Analytics, Data Science, and Visual Discovery 3. Knowledge Repositories – fully curated structures like advanced Intranets and Digital Libraries Read More: https://www.if4it.com/knowledge-management-understanding-knowledge-structures/ 13
  • 14. IF4IT 1. Content: SFN over LFN Raw and unstructured human narrative in the form of “content” (not “data”). Highly structured data, based on Name/Value pair paradigms (e.g. CSV, JSON, etc.). ✖ ✔ 14
  • 15. IF4IT 2. Interactive Data Visualizations VisualComplexity.com D3js.org • Data Science and Data Scientists are VERY expensive. • DDS creates a common set of fully integrated Data Visualizations • DDS automatically creates many more out-of-the-box and ready- to-use Data Visualizations, faster and at far lower costs. 15
  • 16. IF4IT Geographic Maps Interactive Data Visualization Examples… Force Directed Graphs Bubbles Condegram Spirals Bars, Pies, Lines Sankey FlowsChords Multivariate Grids See many interactive examples in the gallery at: http://www.d3js.org 16
  • 17. IF4IT 3. Knowledge Repositories Read More: https://www.if4it.com/nounz/ Generic Example: http://nounz.if4it.com Domain-Specific Example: http://km.if4it.com 17
  • 18. IF4IT The Spectrum of Synthesizable Knowledge Structures Range of Synthesizable Knowledge Structures • Data Records/Nodes • Tables & Inventories • Charts (Pie, Bar, Area, Bubble, etc.) • Graphs (Line, Multi- Line, etc.) • Web Pages • Catalogs • Indexes • Reports • Semantic Relationships • Semantic Predicates Simple Knowledge Structures • Dashboards • Data Visualizations (many different visualizations) • Semantic Data Graphs (SDGs) / Semantic Data Networks (SDNs) • HTML Link Networks • Navigation Taxonomies • Classification Taxonomies Moderately Complex Knowledge Structures • General Web Sites • Intranets • Architecture Models • Architecture Repositories • Configuration Management Databases (CMDBs) • Domain-specific Knowledge Repositories Complex Knowledge Structures • Multi-Context/Multi- Domain Digital Libraries that include all other structures in the spectrum (all columns to the left) • Industry Specific Determinations… - Automatic Claim Processing - New Viable Drugs - Healthcare Care Pathways - High Frequency Auto- Investing - Etc. Super Complex Knowledge Structures Example Formats = TXT, CSV, TSV, JSON, XML, HTML, SVG, PDF, Etc. Simplest Most Complex • Bits and Bytes • Built-In Types and Constants • Lists, Arrays, and Hash Tables • Stacks and Heaps • For Loops, Do Loops, and While Loops • Formulas and Algorithms • Buffers, Streams and Files • Classes and Objects Simplest Knowledge Structures Read More: https://www.if4it.com/knowledge-management-understanding-knowledge-structures/ 18
  • 19. IF4IT DDS Solves the Wikipedia Problem for Enterprises... Quantity: Much higher quantities of artifact delivery. Quality: Much higher levels quality. Time: Much shorter times for artifact delivery (i.e. much higher quantities with higher quality). Money: Much lower costs to deliver artifacts (especially for Data Science & Data Visualizations). FASTER & BETTER KNOWLEDGE DISCOVERY AND DECISION MAKING 19
  • 20. IF4IT The Benefits of DDS • More and Better Knowledge Repositories - Far higher quantities of more advanced content - More advanced features and capabilities - Dynamic integration of data with content - Higher quality of content (e.g. far fewer dead links) - Far less investment of time and funds • Higher stakeholder satisfaction and engagement 20
  • 21. IF4IT Getting Started with DDS 1. Acquire a Data Compiler/Synthesizer • Contact IF4IT for a free NOUNZ Lite compiler https://www.if4it.com/contact-us/ 2. Start with simple Spreadsheet-based Inventories (and Sharepoint List Structure extracts) 3. Incrementally customize small data sets to meet your needs and your desired look-and-feel 4. Slowly progress to more complicated Data Extracts (from proprietary systems) 5. Keep in mind that Time-To-Learn is “incremental” [you don’t have to start with big projects] Crawl Walk Run 21
  • 22. IF4IT Questions and Discussion 22 Frank Guerino CEO & Chairman The International Foundation for Information Technology (IF4IT) Email: [email protected] Twitter: @IF4IT
  • 23. IF4IT Read More: • Automated Content Generation & Curation: https://www.if4it.com/knowledge- management-automated-content-generation-and-curation/ • The Wikipedia Problem: https://www.if4it.com/wikipedia-problem-understanding- enterprise-knowledge-repositories-fail/ • Understanding Data Driven Synthesis: https://www.if4it.com/understanding-data- driven-synthesis/ • Understanding Knowledge Structures: https://www.if4it.com/knowledge-management- understanding-knowledge-structures/ • Learn about D3 and Interactive Visualizations: http:www.d3js.org • Understanding Knowledge Structures: https://www.if4it.com/knowledge-management- understanding-knowledge-structures/ • Learn about the IF4IT NOUNZ Data Compilation Platform: https://www.if4it.com/nounz/ • See Interactive Example of DDS-generated Generic Digital Library: http://nounz.if4it.com (Less than 3 minutes to generate.) • See Interactive Example of DDS-generated KM Body of Knowledge: http://km.if4it.com (Only seconds to generate.) 23
  • 25. IF4IT Global Biopharmaceutical 25 -- TOTAL Administration Category Noun Instances = 5: Time = Wednesday June 15, 2016 at 10:04:08 -- TOTAL Assay Noun Instances = 749: Time = Wednesday June 15, 2016 at 10:04:08 -- TOTAL Biological Matrix Category Noun Instances = 42: Time = Wednesday June 15, 2016 at 10:04:08 -- TOTAL Biomarker Noun Instances = 42: Time = Wednesday June 15, 2016 at 10:04:08 -- TOTAL Company Noun Instances = 18: Time = Wednesday June 15, 2016 at 10:04:08 -- TOTAL Disease Mechanism Noun Instances = 17: Time = Wednesday June 15, 2016 at 10:04:08 -- TOTAL Facility Noun Instances = 3: Time = Wednesday June 15, 2016 at 10:04:08 -- TOTAL Immunoassay Platform Noun Instances = 6: Time = Wednesday June 15, 2016 at 10:04:08 -- TOTAL Instrument Category Noun Instances = 5: Time = Wednesday June 15, 2016 at 10:04:08 -- TOTAL Instrument Noun Instances = 37: Time = Wednesday June 15, 2016 at 10:04:08 -- TOTAL Offering Noun Instances = 516: Time = Wednesday June 15, 2016 at 10:04:09 -- TOTAL Program Category Noun Instances = 5: Time = Wednesday June 15, 2016 at 10:04:09 -- TOTAL Study Type Noun Instances = 17: Time = Wednesday June 15, 2016 at 10:04:09 -- TOTAL White Paper Noun Instances = 28: Time = Wednesday June 15, 2016 at 10:04:09 -- TOTAL Application Noun Instances = 1000: Time = Wednesday June 15, 2016 at 10:04:09 -- TOTAL Business Domain Noun Instances = 9: Time = Wednesday June 15, 2016 at 10:04:09 -- TOTAL Capability Noun Instances = 32: Time = Wednesday June 15, 2016 at 10:04:09 -- TOTAL Computing Server Noun Instances = 100: Time = Wednesday June 15, 2016 at 10:04:09 -- TOTAL Contract Noun Instances = 1166: Time = Wednesday June 15, 2016 at 10:04:09 -- TOTAL Country Noun Instances = 251: Time = Wednesday June 15, 2016 at 10:04:09 -- TOTAL Customer Noun Instances = 150: Time = Wednesday June 15, 2016 at 10:04:10 -- TOTAL Database Noun Instances = 100: Time = Wednesday June 15, 2016 at 10:04:10 -- TOTAL Data Transport Technology Noun Instances = 4: Time = Wednesday June 15, 2016 at 10:04:10 -- TOTAL Environment Noun Instances = 8: Time = Wednesday June 15, 2016 at 10:04:10 -- TOTAL Frequently Asked Question Noun Instances = 32: Time = Wednesday June 15, 2016 at 10:04:10 -- TOTAL Information Category Noun Instances = 16: Time = Wednesday June 15, 2016 at 10:04:10 -- TOTAL Interface Noun Instances = 99: Time = Wednesday June 15, 2016 at 10:04:10 -- TOTAL Language Code Noun Instances = 504: Time = Wednesday June 15, 2016 at 10:04:10 -- TOTAL Letter Noun Instances = 26: Time = Wednesday June 15, 2016 at 10:04:10 -- TOTAL Location Noun Instances = 50: Time = Wednesday June 15, 2016 at 10:04:10 -- TOTAL Market Sector Noun Instances = 2: Time = Wednesday June 15, 2016 at 10:04:10 -- TOTAL Market Segment Noun Instances = 2: Time = Wednesday June 15, 2016 at 10:04:10 -- TOTAL News Article Noun Instances = 6: Time = Wednesday June 15, 2016 at 10:04:10 -- TOTAL Number Noun Instances = 9: Time = Wednesday June 15, 2016 at 10:04:10 -- TOTAL Organization Noun Instances = 29: Time = Wednesday June 15, 2016 at 10:04:10 -- TOTAL Policy Noun Instances = 100: Time = Wednesday June 15, 2016 at 10:04:10 -- TOTAL Process Noun Instances = 26: Time = Wednesday June 15, 2016 at 10:04:10 -- TOTAL Product Noun Instances = 25: Time = Wednesday June 15, 2016 at 10:04:10 -- TOTAL Project Noun Instances = 1000: Time = Wednesday June 15, 2016 at 10:04:10 -- TOTAL Resource Noun Instances = 14: Time = Wednesday June 15, 2016 at 10:04:10 -- TOTAL Sales Transaction Noun Instances = 886: Time = Wednesday June 15, 2016 at 10:04:11 -- TOTAL SDLC Activity Noun Instances = 353: Time = Wednesday June 15, 2016 at 10:04:11 -- TOTAL SDLC Phase Noun Instances = 14: Time = Wednesday June 15, 2016 at 10:04:11 -- TOTAL Service Noun Instances = 561: Time = Wednesday June 15, 2016 at 10:04:11 -- TOTAL Software Noun Instances = 100: Time = Wednesday June 15, 2016 at 10:04:11 -- TOTAL Glossary Term Noun Instances = 235: Time = Wednesday June 15, 2016 at 10:04:11 -- TOTAL Vendor Noun Instances = 100: Time = Wednesday June 15, 2016 at 10:04:11 -- TOTAL Undefined Noun Type Noun Instances = 1: Time = Wednesday June 15, 2016 at 10:04:11 TOTAL Number of Unique Noun Types = 48: Time = Wednesday June 15, 2016 at 10:04:11 TOTAL Noun Instances registered = 8500: Time = Wednesday June 15, 2016 at 10:04:11 TOTAL Number of Unique Abbreviations or Acronyms = 655: Time = Wednesday June 15, 2016 at 10:04:11 TOTAL Number of Unique Semantic Relationships = 30767: Time = Wednesday June 15, 2016 at 10:04:15 TOTAL Number of Unique Semantic Relationship Predicates = 97: Time = Wednesday June 15, 2016 at 10:04:15 TOTAL Minimum Number of HTML Links = 113536: Time = Wednesday June 15, 2016 at 10:07:27 Spreadsheets were used to easily and quickly collect, organize, and supply data to NOUNZ Compiler in 1st Normal Form CSV formats. Vertical industry and business data was collected from public Biopharma web site, organized and cleansed in about 5 hours. Generic IT Data was intentionally comingled with Biopharma vertical industry and business data, in order to show the effects of mixing different data types. TOTALS: Total unique Noun Types (Data Types) = 48 Total Catalogs = 50 Total Noun Instances (across all Noun Types = 8500 Total Semantic Relationships = 30767 Total Semantic Predicates = 97 Total Abbreviations and Acronyms = 655 Total “minimum” # of HTML links = 113536 Total Compile Time = 3 Minutes and 27 Seconds
  • 26. IF4IT Regional Health Care Payer/Insurer 26 • 47 defined Noun Types (a.k.a. Data Types), • almost 49,000 Noun Instances (a.k.a. Data Instances or Records) that are sourced from the different Noun Types, • Almost 294,000 automatically synthesized web pages with different views of data and information, • Over 300K automatically discovered and harvested Semantic Relationships that translate directly to over 1,100,000 contextual and meaningful HTML links. • 46 total Catalogs, Including a Master Catalog, 47 Noun Domain Specific Catalogs (one for each Noun Type), an Abbreviations/Acronyms Catalog, and a Relationship Predicates Catalog • 288 unique Indexing Categories with 2582 unique Data Indexes • 869 harvested and curated Abbreviations and Acronyms • Over 1,600 unique semantic relationship descriptors (i.e. Predicates) • 47 Domain Specific Dashboards (one for each Noun Type). Total Compiler Time = Approximately 15 minutes