SlideShare a Scribd company logo
Talend Data Integration and Management
Data Integration



   Data Integration involves combining data
 residing in differente sources and providing the
        user with a unified view of the data


Data Management combines different disciplines
    to manage data as a valuable resource




                                         www.robertomarchetto.com
Talend


●   Talend is a company focused on Data
    Integration and Data Management solutions
●   Talend is a „Cool Vendor“ for Gartner (2010)
●   Present in more than 12 locations around the
    World
●   Fast growing company




                                          www.robertomarchetto.com
Talend Open Studio




                     www.robertomarchetto.com
Talend Open Studio

●   Open Source, professional tool
●   Draw procedures linking components, each
    component performs an operation
●   DB vendor-specific optimized components
●   Produces fully editable Java (or Perl) code
●   Deployment with small and fast compiled Java
    or as Web Service
●   Eclipse based IDE, excellent flexibility
●   BI Platform indipendent, DB Vendor indipendent
                                               www.robertomarchetto.com
Automatic code generation, diffent
           deployment




                             www.robertomarchetto.com
Extracion Transformation Loading


●   ETL is a common process in Data Integration
    ●   Extract, reading data from different datasources
        (database, flat files, spreadsheet files, web
        services, etc)
    ●   Transfom, converting data in a form so that it can
        be placed in another container (database, web
        services, files, etc). Cleaning, computations and
        verifications are also performed
    ●   Load, write the data in the target format



                                                    www.robertomarchetto.com
Tutorial, Source data




                        www.robertomarchetto.com
Tutorial, Destination data (Datawarehouse)




                                 www.robertomarchetto.com
Tutorial, Metadata


●   Talend requires a preliminary definition of the
    metadata
●   Often a strong metadata definition means, as in
    programming languages, fast, robust and
    maintenable applications
●   ..demo..




                                            www.robertomarchetto.com
Tutorial, Talend jobs basics



●   Place components on the designer
●   Link components to build a transformation
●   Main type of link: Rows flow
●   Schema metadata is propagated and must be
    coherent
●   ..demo..



                                         www.robertomarchetto.com
Tutorial, users_dimension




                        www.robertomarchetto.com
Test the job




               www.robertomarchetto.com
Tutorial, accounts_dimension




                         www.robertomarchetto.com
Tutorial, dates_dimension




                        www.robertomarchetto.com
Tutorial, write a Java library




                            www.robertomarchetto.com
Tutorial, opportunities_fact




                          www.robertomarchetto.com
Tutorial, define a root job




                          www.robertomarchetto.com
Deploy and run




                 www.robertomarchetto.com
Extensibility, comunity plugins


                ●   Many official
                    components
                ●   Components for
                    every task released
                    by the comunity
                ●   Geospatial
                    components, log
                    analysis, Google
                    analytics, data
                    encryption, etc

                                www.robertomarchetto.com
Scheduler




            www.robertomarchetto.com
And now.. reports, dashboards, OLAP,
        Geoanalysis, KPIs..




                              www.robertomarchetto.com
Do you trust your data?




                     www.robertomarchetto.com
What about data quality?

●   Customer A is present 5 times with different
    names
●   Null values can vary statistical indexes like
    mean calculation
●   Duplicated records
●   Blank values
●   Some records can contain errors (es -1 field
    values)
●   Some records can be garbage

                                            www.robertomarchetto.com
Talend Open Profiler




                       www.robertomarchetto.com
What abount data storage size?


●   Some fields can be oversized for the data they
    contain
●   Sometimes fields are related and can be
    calculated
●   Some keys or values are never used
●   When data grow garbage grow
●   Data storage is not free (disks, electricity,
    backups, DB licenses)

                                              www.robertomarchetto.com
Data is „the black gold“ that can produce
                knowledge


●   Data is a resource, you can extract knowledge
●   A lot of Data produces concise informations
●   Data storage is not free and a lot of data can
    make system not fast
●   Data cleansing is a central process in statistical
    analysis and Data Mining




                                            www.robertomarchetto.com
Talend Master Data Management




                         www.robertomarchetto.com

More Related Content

PDF
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
PPTX
Introduction to ITIL 4 and IT service management
PDF
Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...
PDF
Talend Open Studio Introduction - OSSCamp 2014
PPTX
Presentation on work life balance
PPTX
Introduction of Data Science
PPTX
Introduction to MuleSoft
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Introduction to ITIL 4 and IT service management
Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...
Talend Open Studio Introduction - OSSCamp 2014
Presentation on work life balance
Introduction of Data Science
Introduction to MuleSoft

What's hot (20)

PDF
Introduction SQL Analytics on Lakehouse Architecture
PPTX
DW Migration Webinar-March 2022.pptx
PDF
Data Mesh for Dinner
PPTX
Data Lakehouse Symposium | Day 4
PDF
Talend Components | tMap, tJoin, tFileList, tInputFileDelimited | Talend Onli...
PDF
Learn to Use Databricks for Data Science
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PPSX
Intro to Talend Open Studio for Data Integration
PDF
Intro to Delta Lake
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r2)
PDF
Data Mesh Part 4 Monolith to Mesh
PDF
Making Data Timelier and More Reliable with Lakehouse Technology
PDF
Introducing Databricks Delta
PDF
Time to Talk about Data Mesh
PPTX
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
PPTX
Delta lake and the delta architecture
PDF
Talend Data Integration Tutorial | Talend Tutorial For Beginners | Talend Onl...
PDF
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
PPTX
Databricks Fundamentals
PDF
Data Lake Architecture – Modern Strategies & Approaches
Introduction SQL Analytics on Lakehouse Architecture
DW Migration Webinar-March 2022.pptx
Data Mesh for Dinner
Data Lakehouse Symposium | Day 4
Talend Components | tMap, tJoin, tFileList, tInputFileDelimited | Talend Onli...
Learn to Use Databricks for Data Science
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Intro to Talend Open Studio for Data Integration
Intro to Delta Lake
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Mesh Part 4 Monolith to Mesh
Making Data Timelier and More Reliable with Lakehouse Technology
Introducing Databricks Delta
Time to Talk about Data Mesh
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
Delta lake and the delta architecture
Talend Data Integration Tutorial | Talend Tutorial For Beginners | Talend Onl...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
Databricks Fundamentals
Data Lake Architecture – Modern Strategies & Approaches
Ad

Similar to Talend Open Studio Data Integration (20)

PDF
Business Intelligence Open Source
PDF
Integrating Oracle Data Integrator with Oracle GoldenGate 12c
PDF
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
PDF
Spark Workflow Management
ODP
Python for Data Logistics
PDF
An Introduction To Palomino
PDF
Dirty data? Clean it up! - Datapalooza Denver 2016
DOC
PDF
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
ODP
Are we there yet?
PDF
Liferay portals in real projects
PPTX
Dynomite @ RedisConf 2017
PPTX
Scaling Magento
PDF
Open Data Inside - Why Internal Data Portals are Key to Successful Data Gover...
PDF
DevOps Days Rockies MLOps
PDF
Fighting legacy with hexagonal architecture and frameworkless php
DOC
Boobalan_Muthukumarasamy_Resume_DW_8_Yrs
DOC
Amit Kumar_Resume
DOCX
Resume ETL-Informatica developer
PDF
Odi ireland rittman
Business Intelligence Open Source
Integrating Oracle Data Integrator with Oracle GoldenGate 12c
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Spark Workflow Management
Python for Data Logistics
An Introduction To Palomino
Dirty data? Clean it up! - Datapalooza Denver 2016
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
Are we there yet?
Liferay portals in real projects
Dynomite @ RedisConf 2017
Scaling Magento
Open Data Inside - Why Internal Data Portals are Key to Successful Data Gover...
DevOps Days Rockies MLOps
Fighting legacy with hexagonal architecture and frameworkless php
Boobalan_Muthukumarasamy_Resume_DW_8_Yrs
Amit Kumar_Resume
Resume ETL-Informatica developer
Odi ireland rittman
Ad

Recently uploaded (20)

PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Unlocking AI with Model Context Protocol (MCP)
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
A Presentation on Artificial Intelligence
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPT
Teaching material agriculture food technology
PDF
cuic standard and advanced reporting.pdf
PPTX
Machine Learning_overview_presentation.pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Encapsulation_ Review paper, used for researhc scholars
Network Security Unit 5.pdf for BCA BBA.
Building Integrated photovoltaic BIPV_UPV.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
MYSQL Presentation for SQL database connectivity
Unlocking AI with Model Context Protocol (MCP)
“AI and Expert System Decision Support & Business Intelligence Systems”
A Presentation on Artificial Intelligence
Agricultural_Statistics_at_a_Glance_2022_0.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Teaching material agriculture food technology
cuic standard and advanced reporting.pdf
Machine Learning_overview_presentation.pptx
NewMind AI Weekly Chronicles - August'25-Week II
A comparative analysis of optical character recognition models for extracting...
Group 1 Presentation -Planning and Decision Making .pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Encapsulation_ Review paper, used for researhc scholars

Talend Open Studio Data Integration

  • 1. Talend Data Integration and Management
  • 2. Data Integration Data Integration involves combining data residing in differente sources and providing the user with a unified view of the data Data Management combines different disciplines to manage data as a valuable resource www.robertomarchetto.com
  • 3. Talend ● Talend is a company focused on Data Integration and Data Management solutions ● Talend is a „Cool Vendor“ for Gartner (2010) ● Present in more than 12 locations around the World ● Fast growing company www.robertomarchetto.com
  • 4. Talend Open Studio www.robertomarchetto.com
  • 5. Talend Open Studio ● Open Source, professional tool ● Draw procedures linking components, each component performs an operation ● DB vendor-specific optimized components ● Produces fully editable Java (or Perl) code ● Deployment with small and fast compiled Java or as Web Service ● Eclipse based IDE, excellent flexibility ● BI Platform indipendent, DB Vendor indipendent www.robertomarchetto.com
  • 6. Automatic code generation, diffent deployment www.robertomarchetto.com
  • 7. Extracion Transformation Loading ● ETL is a common process in Data Integration ● Extract, reading data from different datasources (database, flat files, spreadsheet files, web services, etc) ● Transfom, converting data in a form so that it can be placed in another container (database, web services, files, etc). Cleaning, computations and verifications are also performed ● Load, write the data in the target format www.robertomarchetto.com
  • 8. Tutorial, Source data www.robertomarchetto.com
  • 9. Tutorial, Destination data (Datawarehouse) www.robertomarchetto.com
  • 10. Tutorial, Metadata ● Talend requires a preliminary definition of the metadata ● Often a strong metadata definition means, as in programming languages, fast, robust and maintenable applications ● ..demo.. www.robertomarchetto.com
  • 11. Tutorial, Talend jobs basics ● Place components on the designer ● Link components to build a transformation ● Main type of link: Rows flow ● Schema metadata is propagated and must be coherent ● ..demo.. www.robertomarchetto.com
  • 12. Tutorial, users_dimension www.robertomarchetto.com
  • 13. Test the job www.robertomarchetto.com
  • 14. Tutorial, accounts_dimension www.robertomarchetto.com
  • 15. Tutorial, dates_dimension www.robertomarchetto.com
  • 16. Tutorial, write a Java library www.robertomarchetto.com
  • 17. Tutorial, opportunities_fact www.robertomarchetto.com
  • 18. Tutorial, define a root job www.robertomarchetto.com
  • 19. Deploy and run www.robertomarchetto.com
  • 20. Extensibility, comunity plugins ● Many official components ● Components for every task released by the comunity ● Geospatial components, log analysis, Google analytics, data encryption, etc www.robertomarchetto.com
  • 21. Scheduler www.robertomarchetto.com
  • 22. And now.. reports, dashboards, OLAP, Geoanalysis, KPIs.. www.robertomarchetto.com
  • 23. Do you trust your data? www.robertomarchetto.com
  • 24. What about data quality? ● Customer A is present 5 times with different names ● Null values can vary statistical indexes like mean calculation ● Duplicated records ● Blank values ● Some records can contain errors (es -1 field values) ● Some records can be garbage www.robertomarchetto.com
  • 25. Talend Open Profiler www.robertomarchetto.com
  • 26. What abount data storage size? ● Some fields can be oversized for the data they contain ● Sometimes fields are related and can be calculated ● Some keys or values are never used ● When data grow garbage grow ● Data storage is not free (disks, electricity, backups, DB licenses) www.robertomarchetto.com
  • 27. Data is „the black gold“ that can produce knowledge ● Data is a resource, you can extract knowledge ● A lot of Data produces concise informations ● Data storage is not free and a lot of data can make system not fast ● Data cleansing is a central process in statistical analysis and Data Mining www.robertomarchetto.com
  • 28. Talend Master Data Management www.robertomarchetto.com