SlideShare a Scribd company logo
Turn Data into Actionable Insights!!
About me:
Vishnu Alavur Kannan
Analytics Technical Platforms Lead
• 15+ years in IT, software engineer @heart
!
• Lead engineering teams through out my career
!
• Platform is just vaporware without passionate people
!
• A players make all the difference in software engineering
✓ 50:1, 100 :1, rarely on any other profession
@Monsanto for two reasons:
!
• I strongly believe in our commitment to sustainable agriculture
!
• I am able to do top-flight Engineering R&D
✓ Complex engineering challenges keeps me going
✓ Freedom to operate:
o Use the right tool for the right job
o Solving problems using cutting-edge technologies
o Open-source friendly, Open environment to contribute back
• Bringing a broad range of solutions to help
nourish our growing world
• Collaborating to help tackle some of the
world’s biggest challenges
• >20,000 employees in 66 countries
• >50% employees based outside of 

the United States
• One of the 25 World’s Best Multinational
Workplaces by Great Place to Work Institute
Monsanto: 



A Sustainable Agriculture Company
Our systems approach integrates technology platforms to
maximize farmer effectiveness
Crop Protection
• Weed Control (Roundup ® 

Branded Agricultural Herbicides)
• Insect Control
• Disease Control
Breeding
• Stress Tolerance
• Disease Control
• Yield
• Vegetables, corn, cotton, soybeans, wheat, canolaBiologicals
• Weed Control
• Insect Control
• Virus Control
• Plant Health
Biotechnology
• Weed Control
• Insect Control
• Stress Tolerance
• Yield / Yield Protection

• NutrientsData Science
• Planting Script Creator
• Increased production
• Efficient land and water use
• Efficient nutrient use
https://www.youtube.com/watch?v=l5Tw0PGcyN0
Why do you do what you do?!
What’s the purpose?
How do you do what you do?
What the hell do you do?
THE GOLDEN CIRCLE
Simon Sinek
Identify the signals from the noise @SCALE
Volume
DATA AT SCALE
Variety
VARIOUS FORMS OF DATA
Velocity
STREAMING IOT
Veracity
DATA UNCERTAINITY
DIGITALMEDIA:280Exabytes
FB:300+Petabytesperday
* Information from multiple sources are adapted and incorporated
POINTS,POLYGONS,
RASTERS,VECTORS
CONNECTINGDATAACROSSSOURCES,
ISWHEREANALYSTSSPENDMOSTOFTHEIRTIME
SENSORSarede-factoto
gatherdata anddetect
anomaliesacrossdomains
Monsanto re-inventing Agriculture through Analytics
Other providers:
Cost
Qualit
y
Agility
• No hardware administration, less software
administration
• Eleven 9’s of data durability
• Harness state-of-the-art software services
!
• DevOps moving towards NoOps
• Provision Infrastructure in seconds:
infrastructure as code - automation
• Grow or shrink compute to match seasonal
workloads and pay smartly as we go
Scale: MON has ~1016+ bytes of data and growing rapidly
• Global Presence: Taking data driven
products & services closer to business
!
• Ability to accelerate feature
development, integrating analytics rapidly
into our workflows @scale
!
• Ingest, store & retrieve massive data sets,
by using the right data store to our
competitive advantage (NoSQL/SQL)
!
• Service diversity, Organizational maturity
IOT, Imagery, Geo-spatial, Genomics, Molecular Breeding…..
Vision
A year ago as we started…
Integrated
Extended
Enhanced
Scalable
Enable Analytics @SCALE for the Enterprise
Reliable
FieldDevices
Apps
Apps
Devices
DevicesApps
DevicesApps
Data
M
odels
M
odels
M
odels
M
odels
Business
Unit-
1
Business
Unit-
2
Business
Unit3
D
igital
Business
Open
Integrate Analytics with Product Platforms
Data Data Science@scale Analytical Models
Turn Data Into Actionable Insights
….
….
APIs
Data
Predictive Product Placement @scale
PFO
PFO
Topography
Site boundary
Zones
Experiment metadata
Planter A/B line
Automap
Elevation
Soil
Weather
Topography
Zones
Location Data Assets
Geo-spatial Catalog
Analytics as a Service
In Collaboration with IT & Business
Scale across teams internalizing a self-service model
Internalize the needs to stay ahead of the curve
Addressing analytics needs based on persona
!
Descriptive
What happened?
!
Diagnostic
Why did it happen?
!
Predictive
What will happen?
!
Prescriptive
What should I do?
!
Cognitive
What can be learnt?
Hindsight Insight Foresight
10’s K of users 1’s K 100’s
Science@Scale
Information Pro-Consumers
Information Consumers
Data ScientistsBusiness Users
Business Analysts Statisticians
Business Intelligence
Ad-hoc Analysis Statistical Analysis
!Data DiscoveryReports
Dashboards
Drill Down Machine Learning
Inferential
CausalExploratory
Machine
Power Users
10’s
Computational Biologists
Neural Networks
Outsight
Systems
Natural Language Processing
Discovery Analytics – Development Environments
Non-prime
Exploratory
Prime
R & D
Development Environments @SCALE
• Big-data Infra. & DevOps
• Data Provisioning @scale
• Model Deployments @scale
• Big-data workloads
• Computational pipelines
• Transformation pipelines
• Training pipelines
• Sizing & Auto-scaling
• Cloud Best practices
• 24/7 availability
• Monitoring
• Alerting
• ELK stack
• ….
Analytical models
@SCALE
• Co-engineering
• Involve us sooner
• Thinking scale ahead
accelerating Time to Market
• Model development &
refactoring
• R, Asreml, Python, OPL…
• Java, Scala, Clojure…
• Infrastructure as code
• AWS, GCP, AzureML
• Docker, Kubernetes
• Distributed computing
• Architecture
• Solutions Design
• Development
!• API integrations
• KAFKA integrations
• OAUTH2 Integrations
• Security/ISO collaborations
Build it once, deploy frameworks as needed for user groups: Bundled in a centralized eco-system
Non-prod to Prod
BLUE / GREEN
Discovery Analytics Development Environments
Data Scientists, Developers and Novice Users
From Discovery to Production
Culture, approach and adoption
Know
Your
Users
For Community
By Community
!
Tailor by Needs
Balance Freedom
with Governance
!
!
!
Drive
User
Adoption
Environments
iteratively served to
everyone @monsanto
Enable analytical capabilities @scale for the enterprise integrated with
Product Platforms
As of today, # of unique data scientists across groups utilizing our discovery analytics environments
Model maturity Global Scalability
Core teams : Train the trainee to share knowledge and best practices utilizing the environment
Business Capabilities
Make the platform robust, sharing a few use cases
Environmental Classification @scale
Engineered using Discovery Analytics - Development Environment
Data

Provisioning
APIs
Data Transformation QA/QC

Rules
Scala

Python

Scikit
API
API
!
• Collaborations with Data Science Teams: Co-engineering R based machine learning model to a
Scala based model training pipeline for scalability
!
• EMR (Amazon Hadoop) & DataProc (GCP) using Apache Spark Computation Engine @scale
• Iterative ON-DEMAND framework, auto-scaling up-to N number of nodes
!
• Training pipeline integration with APIs & co-engineering continuum
Molecular Breeding: Training Pipeline @Scale
Engineered using Discovery Analytics - Development Environment
Data
DATA LEARNER MODEL 1
Cognitive Analytics Pipeline
!
• Collaborations with Cognitive Analytics Data science
team to build:
• An integrated Predictive Product Pipeline from
inception to commercialization
!
Built on:
!
• Apache Airflow (incubating): DAG based model
chaining & workflow management platform
• Models written in Python, R
• Parallelism achieved via Celery workers
• Being customized now to utilize Spark
!
• Apache Parquet - Columnar Storage Format on a file
system; extremely parallelizable
!
• Facebook Presto query engine to query parquet’s via
SQLs through REST APIs – highly performant
!
• Cloud Analytics platform integration
• Co-engineering solutions @scale mining millions
of data points to derive actionable insights
Workflow
DAGs
Libraries
Engineered using Discovery Analytics - Development Environment
Deep learning @SCALE
Discovery Analytics Development Environments integrated with CloudML on GCP
Collect
Store Train
Predict
Evaluate
Training
Pipeline
Retrain
• First Ever Deep Learning platform for the
Enterprise
!
• Perform Deep Learning @scale on CloudML using
TensorFlow via Jupyter from Prime environment
!
• Integrated with data, Inputs, Outputs and
Metadata including Tensor Board to monitor your
model training runs
Discovery Analytics - Workflow
Production Deployment - Workflow
DATA INGESTION AND TRANSFORMATION VIA API’s AND STREAMS
Streaming
Business Intelligence
RUN ANALYTICS@SCALE IN THE CLOUD
Collaborative Data Science - DISCOVERY ANALYTICS
DATA DRIVEN PRODUCTS
KAFKA Streams Data Warehouse*Big-data
Model outputs via APIs & Streams
In-house/Third Party: Platforms
AWS, GCP, Cloudera, DataStax, IBM, Azure, Domino labs…
Prescriptive PredictiveCognitive Historical
Models - Deep Learning, Computational Pipelines, Classification & Simulation Engines
Turn Data into Actionable Insights
Our Journey of Transformation
We have just scratched our surface:
!
• Science@scale – Our Cloud Analytics Platform is only a year old
!
• Talent, Behavior and Platform as our 3 key pillars of focus
!
• Talent:
• Building big-data and cloud analytics engineering team
from the ground up – 150+ interviews, 15 people team now
• Targeting A players, nurture the team on new technologies, build leaders
!
• Behavior/Cultural Mind shift: Data Science & IT Engineering operating as ONE TEAM
• Two extreme spectrums
• Finding the sweet spot in the middle has been the cultural shift
• Data science teams have been very supportive, adapting to change
• Bringing in IT best practices: Agile methodologies, versioning, CI….
• Train the trainee approach to enable adoption across the enterprise
• Leverage the best of both worlds by co-engineering solutions
• Collaboration is our new competitive advantage
!
• Platform: We are at ground zero now, continuing to deliver Minimum Viable Products each sprint
• Continue to mature & stay cutting edge on technologies
• Build vs. Buy [Cost, Time, Quality]
• Miles to go before we sleep
https://www.youtube.com/watch?v=l5Tw0PGcyN0
Why do you do what you do?!
What’s the purpose?
How do you do what you do?
What the hell do you do?
THE GOLDEN CIRCLE
Simon Sinek
• Help identify the signals from the noise @scale
An Enterprise Cloud Analytics platform to serve:
• Analytics as a service enabling Discovery Analytics
environments for the data science community
• Predictive, prescriptive, streaming,
cognitive, IOT edge analytical capabilities @scale
• Big Data Cloud Analytics Engineering
• Internalize data science needs thinking scale ahead
Thank You 

Visit us at engineering.monsanto.com



We are looking for passionate big data
cloud analytics engineers to join our team.



https://www.linkedin.com/in/vishnukannan

More Related Content

PDF
How to get started in Big Data without Big Costs - StampedeCon 2016
PDF
Innovation in the Data Warehouse - StampedeCon 2016
PDF
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
PPTX
Introduction to Kudu - StampedeCon 2016
PDF
Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016
PDF
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
PDF
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
PDF
Filling the Data Lake
How to get started in Big Data without Big Costs - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Introduction to Kudu - StampedeCon 2016
Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Filling the Data Lake

What's hot (19)

PDF
Implementing and running a secure datalake from the trenches
PDF
Solving Big Data Problems using Hortonworks
PDF
Big Data Architecture and Deployment
PPTX
Top Trends in Building Data Lakes for Machine Learning and AI
PPTX
Hadoop Powers Modern Enterprise Data Architectures
PPTX
Big Data on azure
PPTX
Introduction to Azure HDInsight
PPTX
Unlock the value in your big data reservoir using oracle big data discovery a...
PDF
High Performance Spatial-Temporal Trajectory Analysis with Spark
PPTX
Hadoop vs. RDBMS for Advanced Analytics
PPTX
Hadoop for the Masses
PPTX
Solving Performance Problems on Hadoop
PPTX
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
PDF
Moving to a data-centric architecture: Toronto Data Unconference 2015
PDF
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...
PPTX
PPTX
Analysis of Major Trends in Big Data Analytics
PDF
Big Data for Managers: From hadoop to streaming and beyond
PDF
Big data on Azure for Architects
Implementing and running a secure datalake from the trenches
Solving Big Data Problems using Hortonworks
Big Data Architecture and Deployment
Top Trends in Building Data Lakes for Machine Learning and AI
Hadoop Powers Modern Enterprise Data Architectures
Big Data on azure
Introduction to Azure HDInsight
Unlock the value in your big data reservoir using oracle big data discovery a...
High Performance Spatial-Temporal Trajectory Analysis with Spark
Hadoop vs. RDBMS for Advanced Analytics
Hadoop for the Masses
Solving Performance Problems on Hadoop
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Moving to a data-centric architecture: Toronto Data Unconference 2015
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...
Analysis of Major Trends in Big Data Analytics
Big Data for Managers: From hadoop to streaming and beyond
Big data on Azure for Architects
Ad

Viewers also liked (20)

PPTX
Fuel cell
PPTX
Introduction to Data Modeling in Cassandra
PPTX
What is dev ops?
PPTX
Elk stack
PDF
Expect the unexpected: Prepare for failures in microservices
PPT
Exponentiële groei v2
PPTX
Analyze, Influence and Engage Your Customer - v1.7
PPTX
How Cisco Migrated from MapReduce Jobs to Spark Jobs - StampedeCon 2015
PDF
Docker in Production, Look No Hands! by Scott Coulton
PDF
150430 regiosessie corv_almelo
PDF
IoT and Big Data
PDF
Fluentd v1.0 in a nutshell
PDF
Performance Benchmarking of Clouds Evaluating OpenStack
PDF
IBM Containers- Bluemix
PPTX
Cloud adoption patterns April 11 2016
PDF
Sprint 49 review
PPTX
Get complete visibility into containers based application environment
PPT
Sitios turísticos de valledupar
PDF
AppSensor - Near Real Time Event Detection and Response
PPTX
Monitor all the cloud things - security monitoring for everyone
Fuel cell
Introduction to Data Modeling in Cassandra
What is dev ops?
Elk stack
Expect the unexpected: Prepare for failures in microservices
Exponentiële groei v2
Analyze, Influence and Engage Your Customer - v1.7
How Cisco Migrated from MapReduce Jobs to Spark Jobs - StampedeCon 2015
Docker in Production, Look No Hands! by Scott Coulton
150430 regiosessie corv_almelo
IoT and Big Data
Fluentd v1.0 in a nutshell
Performance Benchmarking of Clouds Evaluating OpenStack
IBM Containers- Bluemix
Cloud adoption patterns April 11 2016
Sprint 49 review
Get complete visibility into containers based application environment
Sitios turísticos de valledupar
AppSensor - Near Real Time Event Detection and Response
Monitor all the cloud things - security monitoring for everyone
Ad

Similar to Turn Data Into Actionable Insights - StampedeCon 2016 (20)

PPTX
Applying Big Data Superpowers to Healthcare
PDF
Ramesh kutumbaka resume
PDF
How to make your data scientists happy
PDF
Transforming GE Healthcare with Data Platform Strategy
PDF
Data Science at Scale - The DevOps Approach
PDF
Data Science On The Google Cloud Platform Implementing Endtoend Realtime Data...
PDF
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
PDF
Innovation change mangement m_yaseen
PDF
Big data-analytics-changing-way-organizations-conducting-business
PPTX
Big Data Mining Keynote presentation Sept 2013 09012013
PDF
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
PPTX
Big data analytics and machine intelligence v5.0
PDF
Big data workshop october 18
PDF
Revolution in Business Analytics-Zika Virus Example
PDF
SIMPosium presentation_Bardess Qlik
PPTX
Deliveinrg explainable AI
PPTX
Big data journey to the cloud maz chaudhri 5.30.18
PDF
Data science and its potential to change business as we know it. The Roadmap ...
PDF
Data Science and Culture
PDF
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Applying Big Data Superpowers to Healthcare
Ramesh kutumbaka resume
How to make your data scientists happy
Transforming GE Healthcare with Data Platform Strategy
Data Science at Scale - The DevOps Approach
Data Science On The Google Cloud Platform Implementing Endtoend Realtime Data...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
Innovation change mangement m_yaseen
Big data-analytics-changing-way-organizations-conducting-business
Big Data Mining Keynote presentation Sept 2013 09012013
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
Big data analytics and machine intelligence v5.0
Big data workshop october 18
Revolution in Business Analytics-Zika Virus Example
SIMPosium presentation_Bardess Qlik
Deliveinrg explainable AI
Big data journey to the cloud maz chaudhri 5.30.18
Data science and its potential to change business as we know it. The Roadmap ...
Data Science and Culture
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes

More from StampedeCon (20)

PDF
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
PDF
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
PDF
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
PDF
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
PDF
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
PDF
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
PDF
Foundations of Machine Learning - StampedeCon AI Summit 2017
PDF
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
PDF
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
PDF
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
PDF
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
PDF
A Different Data Science Approach - StampedeCon AI Summit 2017
PDF
Graph in Customer 360 - StampedeCon Big Data Conference 2017
PDF
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
PDF
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
PDF
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
PDF
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
PPTX
Creating a Data Driven Organization - StampedeCon 2016
PPTX
Using The Internet of Things for Population Health Management - StampedeCon 2016
PDF
Visualizing Big Data – The Fundamentals
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Creating a Data Driven Organization - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
Visualizing Big Data – The Fundamentals

Recently uploaded (20)

PPTX
A Presentation on Artificial Intelligence
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Getting Started with Data Integration: FME Form 101
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Mushroom cultivation and it's methods.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Machine Learning_overview_presentation.pptx
PDF
Machine learning based COVID-19 study performance prediction
PPT
Teaching material agriculture food technology
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
A Presentation on Artificial Intelligence
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Per capita expenditure prediction using model stacking based on satellite ima...
TLE Review Electricity (Electricity).pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Getting Started with Data Integration: FME Form 101
Programs and apps: productivity, graphics, security and other tools
Spectral efficient network and resource selection model in 5G networks
Mushroom cultivation and it's methods.pdf
Encapsulation_ Review paper, used for researhc scholars
Building Integrated photovoltaic BIPV_UPV.pdf
Spectroscopy.pptx food analysis technology
Machine Learning_overview_presentation.pptx
Machine learning based COVID-19 study performance prediction
Teaching material agriculture food technology
Digital-Transformation-Roadmap-for-Companies.pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Group 1 Presentation -Planning and Decision Making .pptx
Network Security Unit 5.pdf for BCA BBA.
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...

Turn Data Into Actionable Insights - StampedeCon 2016

  • 1. Turn Data into Actionable Insights!!
  • 2. About me: Vishnu Alavur Kannan Analytics Technical Platforms Lead • 15+ years in IT, software engineer @heart ! • Lead engineering teams through out my career ! • Platform is just vaporware without passionate people ! • A players make all the difference in software engineering ✓ 50:1, 100 :1, rarely on any other profession @Monsanto for two reasons: ! • I strongly believe in our commitment to sustainable agriculture ! • I am able to do top-flight Engineering R&D ✓ Complex engineering challenges keeps me going ✓ Freedom to operate: o Use the right tool for the right job o Solving problems using cutting-edge technologies o Open-source friendly, Open environment to contribute back
  • 3. • Bringing a broad range of solutions to help nourish our growing world • Collaborating to help tackle some of the world’s biggest challenges • >20,000 employees in 66 countries • >50% employees based outside of 
 the United States • One of the 25 World’s Best Multinational Workplaces by Great Place to Work Institute Monsanto: 
 
 A Sustainable Agriculture Company
  • 4. Our systems approach integrates technology platforms to maximize farmer effectiveness Crop Protection • Weed Control (Roundup ® 
 Branded Agricultural Herbicides) • Insect Control • Disease Control Breeding • Stress Tolerance • Disease Control • Yield • Vegetables, corn, cotton, soybeans, wheat, canolaBiologicals • Weed Control • Insect Control • Virus Control • Plant Health Biotechnology • Weed Control • Insect Control • Stress Tolerance • Yield / Yield Protection
 • NutrientsData Science • Planting Script Creator • Increased production • Efficient land and water use • Efficient nutrient use
  • 5. https://www.youtube.com/watch?v=l5Tw0PGcyN0 Why do you do what you do?! What’s the purpose? How do you do what you do? What the hell do you do? THE GOLDEN CIRCLE Simon Sinek
  • 6. Identify the signals from the noise @SCALE Volume DATA AT SCALE Variety VARIOUS FORMS OF DATA Velocity STREAMING IOT Veracity DATA UNCERTAINITY DIGITALMEDIA:280Exabytes FB:300+Petabytesperday * Information from multiple sources are adapted and incorporated POINTS,POLYGONS, RASTERS,VECTORS CONNECTINGDATAACROSSSOURCES, ISWHEREANALYSTSSPENDMOSTOFTHEIRTIME SENSORSarede-factoto gatherdata anddetect anomaliesacrossdomains
  • 7. Monsanto re-inventing Agriculture through Analytics Other providers: Cost Qualit y Agility • No hardware administration, less software administration • Eleven 9’s of data durability • Harness state-of-the-art software services ! • DevOps moving towards NoOps • Provision Infrastructure in seconds: infrastructure as code - automation • Grow or shrink compute to match seasonal workloads and pay smartly as we go Scale: MON has ~1016+ bytes of data and growing rapidly • Global Presence: Taking data driven products & services closer to business ! • Ability to accelerate feature development, integrating analytics rapidly into our workflows @scale ! • Ingest, store & retrieve massive data sets, by using the right data store to our competitive advantage (NoSQL/SQL) ! • Service diversity, Organizational maturity IOT, Imagery, Geo-spatial, Genomics, Molecular Breeding…..
  • 8. Vision A year ago as we started…
  • 9. Integrated Extended Enhanced Scalable Enable Analytics @SCALE for the Enterprise Reliable FieldDevices Apps Apps Devices DevicesApps DevicesApps Data M odels M odels M odels M odels Business Unit- 1 Business Unit- 2 Business Unit3 D igital Business Open
  • 10. Integrate Analytics with Product Platforms Data Data Science@scale Analytical Models Turn Data Into Actionable Insights …. …. APIs Data
  • 11. Predictive Product Placement @scale PFO PFO Topography Site boundary Zones Experiment metadata Planter A/B line Automap Elevation Soil Weather Topography Zones Location Data Assets Geo-spatial Catalog
  • 12. Analytics as a Service In Collaboration with IT & Business Scale across teams internalizing a self-service model
  • 13. Internalize the needs to stay ahead of the curve Addressing analytics needs based on persona ! Descriptive What happened? ! Diagnostic Why did it happen? ! Predictive What will happen? ! Prescriptive What should I do? ! Cognitive What can be learnt? Hindsight Insight Foresight 10’s K of users 1’s K 100’s Science@Scale Information Pro-Consumers Information Consumers Data ScientistsBusiness Users Business Analysts Statisticians Business Intelligence Ad-hoc Analysis Statistical Analysis !Data DiscoveryReports Dashboards Drill Down Machine Learning Inferential CausalExploratory Machine Power Users 10’s Computational Biologists Neural Networks Outsight Systems Natural Language Processing
  • 14. Discovery Analytics – Development Environments Non-prime Exploratory Prime R & D Development Environments @SCALE • Big-data Infra. & DevOps • Data Provisioning @scale • Model Deployments @scale • Big-data workloads • Computational pipelines • Transformation pipelines • Training pipelines • Sizing & Auto-scaling • Cloud Best practices • 24/7 availability • Monitoring • Alerting • ELK stack • …. Analytical models @SCALE • Co-engineering • Involve us sooner • Thinking scale ahead accelerating Time to Market • Model development & refactoring • R, Asreml, Python, OPL… • Java, Scala, Clojure… • Infrastructure as code • AWS, GCP, AzureML • Docker, Kubernetes • Distributed computing • Architecture • Solutions Design • Development !• API integrations • KAFKA integrations • OAUTH2 Integrations • Security/ISO collaborations Build it once, deploy frameworks as needed for user groups: Bundled in a centralized eco-system Non-prod to Prod BLUE / GREEN
  • 15. Discovery Analytics Development Environments Data Scientists, Developers and Novice Users From Discovery to Production Culture, approach and adoption Know Your Users For Community By Community ! Tailor by Needs Balance Freedom with Governance ! ! ! Drive User Adoption Environments iteratively served to everyone @monsanto Enable analytical capabilities @scale for the enterprise integrated with Product Platforms As of today, # of unique data scientists across groups utilizing our discovery analytics environments Model maturity Global Scalability Core teams : Train the trainee to share knowledge and best practices utilizing the environment
  • 16. Business Capabilities Make the platform robust, sharing a few use cases
  • 17. Environmental Classification @scale Engineered using Discovery Analytics - Development Environment Data Provisioning APIs Data Transformation QA/QC Rules Scala Python Scikit API API
  • 18. ! • Collaborations with Data Science Teams: Co-engineering R based machine learning model to a Scala based model training pipeline for scalability ! • EMR (Amazon Hadoop) & DataProc (GCP) using Apache Spark Computation Engine @scale • Iterative ON-DEMAND framework, auto-scaling up-to N number of nodes ! • Training pipeline integration with APIs & co-engineering continuum Molecular Breeding: Training Pipeline @Scale Engineered using Discovery Analytics - Development Environment Data DATA LEARNER MODEL 1
  • 19. Cognitive Analytics Pipeline ! • Collaborations with Cognitive Analytics Data science team to build: • An integrated Predictive Product Pipeline from inception to commercialization ! Built on: ! • Apache Airflow (incubating): DAG based model chaining & workflow management platform • Models written in Python, R • Parallelism achieved via Celery workers • Being customized now to utilize Spark ! • Apache Parquet - Columnar Storage Format on a file system; extremely parallelizable ! • Facebook Presto query engine to query parquet’s via SQLs through REST APIs – highly performant ! • Cloud Analytics platform integration • Co-engineering solutions @scale mining millions of data points to derive actionable insights Workflow DAGs Libraries Engineered using Discovery Analytics - Development Environment
  • 20. Deep learning @SCALE Discovery Analytics Development Environments integrated with CloudML on GCP Collect Store Train Predict Evaluate Training Pipeline Retrain • First Ever Deep Learning platform for the Enterprise ! • Perform Deep Learning @scale on CloudML using TensorFlow via Jupyter from Prime environment ! • Integrated with data, Inputs, Outputs and Metadata including Tensor Board to monitor your model training runs Discovery Analytics - Workflow Production Deployment - Workflow
  • 21. DATA INGESTION AND TRANSFORMATION VIA API’s AND STREAMS Streaming Business Intelligence RUN ANALYTICS@SCALE IN THE CLOUD Collaborative Data Science - DISCOVERY ANALYTICS DATA DRIVEN PRODUCTS KAFKA Streams Data Warehouse*Big-data Model outputs via APIs & Streams In-house/Third Party: Platforms AWS, GCP, Cloudera, DataStax, IBM, Azure, Domino labs… Prescriptive PredictiveCognitive Historical Models - Deep Learning, Computational Pipelines, Classification & Simulation Engines Turn Data into Actionable Insights
  • 22. Our Journey of Transformation We have just scratched our surface: ! • Science@scale – Our Cloud Analytics Platform is only a year old ! • Talent, Behavior and Platform as our 3 key pillars of focus ! • Talent: • Building big-data and cloud analytics engineering team from the ground up – 150+ interviews, 15 people team now • Targeting A players, nurture the team on new technologies, build leaders ! • Behavior/Cultural Mind shift: Data Science & IT Engineering operating as ONE TEAM • Two extreme spectrums • Finding the sweet spot in the middle has been the cultural shift • Data science teams have been very supportive, adapting to change • Bringing in IT best practices: Agile methodologies, versioning, CI…. • Train the trainee approach to enable adoption across the enterprise • Leverage the best of both worlds by co-engineering solutions • Collaboration is our new competitive advantage ! • Platform: We are at ground zero now, continuing to deliver Minimum Viable Products each sprint • Continue to mature & stay cutting edge on technologies • Build vs. Buy [Cost, Time, Quality] • Miles to go before we sleep
  • 23. https://www.youtube.com/watch?v=l5Tw0PGcyN0 Why do you do what you do?! What’s the purpose? How do you do what you do? What the hell do you do? THE GOLDEN CIRCLE Simon Sinek • Help identify the signals from the noise @scale An Enterprise Cloud Analytics platform to serve: • Analytics as a service enabling Discovery Analytics environments for the data science community • Predictive, prescriptive, streaming, cognitive, IOT edge analytical capabilities @scale • Big Data Cloud Analytics Engineering • Internalize data science needs thinking scale ahead
  • 24. Thank You 
 Visit us at engineering.monsanto.com
 
 We are looking for passionate big data cloud analytics engineers to join our team.
 
 https://www.linkedin.com/in/vishnukannan