SlideShare a Scribd company logo
Open Data Learn-up
@_opendatahack
About opendatahack.org
Open data hack is a collaborative effort in solving day to day difficulties
faced by local communities, civic bodies and non profit institutions.
Technologists, designers, innovators and government bodies with great
social insights come aboard for a day together to build technology
based solutions availing enormously accessible free open data.
@_opendatahack
Our current projects in India?
● Real-time environment vitals monitoring system with suggestions
● Health factors heat map of urban localities in India
● Mapping of all quality abortion clinics in India
@_opendatahack
Introduction to Open Data
@_opendatahack
What is Data?
A collection of facts, information and statistics that
can be analyzed to develop new knowledge
@_opendatahack
What is Open Data?
@_opendatahack
Definition by OKF
A piece of data or content is open if anyone is
free to use, reuse, and redistribute it -
subject only, at most, to the requirement to
attribute and/or share-alike.
@_opendatahack
Definition by ODI
Open data is data that is made available by
organizations, businesses and individuals for
anyone to access, use and share.
@_opendatahack
Let’s define it...
@_opendatahack
Open Data is accessible public data that people,
companies and organizations can use to launch new
ventures, analyze patterns and trends, make data-
driven decisions, and solve complex problems.
@_opendatahack
Benefits of Open Data
● Data Driven Decision Making
● Performance Measurement
● Reduction of Government Costs
● Support an Open Government Initiative
– e.g. Transparency
● Economic Development
● Increased Citizen Engagement
● Talent Attraction / Retention
@_opendatahack
Types of Open Data
● Government data
● Commercial data
● Crowd sourced data
@_opendatahack
Few Open Data projects...
@_opendatahack
Open Data sources
@_opendatahack
Open Data Licenses
● Open Data Commons Public Domain Dedication
and Licence (ODC PDDL) – Public domain
● Creative Commons CCZero – Public domain
● Open Data Commons Attribution License –
Attribution for data(bases)
● Open Data Commons Open Database License
(OdbL) - Attribution-ShareAlike for data(bases)
@_opendatahack
Introduction to Data Science
@_opendatahack
What is Data Science?
Data science ~ computer science +
mathematics/statistics + visualization
@_opendatahack
Data is just like crude
● It’s valuable, but if unrefined it cannot really be used.
● It has to be changed into gas, plastic, chemicals, etc
to create a valuable entity that drives profitable
activity
- Data must be broken down and analyzed for it to
have value.
@_opendatahack
Outline
● Harvesting
● Cleaning
● Analyzing
● Visualizing
● Publishing
DATA
@_opendatahack
Data harvesting
● Locally available data
● Data dumps from Web
● Data through Web APIs
● Structured data in Web documents
@_opendatahack
Data cleansing
● Harvested data may come with lots of noise or
interesting anomalies.
● Goal is to provide structured presentation for
analysis.
- Network(graph)
- Values with dimension
@_opendatahack
Data Science Tools
@_opendatahack
Data harvesting
● urllib & BeautifulSoup
● Scrapy
@_opendatahack
Some tips & ethics
● Use the mobile version of the sites if available
● No cookies
● Respect robots.txt
● Identify yourself
● If possible, download bulk data first, process it later
● Prefer dumps over APIs, APIs over scraping
● Be polite and request permission to gather the data
● Worth checking: https://scraperwiki.com/
@_opendatahack
Data analyzing
● Numpy
- Offers efficient multidimensional array object, ndarray
- Basic linear algebra operations and data types
- Requires GNU Fortran
● Scipy
- Builds on top of NumPy
- Modules for statistics, optimization, signal processing, ...
- Add-ons (called SciKits) for machine learning, data mining, etc
● For analysing networks
- NetworkX
- igraph
@_opendatahack
Data visualizing
● Matplotlib
● NetworkX
● PyGraphviz
@_opendatahack
@_opendatahack
NumPy + SciPy + Matplotlib +
IPython
● Provides Matlab ”-ish” environment
● ipython provides extended interactive interpreter
(tab completion, magic functions for object querying,
debugging, ...)
@_opendatahack
Some conviniet data formats
● JSON (import simplejson)
● XML (import xml)
● RDF (import rdflib, SPARQLWrapper)
● GraphML (import networkx)
● CSV (import csv)
@_opendatahack
Resource Description Framework
(RDF)
● Collection of W3C standards for modeling complex
relations and to exchange information
● Allows data from multiple sources to combine nicely
● RDF describes data with triples
● - each triple has form subject - predicate - object
e.g. PyconIndia2017 is organized in Delhi
@_opendatahack
Why R for Data Science?
● Algorithms
● Visualizations
● Data manupulation
● Integrations
● Easily scalable
@_opendatahack
Simple R code for bar graph
# Create the data for the chart.
H <- c(7,12,28,3,41)
# Give the chart file a name.
png(file = "barchart.png")
# Plot the bar chart.
barplot(H)
# Save the file.
dev.off()
@_opendatahack
Shiny R
https://shiny.rstudio.com/gallery/
@_opendatahack
Few commonly used algorithms
● Naïve Bayes Classifier Algorithm
● K Means Clustering Algorithm
● Support Vector Machine Algorithm
● Apriori Algorithm
● Linear Regression
● Logistic Regression
● Artificial Neural Networks
● Random Forests
● Decision Trees
● Nearest Neighbours
@_opendatahack
Anaconda
@_opendatahack
Thank you
opendatahack.org
fb.com/opendatahack

More Related Content

PDF
Research experience and scientific publications
PDF
First steps in Data Mining Kindergarten
PDF
Webinar: SpagoBI & Big Data, a smart approach to turn data into knowledge
PDF
PDF
Massively Scalable Computational Finance with SciDB
PDF
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
PDF
Analytics using r programming
PDF
How to migrate to GraphDB in 10 easy to follow steps
Research experience and scientific publications
First steps in Data Mining Kindergarten
Webinar: SpagoBI & Big Data, a smart approach to turn data into knowledge
Massively Scalable Computational Finance with SciDB
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Analytics using r programming
How to migrate to GraphDB in 10 easy to follow steps

What's hot (20)

PDF
It Don’t Mean a Thing If It Ain’t Got Semantics
PDF
SQL In The Big Data Era
PDF
Beyond 2022 project presentation 2021
PDF
Building an Enterprise Knowledge Graph @Uber: Lessons from Reality
PPTX
Big data analytics
PDF
Apouc 2014-business-analytics-and-big-data
PDF
Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios
PDF
Demystifying Big Data with Scala and Akka
PDF
Study of Various Tools for Data Science
PPTX
Adoption of RDA DTR and PID in Deep Carbon Observatory Data Portal
PDF
Social Media World News Impact on Stock Index Values - Investment Fund Analyt...
PDF
Test Trend Analysis : Towards robust, reliable and timely tests
PDF
Transforming Visibility & Automation: The Actioning Knowledge Graph
ODP
Mango DB
PDF
Cheat sheets for data scientists
PPTX
Data Visualization Project Presentation
ODP
PyData - Multi-dimensional, Multi-modal Image Registration
PDF
Advanced Analytics and Machine Learning with Data Virtualization
PDF
Debunking "Purpose-Built Data Systems:": Enter the Universal Database
PDF
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
It Don’t Mean a Thing If It Ain’t Got Semantics
SQL In The Big Data Era
Beyond 2022 project presentation 2021
Building an Enterprise Knowledge Graph @Uber: Lessons from Reality
Big data analytics
Apouc 2014-business-analytics-and-big-data
Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios
Demystifying Big Data with Scala and Akka
Study of Various Tools for Data Science
Adoption of RDA DTR and PID in Deep Carbon Observatory Data Portal
Social Media World News Impact on Stock Index Values - Investment Fund Analyt...
Test Trend Analysis : Towards robust, reliable and timely tests
Transforming Visibility & Automation: The Actioning Knowledge Graph
Mango DB
Cheat sheets for data scientists
Data Visualization Project Presentation
PyData - Multi-dimensional, Multi-modal Image Registration
Advanced Analytics and Machine Learning with Data Virtualization
Debunking "Purpose-Built Data Systems:": Enter the Universal Database
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
Ad

Similar to Introduction to Open Data and Data Science (20)

PDF
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
PDF
Data Science at Scale - The DevOps Approach
PDF
ACS San Diego - The RDKit: Open-source cheminformatics
PDF
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
PDF
KEDL DBpedia 2019
PPTX
What is Data Science? |Role of Data Science in Big Data, Hadoop & Machine Lea...
PPTX
Top 10 Data analytics tools to look for in 2021
PDF
Dirty data? Clean it up! - Datapalooza Denver 2016
PDF
Open Data Inside - Why Internal Data Portals are Key to Successful Data Gover...
PPTX
BigDataFinal.pptx
PPTX
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open Data
PPTX
Introduction to Cloud computing and Big Data-Hadoop
PPTX
Big data analysis
PDF
Big Data Technologies.pdf
PPTX
Presentation on Big Data Analytics
PPTX
Data Science_Unit-1.2 part - 2 of intro.pptx
ODP
BigData Hadoop
PDF
How to build and run a big data platform in the 21st century
PPTX
Linked Open Data Principles, benefits of LOD for sustainable development
PDF
Big data
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
Data Science at Scale - The DevOps Approach
ACS San Diego - The RDKit: Open-source cheminformatics
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
KEDL DBpedia 2019
What is Data Science? |Role of Data Science in Big Data, Hadoop & Machine Lea...
Top 10 Data analytics tools to look for in 2021
Dirty data? Clean it up! - Datapalooza Denver 2016
Open Data Inside - Why Internal Data Portals are Key to Successful Data Gover...
BigDataFinal.pptx
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open Data
Introduction to Cloud computing and Big Data-Hadoop
Big data analysis
Big Data Technologies.pdf
Presentation on Big Data Analytics
Data Science_Unit-1.2 part - 2 of intro.pptx
BigData Hadoop
How to build and run a big data platform in the 21st century
Linked Open Data Principles, benefits of LOD for sustainable development
Big data
Ad

More from Suraj Kumar Jana (16)

PDF
Convolution Neural Network
PDF
Artificial Intelligence Overview
PDF
Artificial Neural Network: A Brief Overview
PDF
Mathematics For Artificial Intelligence
PDF
Artificial Intelligence: A Brief Overview
PDF
Introduction to Chatbot Development
PDF
Understanding Blockchain: Case Studies
PDF
Understanding Blockchain: Distributed Ledger Technology
PDF
Understanding Blockchain: A General Introduction
PDF
Practical Introduction to Internet of Things (IoT)
PDF
Cloud Computing workshop
PDF
Machine Learning using Python
PDF
Arduino Hands-on Workshop
PDF
Prepare to Start-up
PPSX
Adore India - Introduction
PPSX
Adore India - Talking To Students
Convolution Neural Network
Artificial Intelligence Overview
Artificial Neural Network: A Brief Overview
Mathematics For Artificial Intelligence
Artificial Intelligence: A Brief Overview
Introduction to Chatbot Development
Understanding Blockchain: Case Studies
Understanding Blockchain: Distributed Ledger Technology
Understanding Blockchain: A General Introduction
Practical Introduction to Internet of Things (IoT)
Cloud Computing workshop
Machine Learning using Python
Arduino Hands-on Workshop
Prepare to Start-up
Adore India - Introduction
Adore India - Talking To Students

Recently uploaded (20)

PDF
AI And Its Effect On The Evolving IT Sector In Australia - Elevate
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Event Presentation Google Cloud Next Extended 2025
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Big Data Technologies - Introduction.pptx
PDF
Newfamily of error-correcting codes based on genetic algorithms
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PDF
Sensors and Actuators in IoT Systems using pdf
PDF
CIFDAQ's Market Wrap: Ethereum Leads, Bitcoin Lags, Institutions Shift
PDF
CIFDAQ's Teaching Thursday: Moving Averages Made Simple
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PDF
Reimagining Insurance: Connected Data for Confident Decisions.pdf
PDF
SAP855240_ALP - Defining the Global Template PUBLIC.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
AI And Its Effect On The Evolving IT Sector In Australia - Elevate
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Event Presentation Google Cloud Next Extended 2025
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
Big Data Technologies - Introduction.pptx
Newfamily of error-correcting codes based on genetic algorithms
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
Sensors and Actuators in IoT Systems using pdf
CIFDAQ's Market Wrap: Ethereum Leads, Bitcoin Lags, Institutions Shift
CIFDAQ's Teaching Thursday: Moving Averages Made Simple
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Chapter 3 Spatial Domain Image Processing.pdf
MYSQL Presentation for SQL database connectivity
GamePlan Trading System Review: Professional Trader's Honest Take
Reimagining Insurance: Connected Data for Confident Decisions.pdf
SAP855240_ALP - Defining the Global Template PUBLIC.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
“AI and Expert System Decision Support & Business Intelligence Systems”

Introduction to Open Data and Data Science