SlideShare a Scribd company logo
Access Open Data 
with Open Source 
Software Tools 
Sammy Fung 
sammy@sammy.hk
Sammy Fung 
● Developer 
● Founder, JobFOL 
● President of Open Source Hong Kong
Creating 
values to us 
and community
Open Data
Open Data 
● Discoverable 
– Available and Searchable on Internet. 
● Structured 
– Open and Machine-readable Format. 
● Unconditional 
– Legal Framework allows to reproduce an repurpose 
the data.
Access Open Data with Open Source Software Tools
Access Open Data with Open Source Software Tools
Open Source
Open Source 
● Software Development Model 
● Free Software (1985) 
– Free = Freedom 
– Run the program (Freedom 0) 
– Study the source code and change it (Freedom 1) 
– Redistribute copies (Freeom 2) 
– Distribute your modified version in same license (Freedom 
3) 
● Open Source (1998)
Access Open Data with Open Source Software Tools
Open Source Web Application 
Software Stack 
● LAMP 
– Linux (1991): Operating System 
– Apache (1995): Web Server 
– MySQL (1995): Database Server 
– PHP (1995): Server-side Scripting Language 
● Other Alternatives: 
– LNMP: Replacing Apache with Nginx 
– Another M of LAMP: MariaDB, MongoDB
Python 
● Programming Language 
– Since 1991 
– Widely used general purpose 
– High-level 
– Open Source 
● Another P of LAMP
Access Open Data with Open Source Software Tools
Access Open Data with Open Source Software Tools
Access Open Data with Open Source Software Tools
Access Open Data with Open Source Software Tools
Access Open Data with Open Source Software Tools
Access Open Data with Open Source Software Tools
Access Open Data with Open Source Software Tools
My Open Data related Projects 
● TV Timetable of Live Football Matches (2004) 
● Weather Information (2006) 
● Public Transportation Information (2006) 
● LegCo Vote Information (2013) 
● Air Quality Information (2014) 
● Restaurant Information (2014)
Access Open Data with Open Source Software Tools
TCTrack 
● Plot a map of typhoon path of different observation 
agencies 
● Google Map API 
– First Typhoon Map in HK using Google API 
– Sammy.HK TCTrack → Weather Underground → Hong Kong 
Observatory 
● Twitter API 
– Posting typhoon updates from any potential formation of 
tropcial cyclone in Northwest Pacific Ocean. 
● Data Sources: HKO, JTWC.
Access Open Data with Open Source Software Tools
Access Open Data with Open Source Software Tools
Access Open Data with Open Source Software Tools
Access Open Data with Open Source Software Tools
Access Open Data with Open Source Software Tools
Access Open Data with Open Source Software Tools
Access Open Data with Open Source Software Tools
Access Open Data with Open Source Software Tools
Access Open Data with Open Source Software Tools
Access Open Data with Open Source Software Tools
Access Open Data with Open Source Software Tools
Interview by MetroPop in 2009
Access Open Data with Open Source Software Tools
Access Open Data with Open Source Software Tools
Access Open Data with Open Source Software Tools
Access Open Data with Open Source Software Tools
Access Open Data with Open Source Software Tools
Access Open Data with Open Source Software Tools
Access Open Data with Open Source Software Tools
Access Open Data with Open Source Software Tools
Open Data on 
Hong Kong 
Restaurant & 
Food Licenses
Access Open Data with Open Source Software Tools
Access Open Data with Open Source Software Tools
Licensed Restaurants in Hong Kong 
● Open Data from Data.One PSI 
● Open Source Software Tools 
– Python 
– Scrapy Web Scraping Framework 
● Source Codes are released on GitHub 
– https://github.com/sammyfung/LP_Restaurants_Scr 
apy
Creating environment of 
a Scrapy project 
● Requirements 
– Python, Python-Dev, virtualenv, pip 
● Creating a virtual enviornment for python 
project 
– virtualenv ~/env 
– source ~/env/bin/activate 
– pip install scrapy
Creating a Scrapy project 
● Creating a new Scrapy project with spider 
– scrapy startproject LP_Restaurants_Scrapy 
– cd LP_Restaurants_Scrapy 
– scrapy genspider rlxml fehd.gov.hk 
● Creating a scrapy data model 
● Doing some tests with scrapy shell. 
– scrapy shell <URL> 
– http://www.fehd.gov.hk/english/licensing/license/text/LP_Restaurants_EN.XML 
● Writing the parse function of a scrapy spider. 
● Try and test the spider 
– scrapy crawl rlxml -t json -o restaurant_licenses.json
Open Data
Open Source
Creating 
values to us 
and community

More Related Content

Similar to Access Open Data with Open Source Software Tools (20)

PDF
Ice dec04-04-sammy
Chun Ming Au Yeung
 
PDF
How do we develop open source software to help open data ? (MOSC 2013)
Sammy Fung
 
PDF
Open Source Weather Information Project with OpenStack Object Storage
Sammy Fung
 
PDF
Creating Open Data with Open Source (beta2)
Sammy Fung
 
PDF
Open Data and Web API
Sammy Fung
 
PDF
Local Weather Information and GNOME Shell Extension
Sammy Fung
 
PDF
Use open source software to develop ideas at work
Sammy Fung
 
PDF
The DataTank at ogdcamp Warsaw
Pieter Colpaert
 
PPT
Exploring the Semantic Web
Roberto García
 
PDF
Use of Open Data in Hong Kong (LegCo 2014)
Sammy Fung
 
PDF
Use of Open Data in Hong Kong
Sammy Fung
 
PPS
Sprint linked open_data_with_drupal
emmanuel_jamin
 
PPTX
Big data at scrapinghub
Dana Brophy
 
PPTX
Session 03 acquiring data
Sara-Jayne Terp
 
PPTX
Session 03 acquiring data
bodaceacat
 
PDF
From Hk0weather to Open Data
Sammy Fung
 
KEY
YQL: Select * from Internet
drgath
 
KEY
Open Data Semantic Web Community Barn Raising
Boris Mann
 
PDF
Python, web scraping and content management: Scrapy and Django
Sammy Fung
 
KEY
/me wants it. Scraping sites to get data.
Robert Coup
 
Ice dec04-04-sammy
Chun Ming Au Yeung
 
How do we develop open source software to help open data ? (MOSC 2013)
Sammy Fung
 
Open Source Weather Information Project with OpenStack Object Storage
Sammy Fung
 
Creating Open Data with Open Source (beta2)
Sammy Fung
 
Open Data and Web API
Sammy Fung
 
Local Weather Information and GNOME Shell Extension
Sammy Fung
 
Use open source software to develop ideas at work
Sammy Fung
 
The DataTank at ogdcamp Warsaw
Pieter Colpaert
 
Exploring the Semantic Web
Roberto García
 
Use of Open Data in Hong Kong (LegCo 2014)
Sammy Fung
 
Use of Open Data in Hong Kong
Sammy Fung
 
Sprint linked open_data_with_drupal
emmanuel_jamin
 
Big data at scrapinghub
Dana Brophy
 
Session 03 acquiring data
Sara-Jayne Terp
 
Session 03 acquiring data
bodaceacat
 
From Hk0weather to Open Data
Sammy Fung
 
YQL: Select * from Internet
drgath
 
Open Data Semantic Web Community Barn Raising
Boris Mann
 
Python, web scraping and content management: Scrapy and Django
Sammy Fung
 
/me wants it. Scraping sites to get data.
Robert Coup
 

More from Sammy Fung (20)

PDF
Python 爬網⾴工具 - Scrapy 介紹
Sammy Fung
 
PDF
DevRel - Transform article writing from printing to online
Sammy Fung
 
PDF
Introduction to Open Source by opensource.hk (2019 Edition)
Sammy Fung
 
PDF
My Open Source Journey - Developer and Community
Sammy Fung
 
PDF
Introduction to development with Django web framework
Sammy Fung
 
PDF
香港中文開源軟件翻譯
Sammy Fung
 
PDF
Global Open Source Development 2011-2014 Review and 2015 Forecast
Sammy Fung
 
PDF
Mozilla - Openness of the Web
Sammy Fung
 
PDF
Open Source Technology and Community
Sammy Fung
 
PDF
Installation of LAMP Server with Ubuntu 14.10 Server Edition
Sammy Fung
 
PDF
Software Freedom and Open Source Community
Sammy Fung
 
PDF
Building your own job site with Drupal
Sammy Fung
 
PDF
Software Freedom and Community
Sammy Fung
 
PDF
Open Source Job Board
Sammy Fung
 
PDF
Introduction of Mozilla Hong Kong (COSCUP 2014)
Sammy Fung
 
PDF
Introduction of Open Source Job Board with Drupal CMS
Sammy Fung
 
PDF
Mozilla Community and Hong Kong
Sammy Fung
 
PDF
ITFest 2014 - Open Source Marketing
Sammy Fung
 
PDF
How Open Data can help entrepreneurs - ITFest 2014 E2
Sammy Fung
 
PDF
Air Pollution Weather Map at OpenDataHK.make.02
Sammy Fung
 
Python 爬網⾴工具 - Scrapy 介紹
Sammy Fung
 
DevRel - Transform article writing from printing to online
Sammy Fung
 
Introduction to Open Source by opensource.hk (2019 Edition)
Sammy Fung
 
My Open Source Journey - Developer and Community
Sammy Fung
 
Introduction to development with Django web framework
Sammy Fung
 
香港中文開源軟件翻譯
Sammy Fung
 
Global Open Source Development 2011-2014 Review and 2015 Forecast
Sammy Fung
 
Mozilla - Openness of the Web
Sammy Fung
 
Open Source Technology and Community
Sammy Fung
 
Installation of LAMP Server with Ubuntu 14.10 Server Edition
Sammy Fung
 
Software Freedom and Open Source Community
Sammy Fung
 
Building your own job site with Drupal
Sammy Fung
 
Software Freedom and Community
Sammy Fung
 
Open Source Job Board
Sammy Fung
 
Introduction of Mozilla Hong Kong (COSCUP 2014)
Sammy Fung
 
Introduction of Open Source Job Board with Drupal CMS
Sammy Fung
 
Mozilla Community and Hong Kong
Sammy Fung
 
ITFest 2014 - Open Source Marketing
Sammy Fung
 
How Open Data can help entrepreneurs - ITFest 2014 E2
Sammy Fung
 
Air Pollution Weather Map at OpenDataHK.make.02
Sammy Fung
 
Ad

Recently uploaded (20)

PDF
Automating the Geo-Referencing of Historic Aerial Photography in Flanders
Safe Software
 
PDF
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
 
PDF
Enhancing Environmental Monitoring with Real-Time Data Integration: Leveragin...
Safe Software
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
Paycifi - Programmable Trust_Breakfast_PPTXT
FinTech Belgium
 
PDF
Kubernetes - Architecture & Components.pdf
geethak285
 
PDF
ArcGIS Utility Network Migration - The Hunter Water Story
Safe Software
 
PDF
Proactive Server and System Monitoring with FME: Using HTTP and System Caller...
Safe Software
 
PPTX
Mastering Authorization: Integrating Authentication and Authorization Data in...
Hitachi, Ltd. OSS Solution Center.
 
PDF
Next level data operations using Power Automate magic
Andries den Haan
 
PDF
How to Visualize the ​Spatio-Temporal Data Using CesiumJS​
SANGHEE SHIN
 
PDF
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
PDF
Hello I'm "AI" Your New _________________
Dr. Tathagat Varma
 
PPTX
Reimaginando la Ciberdefensa: De Copilots a Redes de Agentes
Cristian Garcia G.
 
PPTX
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
 
PDF
Java 25 and Beyond - A Roadmap of Innovations
Ana-Maria Mihalceanu
 
PDF
Understanding AI Optimization AIO, LLMO, and GEO
CoDigital
 
PDF
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
PPTX
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Pitch ...
Michele Kryston
 
PPTX
Smart Factory Monitoring IIoT in Machine and Production Operations.pptx
Rejig Digital
 
Automating the Geo-Referencing of Historic Aerial Photography in Flanders
Safe Software
 
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
 
Enhancing Environmental Monitoring with Real-Time Data Integration: Leveragin...
Safe Software
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Paycifi - Programmable Trust_Breakfast_PPTXT
FinTech Belgium
 
Kubernetes - Architecture & Components.pdf
geethak285
 
ArcGIS Utility Network Migration - The Hunter Water Story
Safe Software
 
Proactive Server and System Monitoring with FME: Using HTTP and System Caller...
Safe Software
 
Mastering Authorization: Integrating Authentication and Authorization Data in...
Hitachi, Ltd. OSS Solution Center.
 
Next level data operations using Power Automate magic
Andries den Haan
 
How to Visualize the ​Spatio-Temporal Data Using CesiumJS​
SANGHEE SHIN
 
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
Hello I'm "AI" Your New _________________
Dr. Tathagat Varma
 
Reimaginando la Ciberdefensa: De Copilots a Redes de Agentes
Cristian Garcia G.
 
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
 
Java 25 and Beyond - A Roadmap of Innovations
Ana-Maria Mihalceanu
 
Understanding AI Optimization AIO, LLMO, and GEO
CoDigital
 
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Pitch ...
Michele Kryston
 
Smart Factory Monitoring IIoT in Machine and Production Operations.pptx
Rejig Digital
 
Ad

Access Open Data with Open Source Software Tools

  • 1. Access Open Data with Open Source Software Tools Sammy Fung [email protected]
  • 2. Sammy Fung ● Developer ● Founder, JobFOL ● President of Open Source Hong Kong
  • 3. Creating values to us and community
  • 5. Open Data ● Discoverable – Available and Searchable on Internet. ● Structured – Open and Machine-readable Format. ● Unconditional – Legal Framework allows to reproduce an repurpose the data.
  • 9. Open Source ● Software Development Model ● Free Software (1985) – Free = Freedom – Run the program (Freedom 0) – Study the source code and change it (Freedom 1) – Redistribute copies (Freeom 2) – Distribute your modified version in same license (Freedom 3) ● Open Source (1998)
  • 11. Open Source Web Application Software Stack ● LAMP – Linux (1991): Operating System – Apache (1995): Web Server – MySQL (1995): Database Server – PHP (1995): Server-side Scripting Language ● Other Alternatives: – LNMP: Replacing Apache with Nginx – Another M of LAMP: MariaDB, MongoDB
  • 12. Python ● Programming Language – Since 1991 – Widely used general purpose – High-level – Open Source ● Another P of LAMP
  • 20. My Open Data related Projects ● TV Timetable of Live Football Matches (2004) ● Weather Information (2006) ● Public Transportation Information (2006) ● LegCo Vote Information (2013) ● Air Quality Information (2014) ● Restaurant Information (2014)
  • 22. TCTrack ● Plot a map of typhoon path of different observation agencies ● Google Map API – First Typhoon Map in HK using Google API – Sammy.HK TCTrack → Weather Underground → Hong Kong Observatory ● Twitter API – Posting typhoon updates from any potential formation of tropcial cyclone in Northwest Pacific Ocean. ● Data Sources: HKO, JTWC.
  • 43. Open Data on Hong Kong Restaurant & Food Licenses
  • 46. Licensed Restaurants in Hong Kong ● Open Data from Data.One PSI ● Open Source Software Tools – Python – Scrapy Web Scraping Framework ● Source Codes are released on GitHub – https://github.com/sammyfung/LP_Restaurants_Scr apy
  • 47. Creating environment of a Scrapy project ● Requirements – Python, Python-Dev, virtualenv, pip ● Creating a virtual enviornment for python project – virtualenv ~/env – source ~/env/bin/activate – pip install scrapy
  • 48. Creating a Scrapy project ● Creating a new Scrapy project with spider – scrapy startproject LP_Restaurants_Scrapy – cd LP_Restaurants_Scrapy – scrapy genspider rlxml fehd.gov.hk ● Creating a scrapy data model ● Doing some tests with scrapy shell. – scrapy shell <URL> – http://www.fehd.gov.hk/english/licensing/license/text/LP_Restaurants_EN.XML ● Writing the parse function of a scrapy spider. ● Try and test the spider – scrapy crawl rlxml -t json -o restaurant_licenses.json
  • 51. Creating values to us and community