SlideShare a Scribd company logo
Networked Programs
Chapter 12
Python for Everybody
www.py4e.com
A Free Book on
Network
Architecture
• If you find this topic area interesting
and/or need more detail
• www.net-intro.com
Transport Control Protocol (TCP)
• Built on top of IP (Internet Protocol)
• Assumes IP might lose some data
- stores and retransmits data if it
seems to be lost
• Handles “flow control” using a
transmit window
• Provides a nice reliable pipe Source: http://en.wikipedia.org/wiki/Internet_Protocol_Suite
http://www.flickr.com/photos/kitcowan/2103850699/
http://en.wikipedia.org/wiki/Tin_can_telephone
TCP Connections / Sockets
http://en.wikipedia.org/wiki/Internet_socket
“In computer networking, an Internet socket or network socket is
an endpoint of a bidirectional inter-process communication flow
across an Internet Protocol-based computer network, such as the
Internet.”
Internet
Process Process
TCP Port Numbers
• A port is an application-specific or process-specific
software communications endpoint
• It allows multiple networked applications to coexist on the
same server
• There is a list of well-known TCP port numbers
http://en.wikipedia.org/wiki/TCP_and_UDP_port
www.umich.edu
Incoming
E-Mail
Login
Web Server
25
Personal
Mail Box
23
80
443
109
110
74.208.28.177
blah blah
blah blah
Clipart: http://www.clker.com/search/networksym/1
Common TCP Ports
http://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers
Sometimes we see the
port number in the URL if
the web server is running
on a “non-standard” port.
Sockets in Python
Python has built-in support for TCP Sockets
import socket
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect( ('data.pr4e.org', 80) )
http://docs.python.org/library/socket.html
Host Port
http://xkcd.com/353/
Application Protocols
Application Protocol
• Since TCP (and Python) gives us a
reliable socket, what do we want to
do with the socket? What problem
do we want to solve?
• Application Protocols
- Mail
- World Wide Web Source: http://en.wikipedia.org/wiki/Internet_Protocol_Suite
HTTP - Hypertext Transfer Protocol
• The dominant Application Layer Protocol on the Internet
• Invented for the Web - to Retrieve HTML, Images, Documents,
etc.
• Extended to retrieve data in addition to documents - RSS, Web
Services, etc. Basic Concept - Make a Connection - Request a
document - Retrieve the Document - Close the Connection
http://en.wikipedia.org/wiki/Http
HTTP
The HyperText Transfer Protocol is the set of rules
to allow browsers to retrieve web documents from
servers over the Internet
What is a Protocol?
• A set of rules that all parties follow so we can
predict each other’s behavior
• And not bump into each other
- On two-way roads in USA, drive on the right-
hand side of the road
- On two-way roads in the UK, drive on the
left-hand side of the road
http://www.dr-chuck.com/page1.htm
protocol host document
Robert Cailliau
CERN
http://www.youtube.com/watch?v=x2GylLq59rI
1:17 - 2:19
Getting Data From The Server
• Each time the user clicks on an anchor tag with an href= value to
switch to a new page, the browser makes a connection to the web
server and issues a “GET” request - to GET the content of the page
at the specified URL
• The server returns the HTML document to the browser, which
formats and displays the document to the user
Browser
Web Server
80
Browser
Web Server
80
Click
Browser
Web Server
80
Request
GET http://www.dr-
chuck.com/page2.htm
Click
Browser
Web Server
GET http://www.dr-
chuck.com/page2.htm
80
Request
Click
Browser
Web Server
<h1>The Second
Page</h1><p>If you like,
you can switch back to the
<a href="page1.htm">First
Page</a>.</p>
80
Request Response
GET http://www.dr-
chuck.com/page2.htm
Click
Browser
Web Server
<h1>The Second
Page</h1><p>If you like,
you can switch back to the
<a href="page1.htm">First
Page</a>.</p>
80
Request Response
Parse/
Render
GET http://www.dr-
chuck.com/page2.htm
Click
Internet Standards
• The standards for all of the
Internet protocols (inner
workings) are developed by an
organization
• Internet Engineering Task Force
(IETF)
• www.ietf.org
• Standards are called “RFCs” -
“Request for Comments”
Source: http://tools.ietf.org/html/rfc791
http://www.w3.org/Protocols/rfc2616/rfc2616.txt
Pythonlearn-12-HTTP-  Network Programming
Making an HTTP request
• Connect to the server like www.dr-chuck.com"
• Request a document (or the default document)
• GET http://www.dr-chuck.com/page1.htm HTTP/1.0
• GET http://www.mlive.com/ann-arbor/ HTTP/1.0
• GET http://www.facebook.com HTTP/1.0
Browser
Web Server
Note: Many
servers do not
support HTTP
1.0
$ telnet data.pr4e.org 80
Trying 74.208.28.177...
Connected to data.pr4e.org.
Escape character is '^]'.
GET http://data.pr4e.org/page1.htm HTTP/1.0
HTTP/1.1 200 OK
Date: Tue, 30 Jan 2024 15:30:13 GMT
Server: Apache/2.4.18 (Ubuntu)
Last-Modified: Mon, 15 May 2017 11:11:47 GMT
Content-Length: 128
Content-Type: text/html
<h1>The First Page</h1>
<p>If you like, you can switch to
the <a href="http://data.pr4e.org/page2.htm">Second
Page</a>.</p>
Connection closed by foreign host.
Accurate Hacking in
the Movies
• Matrix Reloaded
• Bourne Ultimatum
• Die Hard 4
• ...
http://nmap.org/movies.html
Let’s Write a Web Browser!
An HTTP Request in Python
import socket
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('data.pr4e.org', 80))
cmd = 'GET http://data.pr4e.org/romeo.txt HTTP/1.0rnrn'.encode()
mysock.send(cmd)
while True:
data = mysock.recv(512)
if (len(data) < 1):
break
print(data.decode(),end='')
mysock.close()
HTTP/1.1 200 OK
Date: Sun, 14 Mar 2010 23:52:41 GMT
Server: Apache
Last-Modified: Tue, 29 Dec 2009 01:31:22 GMT
ETag: "143c1b33-a7-4b395bea"
Accept-Ranges: bytes
Content-Length: 167
Connection: close
Content-Type: text/plain
But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already sick and pale with grief
while True:
data = mysock.recv(512)
if ( len(data) < 1 ) :
break
print(data.decode())
HTTP Header
HTTP Body
About Characters and Strings…
https://en.wikipedia.org/wiki/ASCII
http://www.catonmat.net/download/ascii-cheat-sheet.png
ASCII
American
Standard Code
for Information
Interchange
Representing Simple Strings
• Each character is represented by a
number between 0 and 256 stored in
8 bits of memory
• We refer to "8 bits of memory as a
"byte" of memory – (i.e. my disk
drive contains 3 Terabytes of
memory)
• The ord() function tells us the
numeric value of a simple ASCII
character
>>> print(ord('H'))
72
>>> print(ord('e'))
101
>>> print(ord('n'))
10
>>>
ASCII
>>> print(ord('H'))
72
>>> print(ord('e'))
101
>>> print(ord('n'))
10
>>>
In the 1960s and 1970s,
we just assumed that
one byte was one
character
http://unicode.org/charts/
Multi-Byte Characters
To represent the wide range of characters computers must handle we represent
characters with more than one byte
• UTF-16 – Fixed length - Two bytes
• UTF-32 – Fixed Length - Four Bytes
• UTF-8 – 1-4 bytes
- Upwards compatible with ASCII
- Automatic detection between ASCII and UTF-8
- UTF-8 is recommended practice for encoding
data to be exchanged between systems
https://en.wikipedia.org/wiki/UTF-8
Two Kinds of Strings in Python
Python 3.5.1
>>> x = '이광춘'
>>> type(x)
<class 'str'>
>>> x = u'이광춘'
>>> type(x)
<class 'str'>
>>>
Python 2.7.10
>>> x = '이광춘'
>>> type(x)
<type 'str'>
>>> x = u'이광춘'
>>> type(x)
<type 'unicode'>
>>>
In Python 3, all strings are Unicode
Python 2 versus Python 3
Python 3.5.1
>>> x = b'abc'
>>> type(x)
<class 'bytes'>
>>> x = '이광춘'
>>> type(x)
<class 'str'>
>>> x = u'이광춘'
>>> type(x)
<class 'str'>
Python 2.7.10
>>> x = b'abc'
>>> type(x)
<type 'str'>
>>> x = '이광춘'
>>> type(x)
<type 'str'>
>>> x = u'이광춘'
>>> type(x)
<type 'unicode'>
Python 3 and Unicode
• In Python 3, all strings internally
are UNICODE
• Working with string variables in
Python programs and reading data
from files usually "just works"
• When we talk to a network
resource using sockets or talk to a
database we have to encode and
decode data (usually to UTF-8)
Python 3.5.1
>>> x = b'abc'
>>> type(x)
<class 'bytes'>
>>> x = '이광춘'
>>> type(x)
<class 'str'>
>>> x = u'이광춘'
>>> type(x)
<class 'str'>
Python Strings to Bytes
• When we talk to an external resource like a network socket we send bytes,
so we need to encode Python 3 strings into a given character encoding
• When we read data from an external resource, we must decode it based on
the character set so it is properly represented in Python 3 as a string
while True:
data = mysock.recv(512)
if ( len(data) < 1 ) :
break
mystring = data.decode()
print(mystring)
socket1.py
An HTTP Request in Python
import socket
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('data.pr4e.org', 80))
cmd = 'GET http://data.pr4e.org/romeo.txt HTTP/1.0nn'.encode()
mysock.send(cmd)
while True:
data = mysock.recv(512)
if (len(data) < 1):
break
print(data.decode())
mysock.close()
socket1.py
https://docs.python.org/3/library/stdtypes.html#bytes.decode
https://docs.python.org/3/library/stdtypes.html#str.encode
Network
Socket
Bytes
UTF-8
String
Unicode
Bytes
UTF-8
recv()
decode()
encode() send()
import socket
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('data.pr4e.org', 80))
cmd = 'GET http://data.pr4e.org/romeo.txt HTTP/1.0nn'.encode()
mysock.send(cmd)
while True:
data = mysock.recv(512)
if (len(data) < 1):
break
print(data.decode())
mysock.close()
socket1.py
Making HTTP Easier With urllib
Since HTTP is so common, we have a library that does all the
socket work for us and makes web pages look like a file
import urllib.request, urllib.parse, urllib.error
fhand = urllib.request.urlopen('http://data.pr4e.org/romeo.txt')
for line in fhand:
print(line.decode().strip())
Using urllib in Python
urllib1.py
But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already sick and pale with grief
urllib1.py
import urllib.request, urllib.parse, urllib.error
fhand = urllib.request.urlopen('http://data.pr4e.org/romeo.txt')
for line in fhand:
print(line.decode().strip())
Like a File...
import urllib.request, urllib.parse, urllib.error
fhand = urllib.request.urlopen('http://data.pr4e.org/romeo.txt')
counts = dict()
for line in fhand:
words = line.decode().split()
for word in words:
counts[word] = counts.get(word, 0) + 1
print(counts)
urlwords.py
Reading Web Pages
import urllib.request, urllib.parse, urllib.error
fhand = urllib.request.urlopen('http://www.dr-chuck.com/page1.htm')
for line in fhand:
print(line.decode().strip())
<h1>The First Page</h1>
<p>If you like, you can switch to the <a
href="http://www.dr-chuck.com/page2.htm">Second
Page</a>.
</p>
urllib2.py
Following Links
import urllib.request, urllib.parse, urllib.error
fhand = urllib.request.urlopen('http://www.dr-chuck.com/page1.htm')
for line in fhand:
print(line.decode().strip())
<h1>The First Page</h1>
<p>If you like, you can switch to the <a
href="http://www.dr-chuck.com/page2.htm">Second
Page</a>.
</p>
urllib2.py
The First Lines of Code @ Google?
import urllib.request, urllib.parse, urllib.error
fhand = urllib.request.urlopen('http://www.dr-chuck.com/page1.htm')
for line in fhand:
print(line.decode().strip())
urllib2.py
Parsing HTML
(a.k.a. Web Scraping)
What is Web Scraping?
• When a program or script pretends to be a browser and retrieves
web pages, looks at those web pages, extracts information, and
then looks at more web pages
• Search engines scrape web pages - we call this “spidering the
web” or “web crawling”
http://en.wikipedia.org/wiki/Web_scraping
http://en.wikipedia.org/wiki/Web_crawler
Why Scrape?
• Pull data - particularly social data - who links to who?
• Get your own data back out of some system that has no “export
capability”
• Monitor a site for new information
• Spider the web to make a database for a search engine
Scraping Web Pages
• There is some controversy about web page scraping and some
sites are a bit snippy about it.
• Republishing copyrighted information is not allowed
• Violating terms of service is not allowed
The Easy Way - Beautiful Soup
• You could do string searches the hard way
• Or use the free software library called BeautifulSoup from
www.crummy.com
https://www.crummy.com/software/BeautifulSoup/
# To run this, you can install BeautifulSoup
# https://pypi.python.org/pypi/beautifulsoup4
# Or download the file
# http://www.py4e.com/code3/bs4.zip
# and unzip it in the same directory as this file
import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
...
urllinks.py
BeautifulSoup Installation
import urllib.request, urllib.parse,
urllib.error
from bs4 import BeautifulSoup
url = input('Enter - ')
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')
# Retrieve all of the anchor tags
tags = soup('a')
for tag in tags:
print(tag.get('href', None))
python urllinks.py
Enter - http://www.dr-chuck.com/page1.htm
http://www.dr-chuck.com/page2.htm
Summary
• The TCP/IP gives us pipes / sockets between applications
• We designed application protocols to make use of these pipes
• HyperText Transfer Protocol (HTTP) is a simple yet powerful
protocol
• Python has good support for sockets, HTTP, and HTML
parsing
Acknowledgements / Contributions
Thes slide are Copyright 2010- Charles R. Severance (www.dr-
chuck.com) of the University of Michigan School of Information
and open.umich.edu and made available under a Creative
Commons Attribution 4.0 License. Please maintain this last slide
in all copies of the document to comply with the attribution
requirements of the license. If you make a change, feel free to
add your name and organization to the list of contributors on this
page as you republish the materials.
Initial Development: Charles Severance, University of Michigan
School of Information
… Insert new Contributors here
...

More Related Content

Similar to Pythonlearn-12-HTTP- Network Programming (20)

Socket programming-in-python
Socket programming-in-pythonSocket programming-in-python
Socket programming-in-python
Yuvaraja Ravi
 
Module 5.pptx HTTP protocol on optical and wireless communication
Module 5.pptx HTTP protocol on optical and wireless communicationModule 5.pptx HTTP protocol on optical and wireless communication
Module 5.pptx HTTP protocol on optical and wireless communication
chandushivamurthy4
 
socket programming
socket programming socket programming
socket programming
prashantzagade
 
socket programming
 socket programming  socket programming
socket programming
prashantzagade
 
Python networking
Python networkingPython networking
Python networking
Smt. Indira Gandhi College of Engineering, Navi Mumbai, Mumbai
 
Socket programming
Socket programming Socket programming
Socket programming
Rajivarnan (Rajiv)
 
network programming lab manuaal in this file
network programming lab manuaal in this filenetwork programming lab manuaal in this file
network programming lab manuaal in this file
shivani158351
 
Network Programming-Python-13-8-2023.pptx
Network Programming-Python-13-8-2023.pptxNetwork Programming-Python-13-8-2023.pptx
Network Programming-Python-13-8-2023.pptx
ssuser23035c
 
Of the variedtypes of IPC, sockets arout and awaythe foremostcommon..pdf
Of the variedtypes of IPC, sockets arout and awaythe foremostcommon..pdfOf the variedtypes of IPC, sockets arout and awaythe foremostcommon..pdf
Of the variedtypes of IPC, sockets arout and awaythe foremostcommon..pdf
anuradhasilks
 
session6-Network Programming.pptx
session6-Network Programming.pptxsession6-Network Programming.pptx
session6-Network Programming.pptx
SrinivasanG52
 
HTTP hyper text transfer protocol all .pptx
HTTP hyper text transfer protocol all .pptxHTTP hyper text transfer protocol all .pptx
HTTP hyper text transfer protocol all .pptx
Abdulahad481035
 
Hacking (with) WebSockets
Hacking (with) WebSocketsHacking (with) WebSockets
Hacking (with) WebSockets
Sergey Shekyan
 
Final networks lab manual
Final networks lab manualFinal networks lab manual
Final networks lab manual
Jaya Prasanna
 
What is Socket Programming in Python | Edureka
What is Socket Programming in Python | EdurekaWhat is Socket Programming in Python | Edureka
What is Socket Programming in Python | Edureka
Edureka!
 
Sockets
Sockets Sockets
Sockets
Gopaiah Sanaka
 
Webbasics
WebbasicsWebbasics
Webbasics
patinijava
 
Presentation 3
Presentation 3Presentation 3
Presentation 3
Krishna Chanduri
 
Tcp and udp ports
Tcp and udp portsTcp and udp ports
Tcp and udp ports
sujanakumari1
 
Socket Programming
Socket ProgrammingSocket Programming
Socket Programming
CEC Landran
 
Sockets in unix
Sockets in unixSockets in unix
Sockets in unix
swtjerin4u
 
Socket programming-in-python
Socket programming-in-pythonSocket programming-in-python
Socket programming-in-python
Yuvaraja Ravi
 
Module 5.pptx HTTP protocol on optical and wireless communication
Module 5.pptx HTTP protocol on optical and wireless communicationModule 5.pptx HTTP protocol on optical and wireless communication
Module 5.pptx HTTP protocol on optical and wireless communication
chandushivamurthy4
 
network programming lab manuaal in this file
network programming lab manuaal in this filenetwork programming lab manuaal in this file
network programming lab manuaal in this file
shivani158351
 
Network Programming-Python-13-8-2023.pptx
Network Programming-Python-13-8-2023.pptxNetwork Programming-Python-13-8-2023.pptx
Network Programming-Python-13-8-2023.pptx
ssuser23035c
 
Of the variedtypes of IPC, sockets arout and awaythe foremostcommon..pdf
Of the variedtypes of IPC, sockets arout and awaythe foremostcommon..pdfOf the variedtypes of IPC, sockets arout and awaythe foremostcommon..pdf
Of the variedtypes of IPC, sockets arout and awaythe foremostcommon..pdf
anuradhasilks
 
session6-Network Programming.pptx
session6-Network Programming.pptxsession6-Network Programming.pptx
session6-Network Programming.pptx
SrinivasanG52
 
HTTP hyper text transfer protocol all .pptx
HTTP hyper text transfer protocol all .pptxHTTP hyper text transfer protocol all .pptx
HTTP hyper text transfer protocol all .pptx
Abdulahad481035
 
Hacking (with) WebSockets
Hacking (with) WebSocketsHacking (with) WebSockets
Hacking (with) WebSockets
Sergey Shekyan
 
Final networks lab manual
Final networks lab manualFinal networks lab manual
Final networks lab manual
Jaya Prasanna
 
What is Socket Programming in Python | Edureka
What is Socket Programming in Python | EdurekaWhat is Socket Programming in Python | Edureka
What is Socket Programming in Python | Edureka
Edureka!
 
Socket Programming
Socket ProgrammingSocket Programming
Socket Programming
CEC Landran
 
Sockets in unix
Sockets in unixSockets in unix
Sockets in unix
swtjerin4u
 

More from ssusere5ddd6 (9)

Adobe premiere pro: Creating_Project_Premiere_Pro.pptx
Adobe premiere pro: Creating_Project_Premiere_Pro.pptxAdobe premiere pro: Creating_Project_Premiere_Pro.pptx
Adobe premiere pro: Creating_Project_Premiere_Pro.pptx
ssusere5ddd6
 
Network programming: Banner-Grabbing-Explained.pptx
Network programming: Banner-Grabbing-Explained.pptxNetwork programming: Banner-Grabbing-Explained.pptx
Network programming: Banner-Grabbing-Explained.pptx
ssusere5ddd6
 
Networking Question Overview of lesson 1
Networking Question Overview of lesson 1Networking Question Overview of lesson 1
Networking Question Overview of lesson 1
ssusere5ddd6
 
Module 3 - Basics of Data Manipulation in Time Series
Module 3 - Basics of Data Manipulation in Time SeriesModule 3 - Basics of Data Manipulation in Time Series
Module 3 - Basics of Data Manipulation in Time Series
ssusere5ddd6
 
A practical Approach to Timeseries Forecasting using Python
A practical Approach to Timeseries Forecasting using PythonA practical Approach to Timeseries Forecasting using Python
A practical Approach to Timeseries Forecasting using Python
ssusere5ddd6
 
Day 00 - Introduction to machine learning with big data
Day 00 - Introduction to machine learning with big dataDay 00 - Introduction to machine learning with big data
Day 00 - Introduction to machine learning with big data
ssusere5ddd6
 
Chapter 1 Introduction to Data Science.pptx
Chapter 1 Introduction to Data Science.pptxChapter 1 Introduction to Data Science.pptx
Chapter 1 Introduction to Data Science.pptx
ssusere5ddd6
 
convolutional_neural_networks in deep learning
convolutional_neural_networks in deep learningconvolutional_neural_networks in deep learning
convolutional_neural_networks in deep learning
ssusere5ddd6
 
Beginner Control Structures - For Loop.pdf
Beginner Control Structures - For Loop.pdfBeginner Control Structures - For Loop.pdf
Beginner Control Structures - For Loop.pdf
ssusere5ddd6
 
Adobe premiere pro: Creating_Project_Premiere_Pro.pptx
Adobe premiere pro: Creating_Project_Premiere_Pro.pptxAdobe premiere pro: Creating_Project_Premiere_Pro.pptx
Adobe premiere pro: Creating_Project_Premiere_Pro.pptx
ssusere5ddd6
 
Network programming: Banner-Grabbing-Explained.pptx
Network programming: Banner-Grabbing-Explained.pptxNetwork programming: Banner-Grabbing-Explained.pptx
Network programming: Banner-Grabbing-Explained.pptx
ssusere5ddd6
 
Networking Question Overview of lesson 1
Networking Question Overview of lesson 1Networking Question Overview of lesson 1
Networking Question Overview of lesson 1
ssusere5ddd6
 
Module 3 - Basics of Data Manipulation in Time Series
Module 3 - Basics of Data Manipulation in Time SeriesModule 3 - Basics of Data Manipulation in Time Series
Module 3 - Basics of Data Manipulation in Time Series
ssusere5ddd6
 
A practical Approach to Timeseries Forecasting using Python
A practical Approach to Timeseries Forecasting using PythonA practical Approach to Timeseries Forecasting using Python
A practical Approach to Timeseries Forecasting using Python
ssusere5ddd6
 
Day 00 - Introduction to machine learning with big data
Day 00 - Introduction to machine learning with big dataDay 00 - Introduction to machine learning with big data
Day 00 - Introduction to machine learning with big data
ssusere5ddd6
 
Chapter 1 Introduction to Data Science.pptx
Chapter 1 Introduction to Data Science.pptxChapter 1 Introduction to Data Science.pptx
Chapter 1 Introduction to Data Science.pptx
ssusere5ddd6
 
convolutional_neural_networks in deep learning
convolutional_neural_networks in deep learningconvolutional_neural_networks in deep learning
convolutional_neural_networks in deep learning
ssusere5ddd6
 
Beginner Control Structures - For Loop.pdf
Beginner Control Structures - For Loop.pdfBeginner Control Structures - For Loop.pdf
Beginner Control Structures - For Loop.pdf
ssusere5ddd6
 
Ad

Recently uploaded (20)

Final Sketch Designs for poster production.pptx
Final Sketch Designs for poster production.pptxFinal Sketch Designs for poster production.pptx
Final Sketch Designs for poster production.pptx
bobby205207
 
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
National Information Standards Organization (NISO)
 
How to Manage Maintenance Request in Odoo 18
How to Manage Maintenance Request in Odoo 18How to Manage Maintenance Request in Odoo 18
How to Manage Maintenance Request in Odoo 18
Celine George
 
Artificial intelligence Presented by JM.
Artificial intelligence Presented by JM.Artificial intelligence Presented by JM.
Artificial intelligence Presented by JM.
jmansha170
 
Unit 3 Poster Sketches with annotations.pptx
Unit 3 Poster Sketches with annotations.pptxUnit 3 Poster Sketches with annotations.pptx
Unit 3 Poster Sketches with annotations.pptx
bobby205207
 
Diptera: The Two-Winged Wonders, The Fly Squad: Order Diptera.pptx
Diptera: The Two-Winged Wonders, The Fly Squad: Order Diptera.pptxDiptera: The Two-Winged Wonders, The Fly Squad: Order Diptera.pptx
Diptera: The Two-Winged Wonders, The Fly Squad: Order Diptera.pptx
Arshad Shaikh
 
TV Shows and web-series quiz | QUIZ CLUB OF PSGCAS | 13TH MARCH 2025
TV Shows and web-series quiz | QUIZ CLUB OF PSGCAS | 13TH MARCH 2025TV Shows and web-series quiz | QUIZ CLUB OF PSGCAS | 13TH MARCH 2025
TV Shows and web-series quiz | QUIZ CLUB OF PSGCAS | 13TH MARCH 2025
Quiz Club of PSG College of Arts & Science
 
BUSINESS QUIZ PRELIMS | QUIZ CLUB OF PSGCAS | 9 SEPTEMBER 2024
BUSINESS QUIZ PRELIMS | QUIZ CLUB OF PSGCAS | 9 SEPTEMBER 2024BUSINESS QUIZ PRELIMS | QUIZ CLUB OF PSGCAS | 9 SEPTEMBER 2024
BUSINESS QUIZ PRELIMS | QUIZ CLUB OF PSGCAS | 9 SEPTEMBER 2024
Quiz Club of PSG College of Arts & Science
 
How to Manage Upselling of Subscriptions in Odoo 18
How to Manage Upselling of Subscriptions in Odoo 18How to Manage Upselling of Subscriptions in Odoo 18
How to Manage Upselling of Subscriptions in Odoo 18
Celine George
 
Parenting Teens: Supporting Trust, resilience and independence
Parenting Teens: Supporting Trust, resilience and independenceParenting Teens: Supporting Trust, resilience and independence
Parenting Teens: Supporting Trust, resilience and independence
Pooky Knightsmith
 
LDMMIA Reiki Yoga Next Week Grad Updates
LDMMIA Reiki Yoga Next Week Grad UpdatesLDMMIA Reiki Yoga Next Week Grad Updates
LDMMIA Reiki Yoga Next Week Grad Updates
LDM & Mia eStudios
 
How to Create Quotation Templates Sequence in Odoo 18 Sales
How to Create Quotation Templates Sequence in Odoo 18 SalesHow to Create Quotation Templates Sequence in Odoo 18 Sales
How to Create Quotation Templates Sequence in Odoo 18 Sales
Celine George
 
IDF 30min presentation - December 2, 2024.pptx
IDF 30min presentation - December 2, 2024.pptxIDF 30min presentation - December 2, 2024.pptx
IDF 30min presentation - December 2, 2024.pptx
ArneeAgligar
 
Rai dyansty Chach or Brahamn dynasty, History of Dahir History of Sindh NEP.pptx
Rai dyansty Chach or Brahamn dynasty, History of Dahir History of Sindh NEP.pptxRai dyansty Chach or Brahamn dynasty, History of Dahir History of Sindh NEP.pptx
Rai dyansty Chach or Brahamn dynasty, History of Dahir History of Sindh NEP.pptx
Dr. Ravi Shankar Arya Mahila P. G. College, Banaras Hindu University, Varanasi, India.
 
How to Configure Vendor Management in Lunch App of Odoo 18
How to Configure Vendor Management in Lunch App of Odoo 18How to Configure Vendor Management in Lunch App of Odoo 18
How to Configure Vendor Management in Lunch App of Odoo 18
Celine George
 
MATERI PPT TOPIK 1 LANDASAN FILOSOFIS PENDIDIKAN
MATERI PPT TOPIK 1 LANDASAN FILOSOFIS PENDIDIKANMATERI PPT TOPIK 1 LANDASAN FILOSOFIS PENDIDIKAN
MATERI PPT TOPIK 1 LANDASAN FILOSOFIS PENDIDIKAN
aditya23173
 
Allomorps and word formation.pptx - Google Slides.pdf
Allomorps and word formation.pptx - Google Slides.pdfAllomorps and word formation.pptx - Google Slides.pdf
Allomorps and word formation.pptx - Google Slides.pdf
Abha Pandey
 
Trends Spotting Strategic foresight for tomorrow’s education systems - Debora...
Trends Spotting Strategic foresight for tomorrow’s education systems - Debora...Trends Spotting Strategic foresight for tomorrow’s education systems - Debora...
Trends Spotting Strategic foresight for tomorrow’s education systems - Debora...
EduSkills OECD
 
Analysis of Quantitative Data Parametric and non-parametric tests.pptx
Analysis of Quantitative Data Parametric and non-parametric tests.pptxAnalysis of Quantitative Data Parametric and non-parametric tests.pptx
Analysis of Quantitative Data Parametric and non-parametric tests.pptx
Shrutidhara2
 
Rose Cultivation Practices by Kushal Lamichhane.pdf
Rose Cultivation Practices by Kushal Lamichhane.pdfRose Cultivation Practices by Kushal Lamichhane.pdf
Rose Cultivation Practices by Kushal Lamichhane.pdf
kushallamichhame
 
Final Sketch Designs for poster production.pptx
Final Sketch Designs for poster production.pptxFinal Sketch Designs for poster production.pptx
Final Sketch Designs for poster production.pptx
bobby205207
 
How to Manage Maintenance Request in Odoo 18
How to Manage Maintenance Request in Odoo 18How to Manage Maintenance Request in Odoo 18
How to Manage Maintenance Request in Odoo 18
Celine George
 
Artificial intelligence Presented by JM.
Artificial intelligence Presented by JM.Artificial intelligence Presented by JM.
Artificial intelligence Presented by JM.
jmansha170
 
Unit 3 Poster Sketches with annotations.pptx
Unit 3 Poster Sketches with annotations.pptxUnit 3 Poster Sketches with annotations.pptx
Unit 3 Poster Sketches with annotations.pptx
bobby205207
 
Diptera: The Two-Winged Wonders, The Fly Squad: Order Diptera.pptx
Diptera: The Two-Winged Wonders, The Fly Squad: Order Diptera.pptxDiptera: The Two-Winged Wonders, The Fly Squad: Order Diptera.pptx
Diptera: The Two-Winged Wonders, The Fly Squad: Order Diptera.pptx
Arshad Shaikh
 
How to Manage Upselling of Subscriptions in Odoo 18
How to Manage Upselling of Subscriptions in Odoo 18How to Manage Upselling of Subscriptions in Odoo 18
How to Manage Upselling of Subscriptions in Odoo 18
Celine George
 
Parenting Teens: Supporting Trust, resilience and independence
Parenting Teens: Supporting Trust, resilience and independenceParenting Teens: Supporting Trust, resilience and independence
Parenting Teens: Supporting Trust, resilience and independence
Pooky Knightsmith
 
LDMMIA Reiki Yoga Next Week Grad Updates
LDMMIA Reiki Yoga Next Week Grad UpdatesLDMMIA Reiki Yoga Next Week Grad Updates
LDMMIA Reiki Yoga Next Week Grad Updates
LDM & Mia eStudios
 
How to Create Quotation Templates Sequence in Odoo 18 Sales
How to Create Quotation Templates Sequence in Odoo 18 SalesHow to Create Quotation Templates Sequence in Odoo 18 Sales
How to Create Quotation Templates Sequence in Odoo 18 Sales
Celine George
 
IDF 30min presentation - December 2, 2024.pptx
IDF 30min presentation - December 2, 2024.pptxIDF 30min presentation - December 2, 2024.pptx
IDF 30min presentation - December 2, 2024.pptx
ArneeAgligar
 
How to Configure Vendor Management in Lunch App of Odoo 18
How to Configure Vendor Management in Lunch App of Odoo 18How to Configure Vendor Management in Lunch App of Odoo 18
How to Configure Vendor Management in Lunch App of Odoo 18
Celine George
 
MATERI PPT TOPIK 1 LANDASAN FILOSOFIS PENDIDIKAN
MATERI PPT TOPIK 1 LANDASAN FILOSOFIS PENDIDIKANMATERI PPT TOPIK 1 LANDASAN FILOSOFIS PENDIDIKAN
MATERI PPT TOPIK 1 LANDASAN FILOSOFIS PENDIDIKAN
aditya23173
 
Allomorps and word formation.pptx - Google Slides.pdf
Allomorps and word formation.pptx - Google Slides.pdfAllomorps and word formation.pptx - Google Slides.pdf
Allomorps and word formation.pptx - Google Slides.pdf
Abha Pandey
 
Trends Spotting Strategic foresight for tomorrow’s education systems - Debora...
Trends Spotting Strategic foresight for tomorrow’s education systems - Debora...Trends Spotting Strategic foresight for tomorrow’s education systems - Debora...
Trends Spotting Strategic foresight for tomorrow’s education systems - Debora...
EduSkills OECD
 
Analysis of Quantitative Data Parametric and non-parametric tests.pptx
Analysis of Quantitative Data Parametric and non-parametric tests.pptxAnalysis of Quantitative Data Parametric and non-parametric tests.pptx
Analysis of Quantitative Data Parametric and non-parametric tests.pptx
Shrutidhara2
 
Rose Cultivation Practices by Kushal Lamichhane.pdf
Rose Cultivation Practices by Kushal Lamichhane.pdfRose Cultivation Practices by Kushal Lamichhane.pdf
Rose Cultivation Practices by Kushal Lamichhane.pdf
kushallamichhame
 
Ad

Pythonlearn-12-HTTP- Network Programming

  • 1. Networked Programs Chapter 12 Python for Everybody www.py4e.com
  • 2. A Free Book on Network Architecture • If you find this topic area interesting and/or need more detail • www.net-intro.com
  • 3. Transport Control Protocol (TCP) • Built on top of IP (Internet Protocol) • Assumes IP might lose some data - stores and retransmits data if it seems to be lost • Handles “flow control” using a transmit window • Provides a nice reliable pipe Source: http://en.wikipedia.org/wiki/Internet_Protocol_Suite
  • 5. TCP Connections / Sockets http://en.wikipedia.org/wiki/Internet_socket “In computer networking, an Internet socket or network socket is an endpoint of a bidirectional inter-process communication flow across an Internet Protocol-based computer network, such as the Internet.” Internet Process Process
  • 6. TCP Port Numbers • A port is an application-specific or process-specific software communications endpoint • It allows multiple networked applications to coexist on the same server • There is a list of well-known TCP port numbers http://en.wikipedia.org/wiki/TCP_and_UDP_port
  • 7. www.umich.edu Incoming E-Mail Login Web Server 25 Personal Mail Box 23 80 443 109 110 74.208.28.177 blah blah blah blah Clipart: http://www.clker.com/search/networksym/1
  • 9. Sometimes we see the port number in the URL if the web server is running on a “non-standard” port.
  • 10. Sockets in Python Python has built-in support for TCP Sockets import socket mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) mysock.connect( ('data.pr4e.org', 80) ) http://docs.python.org/library/socket.html Host Port
  • 13. Application Protocol • Since TCP (and Python) gives us a reliable socket, what do we want to do with the socket? What problem do we want to solve? • Application Protocols - Mail - World Wide Web Source: http://en.wikipedia.org/wiki/Internet_Protocol_Suite
  • 14. HTTP - Hypertext Transfer Protocol • The dominant Application Layer Protocol on the Internet • Invented for the Web - to Retrieve HTML, Images, Documents, etc. • Extended to retrieve data in addition to documents - RSS, Web Services, etc. Basic Concept - Make a Connection - Request a document - Retrieve the Document - Close the Connection http://en.wikipedia.org/wiki/Http
  • 15. HTTP The HyperText Transfer Protocol is the set of rules to allow browsers to retrieve web documents from servers over the Internet
  • 16. What is a Protocol? • A set of rules that all parties follow so we can predict each other’s behavior • And not bump into each other - On two-way roads in USA, drive on the right- hand side of the road - On two-way roads in the UK, drive on the left-hand side of the road
  • 17. http://www.dr-chuck.com/page1.htm protocol host document Robert Cailliau CERN http://www.youtube.com/watch?v=x2GylLq59rI 1:17 - 2:19
  • 18. Getting Data From The Server • Each time the user clicks on an anchor tag with an href= value to switch to a new page, the browser makes a connection to the web server and issues a “GET” request - to GET the content of the page at the specified URL • The server returns the HTML document to the browser, which formats and displays the document to the user
  • 23. Browser Web Server <h1>The Second Page</h1><p>If you like, you can switch back to the <a href="page1.htm">First Page</a>.</p> 80 Request Response GET http://www.dr- chuck.com/page2.htm Click
  • 24. Browser Web Server <h1>The Second Page</h1><p>If you like, you can switch back to the <a href="page1.htm">First Page</a>.</p> 80 Request Response Parse/ Render GET http://www.dr- chuck.com/page2.htm Click
  • 25. Internet Standards • The standards for all of the Internet protocols (inner workings) are developed by an organization • Internet Engineering Task Force (IETF) • www.ietf.org • Standards are called “RFCs” - “Request for Comments” Source: http://tools.ietf.org/html/rfc791
  • 28. Making an HTTP request • Connect to the server like www.dr-chuck.com" • Request a document (or the default document) • GET http://www.dr-chuck.com/page1.htm HTTP/1.0 • GET http://www.mlive.com/ann-arbor/ HTTP/1.0 • GET http://www.facebook.com HTTP/1.0
  • 29. Browser Web Server Note: Many servers do not support HTTP 1.0 $ telnet data.pr4e.org 80 Trying 74.208.28.177... Connected to data.pr4e.org. Escape character is '^]'. GET http://data.pr4e.org/page1.htm HTTP/1.0 HTTP/1.1 200 OK Date: Tue, 30 Jan 2024 15:30:13 GMT Server: Apache/2.4.18 (Ubuntu) Last-Modified: Mon, 15 May 2017 11:11:47 GMT Content-Length: 128 Content-Type: text/html <h1>The First Page</h1> <p>If you like, you can switch to the <a href="http://data.pr4e.org/page2.htm">Second Page</a>.</p> Connection closed by foreign host.
  • 30. Accurate Hacking in the Movies • Matrix Reloaded • Bourne Ultimatum • Die Hard 4 • ... http://nmap.org/movies.html
  • 31. Let’s Write a Web Browser!
  • 32. An HTTP Request in Python import socket mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) mysock.connect(('data.pr4e.org', 80)) cmd = 'GET http://data.pr4e.org/romeo.txt HTTP/1.0rnrn'.encode() mysock.send(cmd) while True: data = mysock.recv(512) if (len(data) < 1): break print(data.decode(),end='') mysock.close()
  • 33. HTTP/1.1 200 OK Date: Sun, 14 Mar 2010 23:52:41 GMT Server: Apache Last-Modified: Tue, 29 Dec 2009 01:31:22 GMT ETag: "143c1b33-a7-4b395bea" Accept-Ranges: bytes Content-Length: 167 Connection: close Content-Type: text/plain But soft what light through yonder window breaks It is the east and Juliet is the sun Arise fair sun and kill the envious moon Who is already sick and pale with grief while True: data = mysock.recv(512) if ( len(data) < 1 ) : break print(data.decode()) HTTP Header HTTP Body
  • 34. About Characters and Strings…
  • 36. Representing Simple Strings • Each character is represented by a number between 0 and 256 stored in 8 bits of memory • We refer to "8 bits of memory as a "byte" of memory – (i.e. my disk drive contains 3 Terabytes of memory) • The ord() function tells us the numeric value of a simple ASCII character >>> print(ord('H')) 72 >>> print(ord('e')) 101 >>> print(ord('n')) 10 >>>
  • 37. ASCII >>> print(ord('H')) 72 >>> print(ord('e')) 101 >>> print(ord('n')) 10 >>> In the 1960s and 1970s, we just assumed that one byte was one character
  • 39. Multi-Byte Characters To represent the wide range of characters computers must handle we represent characters with more than one byte • UTF-16 – Fixed length - Two bytes • UTF-32 – Fixed Length - Four Bytes • UTF-8 – 1-4 bytes - Upwards compatible with ASCII - Automatic detection between ASCII and UTF-8 - UTF-8 is recommended practice for encoding data to be exchanged between systems https://en.wikipedia.org/wiki/UTF-8
  • 40. Two Kinds of Strings in Python Python 3.5.1 >>> x = '이광춘' >>> type(x) <class 'str'> >>> x = u'이광춘' >>> type(x) <class 'str'> >>> Python 2.7.10 >>> x = '이광춘' >>> type(x) <type 'str'> >>> x = u'이광춘' >>> type(x) <type 'unicode'> >>> In Python 3, all strings are Unicode
  • 41. Python 2 versus Python 3 Python 3.5.1 >>> x = b'abc' >>> type(x) <class 'bytes'> >>> x = '이광춘' >>> type(x) <class 'str'> >>> x = u'이광춘' >>> type(x) <class 'str'> Python 2.7.10 >>> x = b'abc' >>> type(x) <type 'str'> >>> x = '이광춘' >>> type(x) <type 'str'> >>> x = u'이광춘' >>> type(x) <type 'unicode'>
  • 42. Python 3 and Unicode • In Python 3, all strings internally are UNICODE • Working with string variables in Python programs and reading data from files usually "just works" • When we talk to a network resource using sockets or talk to a database we have to encode and decode data (usually to UTF-8) Python 3.5.1 >>> x = b'abc' >>> type(x) <class 'bytes'> >>> x = '이광춘' >>> type(x) <class 'str'> >>> x = u'이광춘' >>> type(x) <class 'str'>
  • 43. Python Strings to Bytes • When we talk to an external resource like a network socket we send bytes, so we need to encode Python 3 strings into a given character encoding • When we read data from an external resource, we must decode it based on the character set so it is properly represented in Python 3 as a string while True: data = mysock.recv(512) if ( len(data) < 1 ) : break mystring = data.decode() print(mystring) socket1.py
  • 44. An HTTP Request in Python import socket mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) mysock.connect(('data.pr4e.org', 80)) cmd = 'GET http://data.pr4e.org/romeo.txt HTTP/1.0nn'.encode() mysock.send(cmd) while True: data = mysock.recv(512) if (len(data) < 1): break print(data.decode()) mysock.close() socket1.py
  • 46. Network Socket Bytes UTF-8 String Unicode Bytes UTF-8 recv() decode() encode() send() import socket mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) mysock.connect(('data.pr4e.org', 80)) cmd = 'GET http://data.pr4e.org/romeo.txt HTTP/1.0nn'.encode() mysock.send(cmd) while True: data = mysock.recv(512) if (len(data) < 1): break print(data.decode()) mysock.close() socket1.py
  • 47. Making HTTP Easier With urllib
  • 48. Since HTTP is so common, we have a library that does all the socket work for us and makes web pages look like a file import urllib.request, urllib.parse, urllib.error fhand = urllib.request.urlopen('http://data.pr4e.org/romeo.txt') for line in fhand: print(line.decode().strip()) Using urllib in Python urllib1.py
  • 49. But soft what light through yonder window breaks It is the east and Juliet is the sun Arise fair sun and kill the envious moon Who is already sick and pale with grief urllib1.py import urllib.request, urllib.parse, urllib.error fhand = urllib.request.urlopen('http://data.pr4e.org/romeo.txt') for line in fhand: print(line.decode().strip())
  • 50. Like a File... import urllib.request, urllib.parse, urllib.error fhand = urllib.request.urlopen('http://data.pr4e.org/romeo.txt') counts = dict() for line in fhand: words = line.decode().split() for word in words: counts[word] = counts.get(word, 0) + 1 print(counts) urlwords.py
  • 51. Reading Web Pages import urllib.request, urllib.parse, urllib.error fhand = urllib.request.urlopen('http://www.dr-chuck.com/page1.htm') for line in fhand: print(line.decode().strip()) <h1>The First Page</h1> <p>If you like, you can switch to the <a href="http://www.dr-chuck.com/page2.htm">Second Page</a>. </p> urllib2.py
  • 52. Following Links import urllib.request, urllib.parse, urllib.error fhand = urllib.request.urlopen('http://www.dr-chuck.com/page1.htm') for line in fhand: print(line.decode().strip()) <h1>The First Page</h1> <p>If you like, you can switch to the <a href="http://www.dr-chuck.com/page2.htm">Second Page</a>. </p> urllib2.py
  • 53. The First Lines of Code @ Google? import urllib.request, urllib.parse, urllib.error fhand = urllib.request.urlopen('http://www.dr-chuck.com/page1.htm') for line in fhand: print(line.decode().strip()) urllib2.py
  • 55. What is Web Scraping? • When a program or script pretends to be a browser and retrieves web pages, looks at those web pages, extracts information, and then looks at more web pages • Search engines scrape web pages - we call this “spidering the web” or “web crawling” http://en.wikipedia.org/wiki/Web_scraping http://en.wikipedia.org/wiki/Web_crawler
  • 56. Why Scrape? • Pull data - particularly social data - who links to who? • Get your own data back out of some system that has no “export capability” • Monitor a site for new information • Spider the web to make a database for a search engine
  • 57. Scraping Web Pages • There is some controversy about web page scraping and some sites are a bit snippy about it. • Republishing copyrighted information is not allowed • Violating terms of service is not allowed
  • 58. The Easy Way - Beautiful Soup • You could do string searches the hard way • Or use the free software library called BeautifulSoup from www.crummy.com https://www.crummy.com/software/BeautifulSoup/
  • 59. # To run this, you can install BeautifulSoup # https://pypi.python.org/pypi/beautifulsoup4 # Or download the file # http://www.py4e.com/code3/bs4.zip # and unzip it in the same directory as this file import urllib.request, urllib.parse, urllib.error from bs4 import BeautifulSoup ... urllinks.py BeautifulSoup Installation
  • 60. import urllib.request, urllib.parse, urllib.error from bs4 import BeautifulSoup url = input('Enter - ') html = urllib.request.urlopen(url).read() soup = BeautifulSoup(html, 'html.parser') # Retrieve all of the anchor tags tags = soup('a') for tag in tags: print(tag.get('href', None)) python urllinks.py Enter - http://www.dr-chuck.com/page1.htm http://www.dr-chuck.com/page2.htm
  • 61. Summary • The TCP/IP gives us pipes / sockets between applications • We designed application protocols to make use of these pipes • HyperText Transfer Protocol (HTTP) is a simple yet powerful protocol • Python has good support for sockets, HTTP, and HTML parsing
  • 62. Acknowledgements / Contributions Thes slide are Copyright 2010- Charles R. Severance (www.dr- chuck.com) of the University of Michigan School of Information and open.umich.edu and made available under a Creative Commons Attribution 4.0 License. Please maintain this last slide in all copies of the document to comply with the attribution requirements of the license. If you make a change, feel free to add your name and organization to the list of contributors on this page as you republish the materials. Initial Development: Charles Severance, University of Michigan School of Information … Insert new Contributors here ...