SlideShare a Scribd company logo
XML namespaces and XPath with Python
March 2, 2016
1 XML namespaces and XPath with Python
Thomas Aglassinger http://www.roskakori.at
2 XML
• eXtensible Markup Language
• a blueprint for other file formats
• can represent sequences and hierarchies
• text based (binary somewhat possible using e.g. UUEncode)
• human readable
• somewhat verbose
• supports a Document Object Model (DOM)
2.1 XML with Python
• xml: part if the standard library
• xml.etree.ElementTree - XML as pythonic Trees
• xml.dom.mindom - DOM, warts and all
• xml.sax - sequential parsing of large documents
• works, but has limited support for namespaces, XPath etc.
• lxml: available from http://lxml.de/
• Python wrapper to C based XML libraries
• full support for namespaces, XPath, schemas etc
• universally used for “serious” XML processing
2.2 Example XML file
<?xml version="1.0" encoding="utf-8"?>
<people:list xmlns:people="https://www.example.org/xml/people">
<people:updated date="2016-02-16" />
<people:person name="Alice" phone="0650/12345678" size="172" />
<people:person name="Bob" phone="0654/23456789" size="167" />
<people:person name="B¨arbel" phone="0699/34567890" size="182" />
<people:person name="G¨unther" size="172">
<people:note>Ask for phone number.</people:note>
</people:person>
</people:list>
1
2.3 XML namespaces
In our example
xmlns:people="https://www.example.org/xml/people"
assigns the shortcut people to the namespace identified by https://www.example.org/xml/people.
2.4 XPath
XPath is a query language to find nodes in XML documents. Examples:
• /people:list/people:person - all person elements in the document
• /people:list/people:person[@phone] - all person elements in the document with a phone attribute
Tutorial: http://www.w3schools.com/xsl/xpath intro.asp
3 Extract information from XML
3.1 Read the document root
Compute the path to our example XML file:
In [1]: import os.path
people_xml_path = os.path.join(’examples’, ’people.xml’)
Build the document root from the file:
In [2]: from lxml import etree
people_root = etree.parse(people_xml_path)
3.2 Setup the namespace
In [3]: NAMESPACES = {
’people’: ’https://www.example.org/xml/people’,
}
3.3 Find persons and print details
In [4]: # Find persons matching XPath.
person_elements = people_root.xpath(
’/people:list/people:person[@phone]’,
namespaces=NAMESPACES)
# Print name and phone of persons found.
for person_element in person_elements:
print(
person_element.attrib[’name’] + ’: ’ +
person_element.attrib[’phone’])
Alice: 0650/12345678
Bob: 0654/23456789
B¨arbel: 0699/34567890
2
3.4 Examining XML elements
Elements have a tag, where namespaces are represente using the Clark notation {namespace}tag:
In [5]: person_element.tag
Out[5]: ’{https://www.example.org/xml/people}person’
XML attributes are a simlpe dictionary:
In [6]: person_element.attrib
Out[6]: {’phone’: ’0699/34567890’, ’name’: ’B¨arbel’, ’size’: ’182’}
3.5 Text nodes
Print notes about persons without a phone:
In [7]: note_elements_for_persons_without_phone = 
people_root.xpath(
’/people:list/people:person[not(@phone)]/people:note’,
namespaces=NAMESPACES)
for note_element in note_elements_for_persons_without_phone:
person_element = note_element.getparent()
person_name = person_element.attrib[’name’]
note_text = note_element.text
print(person_name + ’: ’ + note_text)
G¨unther: Ask for phone number.
Use getparent() to access the enclosing XML element (as seen above).
4 Summary
• XML will be around for the foreseeable future so learn to deal with it.
• Use lxmlfor any serious XML work in Python.
• Namespaces and XPath can be taimed.
3

More Related Content

What's hot (20)

PDF
ΓΛΩΣΣΑ C - ΜΑΘΗΜΑ 18
Dimitris Psounis
 
DOCX
1 βιογραφικό νέο
Maria Maili
 
PDF
Φύλλο εργασίας για HTML & CSS
lyk-tragaias
 
PPT
δικτυα υπολογιστων λουδαρου
Vasso Servou
 
PDF
Διδακτικό_Σενάριο_Ιστορίας
Georgia Palapela
 
PDF
Εφαρμογές Πληροφορικής Α' ΓΕΛ και Α' ΕΠΑΛ
Vassilis Efopoulos
 
PDF
ΓΛΩΣΣΑ C - ΜΑΘΗΜΑ 9
Dimitris Psounis
 
PDF
ΑΛΓΟΡΙΘΜΟΙ ΣΕ C - ΜΑΘΗΜΑ 1 - ΔΙΑΠΕΡΑΣΗ ΠΙΝΑΚΑ
Dimitris Psounis
 
PDF
Europass-CV-20150908-Μπλιαμπλιας-EL
Bliablias Xristos
 
PPT
γιατί, κύριε, οι τρεις ιεράρχες
Konstantinos Bourdas
 
DOCX
Πρόβλημα
Katerina Drimili
 
PPT
05 απόλυτη σχετική διαδρομή
Ιωάννου Γιαννάκης
 
PDF
ΓΛΩΣΣΑ C - ΜΑΘΗΜΑ 6 (ΕΚΤΥΠΩΣΗ)
Dimitris Psounis
 
PDF
5.3 saas paasiaas
AnastasiaStathopoulo5
 
PPTX
ΚΕΦΑΛΑΙΟ 3 - ΕΡΓΟΝΟΜΙΑ
Zisis Lazakis
 
DOC
Διαγώνισμα Πληροφορικής Α Γυμνασίου
Fotini Pog
 
DOC
εκπαιδευτικό σενάριο ασφάλεια στο διαδίκτυο
cpapadak
 
DOC
Πληροφορική A Γυμνασίου Διαγώνισμα 1
Fotini Pog
 
PPTX
Α΄Λυκείου - Εφαρμοφές Πληροφορικής- Υπηρεσιες διαδικτυου
ΕΥΑΓΓΕΛΙΑ ΚΟΚΚΙΝΟΥ
 
PPTX
Εφαρμογές νέφους (cloud computing)
ΕΥΑΓΓΕΛΙΑ ΚΟΚΚΙΝΟΥ
 
ΓΛΩΣΣΑ C - ΜΑΘΗΜΑ 18
Dimitris Psounis
 
1 βιογραφικό νέο
Maria Maili
 
Φύλλο εργασίας για HTML & CSS
lyk-tragaias
 
δικτυα υπολογιστων λουδαρου
Vasso Servou
 
Διδακτικό_Σενάριο_Ιστορίας
Georgia Palapela
 
Εφαρμογές Πληροφορικής Α' ΓΕΛ και Α' ΕΠΑΛ
Vassilis Efopoulos
 
ΓΛΩΣΣΑ C - ΜΑΘΗΜΑ 9
Dimitris Psounis
 
ΑΛΓΟΡΙΘΜΟΙ ΣΕ C - ΜΑΘΗΜΑ 1 - ΔΙΑΠΕΡΑΣΗ ΠΙΝΑΚΑ
Dimitris Psounis
 
Europass-CV-20150908-Μπλιαμπλιας-EL
Bliablias Xristos
 
γιατί, κύριε, οι τρεις ιεράρχες
Konstantinos Bourdas
 
Πρόβλημα
Katerina Drimili
 
05 απόλυτη σχετική διαδρομή
Ιωάννου Γιαννάκης
 
ΓΛΩΣΣΑ C - ΜΑΘΗΜΑ 6 (ΕΚΤΥΠΩΣΗ)
Dimitris Psounis
 
5.3 saas paasiaas
AnastasiaStathopoulo5
 
ΚΕΦΑΛΑΙΟ 3 - ΕΡΓΟΝΟΜΙΑ
Zisis Lazakis
 
Διαγώνισμα Πληροφορικής Α Γυμνασίου
Fotini Pog
 
εκπαιδευτικό σενάριο ασφάλεια στο διαδίκτυο
cpapadak
 
Πληροφορική A Γυμνασίου Διαγώνισμα 1
Fotini Pog
 
Α΄Λυκείου - Εφαρμοφές Πληροφορικής- Υπηρεσιες διαδικτυου
ΕΥΑΓΓΕΛΙΑ ΚΟΚΚΙΝΟΥ
 
Εφαρμογές νέφους (cloud computing)
ΕΥΑΓΓΕΛΙΑ ΚΟΚΚΙΝΟΥ
 

Similar to XML namespaces and XPath with Python (20)

PPTX
Xml presentation
Miguel Angel Teheran Garcia
 
PPTX
Xml transformation language
reshmavasudev
 
PDF
Xpath tutorial
Ashoka Vanjare
 
PPT
Xpath.ppt
Prerak10
 
PPTX
Unit2_XML_S_SS_US Data_CS19414.pptx
NEHARAJPUT239591
 
PDF
02_Xpath.pdf
Prerak10
 
PPT
XPath Injection
Roberto Suggi Liverani
 
PDF
XML Tools for Perl
Geir Aalberg
 
PPTX
XPath Introduction
Stuart Myles
 
PDF
Extensible markup language attacks
n|u - The Open Security Community
 
PPTX
Extracting data from xml
Kumar
 
PPT
03 x files
Baskarkncet
 
PPT
Xml and DTD's
Swati Parmar
 
PPT
xml.ppt
RajaGanesan14
 
PDF
E05412327
IOSR-JEN
 
PPTX
Web Information Systems XML
Artificial Intelligence Institute at UofSC
 
PPT
XML Presentation-2
Sudharsan S
 
PPTX
Introductionto xslt
Kumar
 
PPT
Extensible Markup Language - XML || Presentation
UjjwalVerma43
 
PDF
Xml Demystified
Viraf Karai
 
Xml presentation
Miguel Angel Teheran Garcia
 
Xml transformation language
reshmavasudev
 
Xpath tutorial
Ashoka Vanjare
 
Xpath.ppt
Prerak10
 
Unit2_XML_S_SS_US Data_CS19414.pptx
NEHARAJPUT239591
 
02_Xpath.pdf
Prerak10
 
XPath Injection
Roberto Suggi Liverani
 
XML Tools for Perl
Geir Aalberg
 
XPath Introduction
Stuart Myles
 
Extensible markup language attacks
n|u - The Open Security Community
 
Extracting data from xml
Kumar
 
03 x files
Baskarkncet
 
Xml and DTD's
Swati Parmar
 
xml.ppt
RajaGanesan14
 
E05412327
IOSR-JEN
 
Web Information Systems XML
Artificial Intelligence Institute at UofSC
 
XML Presentation-2
Sudharsan S
 
Introductionto xslt
Kumar
 
Extensible Markup Language - XML || Presentation
UjjwalVerma43
 
Xml Demystified
Viraf Karai
 
Ad

More from roskakori (18)

PDF
Expanding skill sets - Broaden your perspective on design
roskakori
 
PPTX
Django trifft Flutter
roskakori
 
PDF
Multiple django applications on a single server with nginx
roskakori
 
PDF
Helpful pre commit hooks for Python and Django
roskakori
 
PDF
Startmeeting Interessengruppe NLP NLU Graz
roskakori
 
PDF
Helpful logging with python
roskakori
 
PDF
Helpful logging with Java
roskakori
 
PDF
Einführung in Kommunikation und Konfliktmanagement für Software-Entwickler
roskakori
 
PDF
Analyzing natural language feedback using python
roskakori
 
PDF
Microsoft SQL Server with Linux and Docker
roskakori
 
PDF
Migration to Python 3 in Finance
roskakori
 
PDF
Introduction to pygments
roskakori
 
PDF
Lösungsorientierte Fehlerbehandlung
roskakori
 
PDF
Erste-Hilfekasten für Unicode mit Python
roskakori
 
PDF
Introduction to trader bots with Python
roskakori
 
PDF
Open source projects with python
roskakori
 
PDF
Python builds mit ant
roskakori
 
PPT
Kanban zur Abwicklung von Reporting-Anforderungen
roskakori
 
Expanding skill sets - Broaden your perspective on design
roskakori
 
Django trifft Flutter
roskakori
 
Multiple django applications on a single server with nginx
roskakori
 
Helpful pre commit hooks for Python and Django
roskakori
 
Startmeeting Interessengruppe NLP NLU Graz
roskakori
 
Helpful logging with python
roskakori
 
Helpful logging with Java
roskakori
 
Einführung in Kommunikation und Konfliktmanagement für Software-Entwickler
roskakori
 
Analyzing natural language feedback using python
roskakori
 
Microsoft SQL Server with Linux and Docker
roskakori
 
Migration to Python 3 in Finance
roskakori
 
Introduction to pygments
roskakori
 
Lösungsorientierte Fehlerbehandlung
roskakori
 
Erste-Hilfekasten für Unicode mit Python
roskakori
 
Introduction to trader bots with Python
roskakori
 
Open source projects with python
roskakori
 
Python builds mit ant
roskakori
 
Kanban zur Abwicklung von Reporting-Anforderungen
roskakori
 
Ad

Recently uploaded (20)

PDF
Plugging AI into everything: Model Context Protocol Simplified.pdf
Abati Adewale
 
PPTX
UserCon Belgium: Honey, VMware increased my bill
stijn40
 
PDF
ArcGIS Utility Network Migration - The Hunter Water Story
Safe Software
 
PPTX
Practical Applications of AI in Local Government
OnBoard
 
DOCX
Daily Lesson Log MATATAG ICT TEchnology 8
LOIDAALMAZAN3
 
PDF
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
 
PDF
Darley - FIRST Copenhagen Lightning Talk (2025-06-26) Epochalypse 2038 - Time...
treyka
 
PDF
Salesforce Summer '25 Release Frenchgathering.pptx.pdf
yosra Saidani
 
PDF
Python Conference Singapore - 19 Jun 2025
ninefyi
 
PPTX
Enabling the Digital Artisan – keynote at ICOCI 2025
Alan Dix
 
PDF
Open Source Milvus Vector Database v 2.6
Zilliz
 
PDF
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) Slides
Ravi Tamada
 
PDF
The Growing Value and Application of FME & GenAI
Safe Software
 
PDF
Java 25 and Beyond - A Roadmap of Innovations
Ana-Maria Mihalceanu
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Poster...
Michele Kryston
 
PDF
5 Things to Consider When Deploying AI in Your Enterprise
Safe Software
 
PDF
From Chatbot to Destroyer of Endpoints - Can ChatGPT Automate EDR Bypasses (1...
Priyanka Aash
 
PDF
The Future of Product Management in AI ERA.pdf
Alyona Owens
 
PDF
Automating the Geo-Referencing of Historic Aerial Photography in Flanders
Safe Software
 
Plugging AI into everything: Model Context Protocol Simplified.pdf
Abati Adewale
 
UserCon Belgium: Honey, VMware increased my bill
stijn40
 
ArcGIS Utility Network Migration - The Hunter Water Story
Safe Software
 
Practical Applications of AI in Local Government
OnBoard
 
Daily Lesson Log MATATAG ICT TEchnology 8
LOIDAALMAZAN3
 
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
 
Darley - FIRST Copenhagen Lightning Talk (2025-06-26) Epochalypse 2038 - Time...
treyka
 
Salesforce Summer '25 Release Frenchgathering.pptx.pdf
yosra Saidani
 
Python Conference Singapore - 19 Jun 2025
ninefyi
 
Enabling the Digital Artisan – keynote at ICOCI 2025
Alan Dix
 
Open Source Milvus Vector Database v 2.6
Zilliz
 
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) Slides
Ravi Tamada
 
The Growing Value and Application of FME & GenAI
Safe Software
 
Java 25 and Beyond - A Roadmap of Innovations
Ana-Maria Mihalceanu
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Poster...
Michele Kryston
 
5 Things to Consider When Deploying AI in Your Enterprise
Safe Software
 
From Chatbot to Destroyer of Endpoints - Can ChatGPT Automate EDR Bypasses (1...
Priyanka Aash
 
The Future of Product Management in AI ERA.pdf
Alyona Owens
 
Automating the Geo-Referencing of Historic Aerial Photography in Flanders
Safe Software
 

XML namespaces and XPath with Python

  • 1. XML namespaces and XPath with Python March 2, 2016 1 XML namespaces and XPath with Python Thomas Aglassinger http://www.roskakori.at 2 XML • eXtensible Markup Language • a blueprint for other file formats • can represent sequences and hierarchies • text based (binary somewhat possible using e.g. UUEncode) • human readable • somewhat verbose • supports a Document Object Model (DOM) 2.1 XML with Python • xml: part if the standard library • xml.etree.ElementTree - XML as pythonic Trees • xml.dom.mindom - DOM, warts and all • xml.sax - sequential parsing of large documents • works, but has limited support for namespaces, XPath etc. • lxml: available from http://lxml.de/ • Python wrapper to C based XML libraries • full support for namespaces, XPath, schemas etc • universally used for “serious” XML processing 2.2 Example XML file <?xml version="1.0" encoding="utf-8"?> <people:list xmlns:people="https://www.example.org/xml/people"> <people:updated date="2016-02-16" /> <people:person name="Alice" phone="0650/12345678" size="172" /> <people:person name="Bob" phone="0654/23456789" size="167" /> <people:person name="B¨arbel" phone="0699/34567890" size="182" /> <people:person name="G¨unther" size="172"> <people:note>Ask for phone number.</people:note> </people:person> </people:list> 1
  • 2. 2.3 XML namespaces In our example xmlns:people="https://www.example.org/xml/people" assigns the shortcut people to the namespace identified by https://www.example.org/xml/people. 2.4 XPath XPath is a query language to find nodes in XML documents. Examples: • /people:list/people:person - all person elements in the document • /people:list/people:person[@phone] - all person elements in the document with a phone attribute Tutorial: http://www.w3schools.com/xsl/xpath intro.asp 3 Extract information from XML 3.1 Read the document root Compute the path to our example XML file: In [1]: import os.path people_xml_path = os.path.join(’examples’, ’people.xml’) Build the document root from the file: In [2]: from lxml import etree people_root = etree.parse(people_xml_path) 3.2 Setup the namespace In [3]: NAMESPACES = { ’people’: ’https://www.example.org/xml/people’, } 3.3 Find persons and print details In [4]: # Find persons matching XPath. person_elements = people_root.xpath( ’/people:list/people:person[@phone]’, namespaces=NAMESPACES) # Print name and phone of persons found. for person_element in person_elements: print( person_element.attrib[’name’] + ’: ’ + person_element.attrib[’phone’]) Alice: 0650/12345678 Bob: 0654/23456789 B¨arbel: 0699/34567890 2
  • 3. 3.4 Examining XML elements Elements have a tag, where namespaces are represente using the Clark notation {namespace}tag: In [5]: person_element.tag Out[5]: ’{https://www.example.org/xml/people}person’ XML attributes are a simlpe dictionary: In [6]: person_element.attrib Out[6]: {’phone’: ’0699/34567890’, ’name’: ’B¨arbel’, ’size’: ’182’} 3.5 Text nodes Print notes about persons without a phone: In [7]: note_elements_for_persons_without_phone = people_root.xpath( ’/people:list/people:person[not(@phone)]/people:note’, namespaces=NAMESPACES) for note_element in note_elements_for_persons_without_phone: person_element = note_element.getparent() person_name = person_element.attrib[’name’] note_text = note_element.text print(person_name + ’: ’ + note_text) G¨unther: Ask for phone number. Use getparent() to access the enclosing XML element (as seen above). 4 Summary • XML will be around for the foreseeable future so learn to deal with it. • Use lxmlfor any serious XML work in Python. • Namespaces and XPath can be taimed. 3