XML namespaces and XPath with Python

XML namespaces and XPath with Python
March 2, 2016
1 XML namespaces and XPath with Python
Thomas Aglassinger http://www.roskakori.at
2 XML
• eXtensible Markup Language
• a blueprint for other file formats
• can represent sequences and hierarchies
• text based (binary somewhat possible using e.g. UUEncode)
• human readable
• somewhat verbose
• supports a Document Object Model (DOM)
2.1 XML with Python
• xml: part if the standard library
• xml.etree.ElementTree - XML as pythonic Trees
• xml.dom.mindom - DOM, warts and all
• xml.sax - sequential parsing of large documents
• works, but has limited support for namespaces, XPath etc.
• lxml: available from http://lxml.de/
• Python wrapper to C based XML libraries
• full support for namespaces, XPath, schemas etc
• universally used for “serious” XML processing
2.2 Example XML file
<?xml version="1.0" encoding="utf-8"?>
<people:list xmlns:people="https://www.example.org/xml/people">
<people:updated date="2016-02-16" />
<people:person name="Alice" phone="0650/12345678" size="172" />
<people:person name="Bob" phone="0654/23456789" size="167" />
<people:person name="Bärbel" phone="0699/34567890" size="182" />
<people:person name="Günther" size="172">
<people:note>Ask for phone number.</people:note>
</people:person>
</people:list>
1

2.3 XML namespaces
In our example
xmlns:people="https://www.example.org/xml/people"
assigns the shortcut people to the namespace identified by https://www.example.org/xml/people.
2.4 XPath
XPath is a query language to find nodes in XML documents. Examples:
• /people:list/people:person - all person elements in the document
• /people:list/people:person[@phone] - all person elements in the document with a phone attribute
Tutorial: http://www.w3schools.com/xsl/xpath intro.asp
3 Extract information from XML
3.1 Read the document root
Compute the path to our example XML file:
In [1]: import os.path
people_xml_path = os.path.join(’examples’, ’people.xml’)
Build the document root from the file:
In [2]: from lxml import etree
people_root = etree.parse(people_xml_path)
3.2 Setup the namespace
In [3]: NAMESPACES = {
’people’: ’https://www.example.org/xml/people’,
}
3.3 Find persons and print details
In [4]: # Find persons matching XPath.
person_elements = people_root.xpath(
’/people:list/people:person[@phone]’,
namespaces=NAMESPACES)
# Print name and phone of persons found.
for person_element in person_elements:
print(
person_element.attrib[’name’] + ’: ’ +
person_element.attrib[’phone’])
Alice: 0650/12345678
Bob: 0654/23456789
Bärbel: 0699/34567890
2

3.4 Examining XML elements
Elements have a tag, where namespaces are represente using the Clark notation {namespace}tag:
In [5]: person_element.tag
Out[5]: ’{https://www.example.org/xml/people}person’
XML attributes are a simlpe dictionary:
In [6]: person_element.attrib
Out[6]: {’phone’: ’0699/34567890’, ’name’: ’B¨arbel’, ’size’: ’182’}
3.5 Text nodes
Print notes about persons without a phone:
In [7]: note_elements_for_persons_without_phone =
people_root.xpath(
’/people:list/people:person[not(@phone)]/people:note’,
namespaces=NAMESPACES)
for note_element in note_elements_for_persons_without_phone:
person_element = note_element.getparent()
person_name = person_element.attrib[’name’]
note_text = note_element.text
print(person_name + ’: ’ + note_text)
G¨unther: Ask for phone number.
Use getparent() to access the enclosing XML element (as seen above).
4 Summary
• XML will be around for the foreseeable future so learn to deal with it.
• Use lxmlfor any serious XML work in Python.
• Namespaces and XPath can be taimed.
3

XML namespaces and XPath with Python

More Related Content

What's hot (20)

Similar to XML namespaces and XPath with Python (20)

More from roskakori (18)

Recently uploaded (20)

XML namespaces and XPath with Python