SlideShare a Scribd company logo
XML processing with Perl For the 2 nd  YPPUG session by Joe Jiang  [email_address]
XML is a data format, not a language We use it in financial & searching. DMP can also support it, but not as good as text/HTML. Many people use it for configuration files. I have used it at Perl book translation. For  example :  ...  $book/> count $book//sect1 117 $book/> count $book//sect2 149 $book/> count $book//para 4691 # Wah, it's a big book :)  
The tool to work with XML It's named  XML::XSH2 , by Petr Pajas And it take an  useful  utility named  xsh Which is based on XML::LibXSLT and XML::SAX::Writer, and ... Which is based on XML::LibXML and a lot of ... So you should not expect flat/easy installation :) But it's still possible to be built with  cpanm  utility So I suggest to install  cpanm  first $ curl -kL http://cpanmin.us | perl - --sudo App::cpanminus  $ cpanm -S XML::XSH2 # already made it at  dev , so you can just run: xsh #  ! Finding XML::XSH2 on cpanmetadb failed. # This kind of info is common
How is it used? XPath plus verbs $scratch/> $book := open english-tidyup.xml parsing english-tidyup.xml done.   $book/> cd  //book/chapter[1] $book/book/chapter[1]> ls  title <title>Introduction</title> $book/book/chapter[1]> cd  / $book/> ls  //chapter/title <title>Introduction</title> <title>Filesystems</title> <title>User Accounts</title> ...
Good at pipeline processing $book/> ls  //sect1//para/text()  | wc -w Found 12398 node(s). 150879   Use &quot;wc -m&quot; for Chinese char count. Or make fun with frequency statistics, for top 100 used words:   $book/> ls  //sect1//para/text()  | perl -MList::MoreUtils=natatime -lane 'END{ $it = natatime 100, sort {$cnt{$b} <=> $cnt{$a}} keys %cnt; print for map {join qq(\t), $_, $cnt{$_}} $it->() } $cnt{$_}++ for @F' ... data    483 ... Perl    437 ... file    426 ...
It can be used for conversion #1 $scratch/> $x:=open ArticleInfo_9.xml; parsing ArticleInfo_9.xml done. $x/> ls $x <?xml version=&quot;1.0&quot; encoding=&quot;utf-16&quot;?> < 小样 >          < 标题 ><![CDATA[ 第一推荐 ]]></ 标题 >          < 作者 ><![CDATA[]]></ 作者 >          < 内容 ><![CDATA[   华为美国拓展求解   华为对美国市场的执着显示出中国企业走出去的急切需要,但这样高调注定要经受更多挫折。 ]]></ 内容 >          < 附图 >                  < 简图 >                          < 文件名 >../cnmlfiles/A01/A01Ab25C005_b.jpg</ 文件名 >                          < 高 >260</ 高 >                          < 宽 >245</ 宽 >                  </ 简图 >          </ 附图 > </ 小样 >
Now building an empty xHTML #2 $x/> $y:=new html; $y/> ls $y <?xml version=&quot;1.0&quot; encoding=&quot;utf-8&quot;?> <html/> $y/> xadd element &quot;<head/>&quot; into $y/html;  #xadd is just alias of insert $y/> ls $y <?xml version=&quot;1.0&quot; encoding=&quot;utf-8&quot;?> <html>    <head/> </html>   $y/> xadd element &quot;<title/>&quot; into $y/html/head; $y/> xadd element &quot;<body/>&quot; into $y/html; $y/> ls $y <?xml version=&quot;1.0&quot; encoding=&quot;utf-8&quot;?> <html>    <head>      <title/>    </head>    <body/> </html>
Copy contents into xHTML #3 $y/> xadd text $x// 小样 / 标题 /text() into $y/html/head/title; $y/> ls $y <?xml version=&quot;1.0&quot; encoding=&quot;utf-8&quot;?> <html>    <head>      <title> 第一推荐 </title>    </head>    <body/> </html> $y/> xadd text $x// 小样 / 内容 /text() into $y/html/body; $y/> save --file x.html $y; Document saved into file 'x.html'. $y/>Good bye! $  cat x.html <?xml version=&quot;1.0&quot; encoding=&quot;utf-8&quot;?> <html>    <head>      <title> 第一推荐 </title>    </head>    <body>   华为美国拓展求解   华为对美国市场的执着显示出中国企业走出去的急切需要,但这样高调注定要经受更多挫折。 </body> </html>
XSLT is a focused XML conversion language, based on XPath <? xml version = &quot;1.0&quot;  encoding = &quot;ISO-8859-1&quot; ?> < xsl : stylesheet   version = &quot;1.0&quot;   xmlns : xsl = &quot; http://www.w3.org/1999/XSL/Transform &quot; > < xsl : template   match = &quot;/perldata/hashref&quot; >    <table  border = &quot;1&quot; >     <tr>      <th> Key </th>      <th> Value </th>     </tr>      < xsl : for-each   select = &quot;item&quot; >      <tr>       <td>< xsl : value-of   select = &quot;@key&quot; /></td>       <td>< xsl : value-of   select = &quot;.&quot; /></td>      </tr> </ xsl : for-each > </table> </ xsl : template > </ xsl : stylesheet >
This works well with XML::Dumper $ perl -MXML::Dumper -e 'print pl2xml(\%INC)' | xsltproc hashref.xsl - | w3m -T text/html We can use xsltproc to convert the DocBook book to HTML And to PDF, with another utility named fop Or generate MSWord doc file from openoffice With the help from  openoffice docbook XSLT filter
Now you have been equipped with another tool named XML   Thanks all for the magic! Module Name Author Version XML::Dumper  MIKEWONG 0.81 XML::Simple  GRANTM 2.18 XML::LibXML  PAJAS 1.87 XML::XPath MSERGEANT 1.13 XML::XSH2  PAJAS 2.1.3 XML::Twig  MIROD 3.38

More Related Content

ODP
Architecting Web Services
PPT
XML and Web Services with PHP5 and PEAR
PPT
PPT
The Big Documentation Extravaganza
PPT
Inroduction to XSLT with PHP4
PPT
Grddl In A Nutshell V1
ODP
PHPTAL introduction
Architecting Web Services
XML and Web Services with PHP5 and PEAR
The Big Documentation Extravaganza
Inroduction to XSLT with PHP4
Grddl In A Nutshell V1
PHPTAL introduction

What's hot (14)

PPT
REST, HTTP, and the PATCH verb (with kittens)
PPT
PPT
Justmeans power point
PPT
Justmeans power point
PPT
PHP Presentation
PDF
WordPress APIs
PPT
Php intro
PPT
PHP Presentation
PPT
JSP Custom Tags
PPT
Php Training
PPT
CSIS 138 Javascript Class1
PPT
Phpwebdevelping
PPT
ImplementingChangeTrackingAndFlagging
PPT
Processing XML with Java
REST, HTTP, and the PATCH verb (with kittens)
Justmeans power point
Justmeans power point
PHP Presentation
WordPress APIs
Php intro
PHP Presentation
JSP Custom Tags
Php Training
CSIS 138 Javascript Class1
Phpwebdevelping
ImplementingChangeTrackingAndFlagging
Processing XML with Java
Ad

Viewers also liked (8)

PPT
MSP programme
PPT
香港六合彩-六合彩
PPT
Final Msp Pgm
PPTX
iData Technologies Work With DD Agencies
PPTX
2015 pmiwg
PPS
Goodfriend
PPT
Web Globalization
PPT
PW Workshop
MSP programme
香港六合彩-六合彩
Final Msp Pgm
iData Technologies Work With DD Agencies
2015 pmiwg
Goodfriend
Web Globalization
PW Workshop
Ad

Similar to XML processing with perl (20)

PPTX
Xml For Dummies Chapter 12 Handling Transformations With Xsl it-slideshares...
PPT
3 xml namespaces and xml schema
PPT
Introduction To Xml
PPT
Introduction to XML
ODP
Phing - A PHP Build Tool (An Introduction)
PPT
PPT
Introduction To Lamp
PPT
Html tutorial
PPT
Web Scraper Shibuya.pm tech talk #8
PPT
PPT
Csphtp1 18
PPT
Forum Presentation
PPT
XML Training Presentation
PDF
OSCON 2004: XML and Apache
PPT
Transforming Xml Data Into Html
PPT
KMUTNB - Internet Programming 3/7
PPS
Quick Referance to WML
PPT
XML Transformations With PHP
PPT
Html Ppt
PPT
The JSON Saga
Xml For Dummies Chapter 12 Handling Transformations With Xsl it-slideshares...
3 xml namespaces and xml schema
Introduction To Xml
Introduction to XML
Phing - A PHP Build Tool (An Introduction)
Introduction To Lamp
Html tutorial
Web Scraper Shibuya.pm tech talk #8
Csphtp1 18
Forum Presentation
XML Training Presentation
OSCON 2004: XML and Apache
Transforming Xml Data Into Html
KMUTNB - Internet Programming 3/7
Quick Referance to WML
XML Transformations With PHP
Html Ppt
The JSON Saga

Recently uploaded (20)

PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
PPTX
Cloud computing and distributed systems.
PDF
madgavkar20181017ppt McKinsey Presentation.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Electronic commerce courselecture one. Pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
CIFDAQ's Market Wrap: Ethereum Leads, Bitcoin Lags, Institutions Shift
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
How Onsite IT Support Drives Business Efficiency, Security, and Growth.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Empathic Computing: Creating Shared Understanding
PDF
cuic standard and advanced reporting.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Per capita expenditure prediction using model stacking based on satellite ima...
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
Cloud computing and distributed systems.
madgavkar20181017ppt McKinsey Presentation.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Electronic commerce courselecture one. Pdf
Big Data Technologies - Introduction.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Understanding_Digital_Forensics_Presentation.pptx
CIFDAQ's Market Wrap: Ethereum Leads, Bitcoin Lags, Institutions Shift
Advanced methodologies resolving dimensionality complications for autism neur...
How Onsite IT Support Drives Business Efficiency, Security, and Growth.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Empathic Computing: Creating Shared Understanding
cuic standard and advanced reporting.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Review of recent advances in non-invasive hemoglobin estimation
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...

XML processing with perl

  • 1. XML processing with Perl For the 2 nd YPPUG session by Joe Jiang [email_address]
  • 2. XML is a data format, not a language We use it in financial & searching. DMP can also support it, but not as good as text/HTML. Many people use it for configuration files. I have used it at Perl book translation. For example : ... $book/> count $book//sect1 117 $book/> count $book//sect2 149 $book/> count $book//para 4691 # Wah, it's a big book :)  
  • 3. The tool to work with XML It's named XML::XSH2 , by Petr Pajas And it take an useful utility named xsh Which is based on XML::LibXSLT and XML::SAX::Writer, and ... Which is based on XML::LibXML and a lot of ... So you should not expect flat/easy installation :) But it's still possible to be built with cpanm utility So I suggest to install cpanm first $ curl -kL http://cpanmin.us | perl - --sudo App::cpanminus $ cpanm -S XML::XSH2 # already made it at dev , so you can just run: xsh # ! Finding XML::XSH2 on cpanmetadb failed. # This kind of info is common
  • 4. How is it used? XPath plus verbs $scratch/> $book := open english-tidyup.xml parsing english-tidyup.xml done.   $book/> cd //book/chapter[1] $book/book/chapter[1]> ls title <title>Introduction</title> $book/book/chapter[1]> cd / $book/> ls //chapter/title <title>Introduction</title> <title>Filesystems</title> <title>User Accounts</title> ...
  • 5. Good at pipeline processing $book/> ls //sect1//para/text() | wc -w Found 12398 node(s). 150879   Use &quot;wc -m&quot; for Chinese char count. Or make fun with frequency statistics, for top 100 used words:   $book/> ls //sect1//para/text() | perl -MList::MoreUtils=natatime -lane 'END{ $it = natatime 100, sort {$cnt{$b} <=> $cnt{$a}} keys %cnt; print for map {join qq(\t), $_, $cnt{$_}} $it->() } $cnt{$_}++ for @F' ... data    483 ... Perl    437 ... file    426 ...
  • 6. It can be used for conversion #1 $scratch/> $x:=open ArticleInfo_9.xml; parsing ArticleInfo_9.xml done. $x/> ls $x <?xml version=&quot;1.0&quot; encoding=&quot;utf-16&quot;?> < 小样 >          < 标题 ><![CDATA[ 第一推荐 ]]></ 标题 >          < 作者 ><![CDATA[]]></ 作者 >          < 内容 ><![CDATA[   华为美国拓展求解   华为对美国市场的执着显示出中国企业走出去的急切需要,但这样高调注定要经受更多挫折。 ]]></ 内容 >          < 附图 >                  < 简图 >                          < 文件名 >../cnmlfiles/A01/A01Ab25C005_b.jpg</ 文件名 >                          < 高 >260</ 高 >                          < 宽 >245</ 宽 >                  </ 简图 >          </ 附图 > </ 小样 >
  • 7. Now building an empty xHTML #2 $x/> $y:=new html; $y/> ls $y <?xml version=&quot;1.0&quot; encoding=&quot;utf-8&quot;?> <html/> $y/> xadd element &quot;<head/>&quot; into $y/html; #xadd is just alias of insert $y/> ls $y <?xml version=&quot;1.0&quot; encoding=&quot;utf-8&quot;?> <html>    <head/> </html>   $y/> xadd element &quot;<title/>&quot; into $y/html/head; $y/> xadd element &quot;<body/>&quot; into $y/html; $y/> ls $y <?xml version=&quot;1.0&quot; encoding=&quot;utf-8&quot;?> <html>    <head>      <title/>    </head>    <body/> </html>
  • 8. Copy contents into xHTML #3 $y/> xadd text $x// 小样 / 标题 /text() into $y/html/head/title; $y/> ls $y <?xml version=&quot;1.0&quot; encoding=&quot;utf-8&quot;?> <html>    <head>      <title> 第一推荐 </title>    </head>    <body/> </html> $y/> xadd text $x// 小样 / 内容 /text() into $y/html/body; $y/> save --file x.html $y; Document saved into file 'x.html'. $y/>Good bye! $ cat x.html <?xml version=&quot;1.0&quot; encoding=&quot;utf-8&quot;?> <html>    <head>      <title> 第一推荐 </title>    </head>    <body>   华为美国拓展求解   华为对美国市场的执着显示出中国企业走出去的急切需要,但这样高调注定要经受更多挫折。 </body> </html>
  • 9. XSLT is a focused XML conversion language, based on XPath <? xml version = &quot;1.0&quot; encoding = &quot;ISO-8859-1&quot; ?> < xsl : stylesheet version = &quot;1.0&quot; xmlns : xsl = &quot; http://www.w3.org/1999/XSL/Transform &quot; > < xsl : template match = &quot;/perldata/hashref&quot; >   <table border = &quot;1&quot; >    <tr>     <th> Key </th>     <th> Value </th>    </tr>     < xsl : for-each select = &quot;item&quot; >     <tr>      <td>< xsl : value-of select = &quot;@key&quot; /></td>      <td>< xsl : value-of select = &quot;.&quot; /></td>     </tr> </ xsl : for-each > </table> </ xsl : template > </ xsl : stylesheet >
  • 10. This works well with XML::Dumper $ perl -MXML::Dumper -e 'print pl2xml(\%INC)' | xsltproc hashref.xsl - | w3m -T text/html We can use xsltproc to convert the DocBook book to HTML And to PDF, with another utility named fop Or generate MSWord doc file from openoffice With the help from openoffice docbook XSLT filter
  • 11. Now you have been equipped with another tool named XML   Thanks all for the magic! Module Name Author Version XML::Dumper MIKEWONG 0.81 XML::Simple GRANTM 2.18 XML::LibXML PAJAS 1.87 XML::XPath MSERGEANT 1.13 XML::XSH2 PAJAS 2.1.3 XML::Twig MIROD 3.38