Showing posts with label standards. Show all posts
Showing posts with label standards. Show all posts

Monday, May 26, 2008

Converting XML to JSON

Why would I want to convert XML to JSON. Mainly because JSON is a subset of JavaScript (JavaScript Object Notation) and XML isn't. It is much easier to manipulate JavaScript Objects, then it is to manipulate XML. This is because Objects are native to JavaScript, where as XML requires an API, the DOM, which is harder to use. DOM implementations in browsers are not consistent, while you will find Objects and their methods more or less the same across browsers.

Since, most of the content/data available on the web is in XML format and not JSON, converting XML to JSON is necessary.

The main problem is that there is no standard way of converting XML to JSON. So when converting, we have to develop our own rules, or base them on the most widely used conversion rules. Lets see how the big boys do it.

Rules Google GData Uses to convert XML to JSON

A GData service creates a JSON-format feed by converting the XML feed, using the following rules:

Basic

  • The feed is represented as a JSON object; each nested element or attribute is represented as a name/value property of the object.
  • Attributes are converted to String properties.
  • Child elements are converted to Object properties.
  • Elements that may appear more than once are converted to Array properties.
  • Text values of tags are converted to $t properties.

Namespace

  • If an element has a namespace alias, the alias and element are concatenated using "$". For example, ns:element becomes ns$element.

XML

  • XML version and encoding attributes are converted to attribute version and encoding of the root element, respectively.

Google GData XML to JSON example

This is a hypothetical example, Google GData only deals with RSS and ATOM feeds.

<?xml version="1.0" encoding="UTF-8"?>
<example:user domain="example.com">
 <name>Joe</name>
 <status online="true">Away</status>
 <idle />
</example:user>
{
 "version": "1.0",
 "encoding": "UTF-8",
 "example$user" : {
  "domain" : "example.com",
   "name" : { "$t" : "Joe" },
   "status" : {
    "online" : "true",
    "$t" : "Away"
   },
   "idle" : null
  }
}

How Google converts XML to JSON is well documented. The main points being that XML node attributes become strings properties, the node data or text becomes $t properties and namespaces are concatenated with $.
http://code.google.com/apis/gdata/json.html#Background

Rules Yahoo Uses to convert XML to JSON

I could not find any documentation on the rules Yahoo uses to convert its XML to JSON in Yahoo Pipes, however, by looking the output of a pipe in RSS format and the corresponding JSON format you can get an idea of the rules used.

Basic

  • The feed is represented as a JSON object; each nested element or attribute is represented as a name/value property of the object.
  • Attributes are converted to String properties.
  • Child elements are converted to Object properties.
  • Elements that may appear more than once are converted to Array properties.
  • Text values of tags are converted to string properties of the parent node, if the node has no attributes.
  • Text values of tags are converted to content properties, if the node has attributes.

Namespace

  • Unknown.

XML

  • XML version and encoding attributes are removed/ignored - at least in the RSS sample I looked at.

The only problem I see with the rules Yahoo Pipes uses is that if an XML node has an attribute named "content", then it will conflict with the Text value of the node/element giving the programer an unexpected result.

Yahoo Pipes XML to JSON example

<?xml version="1.0" encoding="UTF-8"?>
<example:user domain="example.com">
 <name>Joe</name>
 <status online="true">Away</status>
 <idle />
</example:user>
{
 "example??user" : {
  "domain" : "example.com",
   "name" : "Joe",
   "status" : {
    "online" : "true",
    "content" : "Away",
   },
   "idle" : ??
  }
}

XML.com on rules to convert XML to JSON

The article on XML.com by Stefan Goessner gives a list of possible XML element structures and the corresponding JSON Objects.
http://www.xml.com/pub/a/2006/05/31/converting-between-xml-and-json.html

Pattern XML JSON Access
1 <e/> "e": null o.e
2 <e>text</e> "e": "text" o.e
3 <e name="value" /> "e":{"@name": "value"} o.e["@name"]
4 <e name="value">text</e> "e": { "@name": "value", "#text": "text" } o.e["@name"] o.e["#text"]
5 <e> <a>text</a> <b>text</b> </e> "e": { "a": "text", "b": "text" } o.e.a o.e.b
6 <e> <a>text</a> <a>text</a> </e> "e": { "a": ["text", "text"] } o.e.a[0] o.e.a[1]
7 <e> text <a>text</a> </e> "e": { "#text": "text", "a": "text" } o.e["#text"] o.e.a

If we translate this to the rules format given by Google it would look something like:

Basic

  • The feed is represented as a JSON object; each nested element or attribute is represented as a name/value property of the object.
  • Attributes are converted to @attribute properties. (attribute name preceeded by @)
  • Child elements are converted to Object properties, if the node has attributes or child nodes.
  • Elements that may appear more than once are converted to Array properties.
  • Text values of tags are converted to string properties of the parent node, if the node has no attributes or child nodes.
  • Text values of tags are converted to #text properties, if the node has attributes or child nodes.

Namespace

  • If an element has a namespace alias, the alias and element are concatenated using ":". For example, ns:element becomes ns:element. (ie: namespaced elements are treated as any other element)

XML

  • XML version and encoding attributes are not converted.

XML.com XML to JSON example

<?xml version="1.0" encoding="UTF-8"?>
<example:user domain="example.com">
 <name>Joe</name>
 <status online="true">Away</status>
 <idle />
</example:user>
{
 "example:user" : {
  "@attributes" : { "domain" : "example.com" },
   "name" : { "#text" : "Joe" },
   "status" : {
    "@attributes" : {"online" : "true"},
    "#text" : "Away"
   },
   "idle" : null
 }
}

Other rules being used to convert XML to JSON

Here is a blog on the topic of an XML to JSON standard. http://khanderaotech.blogspot.com/2007/03/mapping-between-xml-json-need-standard.html.
A good discussion on the differences between XML and JSON. http://blog.jclark.com/2007/04/xml-and-json.html

We need a standard way of converting XML to JSON

I'm tired of hearing the "XML vs JSON" debate. Why not just make them compatible. Now, that we see just how many different rules are being used, we can definitely see another reason why a standard would come in handy. But till then, I think I'll add to the confusion and come up with my own ruleset.

My rules of converting XML to JSON

My rules are simple and is based on the XML DOM. The DOM represents XML as DOM Objects and Methods. We will use the DOM objects only since JSON does not use methods. So each Element would be an Object, and each text node #text property and attributes an @attributes object with string properties of the attribute names. The only difference from the DOM Objects representation in JavaScript is the @ sign in front of the attributes Object name - this is to to avoid conflicts with elements named "attributes". The DOM goes around this by having public methods to select child nodes, and not public properties (the actual properties are private, and thus not available in an object notation).

Basic

  • The feed is represented as a JSON object; each nested element or attribute is represented as a name/value property of the object.
  • Attributes are converted to String properties of the @attributes property.
  • Child elements are converted to Object properties.
  • Elements that may appear more than once are converted to Array properties.
  • Text values of tags are converted to $text properties.

Namespace

  • Treat as any other element.

XML

  • XML version and encoding attributes are not converted.

In order to convert XML to JSON with JavaScript, you first have to convert the XML to a DOM Document (to make things simpler). Any major browser willd do this either automatically in the case of the XML/XHTML Document you are viewing, or an XML document retrieved via XMLHttpRequest. But if all you have is an XML string, something like this will do:

function TextToXML(strXML) {
 var xmlDoc = null;
 try {
  xmlDoc = (document.all)?new ActiveXObject("Microsoft.XMLDOM"):new DOMParser();
  xmlDoc.async = false;
 } catch(e) {throw new Error("XML Parser could not be instantiated");}
 var out;
 try {
  if(document.all) {
   out = (xmlDoc.loadXML(strXML))?xmlDoc:false;
  } else {  
   out = xmlDoc.parseFromString(strXML, "text/xml");
  }
 } catch(e) { throw new Error("Error parsing XML string"); }
 return out;
} 

This will give you the XML represented as a DOM Document, which you can traverse using the DOM methods.

Now all you'll have to do to convert the DOM Document to JSON is traverse it, and for every Element, create an Object, for its attributes create an @attributes Object, and a #text attribute for text nodes and repeat the process for any child elements.

/**
 * Convert XML to JSON Object
 * @param {Object} XML DOM Document
 */
xml2Json = function(xml) {
 var obj = {};
 
 if (xml.nodeType == 1) { // element
  // do attributes
  if (xml.attributes.length > 0) {
   obj['@attributes'] = {};
   for (var j = 0; j < xml.attributes.length; j++) {
    obj['@attributes'][xml.attributes[j].nodeName] = xml.attributes[j].nodeValue;
   }
  }
  
 } else if (xml.nodeType == 3) { // text
  obj = xml.nodeValue;
 }
 
 // do children
 if (xml.hasChildNodes()) {
  for(var i = 0; i < xml.childNodes.length; i++) {
   if (typeof(obj[xml.childNodes[i].nodeName]) == 'undefined') {
    obj[xml.childNodes[i].nodeName] = xml2Json(xml.childNodes[i]);
   } else {
    if (typeof(obj[xml.childNodes[i].nodeName].length) == 'undefined') {
     var old = obj[xml.childNodes[i].nodeName];
     obj[xml.childNodes[i].nodeName] = [];
     obj[xml.childNodes[i].nodeName].push(old);
    }
    obj[xml.childNodes[i].nodeName].push(xml2Json(xml.childNodes[i]));
   }
   
  }
 }

 return obj;
};

Converting XML to Lean JSON?

We could make the JSON encoding of the XML lean by using just "@" for attributes and "#" for text in place of "@attributes" and "#text":

{
 "example:user" : {
  "@" : { "domain" : "example.com" },
   "name" : { "#" : "Joe" },
   "status" : {
    "@" : {"online" : "true"},
    "#" : "Away"
   },
   "idle" : null
 }
}

You may notice that "@" and "#" are valid as javascript property names, but not as XML attribute names. This allows us to encompass the DOM representation in object notation, since we are swapping DOM functions for Object properties that are not allowed as XML attributes and thus will not get any collisions. We could go further and use "!" for comments for example, and "%" for CDATA. I'm leaving these two out for simplicity.

What about converting JSON to XML?

If we follow the rules used to convert XML to JSON, it should be easy to convert JSON back to XML. We'd Just need to recurse through our JSON Object, and create the necessary XML objects using the DOM methods.

/**
 * JSON to XML
 * @param {Object} JSON
 */
json2Xml = function(json, node) {
 
 var root = false;
 if (!node) {
  node = document.createElement('root');
  root = true;
 }
 
 for (var x in json) {
  // ignore inherited properties
  if (json.hasOwnProperty(x)) {
  
   if (x == '#text') { // text
    node.appendChild(document.createTextNode(json[x]));
   } else  if (x == '@attributes') { // attributes
    for (var y in json[x]) {
     if (json[x].hasOwnProperty(y)) {
      node.setAttribute(y, json[x][y]);
     }
    }
   } else if (x == '#comment') { // comment
   // ignore
   
   } else { // elements
    if (json[x] instanceof Array) { // handle arrays
     for (var i = 0; i < json[x].length; i++) {
      node.appendChild(json2Xml(json[x][i], document.createElement(x)));
     }
    } else {
     node.appendChild(json2Xml(json[x], document.createElement(x)));
    }
   }
  }
 }
 
 if (root == true) {
  return this.textToXML(node.innerHTML);
 } else {
  return node;
 }
 
};

This really isn't a good example as I couldn't find out how to create Elements using the XML DOM with browser Javascript. Instead I had to create Elements using the document.createElement() and text nodes with document.createTextNode() and use the non-standard innerHTML property in the end. The main point demonstrated is how straight forward the conversion is.

What is the use of converting JSON to XML

If you are familiar with creating xHTML via the DOM methods, you'll know how verbose it can be. By using a simple data structure to represent XML, we can remove the repetitive code needed to create the xHTML. Here is a function that creates HTML Elements out of a JSON Object.

/**
 * JSON to HTML Elements
 * @param {String} Root Element TagName
 * @param {Object} JSON
 */
json2HTML = function(tag, json, node) {
 
 if (!node) {
  node = document.createElement(tag);
 }
 
 for (var x in json) {
  // ignore inherited properties
  if (json.hasOwnProperty(x)) {
  
   if (x == '#text') { // text
    node.appendChild(document.createTextNode(json[x]));
   } else  if (x == '@attributes') { // attributes
    for (var y in json[x]) {
     if (json[x].hasOwnProperty(y)) {
      node.setAttribute(y, json[x][y]);
     }
    }
   } else if (x == '#comment') { // comment
   // ignore
   
   } else { // elements
    if (json[x] instanceof Array) { // handle arrays
     for (var i = 0; i < json[x].length; i++) {
      node.appendChild(json2HTML(json[x][i], document.createElement(x)));
     }
    } else {
     node.appendChild(json2HTML(json[x], document.createElement(x)));
    }
   }
  }
 }
 
 return node;
 
};

Lets say you wanted a link <a title="Example" href="http://example.com/">example.com</a>. With the regular browser DOM methods you'd do:

var a = document.createElement('a');
a.setAttribute('href', 'http://example.com/');
a.setAttribute('title', 'Example');
a.appendChild(document.createTextNode('example.com');
This is procedural and thus not very pleasing to the eye (unstructured) as well as verbose. With JSON to XHTML you would just be dealing with the data in native JavaScript Object notation.
var a = json2HTML('a', {
 '@attributes': { href: 'http://example.com/', title: 'Example' },
 '#text': 'example.com'
});

That does look a lot better. This is because JSON seperates the data into a single Object, which can be manipulated as we see fit, in this case with json2HTML().

If you want nested elements:

var div = json2HTML('div', {
 a : {
  '@attributes': { href: 'http://example.com/', title: 'Example' },
  '#text': 'example.com'
 }
});

Which gives you

<div><a title="Example" href="http://example.com/">example.com</a></div>

The uses of converting JSON to XML are many. Another example, lets say you want to syndicate an RSS feed. Just create the JSON Object with the rules given for conversion between XML and JSON, run it through your json2Xml() function and you should have a quick and easy RSS feed. Normally you'd be using a server side language other than JavaScript to generate your RSS (however Server Side JavaScript is a good choice also) but since the rules are language independent, it doesn't make a difference which language is used, as long as it can support the DOM, and JSON.

Monday, May 5, 2008

IE8 and the Activities Feature for Developers

The IE8 Features for developers is pretty impressive. Heres is a bit on the "Activities" Feature.

Activities

This should probably have a better name for developers, something like "open service". (ed. Its actually called OpenService and there is a proposed extension on the MicroFormats Wiki. ) The IE8 feature allows a developer to embed a web service into the HTML page. If you're familiar with Open Search, this is a very similar protocol for embedding any service into a HTML page and follows the same technique.

Open Search is an XML format for mapping out query URLs for Search Engines. EG:

<?xml version="1.0" encoding="UTF-8"?>
 <OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/">
   <ShortName>Web Search</ShortName>
   <Description>Use Example.com to search the Web.</Description>
   <Tags>example web</Tags>
   <Contact>[email protected]</Contact>
   <Url type="application/rss+xml" 
        template="http://example.com/?q={searchTerms}&pw={startPage?}&format=rss"/>
 </OpenSearchDescription>
This then allows an Open Search Client such as the browser to make a Search Request based on the XML provided for a Search Engine Service. This is what is used each time you type a search into the little search bar on the top right in Firefox or IE.

IE8 feeds of a similar XML schema for its Activities - which maps out queries for web services, I'd guess RESTFUL webservices. EG:

<?xml version="1.0" encoding="UTF-8"?>
  <openServiceDescription xmlns=”http://www.microsoft.com/schemas/openservicedescription/1.0”>
  <homepageUrl>http://maps.live.com</homepageUrl>
  <display>
    <name>Map with Windows Live</name>
    <icon>http://www.live.com/favicon.ico</icon>
  </display>
  <activity category=”map”>
    <activityAction context=”selection”>
      <preview action= ”http://maps.live.com/geotager.aspx">
        <parameter name="b" value="{selection}"/>
        <parameter name="clean" value="true"/>
        <parameter name="w" value="320"/>
        <parameter name="h" value="240"/>
        </preview>
      <execute action="http://maps.live.com/default.aspx">
        <parameter name="where1" value="{selection}" type="text" />
      </execute>
    </activityAction>
  </activity>
</openServiceDescription>
Unfortunately following the namespace for the openServiceDescription XML document yields an equivalent of a 404 page. Wow Microsoft, nice documentation. Guess you'll have to Google it.

You may or may not be aware that this is a standardization of a number of existing data formats and widgets used to display exactly the same thing, links that enable you to add a piece of HTML directly to an external web service.

Generally, a developer develops a web service, they have to syndicate that service some how. They have choices such as JSON, RSS, ATOM, or AJAX widgets etc. The problems with these is that it does not allow an external service to make a dynamic request or query to their service in a standard way. Open Search was developed to standardize this for Search Engines. Now it looks like Microsoft has come up a similar standard for general web service.

The key difference between a standardized format for querying a web service and the web service description provided by the web service itself is simplicity. You can query any web service if you have the technical expertize to read the documentation, and implement a web service request and consume its response. However, you cannot implement the description of one web service directly to another unless you have a standard description for querying both - Open Service Description.

In other words, this is how IE8/Microsoft aims to have web services come to them, instead of having to go out and implement every web service description out there, developers of web services will be sending their Open Service Descriptions to IE8.

As a developer, this is good news. Now you can hitch a ride with IE8, consume any web service descriptions designed for IE8 and implement them into your own mashups, website, webapp or web service.

Sunday, January 20, 2008

Cleaning xHTML markup with PHP Tidy

Everyone makes mistakes. Even the best xHTML coders will sometimes write invalid xHTML. Not to worry, PHP can automatically clean up xHTML before display using the PHP Tidy Extension.

PHP Tidy uses the Tidy Parser. Tidy, is ported to many programming languages, and allows the language to clean up XML documents. It works well for xHTML.

In PHP5, the tidy extension is a default extension, however, in PHP4 you will need to download the Tidy PHP4 extension and compile the PHP executable with Tidy support.

How to use Tidy in PHP is documented here. Here is some examples of what Tidy can do.

Example use of Tidy in PHP

For code portability/distribution its necessary to first check if the tidy extension is available on your PHP version. You can do this by querying the existence of the tidy functions or classes (among other methods). So first you check if Tidy support is availalbe:

if (function_exists('tidy_parse_string')) {
// do your tidy stuff
}
Then comes the tidying. For simplicity, I'll use the single PHP Tidy function, 'tidy_repair_string'.

// Specify configuration
$config = array(
 'indent'         => true,
 'output-xhtml'   => true,
 'wrap'           => 200);
// Specify encoding
$encoding = 'utf8';
// repair HTML
$html = tidy_repair_string($html, $config, $encoding);

This works for both PHP4 and PHP5. PHP5 also supports an OO syntax.

Example Implementation: PHP Tidy Plugin for Joomla

Here is how I implemented the PHP Tidy Plugin into Joomla.

Joomla is a Content Management System, thus you cannot directly control the xHTML that will go into your articles. Some of your users may not be very xHTML savvy. The main reason I implemented Tidy is to clean content inserted automatically from feeds - which you have absolutely no control over.

A Joomla Plugin implements a basic Observer Pattern into Joomla. Functions are registered as observers, which are triggered during certain events. One such event is the preparation of content for display. The tidy plugin thus registers as a handler of content preparation. It then passes all content through the tidy parser, and returns the clean xHTML to Joomla.

The Joomla Tidy Plugin Code


/**
* @copyright Copyright (C) 2007 Fiji Web Design. All rights reserved.
* @license http://www.gnu.org/copyleft/gpl.html GNU/GPL
* @author [email protected]
*/

// no direct access
defined( '_VALID_MOS' ) or die( 'Restricted access' );

// register content event handlers
$_MAMBOTS->registerFunction( 'onPrepareContent', 'bot_tidy' );

/**
*  Tidy up the xHTML of your content
*/
function bot_tidy( $published, &$row, &$params, $page=0 ) {
 
 if ($published) {
  // get the plugin parameters
  //$botParams = bot_tidy_getParams('bot_tidy');

  if (isset($row->text) && $row->text) {
   $row->text = bot_tidy_parse($row->text);
  }

 }
 return true;
}

/**
* Parses a string with tidy taking into consideration the Joomla encoding
* @param String xHTML
*/
function bot_tidy_parse($html) {
 if (function_exists('tidy_parse_string')) {
  
  // Specify configuration
  $config = array(
       'indent'         => true,
       'output-xhtml'   => true,
       'wrap'           => 200);
  // get Joomla content encoding
  $iso = split( '=', _ISO );
  $encoding = '';
  $jos_enc = str_replace('-', '', $iso[1]);
  if (in_array($jos_enc, array('ascii', 'latin0', 'latin1', 'raw', 'utf8', 'iso2022', 'mac', 'win1252', 'ibm858', 'utf16', 'utf16le', 'utf16be', 'big5', 'shiftjis'))) {
   $encoding = $jos_enc;
  }
  
  // Tidy
  $html = tidy_repair_string($html, $config, $encoding);
  
  return $html
  ."\r\n"
  ;
 } else {
  return $html
  ."\r\n"
  ;
 }
}

Here is the tidy plugin for Joomla.

Tidy is great for Content Management Systems where content is contributed by users with differing levels of xHTML knowledge. It is also necessary if you want content from RSS feeds to pass W3C validation (if they contain xHTML like the Google News Feeds). I've noticed however, that PHP Tidy does not always create valid xHTML content. It does however create valid XML every time. This is yet to be explored further as I have just released Joomla Tidy Plugin for Alpha testing.

Use invalid CSS in a W3C standards compliant XHTML Document

This is a technique I've decided to offer in the Joomla Extensions I've developed for the Joomla community, in an effort to allow users of the extension to use invalid CSS on their pages, yet still have the page validate with the W3C CSS Validator.

Why use invalid CSS markup?

Now why would you want to use invalid CSS in the first place? The reason is that not all browsers support the W3C recommended CSS2 and CSS3 properties. (Ahem.. Microsoft.. Ahem...). Now with IE7 our CSS is diverging even more. So the use of invalid CSS is quite inevitable.

An Example Maybe?

Lets say you want to make an image transparent. Something that should be quite simple. So you use the W3C recommended CSS property which is opacity:

img {
   opacity:0.50;
}
Amazingly, this piece of CSS does not validate. Try copy and pasting it to the CSS validator.

Not only does it not validate, it doesn't work for a couple of browsers. So, in order to have image transparency across the largest number of browsers, you need:

img {  
   filter:alpha(opacity=50); 
   -moz-opacity:0.50;  
   opacity:0.50; 
   -khtml-opacity:0.50; 
}
Now this 4 property definitions will create exactly 4 errors in the CSS validator. Hahaha.

How do I use invalid CSS markup that validates?

Now the fun part. In our server logs, around 99% of actual users have JavaScript enabled on their browser. So why not use JavaScript to serve your CSS. If you're familiar with JavaScript Remoting, this is exactly the technique applied to a different problem. In JavaScript remoting, JavaScript files are loaded dynamically after the page has fully loaded. With this technique, CSS is loaded dynamically after the page has loaded.

window.onload = function() {
   if (document && document.getElementsByTagName) {
 var head = document.getElementsByTagName('head')[0];
 var link = document.createElement('link');
 link.type = 'text/css';
 link.rel = 'stylesheet';
 link.href = 'invalid_styles.css';
 head.appendChild(link);
   }
};

What we have is a piece of JavaScript that will load the CSS file, invalid_styles.css, after the HTML Document has loaded. So all you need to do is place your invalid CSS in invalid_styles.css. The W3C validator does not render JavaScript generated Elements on the page, thus the page will pass validation, while 99% of your users will enjoy your funky non-W3C CSS styles.

Fallback to valid CSS

What about the other 1%? They will only view your valid CSS since they do not have JavaScript enabled. So they have everything except your fancy styles. The actual layout of your page should be made with valid CSS.

This is not a new concept, it is used a lot for dynamic switching of CSS styles. However, the proposed application is for loading of non-W3C CSS content for extra formating in non-W3C compliant CSS browsers, while keeping all CSS layout styles in valid CSS for users without JavaScript.