In-browser data aggregation
Sunday, 07 August 2011 07:01

Often I find myself compiling data from web pages in a form more suitable for further processing – or simply in a more accessible layout including data from sub-pages (e.g. for printing). Of course, those tasks are best done for example with some XSLT processor. However, some of these actions just require small changes to the document's DOM and are therefore easily accomplished with a line of JavaScript in the browser.

Here I want to present a small tool that can do all this by exploiting the power of XSLT in the browser within reach of a single mouse click. In this way we can take advantage of the browser's forgiving HTML-parser, which even works with documents which are not valid XML. The solution consists of a set of XSLT files, a generic script which utilises those to iteratively transform a set of pages, and a loader page which may sit on your local computer. In it all the other components are glued into a single javascript:-link. This link can then be copied to the browsers toolbar.

A tiny example could be to get an overview of all articles in this section of my homepage. In order to save bandwidth the link is not generated automatically, but please click here to create it on-the-fly from the XSLT files. This link can be used right here, or copied to the browser's toolbar to be applied on other pages as well.

It all starts with a link of the form

  <a class="replacer" href="/code/xslt/index.xsl,/code/xslt/list.xsl" target="text/html">
    Generate ToC

For all links with the class name replacer, the required XSLT files (given in a comma-seperated list in the href-attribute) and the Javascript code of the parser are downloaded by calling generateReplacerLinks( ) from generate.js. Together this data is then written into the link's href-attribute.

Upon execution (by clicking on the final link), the function runParser from transform.js is called with four parameters:

  1. An object of the form
        '/code/xslt/index.xsl': string1,
        '/code/xslt/list.xsl': string2
    containing strings of XSLT documents. Those were filled in by generateReplacerLinks( ) on the basis of the file list in the href-property.
  2. The name of the first XSLT-file. It is the one to be applied to the first document.
  3. (optional) An array of MIME-types. Each one will be used to open a new browser-window to display the resulting document. If none is given, text/html is used by default. You can use this to also display the source code by specifying text/plain.
  4. (optional) The root-node where to start the transformation. It defaults to the current window's document-object.

The first XSLT-file is then applied to the current document in the current browser window and generates a new HTML document. Within those newly generated documents, any links of the form

  <a href="next.html" data-xsl="/code/xslt/list.xsl" data-request-type="xhr">Some link</a>

will be repaced by the result of applying the XSLT-file specified in the data-xsl-attribute to the document given by the href-attribute.

If the data-request-type is set to xhr, the document is obtained and parsed using an XMLHttpRequest instead of a new browser-window. This has the advantage, that additional parameters (such as the HTTP header field X-Requested-With) can be specified and no new window is required. On the down side, the document has to be strictly XML conform. In contrast to the internal HTML parser, for example non-closed tags will cause the parser to fail.

Further, parameters can be passed to the XSLT by using other data-someparam-attributes. They can be accessed by $someparam in the XSL transform after declaring them with <xsl:param name="someparam"/>. One parameter that will always be available is url, containing the URL of the current source document.

The files used in this example are:

  • The link generator generate.js which is initiated by a call to generateReplacerLinks( ).
  • The source of the transformation script used behind the scenes, transform.js.
  • The XSLT files index.xsl and list.xsl.

I usually use them on a local HTML page, calling generateReplacerLinks( ) in the onload-handler and copy the links into my toolbar for online use.