| In-browser data aggregation |
| Sunday, 07 August 2011 07:01 |
|
Often I find myself compiling data from web pages in a form more suitable for further processing – or simply in a more accessible layout including data from sub-pages (e.g. for printing). Of course, those tasks are best done for example with some XSLT processor. However, some of these actions just require small changes to the document's DOM and are therefore easily accomplished with a line of JavaScript in the browser. Here I want to present a small tool that can do all this by exploiting the power of XSLT in the browser within reach of a single mouse click. In this way we can take advantage of the browser's forgiving HTML-parser, which even works with documents which are not valid XML. The solution consists of a set of XSLT files, a generic script which utilises those to iteratively transform a set of pages, and a loader page which may sit on your local computer. In it all the other components are glued into a single javascript:-link. This link can then be copied to the browsers toolbar. A tiny example could be to get an overview of all articles in this section of my homepage. In order to save bandwidth the link is not generated automatically, but please click here to create it on-the-fly from the XSLT files. This link can be used right here, or copied to the browser's toolbar to be applied on other pages as well.
It all starts with a link of the form <a class="replacer" href="/code/xslt/index.xsl,/code/xslt/list.xsl" target="text/html"> Generate ToC </a> For all links with the class name replacer, the required XSLT files (given in a comma-seperated list in the href-attribute) and the Javascript code of the parser are downloaded by calling generateReplacerLinks( ) from generate.js. Together this data is then written into the link's href-attribute. Upon execution (by clicking on the final link), the function runParser from transform.js is called with four parameters:
The first XSLT-file is then applied to the current document in the current browser window and generates a new HTML document. Within those newly generated documents, any links of the form <a href="next.html" data-xsl="/code/xslt/list.xsl" data-request-type="xhr">Some link</a> will be repaced by the result of applying the XSLT-file specified in the data-xsl-attribute to the document given by the href-attribute. If the data-request-type is set to xhr, the document is obtained and parsed using an XMLHttpRequest instead of a new browser-window. This has the advantage, that additional parameters (such as the HTTP header field X-Requested-With) can be specified and no new window is required. On the down side, the document has to be strictly XML conform. In contrast to the internal HTML parser, for example non-closed tags will cause the parser to fail. Further, parameters can be passed to the XSLT by using other data-someparam-attributes. They can be accessed by $someparam in the XSL transform after declaring them with <xsl:param name="someparam"/>. One parameter that will always be available is url, containing the URL of the current source document. The files used in this example are:
I usually use them on a local HTML page, calling generateReplacerLinks( ) in the onload-handler and copy the links into my toolbar for online use. |







