Extensibility

Extensibility

This page describes how to extend the capability of SAXON XSLT Stylesheets

Contents

Writing extension functions
Writing extension elements
Writing Java node handlers
Writing input filters
Writing output filters
Implementing a collating sequence
Implementing a numbering sequence

Writing extension functions

An extension function is invoked using a name such as prefix:localname().

The prefix must be the prefix associated with a namespace declaration that is in scope. The corresponding URI for the namespace identifies the class where the external function will be found. The namespace URI must either be the fully-qualified class name (for example xmlns:date="java.util.Date"), or a string containing a "/", in which the fully-qualified class name appears after the final "/". (for example xmlns:date="xmlns:date="http://www.jclark.com/xt/java/java.util.Date"). The part of the URI before the final "/" is immaterial. The class must be on the classpath.

The SAXON namespace URI "http://icl.com/saxon" is recognised as a special case, and causes the function to be loaded from the class com.icl.saxon.functions.Extensions. The class name can be specified explicitly if you prefer.

There are three cases to consider: static methods, constructors, and instance-level methods.

Static methods can be called directly. The localname of the function must match the name of a public static method in this class. The names match if they contain the same characters, ignoring case and hyphens. For example "to-string" matches "toString". If there are several methods in the class that match the localname, the first one with the correct number of arguments is used. There is no attempt to match the type of the arguments.

For example:


<xsl:value-of select="math:sqrt($arg)"
   xmlns:math="java.lang.Math"/>

This will invoke the static method java.lang.Math.sqrt(), applying it to the value of the variable $arg, and copying the value of the square root of $arg to the result tree.

Constructors are called by using the function named new(). If there are several constructors, the first one with the correct number of arguments is used. There is no attempt to match the type of the arguments. The result of calling new() is an XPath value of type Java Object; the only things that can be done with a Java Object are to assign it to a variable, to pass it to an extension function, and to convert it to a string, number, or boolean. Conversion to a string is done by calling the Java object's toString() method; conversion to a number or boolean is done by converting to a string and then converting that.

Instance-level methods are called by supplying an extra first argument of type Java Object which is the object on which the method is to be invoked. A Java Object is usually created by calling an extension function (e.g. a constructor) that returns an object; it may also be passed to the style sheet as the value of a global parameter. Matching of method names is done as for static methods. If there are several methods in the class that match the localname, the first one with the correct number of arguments is used. There is no attempt to match the type of the arguments.

For example, the following stylesheet prints the date and time. This example is copied from the documentation of the xt product, and it works unchanged with SAXON, because SAXON does not care what the namespace URI for extension functions is, so long as it ends with the class name. (Extension functions are likely to be compatible between SAXON and xt provided they only use the data types string, number, and boolean).


<xsl:stylesheet
  version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:date="http://www.jclark.com/xt/java/java.util.Date">

<xsl:template match="/">
  <html>
    <xsl:if test="function-available('date:to-string') and function-available('date:new')">
      <p><xsl:value-of select="date:to-string(date:new())"/></p>
    </xsl:if>
  </html>
</xsl:template>

</xsl:stylesheet>

The parameters of the method called must each be one of the following Java data types:

boolean (or Boolean)
double, float, short, int, or long (or their object wrapper equivalents)/li>
String
com.icl.saxon.expr.NodeSetValue (node-set)
com.icl.saxon.expr.FragmentValue (result tree fragment)
com.icl.saxon.expr.ObjectValue (Java object)

These correspond to the five XSLT data types Boolean, Number, String, Node-set, and Result Tree Fragment, and the additional SAXON-defined data type Java Object. If necessary, and if the target data type is boolean, double, float, short, int, long, or String, the supplied value will be converted to the target data type. If there are several methods in the class whose arguments are of different data types, there is no attempt to match the right one.

In addition, there may be an extra first argument of class com.icl.saxon.Context. This argument is not supplied by the calling XSL code, but by SAXON itself. The Context object provides methods to access many internal SAXON resources, the most useful being getCurrent() which returns the current node in the source document. The Context object is not available with constructors.

If any exceptions are thrown by the method, or if a matching method cannot be found, processing of the stylesheet will be abandoned.

The system function function-available(String name) returns true if there appears to be a method available with the right name. It does not test whether this method has the appropriate number of arguments or whether the arguments are of appropriate types. If the function name is "new" it returns true so long as the class is not an abstract class or interface, and so long as it has at least one constructor.

There are a number of extension functions supplied with the SAXON product: for details, see extensions.html. The source code of these methods, which in most cases is extremely simple, can be used as an example for writing other user extension functions. It is found in class com.icl.saxon.functions.Extensions

Writing extension elements

SAXON implements the element extensibility feature defined in section 14.1 of the standard. This feature allows you to define your own element types for use in the stylesheet.

If a namespace prefix is to be used to denote extension elements, it must be declared in the extension-element-prefixes attribute on the xsl:stylesheet element, or the xsl:extension-element-prefixes attribute on any enclosing literal result element or extension element.

Note that SAXON itself provides a number of stylesheet elements beyond those defined in the XSLT specification, including saxon:output, saxon:assign, saxon:entity-ref, saxon:while, saxon:group, saxon:item. To enable these, use the standard XSL extension mechanism: define extension-element-prefixes="saxon" on the xsl:stylesheet element, or xsl:extension-element-prefixes="saxon" on any enclosing literal result element.

To invoke a user-defined set of extension elements, include the prefix in this attribute as described, and associate it with a namespace URI that ends in "/" followed by the fully qualified class name of a Java class that implements the com.icl.saxon.style.ExtensionElementFactory interface. This interface defines a single method, getExtensionClass(), which takes the local name of the element (i.e., the name without its namespace prefix) as a parameter, and returns the Java class used to implement this extension element (for example, "return SQLConnect.class"). The class returned must be a subclass of com.icl.saxon.style.StyleElement.

The best way to see how to implement an extension element is by looking at the example, for SQL extension elements, provided in package com.icl.saxon.sql, and at the sample stylesheet books.sqlxsl which uses these extension elements. There are three main methods a StyleElement class must provide:

prepareAttributes()	This is called while the stylesheet tree is still being built, so it should not attempt to navigate the tree. Its task is to validate the attributes of the stylesheet element and perform any preprocessing necessary. For example, if the attribute is an attribute value template, this includes creating an Expression that can subsequently be evaluated to get the AVT's value.
validate()	This is called once the tree has been built, and its task is to check that the stylesheet element appears in the right context within the tree, e.g. that it is within a template
process()	This is called to process a particular node in the source document, which can be accessed by reference to the Context supplied as a parameter.

The StyleElement class has access to many services supplied either via its superclasses or via the Context object. For details, see the API documentation of the individual classes.

Any element whose prefix matches a namespace listed in the extension-element-prefixes attribute of an enclosing element is treated as an extension element. If no class can be instantiated for the element (for example, because no ExtensionElementFactory can be loaded, or because the ExtensionElementFactory doesn't recognise the local name), then fallback action is taken as follows. If the element has one or more xsl:fallback children, they are processed. Otherwise, an error is reported. When xsl:fallback is used in any other context, it and its children are ignored.

It is also possible to test whether an extension element is implemented by using the system function element-available(). This returns true if the namespace of the element identifies it as an extension element (or indeed as a standard XSL element) and if a class can be instantiated to represent it. If the namespace is not that of an extension element, or if no class can be instantiated, it returns false.

Writing Java node handlers

A Java node handler can be used to process any node, in place of an XSL template. The handler is nominated by using a saxon:handler element with a handler attribute that names the node handler class. The handler itself is an implementation of com.icl.saxon.NodeHandler or one of its subclasses (the most usual being com.icl.saxon.ElementHandler). The saxon:handler element must be a top-level element, and must be empty. It takes the same attributes as xsl:template (match, mode, name, and priority) and is considered along with xsl:template elements to decide which template to execute when xsl:call-template or xsl:apply-templates is used.

Java node handlers have full access to the source document and the current processing context (for example, the values of parameters). The may also trigger processing of other nodes in the document by calling applyTemplates(): this works just like xsl:apply-templates, and the selected nodes may be processed either by XSL templates or by further Java node handlers.

A Java node handler may also be registered with a name, and may thus be invoked using xsl:call-template. There is no direct mechanism for a Java node handler to call a named XSLT template, but the effect can be achieved by using a mode that identifies the called template uniquely.

Writing input filters

SAXON takes its input from a SAX Parser reading from an InputSource. A very useful technique is to interpose a filter between the parser and SAXON. The filter will typically be an instance of John Cowan's ParserFilter class (see http://www.ccil.org/~cowan/XML/).

To use SAXON this way, you will need a main program that wraps the com.icl.saxon.StyleSheet class. For example, you could write:,



    public static void main(String args[]) throws Exception {
        StyleSheet style = new StyleSheet();
        Parser p = new com.jclark.xml.sax.Driver();
        style.setStyleParser(p);
        style.setSourceParser(new MyFilter(p));
        style.prepareStyleSheet(new ExtendedInputSource(new File("c:/style/style1.xsl"));
        style.renderSource(new InputSource(args[0]));
    }

This runs the fixed stylesheet style1.xsl against the source XML document supplied as the first argument, using MyFilter as an input filter.

It is also possible to achieve the same effect without needing to write your application as an implementation of the SAX Parser interface. Instead of calling setStyleParser() to register your application as the supplier of data, you can call getSourceDocumentHandler() to obtain a SAX DocumentHandler, to which you can then feed the source document as a sequence of SAX events such as startElement() and endElement(). When the whole document has been supplied, the application should call renderSuppliedDocument() to apply the stylesheet to the document that has been built up.

For example, you could write:,



    public static void main(String args[]) throws Exception {
        StyleSheet style = new StyleSheet();
        Parser p = new com.jclark.xml.sax.Driver();
        style.setStyleParser(p);
        style.prepareStyleSheet(new ExtendedInputSource(new File("c:/style/style1.xsl"));
        DocumentHandler dh = style.getSourceDocumentHandler();
        dh.startDocument();
        dh.startElement("top", new AttributeListImpl());
        dh.endElement("top");
        dh.endDocument();
        style.renderSuppliedDocument();
    }

Note that SAXON relies on the application to supply a well-formed sequence of SAX events; if it doesn't, the consequences are unpredictable.

Writing output filters

The output of a SAXON stylesheet can be directed to a user-defined output filter. This filter can be defined either as a standard SAX DocumentHandler, or as an implementation of the SAXON class com.icl.saxon.output.Emitter, which is a subclass of DocumentHandler. The advantage of using an Emitter is that more information is available from the stylesheet, for example the attributes of the xsl:output element.

A DocumentHandler should only be used when the result tree is a well-formed document. (The XSLT specification also allows the output to be an external general parsed entity.) If the tree is not well-formed, it will only be notified of a subset of the tree that is well-formed. In particular, it will not be informed of any top-level text nodes before or after the first element node, or of any top-level element nodes after the first. If an Emitter is used, however, it will be informed of all events.

The Emitter or DocumentHandler to be used is specified in the method attribute of the xsl:output or saxon:output element, as a fully-qualified class name; for example method="com.acme.xml.SaxonOutputFilter"

See the documentation of class com.icl.saxon.output.Emitter for details of the methods available, or implementations such as HTMLEmitter and XMLEmitter and TEXTEmitter for the standard output formats supported by SAXON.

It can sometimes be useful to set up a chain of emitters working as a pipeline. To write a filter that participates in such a pipeline, the class ProxyEmitter is supplied. Use the class Indenter, which handles XML and HTML indentation, as an example of how to write a ProxyEmitter.

Rather than writing an output filter in Java, SAXON also allows you to process the output through another XSL stylesheet. To do this, simply name the next stylesheet in the next-in-chain attribute of saxon:output. This facility should only be used if the output of the first stylesheet is a well-formed XML document.

Implementing a collating sequence

It is possible to define a collating sequence for use by xsl:sort. This is controlled through the lang attribute of the xsl:sort element. The feature is primarily intended to provide language-dependent collating, but in fact it can be used to provide arbitrary collating sequences: for example if you want to collate the names of the months January, February, March, etc, in the conventional sequence you could do this by writing and providing a collating sequence for language "month".

To implement a collating sequence for language X, you need to define a class com.icl.saxon.sort.Compare_X, for example com.icl.saxon.sort.Compare_month. This must implement the interface TextComparer. In fact, this particular collating sequence is supplied as a specimen and you can use this as a prototype to write your own. A collating sequence is also supplied for lang="en".

Note that any hyphens in the language name are ignored in forming the class name, but case is significant. For example if you specify lang="en-GB", the TextComparer must be named "com.icl.saxon.sort.Compare_enGB".

Implementing a numbering sequence

It is possible to define a numbering sequence for use by xsl:number. This is controlled through the lang attribute of the xsl:number element. The feature is primarily intended to provide language-dependent numbering, but in fact it can be used to provide arbitrary numbering sequences: for example if you want to number items as "one", "two", "three" etc, you could implement a numbering class to do this and invoke it say with lang="alpha".

To implement a collating sequence for language X, you need to define a class com.icl.saxon.number.Numberer_X, for example com.icl.saxon.sort.Numberer_alpha. This must implement the interface Numberer. A (not very useful) Numberer is supplied for lang="de" as a specimen, and you can use this as a prototype to write your own. A numbering sequence is also supplied for lang="en", and this is used by default if no other can be loaded.

Note that any hyphens in the language name are ignored in forming the class name, but case is significant. For example if you specify lang="en-GB", the Numberer must be named "com.icl.saxon.number.Numberer_enGB".

Michael H. Kay
27 April 2000