XSLT Mayhem
I’ve been working on writing XSLTs. The big one is a transformation from the XML output by the Drupal REST server into a tab separated value file. (don’t ask) Every time I start to work with XSLT again I have to REMEMBER how to do the simple stuff again. The biggest challenge is that none of my staff are super comfortable with XSL. Which means I don’t have a second pair of eyes to look at things and say “hey Karen its is this stupid simple little thing right here”. On the bright side, I’m looking at this as a “teachable moment”. Once I get these written I’m going to use them to teach my staff. The reason? They are a case study in XSLT writing. They’ve got all the basics like the evils of XPath. One messed up XPath statement at the top of the document broke the whole thing. As well as some bells and whistles. A few of my favorite bells and whistles are below
Predicates – Can be used to help filter to particular set of nodes. So you might see something like //staff/[location='Houston']
If you want to filter for things that aren’t any exact match then you’ll have to use one of the following function:
contains() – Contains is a way to find a particular string in a node. So if you want to see if the department node has the word Web in it you’d perform the following: contains(//staff/department, ‘Web’)
starts-with() – This allows you to check to see if a node starts with a particular string. It is really good for finding things that start with the same letter or letters and showing them. So if you wanted to produce a staff page and limit it to a particular letter of the alphabet say ‘C’ the you’d perform the following starts-with(//staff/lastName, ‘C’)
Besides functions to filter there are functions that do lots of other helpful things.
node() function – Doing <xsl:if test=”xpath/statement/to/node/node()”> will see if a node actual has content in it rather than existing but being empty
translate() function – doing <xsl:value-of select=”contains(xpath/statement/to/node,’abcdefghijklmnopqrstuvwxyz’,'ABCDEFGHIJKLMNOPQRSTUVWXYZ’)”/> will change everything uppercase to lowercase
document() – There are lots of ways to use this, but the one that makes me dance the happy dance of joy is the fact you can pull content from another XML document. So if you want to merge several documents or grab a piece of info from another document, this function is the best.
position() – Tells you what number a given node is. Use <xsl:if test=”position()=1″> to see if the node is the first one in a set
last() – Tells you how many nodes are in a particular node set. Use <xsl:if test=”position() = last()”> to check to see if a given node is the last in a set
normalize-space() – This function is crucial if you want to normalize the space in a node. Typically you might want to do this if someone has entered data with things like tabs or line breaks
substring () – Allow you to get a particular portion of a string based on start position and length. So let’s say you had a date formatted 1993-02-11 and you wanted to get the month then you could do substring(’1993-02-11′,’6′,’2′) 6 is the position you want to start at and 2 is the number of characters to retrieve.
substring-before () & substring-after() – substring-before will retrieve the part of the string before the first occurance of substring being searched for. sub-string after will return the part of the string that occurs after the first occurance of substring being searched for. So if you do substring-before(‘Smith, Jane’, ‘, ‘) you’ll get Smith and if you do substring-after(‘Smith, Jane’, ‘, ‘) you’ll get Jane. Its a great way to divide up information which has been crammed into a single node. It only works on the first occurance though do if you have something like ‘cats, dogs, birds’ then substring-before(‘cats, dogs, birds’, ‘, ‘) will get you cats but substring-after(‘cats, dogs, birds’, ‘, ‘) will get you dogs, birds.
Its amazing how powerful XSLT is and what it will let you do. I’m glad I have a reference guide and several web references/tutorials to help though because remembering every last tag and function can be difficult. Using oXygen has helped tons too, because if you configure it properly not only will it help with the Xpath, it does other forms of code completion as well as tell you what different tags do. Overall, I’m looking forward to sharing some of this knowledge with my staff and others.
Good story, Karen! I consider XSLT a bit of a vice of mine. Despite the gnarly syntax and the weird way you have to do some really basic things, it is also incredibly, intoxicatingly powerful. I always find great pleasure and satisfaction in working out a solution to a problem in XSLT. And the *most* marvelous thing about it is that it’s a standard language that can be used or embedded in a whole host of different environments, from the Unix shell to Internet Explorer, and in just about any programming language. Because you can use it to express many of the kinds of transformations we routinely have to do to structured, bibliographic data, it could be an incredibly powerful way for library folks to share some of the grunt work of data manipulation — and in a system-independent way. I love the example set by the LoC stylesheets for transforming MODS, MARC, and DC. I’d love to see that expanded into a convention for building more modular, piecemeal stylesheets that could be combined to do much more fun stuff (like generating COINS or OpenURL from MARC, etc) — eventually an open repository of sorts — a shared toolbag of XSLT modules to do Useful Things to bibliographic data.
I completely agree Sebastian. Honestly, XSLT is one of my favorite tools. As you say it works in a variety of different environments from client-side parsing like IE or Firefox and editors like oXygen to server side technologies like PHP, Perl, ASP, Coldfusion. Honestly, it is wicked modularized. Taking my TSV tranform and making it into a MODS one was a snap because I wrote it in a “push” fashion. With all the recent discussion about creating linked data, it seems like XSLTs that help do this would be a good idea, particularly if the data currently is available in some other XML format.