Sunday, September 7, 2014

Jackrabbit and XPath Queries: Escaping Paths

Jackrabbit and XPath Queries: Escaping Paths

We have a service that manages a Apache Jackrabbit repository. The main client of the service builds lists of records of the form: ///, where user's are represented by a UUID. For example:

  /JCP/feeadeaf-1dae-427f-bf4e-842b07965a93/label/

Now we started to build a web endpoint to the service that can be used to view the contents of the repository at any time. This endpoint may want to create a query into the repository along the lines of "show me all lists for institution JCP and user feeadeaf-1dae-427f-bf4e-842b07965a93". When we create lists in the repository, we increment a property named sequence (more on this on a later date), so an XPath query that proved to work for us given the above example proved to be:

  /*/JCP/feeadeaf-1dae-427f-bf4e-842b07965a93/label//*[@sequence]

This was working well at first until we started to execute queries where the UUID had a leading digit. We would see an exception in our logs of the form:

  Encountered "-" at line 1, column 26.
  Was expecting one of:
   ...
   ...
   ...
   ...
  ...
     for statement: for $v in /*/JCP/2eeadeaf-1dae-427f-bf4e-842b07965a93/label//*[@sequence] return $v

Since the hypen was indicated as the culprit by the exception message, and given the fact we only ran into this when the UUID node began with a digit, we thought that both conditions were required to create the invalid query. Our first attempt at a solution simply invovled prefixing all nodes that fit the pattern of a UUID with a string like uuid_.

In reality, it was simply the leading digit that caused the problem. The above solution would definitely not work for all future use cases of the service. The problem can simply be stated that the query was invalid as XML nodes cannot start with digits. We could still however create paths into the repository in the above manner as long as we escaped the leading digit in the query:

  /*/JCP/_x0032_eeadeaf-1dae-427f-bf4e-842b07965a93//*[@sequence]

The code below demonstrates our modeling of a path into the repository and the method by which we perform the escaping. Before that though it is worth noting that we had to encode the individual steps when building the path included in the query. If we included the whole path in the escaping logic, the delimiting slashes would be escaped, and the query would not work. If we included the query as a whole into the escaping logic, the asterisks would be escaped, and the query would not also work.

    import org.apache.commons.lang3.StringUtils;
    import org.apache.jackrabbit.util.ISO9075;

    class Path {
        List<String> steps; //...

        public String asQuery() {
            return steps.size() > 0 ? "/*" + asPathString(encodedSteps()) + "//*" : "//*";
        }

        private String asPathString(List<String> steps) {
            return '/' + StringUtils.join(steps, '/');
        }

        private List<String> encodedSteps() {
            List<String> encodedSteps = new ArrayList<>();
            for (String step : steps) {
                encodedSteps.add(ISO9075.encode(step));
            }
            return encodedSteps;
        }
    }

The ISO9075 class can be found in org.apache.jackrabbit:jackrabbit-jcr-commons.

My initial search for answers started on Stack Overflow, and of course the good folks answering questions there didn't let me down!

No comments:

Post a Comment