Wednesday, February 17, 2010

Groovy GPath for complex object traversal

GPath is a path expression language integrated into Groovy which allows parts of nested structured data to be identified and processed. GPath is named after popular XPath functionality to traverse XML nodes. GPath can be used to process XML nodes or composite objects.

class Person{
String firstName
String lastName
int age
}

def personList = [
new Person(firstName: "dhaval", lastName: "nagar", age: 25),
new Person(firstName: "nachiket", lastName: "patel", age: 24),
]

The easiest way to get all firstName from the above collection is to use collect{} closure:

println personList.collect{it.firstName}
// will print ["dhaval", "nachiket"]

But Groovy believes in shortening the most obvious stuff so the above statement can be rewritten like the following the still get the same result.

println personList.firstName

This feature becomes powerful when combined with Regular Expression. The following statement will print all the person whose first name starts with "d".

println personList.firstName.grep(~/d.*/)
// will print ["dhaval"]

However GPath is more popular to parse the XML documents effectively. Here we will take a small XML document and parse it with XmlSlurper and access it's nodes with GPath.

def mylinks = """
<links>
  <link>
    <id>1</id>
    <url>www.google.com</url>
  </link>
  <link>
    <id>2</id>
    <url>www.apple.co.in</url>
  </link>
</links>
"""

def links = new XmlSlurper().parseText(mylinks)

The following statement will print all the URL ending with "com"
println links.link.url.grep(~/.*com/)

The following statement will print all the URL ending with "co.in". One way to escape the . is to surround it with square brackets.
println links.link.url.grep(~/.*co[.]in/)

If my xml document changes radically, its affects my code very slightly.

def mylinks = """
<links>
  <link id="1" url="www.google.com"/>
  <link id="2" url="www.apple.co.in"/>
</links>    
"""

To find out urls ending with "com" and "co.in":

println links.link.@url.grep(~/.*com/)
println links.link.@url.grep(~/.*co[.]in/)

@ is the operator to access the property of a node. Parsing XML documents in such an easy way is a dream come true for a Java programmer.

1 comment:

  1. I have POJO with nested objects which i need to translate to a simple object with out nesting for example i have a Person and Address as below

    public class Person {
    private String firstName;
    private String lastName;
    private Address address;
    }

    public class Address {
    private String lineOne;
    private String lineTwo;
    }
    I need to translate Person to PersonFlat which looks like

    public class Person {
    private String firstName;
    private String lastName;
    private String Address_lineOne;
    private String Address_lineTwo;
    }
    is there any way where i can do xpath kind of extraction on the Person instance to get the Address.lineOne and Address.lineTwo using groovy metaClass ?

    ReplyDelete