XPath selector based on Jsoup.
@Test
public void testSelect() {
String html = "<html><div><a href='https://github.com'>github.com</a></div>" +
"<table><tr><td>a</td><td>b</td></tr></table></html>";
Document document = Jsoup.parse(html);
String result = Xsoup.compile("//a/@href").evaluate(document).get();
Assert.assertEquals("https://github.com", result);
List<String> list = Xsoup.compile("//tr/td/text()").evaluate(document).list();
Assert.assertEquals("a", list.get(0));
Assert.assertEquals("b", list.get(1));
}Xsoup use Jsoup as HTML parser.
Compare with another most used XPath selector for HTML - HtmlCleaner, Xsoup is much faster:
Normal HTML, size 44KB
XPath: "//a"
Run for 2000 times
Environment:Mac Air MD231CH/A
CPU: 1.8Ghz Intel Core i5
| Operation | Xsoup | HtmlCleaner |
| parse | 3,207(ms) | 7,999(ms) |
| select | 95(ms) | 380(ms) |
| Name | Expression | Support |
| nodename | nodename | yes |
| immediate parent | / | yes |
| parent | // | yes |
| attribute | [@key=value] | yes |
| nth child | tag[n] | yes |
| attribute | /@key | yes |
| wildcard in tagname | /* | yes |
| wildcard in attribute | /[@*] | yes |
| function | function() | part |
| or | a | b | yes since 0.2.0 |
| parent in path | . or .. | no |
| predicates | price>35 | no |
| predicates logic | @class=a or @class=b | yes since 0.2.0 |
In Xsoup, we use some function (maybe not in Standard XPath 1.0):
| Expression | Description | Standard XPath |
| text(n) | nth text content of element(0 for all) | text() only |
| allText() | text including children | not support |
| tidyText() | text including children, well formatted | not support |
| html() | innerhtml of element | not support |
| outerHtml() | outerHtml of element | not support |
| regex(@attr,expr,group) | use regex to extract content | not support |
These XPath syntax are extended only in Xsoup (for convenience in extracting HTML, refer to Jsoup CSS Selector):
| Name | Expression | Support |
| attribute value not equals | [@key!=value] | yes |
| attribute value start with | [@key~=value] | yes |
| attribute value end with | [@key$=value] | yes |
| attribute value contains | [@key*=value] | yes |
| attribute value match regex | [@key~=value] | yes |
MIT License, see file LICENSE

