How To Get Orphaned Text With Jsoup?
I have an html: This is the first text More text here Another line of text Text in the spanAnother text in span
Solution 1:
I would go with a recursive method that takes your starting tag and iterates over its child nodes. For each TextNode, print the contents. For each Element, check it for child nodes.
publicstaticvoidmain(String[] args)throws ParseException, IOException
{
//I put your HTML in the body tag in a local fileDocumentdoc= Jsoup.parse(newFile("input/20160505.html"), "UTF-8");
Elementselements= doc.getElementsByTag("body");
ElementrootTag= elements.get(0);
printTextOfTag(rootTag);
}
publicstaticvoidprintTextOfTag(Element currentTag)
{
List<Node> nodes = currentTag.childNodes();
for(Node n : nodes)
{
if(n instanceof TextNode)
{
System.out.println(((TextNode)n).text());
}
elseif(n instanceof Element)
{
printTextOfTag((Element)n);
}
}
}
Output
This is the first text
More text here Another line of text
Text in the span
Another text in span
This is another line
Post a Comment for "How To Get Orphaned Text With Jsoup?"