0

I try to parse the XML output of Stanford NLP in java

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
InputSource is = new InputSource(new StringReader("<a>"+tagged+"</a>"));
Document doc = builder.parse(is);
doc.getDocumentElement().normalize();
NodeList nl=doc.getElementsByTagName("sentence");

The problem is that the XML output of Stanford NLP contains " like

<word wid="9" pos="``" lemma=""">"</word>

Then, I get the error:

[Fatal Error] :11:34: Element type "word" must be followed by either attribute specifications, ">" or "/>".
Exception in thread "main" org.xml.sax.SAXParseException; lineNumber: 11; columnNumber: 34; Element type "word" must be followed by either attribute specifications, ">" or "/>".
    at java.xml/com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:261)
    at java.xml/com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339)
    at y.main(y.java:46)

I thought of replacing/escaping """ and >"<, but it is a non-standard approach and may break the entire XML.

Googlebot
  • 15,159
  • 44
  • 133
  • 229

0 Answers0