Parsing XML without JAXB

Last Update: 26.04.2018. By Jens in Developers Life | Learning | Newsletter

A few weeks ago I launched a side-project of mine, a regional job portal for developers. As with any new app I needed to prefill it with data and on an ongoing basis. It takes time until companies post their directly. For that, the app polls a few APIs for getting job posts. Unfortunately, they still use XML. Yep, no JSON.

In Python, it is a piece of cake to parse an XML with minidom or alike and then just work with objects. Not so in Java, you always have some mapping config, XML schema or annotations. With JAXB I made the best experience with the schema first approach. But this time, it was not my XML and I just wanted to have it done.

Luckily, there are still some alternatives around. Namely XStream and Jackson for XML

Jackson XML is as simple as the JSON variant and follows the same logic. The equivalent of the ObjectMapper is the XmlMapper and it looked like it has the same method signatures. So, in overall, if you know the Jackson for JSON, you can directly work with Jackson for XML.

The only difference I noticed, it uses different annotations to support features XML has, but JSON not, like attribute and cdata blocks.

Tomorrow, we’ll look at an example.


The first thing is, of course, adding the dependency. For the sake of short emails, I leave that out here.

Next step is to actually use the XMLMapper like:

```java XmlMapper mapper = new XmlMapper();
mapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false);
mapper.setDefaultUseWrapper(false);

Jobs jobs = mapper.readValue(itheInputStream, Jobs.class); ```

Configuration generally works the same as with JSON, so e.g. FAIL_ON_UNKNOWN_PROPERTIES will just ignore properties we don’t care about. By default, the XMLMapper will use wrappers for lists/repeating elements, so if you have some XML like:

```xml

<![CDATA[Feste Anstellung]]>

<![CDATA[Mit Berufserfahrung]]>

<![CDATA[Vollzeit]]>

```

It will usually add a wrapper class for the Category elements, so Job -> CategoryWrapper -> Category elements. Setting the wrapper feature to false will prevent that.