ChatGPT解决这个技术问题 Extra ChatGPT

Dealing with "Xerces hell" in Java/Maven?

In my office, the mere mention of the word Xerces is enough to incite murderous rage from developers. A cursory glance at the other Xerces questions on SO seem to indicate that almost all Maven users are "touched" by this problem at some point. Unfortunately, understanding the problem requires a bit of knowledge about the history of Xerces...

History

Xerces is the most widely used XML parser in the Java ecosystem. Almost every library or framework written in Java uses Xerces in some capacity (transitively, if not directly).

The Xerces jars included in the official binaries are, to this day, not versioned. For example, the Xerces 2.11.0 implementation jar is named xercesImpl.jar and not xercesImpl-2.11.0.jar.

The Xerces team does not use Maven, which means they do not upload an official release to Maven Central.

Xerces used to be released as a single jar (xerces.jar), but was split into two jars, one containing the API (xml-apis.jar) and one containing the implementations of those APIs (xercesImpl.jar). Many older Maven POMs still declare a dependency on xerces.jar. At some point in the past, Xerces was also released as xmlParserAPIs.jar, which some older POMs also depend on.

The versions assigned to the xml-apis and xercesImpl jars by those who deploy their jars to Maven repositories are often different. For example, xml-apis might be given version 1.3.03 and xercesImpl might be given version 2.8.0, even though both are from Xerces 2.8.0. This is because people often tag the xml-apis jar with the version of the specifications that it implements. There is a very nice, but incomplete breakdown of this here.

To complicate matters, Xerces is the XML parser used in the reference implementation of the Java API for XML Processing (JAXP), included in the JRE. The implementation classes are repackaged under the com.sun.* namespace, which makes it dangerous to access them directly, as they may not be available in some JREs. However, not all of the Xerces functionality is exposed via the java.* and javax.* APIs; for example, there is no API that exposes Xerces serialization.

Adding to the confusing mess, almost all servlet containers (JBoss, Jetty, Glassfish, Tomcat, etc.), ship with Xerces in one or more of their /lib folders.

Problems

Conflict Resolution

For some -- or perhaps all -- of the reasons above, many organizations publish and consume custom builds of Xerces in their POMs. This is not really a problem if you have a small application and are only using Maven Central, but it quickly becomes an issue for enterprise software where Artifactory or Nexus is proxying multiple repositories (JBoss, Hibernate, etc.):

https://i.stack.imgur.com/Fe9Xa.png

For example, organization A might publish xml-apis as:

<groupId>org.apache.xerces</groupId>
<artifactId>xml-apis</artifactId>
<version>2.9.1</version>

Meanwhile, organization B might publish the same jar as:

<groupId>xml-apis</groupId>
<artifactId>xml-apis</artifactId>
<version>1.3.04</version>

Although B's jar is a lower version than A's jar, Maven does not know that they are the same artifact because they have different groupIds. Thus, it cannot perform conflict resolution and both jars will be included as resolved dependencies:

https://i.stack.imgur.com/4X1ts.png

Classloader Hell

As mentioned above, the JRE ships with Xerces in the JAXP RI. While it would be nice to mark all Xerces Maven dependencies as <exclusion>s or as <provided>, the third-party code you depend on may or may not work with the version provided in JAXP of the JDK you're using. In addition, you have the Xerces jars shipped in your servlet container to contend with. This leaves you with a number of choices: Do you delete the servlet version and hope that your container runs on the JAXP version? Is it better to leave the servlet version, and hope that your application frameworks run on the servlet version? If one or two of the unresolved conflicts outlined above manage to slip into your product (easy to happen in a large organization), you quickly find yourself in classloader hell, wondering which version of Xerces the classloader is picking at runtime and whether or not it will pick the same jar in Windows and Linux (probably not).

Solutions?

We've tried marking all Xerces Maven dependencies as <provided> or as an <exclusion>, but this is difficult to enforce (especially with a large team) given that the artifacts have so many aliases (xml-apis, xerces, xercesImpl, xmlParserAPIs, etc.). Additionally, our third party libs/frameworks may not run on the JAXP version or the version provided by a servlet container.

How can we best address this problem with Maven? Do we have to exercise such fine-grained control over our dependencies, and then rely on tiered classloading? Is there some way to globally exclude all Xerces dependencies, and force all of our frameworks/libs to use the JAXP version?

UPDATE: Joshua Spiewak has uploaded a patched version of the Xerces build scripts to XERCESJ-1454 that allows for upload to Maven Central. Vote/watch/contribute to this issue and let's fix this problem once and for all.

Thanks for this detailed question. I do not understand the motivation of the xerces team. I would imagine they are proud of there product and take pleasure in other using it but the current state of xerces and maven disgraceful. Even so, they can do what they want even if it makes no sense to me. I wonder if the sonatype guys have any suggestions.
This maybe off topic, but this is probably the better post I have ever seen. More related to the question, what you describe is one of the most painful issue that we can encounter. Great initiative !
@TravisSchneeberger Much of the complexity is because Sun chose to use Xerces in the JRE itself. You can hardly blame the Xerces folks for that.
Usually we try to find a version of Xerces that satisfies all dependent libraries by trial and error, if it's not possible then refactor to WARs to split the application into separate WARs (separate class loaders). This tool (I wrote it) helps understanding what is going on jhades.org by allowing to query the classpath for jars , and classes - it works also in the case when the server doesn't start yet
Just a quick comment if you're getting this error while starting servicemix from git bash in windows: start it from "normal" cmd instead.

L
Lii

There are 2.11.0 JARs (and source JARs!) of Xerces in Maven Central since 20th February 2013! See Xerces in Maven Central. I wonder why they haven't resolved https://issues.apache.org/jira/browse/XERCESJ-1454...

I've used:

<dependency>
    <groupId>xerces</groupId>
    <artifactId>xercesImpl</artifactId>
    <version>2.11.0</version>
</dependency>

and all dependencies have resolved fine - even proper xml-apis-1.4.01!

And what's most important (and what wasn't obvious in the past) - the JAR in Maven Central is the same JAR as in the official Xerces-J-bin.2.11.0.zip distribution.

I couldn't however find xml-schema-1.1-beta version - it can't be a Maven classifier-ed version because of additional dependencies.


Although it's very confusing that xml-apis:xml-apis:1.4.01 is newer than xml-apis:xml-apis:2.0.2 ?? see search.maven.org/…
It is confusing, but it's due to the third party uploads of non-versioned Xerces jars, like justingarrik was saying in his post. xml-apis 2.9.1 is the same as 1.3.04, so in that sense, 1.4.01 is newer (and numerically larger) than 1.3.04.
If you have both xercesImpl and xml-apis in your pom.xml be sure to delete the xml-apis dependency! Otherwise 2.0.2 rears its ugly head.
S
SkyWalker

Frankly, pretty much everything that we've encountered works just fine w/ the JAXP version, so we always exclude xml-apis and xercesImpl.


Could you add a pom.xml snippet for that?
When I try this, I get JavaMelody and Spring throwing java.lang.NoClassDefFoundError: org/w3c/dom/ElementTraversal at runtime.
To add to David Moles response -- I have seen a half dozen transitive dependencies need ElementTraversal. Various things in Spring and Hadoop most commonly.
If you get java.lang.NoClassDefFoundError: org/w3c/dom/ElementTraversal try adding xml-apis 1.4.01 to your pom (and exclude all other dependent versions)
ElementTraversal is a new class added in Xerces 11 and available in xml-apis:xml-apis:1.4.01 dependency. So you may need to copy the class manually to your project or use whole dependency which causes duplicated classes in classloader. But in JDK9 this class was included so in feature you may need to remove the dep.
T
Travis Schneeberger

You could use the maven enforcer plugin with the banned dependency rule. This would allow you to ban all the aliases that you don't want and allow only the one you do want. These rules will fail the maven build of your project when violated. Furthermore, if this rule applies to all projects in an enterprise you could put the plugin configuration in a corporate parent pom.

see:

http://maven.apache.org/plugins/maven-enforcer-plugin/

http://maven.apache.org/enforcer/enforcer-rules/bannedDependencies.html


n
netmikey

I know this doesn't answer the question exactly, but for ppl coming in from google that happen to use Gradle for their dependency management:

I managed to get rid of all xerces/Java8 issues with Gradle like this:

configurations {
    all*.exclude group: 'xml-apis'
    all*.exclude group: 'xerces'
}

nice, with maven you need about 4000 lines of XML to do that.
that didn't solve the problem. any other hints for Android-Gradle people?
@teknopaul XML is used purely for configuration. Groovy is a high level programming language. Sometimes you might want to use XML for its explicitness instead of groovy for its magic.
J
Jens Schauder

I guess there is one question you need to answer:

Does there exist a xerces*.jar that everything in your application can live with?

If not you are basically screwed and would have to use something like OSGI, which allows you to have different versions of a library loaded at the same time. Be warned that it basically replaces jar version issues with classloader issues ...

If there exists such a version you could make your repository return that version for all kinds of dependencies. It's an ugly hack and would end up with the same xerces implementation in your classpath multiple times but better than having multiple different versions of xerces.

You could exclude every dependency to xerces and add one to the version you want to use.

I wonder if you can write some kind of version resolution strategy as a plugin for maven. This would probably the nicest solution but if at all feasible needs some research and coding.

For the version contained in your runtime environment, you'll have to make sure it either gets removed from the application classpath or the application jars get considered first for classloading before the lib folder of the server get considered.

So to wrap it up: It's a mess and that won't change.


The same class from the same jar loaded by different ClassLoaders is still a ClassCastException (in all standard containers)
Exactly. That's why I wrote: Be warned that it basically replaces jar version issues with classloader issues
D
Derek Bennett

You should debug first, to help identify your level of XML hell. In my opinion, the first step is to add

-Djavax.xml.parsers.SAXParserFactory=com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl
-Djavax.xml.transform.TransformerFactory=com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl
-Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl

to the command line. If that works, then start excluding libraries. If not, then add

-Djaxp.debug=1

to the command-line.


D
Daniel

There is another option that hasn't been explored here: declaring Xerces dependencies in Maven as optional:

<dependency>
   <groupId>xerces</groupId>
   <artifactId>xercesImpl</artifactId>
   <version>...</version>
   <optional>true</optional>
</dependency>

Basically what this does is to force all dependents to declare their version of Xerces or their project won't compile. If they want to override this dependency, they are welcome to do so, but then they will own the potential problem.

This creates a strong incentive for downstream projects to:

Make an active decision. Do they go with the same version of Xerces or use something else?

Actually test their parsing (e.g. through unit testing) and classloading as well as not to clutter up their classpath.

Not all developers keep track of newly introduced dependencies (e.g. with mvn dependency:tree). This approach will immediately bring the matter to their attention.

It works quite well at our organization. Before its introduction, we used to live in the same hell the OP is describing.


Should I literally use dot-dot-dot within the version element, or do I need to use a real version like 2.6.2?
@chrisinmtown The real version.
t
teknopaul

Every maven project should stop depending on xerces, they probably don't really. XML APIs and an Impl has been part of Java since 1.4. There is no need to depend on xerces or XML APIs, its like saying you depend on Java or Swing. This is implicit.

If I was boss of a maven repo I'd write a script to recursively remove xerces dependencies and write a read me that says this repo requires Java 1.4.

Anything that actually breaks because it references Xerces directly via org.apache imports needs a code fix to bring it up to Java 1.4 level (and has done since 2002) or solution at JVM level via endorsed libs, not in maven.


When performing the refactor that you detailed, you also need to search for the package and class names in the text of your Java files and config. You will find that developers have put the FQN of the Impl classes in constant strings that get used by Class.forName and the similar constructs.
This assumes all SAX implementations do the same thing, which is not true. the xercesImpl library allows for configuration options that the java.xml.parser libraries lack.
O
Ondra Žižka

What would help, except for excluding, is modular dependencies.

With one flat classloading (standalone app), or semi-hierarchical (JBoss AS/EAP 5.x) this was a problem.

But with modular frameworks like OSGi and JBoss Modules, this is not so much pain anymore. The libraries may use whichever library they want, independently.

Of course, it's still most recommendable to stick with just a single implementation and version, but if there's no other way (using extra features from more libs), then modularizing might save you.

A good example of JBoss Modules in action is, naturally, JBoss AS 7 / EAP 6 / WildFly 8, for which it was primarily developed.

Example module definition:

<?xml version="1.0" encoding="UTF-8"?>
<module xmlns="urn:jboss:module:1.1" name="org.jboss.msc">
    <main-class name="org.jboss.msc.Version"/>
    <properties>
        <property name="my.property" value="foo"/>
    </properties>
    <resources>
        <resource-root path="jboss-msc-1.0.1.GA.jar"/>
    </resources>
    <dependencies>
        <module name="javax.api"/>
        <module name="org.jboss.logging"/>
        <module name="org.jboss.modules"/>
        <!-- Optional deps -->
        <module name="javax.inject.api" optional="true"/>
        <module name="org.jboss.threads" optional="true"/>
    </dependencies>
</module>

In comparison with OSGi, JBoss Modules is simpler and faster. While missing certain features, it's sufficient for most projects which are (mostly) under control of one vendor, and allow stunning fast boot (due to paralelized dependencies resolving).

Note that there's a modularization effort underway for Java 8, but AFAIK that's primarily to modularize the JRE itself, not sure whether it will be applicable to apps.


jboss modules is about static modularization. It has little to do with runtime modularization OSGi has to offer - I would say they compliment each other. It's a nice system though.
*complement instead of compliment
t
thrau

Apparently xerces:xml-apis:1.4.01 is no longer in maven central, which is however what xerces:xercesImpl:2.11.0 references.

This works for me:

<dependency>
  <groupId>xerces</groupId>
  <artifactId>xercesImpl</artifactId>
  <version>2.11.0</version>
  <exclusions>
    <exclusion>
      <groupId>xerces</groupId>
      <artifactId>xml-apis</artifactId>
    </exclusion>
  </exclusions>
</dependency>
<dependency>
  <groupId>xml-apis</groupId>
  <artifactId>xml-apis</artifactId>
  <version>1.4.01</version>
</dependency>

Looks like central to me: repo1.maven.org/maven2/xml-apis/xml-apis/1.4.01/… Last modified 2011-08-20?
sure, with the id xml-apis/xml-apis, but the transitive dependency is xerces/xml-apis, which is why my config exlcudes xerces/xml-apis explicitly, and instead uses the one which you correctly pointed out is in central.
M
Miss Chanandler Bong

My friend that's very simple, here an example:

<dependency>
    <groupId>xalan</groupId>
    <artifactId>xalan</artifactId>
    <version>2.7.2</version>
    <scope>${my-scope}</scope>
    <exclusions>
        <exclusion>
        <groupId>xml-apis</groupId>
        <artifactId>xml-apis</artifactId>
    </exclusion>
</dependency>

And if you want to check in the terminal(windows console for this example) that your maven tree has no problems:

mvn dependency:tree -Dverbose | grep --color=always '(.* conflict\|^' | less -r