ChatGPT解决这个技术问题 Extra ChatGPT

How to configure encoding in Maven?

When I run maven install on my multi module maven project I always get the following output:

[WARNING] File encoding has not been set, using platform encoding UTF-8, i.e. build is platform dependent!

So, I googled around a bit, but all I can find is that I have to add:

<properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>

...to my pom.xml. But it's already there (in the parent pom.xml).

Configuring <encoding> for the maven-resources-plugin or the maven-compiler-plugin also doesn't fix it.

So what's the problem?

Be careful that UTF-8 encoding is what you actually want to specify as the encoding. You may be better off using a simpler encoding such as ISO-8859-1 (aka Latin-1) or even US-ASCII.
"You may be better off using a simpler encoding such as..." yeah, and bug end-users, as well as other developers... Nowadays it's best to try to use UTF-8 as much as possible and care about other encodings only when a multi-encoding application requirement is thrown to you. Here, we're talking mostly about the encoding of source and configuration files, the encoding of user input is managed differently (with 'java -Dfile.encoding ...' and with a lot of painful programming effort).
I personally decided that the encoding issues were so elusive that I went for encoding ASCII in pom.xml and then took the encoding issues up front. This is naturally prompted by having a non-ASCII character in my name giving issues from day 1:)
What encoding is set in parent pom.xml ?

N
Naman

OK, I have found the problem.

I use some reporting plugins. In the documentation of the failsafe-maven-plugin I found, that the <encoding> configuration - of course - uses ${project.reporting.outputEncoding} by default.

So I added the property as a child element of the project element and everything is fine now:

<properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
</properties>

See also http://maven.apache.org/general.html#encoding-warning


So I had this issue and I added the properties from above like this: <profiles> <profile> <activation> <activeByDefault>true</activeByDefault> </activation> <id>local</id> <properties> <url>earneventapi.intra1.e1.v2.epaas.aexp.com</url> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding> </properties> </profile>
No, the only global setting of coding is to be done by env. variable: stackoverflow.com/a/9976788/715269
This works as expected while adding the 2 properties to the properties block of the pom.xml file. Thanks.
SET MAVEN_OPTS=-Dfile.encoding=utf-8 or unix like export MAVEN_OPTS=-Dfile.encoding=utf-8 is the only correct answer ... ;-)
V
Ville Myrskyneva

This would be in addition to previous, if someone meets a problem with scandic letters that isn't solved with the solution above.

If the java source files contain scandic letters they need to be interpreted correctly by the Java used for compiling. (e.g. scandic letters used in constants)

Even that the files are stored in UTF-8 and the Maven is configured to use UTF-8, the System Java used by the Maven will still use the system default (eg. in Windows: cp1252).

This will be visible only running the tests via maven (possibly printing the values of these constants in tests. The printed scandic letters would show as '< ?>') If not tested properly, this would corrupt the class files as compile result and be left unnoticed.

To prevent this, you have to set the Java used for compiling to use UTF-8 encoding. It is not enough to have the encoding settings in the maven pom.xml, you need to set the environment variable: JAVA_TOOL_OPTIONS = -Dfile.encoding=UTF8

Also, if using Eclipse in Windows, you may need to set the encoding used in addition to this (if you run individual test via eclipse).


Not sure if there's a maven way to do this, since this is a JVM setting, not Maven.
I think you are mixing things up. You only need to set -Dfile.encoding if you use I/O in Java without explicitly specifying an encoding (which is not recommended). I don't see what this has to do with scandic letters in Java source files. Non-ASCII in Java source files works with Maven when project.build.sourceEncoding is set correctly, as described in Ethan Leroy's answer.
@sleske I would assume the same would be enough, but when I first ended here and did the pom.xml changes, it did not fix my problem. After more search and after trial and error the solution described worked. I think that the reason for what happens is because the maven calls the javac of the installed/referred JDK which in turn uses the O/S encoding as default. If someone knows a way to specify the encoding for the javac call in pom.xml would solve this issue in "maven way".
@VilleMyrskyneva: When Maven invokes javac, it will pass along the encoding set by project.build.sourceEncoding (you can check using mvn -X), so I don't see how what you describe is necessary. If you still get encoding problems in your project, consider asking that as a separate question - it seems you are running into a different problem. Ideally, post a reproducible test case.
@sleske I have project.build.sourceEncoding in pom.xml, but mvn test still have problem with encoding. while that -Dfile.encoding=UTF8 solves it. I don't understand why. stackoverflow.com/questions/42990644/…
b
bhdrk

If you combine the answers above, finally a pom.xml that configured for UTF-8 should seem like that.

pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

    <modelVersion>4.0.0</modelVersion>

    <groupId>YOUR_COMPANY</groupId>
    <artifactId>YOUR_APP</artifactId>
    <version>1.0.0-SNAPSHOT</version>

    <properties>
        <project.java.version>1.8</project.java.version>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
    </properties>

    <dependencies>
        <!-- Your dependencies -->
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.7.0</version>
                <configuration>
                    <source>${project.java.version}</source>
                    <target>${project.java.version}</target>
                    <encoding>${project.build.sourceEncoding}</encoding>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-resources-plugin</artifactId>
                <version>3.0.2</version>
                <configuration>
                    <encoding>${project.build.sourceEncoding}</encoding>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>

the default seems to be ${project.build.sourceEncoding}, so you shouldn't need to define it explicitly for the maven-resources-plugin (see maven.apache.org/plugins/maven-resources-plugin/examples/…, maven.apache.org/plugins/maven-resources-plugin/…, maven.apache.org/general.html#encoding-warning)
No, the only global setting of coding is to be done by env. variable: stackoverflow.com/a/9976788/715269
A
Alexandr

It seems people mix a content encoding with a built files/resources encoding. Having only maven properties is not enough. Having -Dfile.encoding=UTF8 not effective. To avoid having issues with encoding you should follow the following simple rules

Set maven encoding, as described above:

UTF-8 UTF-8

Always set encoding explicitly, when work with files, strings, IO in your code. If you do not follow this rule, your application depend on the environment. The -Dfile.encoding=UTF8 exactly is responsible for run-time environment configuration, but we should not depend on it. If you have thousands of clients, it takes more effort to configure systems and to find issues because of it. You just have an additional dependency on it which you can avoid by setting it explicitly. Most methods in Java that use a default encoding are marked as deprecated because of it. Make sure the content, you are working with, also is in the same encoding, that you expect. If it is not, the previous steps do not matter! For instance a file will not be processed correctly, if its encoding is not UTF8 but you expect it. To check file encoding on Linux:

$ file --mime F_PRDAUFT.dsv

Force clients/server set encoding explicitly in requests/responses, here are examples:

@Produces("application/json; charset=UTF-8") @Consumes("application/json; charset=UTF-8")

Hope this will be useful to someone.


No, the only global setting of coding is to be done by env. variable: stackoverflow.com/a/9976788/715269
f
fsimon

Try this:

<project>
  ...
  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-resources-plugin</artifactId>
        <version>2.7</version>
        <configuration>
          ...
          <encoding>UTF-8</encoding>
          ...
        </configuration>
      </plugin>
    </plugins>
    ...
  </build>
  ...
</project>

Particularly important, we shouldn't forget that not only the sources, but also the resources need this encoding setting.
i
isapir

In my case I was using the maven-dependency-plugin so in order to resolve the issue I had to add the following property:

  <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>

See Apache Maven Resources Plugin / Specifying a character encoding scheme