ChatGPT解决这个技术问题 Extra ChatGPT

How to write a UTF-8 file with Java?

I have some current code and the problem is its creating a 1252 codepage file, i want to force it to create a UTF-8 file

Can anyone help me with this code, as i say it currently works... but i need to force the save on utf.. can i pass a parameter or something??

this is what i have, any help really appreciated

var out = new java.io.FileWriter( new java.io.File( path )),
        text = new java.lang.String( src || "" );
    out.write( text, 0, text.length() );
    out.flush();
    out.close();
Please post code which passes the compiler, if possible.
it seems to be rhino (javascript)

N
Neuron

Instead of using FileWriter, create a FileOutputStream. You can then wrap this in an OutputStreamWriter, which allows you to pass an encoding in the constructor. Then you can write your data to that inside a try-with-resources Statement:

try (OutputStreamWriter writer =
             new OutputStreamWriter(new FileOutputStream(PROPERTIES_FILE), StandardCharsets.UTF_8))
    // do stuff
}

... and curse at Sun not putting in a constructor to FileWriter which takes a Charset.
It does seem like an odd oversight. And they still haven't fixed it.
@Jon Skeet: Given that FileWriter is a wrapper for FileOutputStream that assumes the default encoding and buffer size, wouldn't that defeat the point?
Sorry, I meant for OutputStreamWriter, not for FileOutputStream.
I recommed to separate every declaration for types that implements the Closeable interface, especially if you use try with resources, like "new FileOutputStream"; is a good practice and avoid future errors like "IOException: Too many open files".
n
nhahtdh

Try this

Writer out = new BufferedWriter(new OutputStreamWriter(
    new FileOutputStream("outfilename"), "UTF-8"));
try {
    out.write(aString);
} finally {
    out.close();
}

I think there is a typo. Writer out = ... should be corrected to BufferedWriter out = ... .
Writer is the Abstract Class, BufferedWriter is implementing and write() + close() are declarated.
This creates an actual UTF-8 without BOM, not just UTF-8. Is there a way to force that?
N
Neuron

Try using FileUtils.write from Apache Commons.

You should be able to do something like:

File f = new File("output.txt"); 
FileUtils.writeStringToFile(f, document.outerHtml(), "UTF-8");

This will create the file if it does not exist.


This also produces a file UTF-8 WIthout BOM ... I don't know if it's relevant or not.
@Smarty only if you are already using Apache Commons. Otherwise it seems an awful waste to include yet another jar just because you don't want to write a few more characters.
I couldn't see a 'write(..)' method in FileUtils class. I checked in the commons IO 1.4
If you read the Java docs on the link shown in the question, then it tells you the version of the Commons IO API where the write APIs were introduced. It looks like the write APIs were introduced from v2.0 onwards.
Just would like to mention that I used the method FileUtils.writeStringToFile(...) (with commons-io-1.3.1.jar) instead of FileUtils.write(...).
N
Neuron

Since Java 7 you can do the same with Files.newBufferedWriter a little more succinctly:

Path logFile = Paths.get("/tmp/example.txt");
try (BufferedWriter writer = Files.newBufferedWriter(logFile, StandardCharsets.UTF_8)) {
    writer.write("Hello World!");
    // ...
}

E
Emperorlou

All of the answers given here wont work since java's UTF-8 writing is bugged.

http://tripoverit.blogspot.com/2007/04/javas-utf-8-and-unicode-writing-is.html


As far as I can tell, the bug is this one (since the author of that article doesn't bother to mention it): bugs.sun.com/view_bug.do?bug_id=4508058
The only issue when writing is the missing BOM. No big deal. Reading a file with a BOM on the other hand requires stripping it manually.
UTF-8 doesn't need BOM, so technically the written file is still a valid UTF-8 encoded text file. The bug is with reading an UTF-8 with BOM.
@Chris the bugs.sun.com link is broken. Do you have one that works?
Still works for me; I'm not logged in or anything. Try googling for bug 4508058.
b
boxofrats
var out = new java.io.PrintWriter(new java.io.File(path), "UTF-8");
text = new java.lang.String( src || "" );
out.print(text);
out.flush();
out.close();

M
McDowell

The Java 7 Files utility type is useful for working with files:

import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.io.IOException;
import java.util.*;

public class WriteReadUtf8 {
  public static void main(String[] args) throws IOException {
    List<String> lines = Arrays.asList("These", "are", "lines");

    Path textFile = Paths.get("foo.txt");
    Files.write(textFile, lines, StandardCharsets.UTF_8);

    List<String> read = Files.readAllLines(textFile, StandardCharsets.UTF_8);

    System.out.println(lines.equals(read));
  }
}

The Java 8 version allows you to omit the Charset argument - the methods default to UTF-8.


D
Dharmesh Patel

we can write the UTF-8 encoded file with java using use PrintWriter to write UTF-8 encoded xml

Or Click here

PrintWriter out1 = new PrintWriter(new File("C:\\abc.xml"), "UTF-8");

A
Ammad

Below sample code can read file line by line and write new file in UTF-8 format. Also, i am explicitly specifying Cp1252 encoding.

    public static void main(String args[]) throws IOException {

    BufferedReader br = new BufferedReader(new InputStreamReader(
            new FileInputStream("c:\\filenonUTF.txt"),
            "Cp1252"));
    String line;

    Writer out = new BufferedWriter(
            new OutputStreamWriter(new FileOutputStream(
                    "c:\\fileUTF.txt"), "UTF-8"));

    try {

        while ((line = br.readLine()) != null) {

            out.write(line);
            out.write("\n");

        }

    } finally {

        br.close();
        out.close();

    }
}