I'm trying to read from a text/plain
file over the internet, line-by-line. The code I have right now is:
URL url = new URL("http://kuehldesign.net/test.txt");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
LinkedList<String> lines = new LinkedList();
String readLine;
while ((readLine = in.readLine()) != null) {
lines.add(readLine);
}
for (String line : lines) {
out.println("> " + line);
}
The file, test.txt
, contains ¡Hélló!
, which I am using in order to test the encoding.
When I review the OutputStream
(out
), I see it as > ¡Hélló!
. I don't believe this is a problem with the OutputStream
since I can do out.println("é");
without problems.
Any ideas for reading form the InputStream
as UTF-8? Thanks!
text/plain
file, unfortunately, and it's not using a UTF-8 encoding. I wasn't aware of any good network libraries; any suggestions?
Solved my own problem. This line:
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
needs to be:
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream(), "UTF-8"));
or since Java 7:
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream(), StandardCharsets.UTF_8));
String file = "";
try {
InputStream is = new FileInputStream(filename);
String UTF8 = "utf8";
int BUFFER_SIZE = 8192;
BufferedReader br = new BufferedReader(new InputStreamReader(is,
UTF8), BUFFER_SIZE);
String str;
while ((str = br.readLine()) != null) {
file += str;
}
} catch (Exception e) {
}
Try this,.. :-)
I ran into the same problem every time it finds a special character marks it as ��. to solve this, I tried using the encoding: ISO-8859-1
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream("txtPath"),"ISO-8859-1"));
while ((line = br.readLine()) != null) {
}
I hope this can help anyone who sees this post.
If you use the constructor InputStreamReader(InputStream in, Charset cs)
, bad characters are silently replaced. To change this behaviour, use a CharsetDecoder
:
public static Reader newReader(Inputstream is) {
new InputStreamReader(is,
StandardCharsets.UTF_8.newDecoder()
.onMalformedInput(CodingErrorAction.REPORT)
.onUnmappableCharacter(CodingErrorAction.REPORT)
);
}
Then catch java.nio.charset.CharacterCodingException
.
Success story sharing
CharsetDecoder dec
argument. This is same Java design bug that theOutputStreamWriter
constructors have: only one of the four actually condescends to tell you when something goes wrong. You again have to use the fancyCharsetDecoder dec
argument there, too. The only safe and sane thing to do is to consider all other constructors deprecated, because they cannot be trusted to behave.StandardCharsets.UTF_8