ChatGPT解决这个技术问题 Extra ChatGPT

How can I read a large text file line by line using Java?

I need to read a large text file of around 5-6 GB line by line using Java.

How can I do this quickly?

@kamaci et. al. This question should not be marked as a duplicate. "Quickly read the last line" is not an alternative, and its debatable whether "Quickest way to read text-file line by line" is. The quickest way to do something is not necessarily the common way. Furthermore, the answers below include code, the most relevant alternative you list does not. This question is useful. It is currently the top google search result for "java read file line by line". Finally, its off putting to arrive at stack overflow and find that 1 in every 2 question is flagged for disposal.
Here is a comparison of speed for six possible implementations.
Event though I have been reading comments arguing that SO's close policy sucks, SO persists in it. It's such a narrow minded developer perspective to want to avoid redundancy at all costs! Just let it be! The cream will rise to the top and the sh*t will sink to the bottom just fine all by itself. Even though a question may have been asked before (which question isn't??), that does not mean that a new question may not be able to phrase it better, get better answers, rank higher in search engines etc. Interestingly, this question is now 'protected'....
It's incredible how questions get marked as duplicate by just reading the title.
After Shog's edit this is indeed a duplicate of stackoverflow.com/q/5800361/103167 but this one has gotten far more activity.

R
Rob Kielty

A common pattern is to use

try (BufferedReader br = new BufferedReader(new FileReader(file))) {
    String line;
    while ((line = br.readLine()) != null) {
       // process the line.
    }
}

You can read the data faster if you assume there is no character encoding. e.g. ASCII-7 but it won't make much difference. It is highly likely that what you do with the data will take much longer.

EDIT: A less common pattern to use which avoids the scope of line leaking.

try(BufferedReader br = new BufferedReader(new FileReader(file))) {
    for(String line; (line = br.readLine()) != null; ) {
        // process the line.
    }
    // line is not visible here.
}

UPDATE: In Java 8 you can do

try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
        stream.forEach(System.out::println);
}

NOTE: You have to place the Stream in a try-with-resource block to ensure the #close method is called on it, otherwise the underlying file handle is never closed until GC does it much later.


What does this pattern look like with proper exception handling? I note that br.close() throws IOException, which seems surprising -- what could happen when closing a file that is opened for read, anyway? FileReader's constructor might throw a FileNotFound exception.
If I have a 200MB file and it can read at 90MB/s then I expect it to take ~3s? Mine seem to take minutes, with this "slow" way of reading. I am on an SSD so read speeds should not be a problem?
@JiewMeng SO I would suspect something else you are doing is taking time. Can you try just reading the lines of the file and nothing else.
Why not for(String line = br.readLine(); line != null; line = br.readLine()) Btw, in Java 8 you can do try( Stream<String> lines = Files.lines(...) ){ for( String line : (Iterable<String>) lines::iterator ) { ... } } Which is hard not to hate.
@AleksandrDubinsky The problem I have with closures in Java 8 is that it very easily makes the code more complicated to read (as well as being slower) I can see lots of developers overusing it because it is "cool".
L
Leonardo Alves Machado

Look at this blog:

Java Read File Line by Line - Java Tutorial

The buffer size may be specified, or the default size may be used. The default is large enough for most purposes.

// Open the file
FileInputStream fstream = new FileInputStream("textfile.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));

String strLine;

//Read File Line By Line
while ((strLine = br.readLine()) != null)   {
  // Print the content on the console
  System.out.println (strLine);
}

//Close the input stream
fstream.close();

My file is 1.5 Gig and it's not possible to read the file using your answer!
@AboozarRajabi Of course it is possible. This code can read any text file.
Downvoted for poor quality link. There is a completely pointless DataInputStream, and the wrong stream is closed. Nothing wrong with the Java Tutorial, and no need to cite arbitrary third-party Internet rubbish like this.
I'd ditch the comments, you have 4 lines of 100% redundant comments for 6 lines of code.
P
Peter Mortensen

Once Java 8 is out (March 2014) you'll be able to use streams:

try (Stream<String> lines = Files.lines(Paths.get(filename), Charset.defaultCharset())) {
  lines.forEachOrdered(line -> process(line));
}

Printing all the lines in the file:

try (Stream<String> lines = Files.lines(file, Charset.defaultCharset())) {
  lines.forEachOrdered(System.out::println);
}

Use StandardCharsets.UTF_8, use Stream<String> for conciseness, and avoid using forEach() and especially forEachOrdered() unless there's a reason.
Why avoid forEach()? Is it bad?
If I us forEach instead of forEachOrdered, the lines might be printed out of order, aren't they?
@steventrouble Take a look at: stackoverflow.com/questions/16635398/… It's not bad if you pass a short function reference like forEach(this::process), but it gets ugly if you write blocks of code as lambdas inside forEach().
@msayag, You're right, you need forEachOrdered in order to execute in-order. Be aware that you won't be able to parallelize the stream in that case, although I've found that parallelization doesn't turn on unless the file has thousands of lines.
O
OneCricketeer

Here is a sample with full error handling and supporting charset specification for pre-Java 7. With Java 7 you can use try-with-resources syntax, which makes the code cleaner.

If you just want the default charset you can skip the InputStream and use FileReader.

InputStream ins = null; // raw byte-stream
Reader r = null; // cooked reader
BufferedReader br = null; // buffered for readLine()
try {
    String s;
    ins = new FileInputStream("textfile.txt");
    r = new InputStreamReader(ins, "UTF-8"); // leave charset out for default
    br = new BufferedReader(r);
    while ((s = br.readLine()) != null) {
        System.out.println(s);
    }
}
catch (Exception e)
{
    System.err.println(e.getMessage()); // handle exception
}
finally {
    if (br != null) { try { br.close(); } catch(Throwable t) { /* ensure close happens */ } }
    if (r != null) { try { r.close(); } catch(Throwable t) { /* ensure close happens */ } }
    if (ins != null) { try { ins.close(); } catch(Throwable t) { /* ensure close happens */ } }
}

Here is the Groovy version, with full error handling:

File f = new File("textfile.txt");
f.withReader("UTF-8") { br ->
    br.eachLine { line ->
        println line;
    }
}

What does a ByteArrayInputStream fed by a string literal have to do with reading a large text file?
absolutely useless closes. There is zero reason to close every stream. If you close any of those streams you automatically close all other streams...
g
gomisha

I documented and tested 10 different ways to read a file in Java and then ran them against each other by making them read in test files from 1KB to 1GB. Here are the fastest 3 file reading methods for reading a 1GB test file.

Note that when running the performance tests I didn't output anything to the console since that would really slow down the test. I just wanted to test the raw reading speed.

1) java.nio.file.Files.readAllBytes()

Tested in Java 7, 8, 9. This was overall the fastest method. Reading a 1GB file was consistently just under 1 second.

import java.io..File;
import java.io.IOException;
import java.nio.file.Files;

public class ReadFile_Files_ReadAllBytes {
  public static void main(String [] pArgs) throws IOException {
    String fileName = "c:\\temp\\sample-1GB.txt";
    File file = new File(fileName);

    byte [] fileBytes = Files.readAllBytes(file.toPath());
    char singleChar;
    for(byte b : fileBytes) {
      singleChar = (char) b;
      System.out.print(singleChar);
    }
  }
}

2) java.nio.file.Files.lines()

This was tested successfully in Java 8 and 9 but it won't work in Java 7 because of the lack of support for lambda expressions. It took about 3.5 seconds to read in a 1GB file which put it in second place as far as reading larger files.

import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.util.stream.Stream;

public class ReadFile_Files_Lines {
  public static void main(String[] pArgs) throws IOException {
    String fileName = "c:\\temp\\sample-1GB.txt";
    File file = new File(fileName);

    try (Stream linesStream = Files.lines(file.toPath())) {
      linesStream.forEach(line -> {
        System.out.println(line);
      });
    }
  }
}

3) BufferedReader

Tested to work in Java 7, 8, 9. This took about 4.5 seconds to read in a 1GB test file.

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;

public class ReadFile_BufferedReader_ReadLine {
  public static void main(String [] args) throws IOException {
    String fileName = "c:\\temp\\sample-1GB.txt";
    FileReader fileReader = new FileReader(fileName);

    try (BufferedReader bufferedReader = new BufferedReader(fileReader)) {
      String line;
      while((line = bufferedReader.readLine()) != null) {
        System.out.println(line);
      }
    }
  }

You can find the complete rankings for all 10 file reading methods here.


Your guide is amazing :)
You are mostly timing System.out.print/println() here; you are also assuming the file will fit into memory in your first two cases.
Fair enough. Maybe I could've made those assumptions more explicit in my answer.
the question asked for reading line by line, only last method qualifies...
@eis Given that he tested 10 ways to read a file and the third fastest is line-by-line, it can be assumed reasonably that the third method shown here is also the fastest way to read a file line-by-line. I would argue then that he not only fully answered the question, but gave additional information as well which is quite useful to know.
C
Community

In Java 8, you could do:

try (Stream<String> lines = Files.lines (file, StandardCharsets.UTF_8))
{
    for (String line : (Iterable<String>) lines::iterator)
    {
        ;
    }
}

Some notes: The stream returned by Files.lines (unlike most streams) needs to be closed. For the reasons mentioned here I avoid using forEach(). The strange code (Iterable<String>) lines::iterator casts a Stream to an Iterable.


By not implementing Iterable this code is definitively ugly although useful. It needs a cast (ie (Iterable<String>)) to work.
How can I skip the first line with this method?
@qed for(String line : (Iterable<String>) lines.skip(1)::iterator)
If you’re not intending to actually use Stream features, using Files.newBufferedReader instead of Files.lines and repeatedly calling readLine() until null instead of using constructs like (Iterable<String>) lines::iterator seems to be much simpler…
@user207421 Why do you say it reads the file into memory? The javadoc says, Unlike readAllLines, [File.lines] does not read all lines into a List, but instead populates lazily as the stream is consumed... The returned stream encapsulates a Reader.
g
guido

What you can do is scan the entire text using Scanner and go through the text line by line. Of course you should import the following:

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
public static void readText throws FileNotFoundException {
    Scanner scan = new Scanner(new File("samplefilename.txt"));
    while(scan.hasNextLine()){
        String line = scan.nextLine();
        //Here you can manipulate the string the way you want
    }
}

Scanner basically scans all the text. The while loop is used to traverse through the entire text.

The .hasNextLine() function is a boolean that returns true if there are still more lines in the text. The .nextLine() function gives you an entire line as a String which you can then use the way you want. Try System.out.println(line) to print the text.

Side Note: .txt is the file type text.


Shouldn't the method declaration look instead of this: ´public static void readText throws FileNotFoundException(){´ Like: ´public static void readText() throws FileNotFoundException{´
This is considerably slower than BufferedReader.readLine(), and he asked for the best-performing method.
י
ישו אוהב אותך

FileReader won't let you specify the encoding, use InputStreamReaderinstead if you need to specify it:

try {
    BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(filePath), "Cp1252"));         

    String line;
    while ((line = br.readLine()) != null) {
        // process the line.
    }
    br.close();

} catch (IOException e) {
    e.printStackTrace();
}

If you imported this file from Windows, it might have ANSI encoding (Cp1252), so you have to specify the encoding.


D
Diego Duarte

In Java 7:

String folderPath = "C:/folderOfMyFile";
Path path = Paths.get(folderPath, "myFileName.csv"); //or any text file eg.: txt, bat, etc
Charset charset = Charset.forName("UTF-8");

try (BufferedReader reader = Files.newBufferedReader(path , charset)) {
  while ((line = reader.readLine()) != null ) {
    //separate all csv fields into string array
    String[] lineVariables = line.split(","); 
  }
} catch (IOException e) {
    System.err.println(e);
}

be aware! using line.split this way will NOT parse properly if a field contains a comma and it is surrounded by quotes. This split will ignore that and just separate the field in chunks using the internal comma. HTH, Marcelo.
CSV: Comma Separated Values file, thus you shouldn't use comma in a csv field, unless you mean to add another field. So, use split for comma token in java when parsing a CSV file is perfectly fine and right
Diego, this is not correct. The only CSV standard (RFC 4180) specifically says "Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes."
Use StandardCharsets.UTF_8 to avoid the checked exception in Charset.forName("UTF-8")
Thank you "Diego Duarte" for your comment; i must say i agree with what "serg.nechaev" replies. I see commas embedded in csv files 'all the time'. People expect that this will be accepted. with all due respect. also a big thanks to "serg.nechaev". IMHO you are right. Cheerse Everyone.
d
djna

In Java 8, there is also an alternative to using Files.lines(). If your input source isn't a file but something more abstract like a Reader or an InputStream, you can stream the lines via the BufferedReaders lines() method.

For example:

try (BufferedReader reader = new BufferedReader(...)) {
  reader.lines().forEach(line -> processLine(line));
}

will call processLine() for each input line read by the BufferedReader.


P
Peter Mortensen

For reading a file with Java 8

package com.java.java8;

import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.stream.Stream;

/**
 * The Class ReadLargeFile.
 *
 * @author Ankit Sood Apr 20, 2017
 */
public class ReadLargeFile {

    /**
     * The main method.
     *
     * @param args
     *            the arguments
     */
    public static void main(String[] args) {
        try {
            Stream<String> stream = Files.lines(Paths.get("C:\\Users\\System\\Desktop\\demoData.txt"));
            stream.forEach(System.out::println);
        }
        catch (Exception e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }
}

A
Abhilash

You can use Scanner class

Scanner sc=new Scanner(file);
sc.nextLine();

@Tim 'Bomb horribly' is not a term I recognize in CS. What exactly do you mean?
Bog down, execute very slowly, most likely crash. I probably should avoid idioms on this site ;)
@Tim Why would it do so?
Using Scanner is fine, but this answer does not include the full code to use it properly.
@Tim This code will neither 'bomb horribly' nor 'bog down' nor 'execute very slowly' nor 'most likely crash'. As a matter of fact as written it will only read one line, almost instaneously. You can read megabytes per second this way, although BufferedReader.readLine() is certainly several times as fast. If you think otherwise please provide your reasons.
C
Community

Java 9:

try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
    stream.forEach(System.out::println);
}

I think you have to System.getProperty("os.name").equals("Linux")
Don't compare strings with == !
This is the canonical Java 8 example, as already posted by others. Why do you claim that this is “Java-9”?
@Holger memory mapped files that he forgot to mention may be?
to process it line by line you can do try (Stream stream = Files.lines(Paths.get(inputFile))) { stream.forEach((line) -> { System.out.println(line); }); }
u
user207421

You need to use the readLine() method in class BufferedReader. Create a new object from that class and operate this method on him and save it to a string.

BufferReader Javadoc


Seems like link to BufferReaderAPI is broken
R
Rajamohan S

The clear way to achieve this,

For example:

If you have dataFile.txt on your current directory

import java.io.*;
import java.util.Scanner;
import java.io.FileNotFoundException;

public class readByLine
{
    public readByLine() throws FileNotFoundException
    {
        Scanner linReader = new Scanner(new File("dataFile.txt"));

        while (linReader.hasNext())
        {
            String line = linReader.nextLine();
            System.out.println(line);
        }
        linReader.close();

    }

    public static void main(String args[])  throws FileNotFoundException
    {
        new readByLine();
    }
}

https://i.stack.imgur.com/W3xLx.jpg


Why is it clearer? And don't post pictures of text here. Post the text.
You posted a picture. It is a picture of text. You could have cut and pasted the text directly into this page. Nobody said anything about posting programs. Posting pictures of text is a waste of your time, which I don't care about, and oyur bandwidth, which I do.
Y
YakovL
BufferedReader br;
FileInputStream fin;
try {
    fin = new FileInputStream(fileName);
    br = new BufferedReader(new InputStreamReader(fin));

    /*Path pathToFile = Paths.get(fileName);
    br = Files.newBufferedReader(pathToFile,StandardCharsets.US_ASCII);*/

    String line = br.readLine();
    while (line != null) {
        String[] attributes = line.split(",");
        Movie movie = createMovie(attributes);
        movies.add(movie);
        line = br.readLine();
    }
    fin.close();
    br.close();
} catch (FileNotFoundException e) {
    System.out.println("Your Message");
} catch (IOException e) {
    System.out.println("Your Message");
}

It works for me. Hope It will help you too.


K
Kirill

You can use streams to do it more precisely:

Files.lines(Paths.get("input.txt")).forEach(s -> stringBuffer.append(s);

I agree that it is actually fine. Aguess, people dislike it because of strange StringBuffer choice (StringBuilder is generally preferred, even though it might just be a bad name for variable). Also because it is already mentioned above.
B
Binkan Salaryman

I usually do the reading routine straightforward:

void readResource(InputStream source) throws IOException {
    BufferedReader stream = null;
    try {
        stream = new BufferedReader(new InputStreamReader(source));
        while (true) {
            String line = stream.readLine();
            if(line == null) {
                break;
            }
            //process line
            System.out.println(line)
        }
    } finally {
        closeQuiet(stream);
    }
}

static void closeQuiet(Closeable closeable) {
    if (closeable != null) {
        try {
            closeable.close();
        } catch (IOException ignore) {
        }
    }
}

P
Peter Mortensen

By using the org.apache.commons.io package, it gave more performance, especially in legacy code which uses Java 6 and below.

Java 7 has a better API with fewer exceptions handling and more useful methods:

LineIterator lineIterator = null;
try {
    lineIterator = FileUtils.lineIterator(new File("/home/username/m.log"), "windows-1256"); // The second parameter is optionnal
    while (lineIterator.hasNext()) {
        String currentLine = lineIterator.next();
        // Some operation
    }
}
finally {
    LineIterator.closeQuietly(lineIterator);
}

Maven

<!-- https://mvnrepository.com/artifact/commons-io/commons-io -->
<dependency>
    <groupId>commons-io</groupId>
    <artifactId>commons-io</artifactId>
    <version>2.6</version>
</dependency>

U
Usman Yaqoob

You can use this code:

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;

public class ReadTextFile {

    public static void main(String[] args) throws IOException {

        try {

            File f = new File("src/com/data.txt");

            BufferedReader b = new BufferedReader(new FileReader(f));

            String readLine = "";

            System.out.println("Reading file using Buffered Reader");

            while ((readLine = b.readLine()) != null) {
                System.out.println(readLine);
            }

        } catch (IOException e) {
            e.printStackTrace();
        }

    }

}

An explanation would be in order.
P
Peter Mortensen

You can also use Apache Commons IO:

File file = new File("/home/user/file.txt");
try {
    List<String> lines = FileUtils.readLines(file);
} catch (IOException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}

FileUtils.readLines(file) is a deprecated method. Additionally, the method invokes IOUtils.readLines, which uses a BufferedReader and ArrayList. This is not a line-by-line method, and certainly not one that would be practical for reading several GB.
A
Arefe

You can read file data line by line as below:

String fileLoc = "fileLocationInTheDisk";

List<String> lines = Files.lines(Path.of(fileLoc), StandardCharsets.UTF_8).collect(Collectors.toList());

Do you realise you'd be storing the lines from a 5-6GB in memory ? This most probably will result in a memory overflow exception. Also, the OP asked for it to be done quickly, which this also doesn't answer because processing line by line would be much more efficient