How to get line count of a large file cheaply in Python?

K

Kyle

One line, probably pretty fast:

num_lines = sum(1 for line in open('myfile.txt'))

its similar to sum(sequence of 1) every line is counting as 1. >>> [ 1 for line in range(10) ] [1, 1, 1, 1, 1, 1, 1, 1, 1, 1] >>> sum( 1 for line in range(10) ) 10 >>>

num_lines = sum(1 for line in open('myfile.txt') if line.rstrip()) for filter empty lines

as we open a file, will this be closed automatically once we iterate over all the elements? Is it required to 'close()'? I think we cannot use 'with open()' in this short statement, right?

A slight lint improvement: num_lines = sum(1 for _ in open('myfile.txt'))

It's not any faster than the other solutions, see stackoverflow.com/a/68385697/353337.

Y

Yuval Adam

You can't get any better than that.

After all, any solution will have to read the entire file, figure out how many \n you have, and return that result.

Do you have a better way of doing that without reading the entire file? Not sure... The best solution will always be I/O-bound, best you can do is make sure you don't use unnecessary memory, but it looks like you have that covered.

Exactly, even WC is reading through the file, but in C and it's probably pretty optimized.

As far as I understand the Python file IO is done through C as well. docs.python.org/library/stdtypes.html#file-objects

@Tomalak That's a red herring. While python and wc might be issuing the same syscalls, python has opcode dispatch overhead that wc doesn't have.

You can approximate a line count by sampling. It can be thousands of times faster. See: documentroot.com/2011/02/…

Other answers seem to indicate this categorical answer is wrong, and should therefore be deleted rather than kept as accepted.

R

Ryan Ginstrom

I believe that a memory mapped file will be the fastest solution. I tried four functions: the function posted by the OP (opcount); a simple iteration over the lines in the file (simplecount); readline with a memory-mapped filed (mmap) (mapcount); and the buffer read solution offered by Mykola Kharechko (bufcount).

I ran each function five times, and calculated the average run-time for a 1.2 million-line text file.

Windows XP, Python 2.5, 2GB RAM, 2 GHz AMD processor

Here are my results:

mapcount : 0.465599966049
simplecount : 0.756399965286
bufcount : 0.546800041199
opcount : 0.718600034714

Edit: numbers for Python 2.6:

mapcount : 0.471799945831
simplecount : 0.634400033951
bufcount : 0.468800067902
opcount : 0.602999973297

So the buffer read strategy seems to be the fastest for Windows/Python 2.6

Here is the code:

from __future__ import with_statement
import time
import mmap
import random
from collections import defaultdict

def mapcount(filename):
    f = open(filename, "r+")
    buf = mmap.mmap(f.fileno(), 0)
    lines = 0
    readline = buf.readline
    while readline():
        lines += 1
    return lines

def simplecount(filename):
    lines = 0
    for line in open(filename):
        lines += 1
    return lines

def bufcount(filename):
    f = open(filename)                  
    lines = 0
    buf_size = 1024 * 1024
    read_f = f.read # loop optimization

    buf = read_f(buf_size)
    while buf:
        lines += buf.count('\n')
        buf = read_f(buf_size)

    return lines

def opcount(fname):
    with open(fname) as f:
        for i, l in enumerate(f):
            pass
    return i + 1


counts = defaultdict(list)

for i in range(5):
    for func in [mapcount, simplecount, bufcount, opcount]:
        start_time = time.time()
        assert func("big_file.txt") == 1209138
        counts[func].append(time.time() - start_time)

for key, vals in counts.items():
    print key.__name__, ":", sum(vals) / float(len(vals))

It seems that wccount() is the fastest gist.github.com/0ac760859e614cd03652

The buffered read is the fastest solution, not mmap or wccount. See stackoverflow.com/a/68385697/353337.

Q

Quentin Pradet

I had to post this on a similar question until my reputation score jumped a bit (thanks to whoever bumped me!).

All of these solutions ignore one way to make this run considerably faster, namely by using the unbuffered (raw) interface, using bytearrays, and doing your own buffering. (This only applies in Python 3. In Python 2, the raw interface may or may not be used by default, but in Python 3, you'll default into Unicode.)

Using a modified version of the timing tool, I believe the following code is faster (and marginally more pythonic) than any of the solutions offered:

def rawcount(filename):
    f = open(filename, 'rb')
    lines = 0
    buf_size = 1024 * 1024
    read_f = f.raw.read

    buf = read_f(buf_size)
    while buf:
        lines += buf.count(b'\n')
        buf = read_f(buf_size)

    return lines

Using a separate generator function, this runs a smidge faster:

def _make_gen(reader):
    b = reader(1024 * 1024)
    while b:
        yield b
        b = reader(1024*1024)

def rawgencount(filename):
    f = open(filename, 'rb')
    f_gen = _make_gen(f.raw.read)
    return sum( buf.count(b'\n') for buf in f_gen )

This can be done completely with generators expressions in-line using itertools, but it gets pretty weird looking:

from itertools import (takewhile,repeat)

def rawincount(filename):
    f = open(filename, 'rb')
    bufgen = takewhile(lambda x: x, (f.raw.read(1024*1024) for _ in repeat(None)))
    return sum( buf.count(b'\n') for buf in bufgen )

Here are my timings:

function      average, s  min, s   ratio
rawincount        0.0043  0.0041   1.00
rawgencount       0.0044  0.0042   1.01
rawcount          0.0048  0.0045   1.09
bufcount          0.008   0.0068   1.64
wccount           0.01    0.0097   2.35
itercount         0.014   0.014    3.41
opcount           0.02    0.02     4.83
kylecount         0.021   0.021    5.05
simplecount       0.022   0.022    5.25
mapcount          0.037   0.031    7.46

I am working with 100Gb+ files, and your rawgencounts is the only feasible solution I have seen so far. Thanks!

is wccount in this table for the subprocess shell wc tool?

Thanks @michael-bacon, it's a really nice solution. You can make the rawincount solution less weird looking by using bufgen = iter(partial(f.raw.read, 1024*1024), b'') instead of combining takewhile and repeat.

Oh, partial function, yeah, that's a nice little tweak. Also, I assumed that the 1024*1024 would get merged by the interpreter and treated as a constant but that was on hunch not documentation.

@MichaelBacon, would it be faster to open the file with buffering=0 and then calling read instead of just opening the file as "rb" and calling raw.read, or will that be optimized to the same thing?

n

nosklo

You could execute a subprocess and run wc -l filename

import subprocess

def file_len(fname):
    p = subprocess.Popen(['wc', '-l', fname], stdout=subprocess.PIPE, 
                                              stderr=subprocess.PIPE)
    result, err = p.communicate()
    if p.returncode != 0:
        raise IOError(err)
    return int(result.strip().split()[0])

what would be the windows version of this?

You can refer to this SO question regarding that. stackoverflow.com/questions/247234/…

Indeed, in my case (Mac OS X) this takes 0.13s versus 0.5s for counting the number of lines "for x in file(...)" produces, versus 1.0s counting repeated calls to str.find or mmap.find. (The file I used to test this has 1.3 million lines.)

No need to involve the shell on that. edited answer and added example code;

Is not cross platform.

n

namit

Here is a python program to use the multiprocessing library to distribute the line counting across machines/cores. My test improves counting a 20million line file from 26 seconds to 7 seconds using an 8 core windows 64 server. Note: not using memory mapping makes things much slower.

import multiprocessing, sys, time, os, mmap
import logging, logging.handlers

def init_logger(pid):
    console_format = 'P{0} %(levelname)s %(message)s'.format(pid)
    logger = logging.getLogger()  # New logger at root level
    logger.setLevel( logging.INFO )
    logger.handlers.append( logging.StreamHandler() )
    logger.handlers[0].setFormatter( logging.Formatter( console_format, '%d/%m/%y %H:%M:%S' ) )

def getFileLineCount( queues, pid, processes, file1 ):
    init_logger(pid)
    logging.info( 'start' )

    physical_file = open(file1, "r")
    #  mmap.mmap(fileno, length[, tagname[, access[, offset]]]

    m1 = mmap.mmap( physical_file.fileno(), 0, access=mmap.ACCESS_READ )

    #work out file size to divide up line counting

    fSize = os.stat(file1).st_size
    chunk = (fSize / processes) + 1

    lines = 0

    #get where I start and stop
    _seedStart = chunk * (pid)
    _seekEnd = chunk * (pid+1)
    seekStart = int(_seedStart)
    seekEnd = int(_seekEnd)

    if seekEnd < int(_seekEnd + 1):
        seekEnd += 1

    if _seedStart < int(seekStart + 1):
        seekStart += 1

    if seekEnd > fSize:
        seekEnd = fSize

    #find where to start
    if pid > 0:
        m1.seek( seekStart )
        #read next line
        l1 = m1.readline()  # need to use readline with memory mapped files
        seekStart = m1.tell()

    #tell previous rank my seek start to make their seek end

    if pid > 0:
        queues[pid-1].put( seekStart )
    if pid < processes-1:
        seekEnd = queues[pid].get()

    m1.seek( seekStart )
    l1 = m1.readline()

    while len(l1) > 0:
        lines += 1
        l1 = m1.readline()
        if m1.tell() > seekEnd or len(l1) == 0:
            break

    logging.info( 'done' )
    # add up the results
    if pid == 0:
        for p in range(1,processes):
            lines += queues[0].get()
        queues[0].put(lines) # the total lines counted
    else:
        queues[0].put(lines)

    m1.close()
    physical_file.close()

if __name__ == '__main__':
    init_logger( 'main' )
    if len(sys.argv) > 1:
        file_name = sys.argv[1]
    else:
        logging.fatal( 'parameters required: file-name [processes]' )
        exit()

    t = time.time()
    processes = multiprocessing.cpu_count()
    if len(sys.argv) > 2:
        processes = int(sys.argv[2])
    queues=[] # a queue for each process
    for pid in range(processes):
        queues.append( multiprocessing.Queue() )
    jobs=[]
    prev_pipe = 0
    for pid in range(processes):
        p = multiprocessing.Process( target = getFileLineCount, args=(queues, pid, processes, file_name,) )
        p.start()
        jobs.append(p)

    jobs[0].join() #wait for counting to finish
    lines = queues[0].get()

    logging.info( 'finished {} Lines:{}'.format( time.time() - t, lines ) )

How does this work with files much bigger than main memory? for instance a 20GB file on a system with 4GB RAM and 2 cores

Hard to test now, but I presume it would page the file in and out.

This is pretty neat code. I was surprised to find that it is faster to use multiple processors. I figured that the IO would be the bottleneck. In older Python versions, line 21 needs int() like chunk = int((fSize / processes)) + 1

do it load all the file into the memory? what about a bigger fire where the size is bigger then the ram on the computer?

Would you mind if I formatted the answer with black? black.vercel.app

N

Nico Schlömer

After a perfplot analysis, one has to recommend the buffered read solution

def buf_count_newlines_gen(fname):
    def _make_gen(reader):
        while True:
            b = reader(2 ** 16)
            if not b: break
            yield b

    with open(fname, "rb") as f:
        count = sum(buf.count(b"\n") for buf in _make_gen(f.raw.read))
    return count

It's fast and memory-efficient. Most other solutions are about 20 times slower.

https://i.stack.imgur.com/sdUM3.png

Code to reproduce the plot:

import mmap
import subprocess
from functools import partial

import perfplot


def setup(n):
    fname = "t.txt"
    with open(fname, "w") as f:
        for i in range(n):
            f.write(str(i) + "\n")
    return fname


def for_enumerate(fname):
    i = 0
    with open(fname) as f:
        for i, _ in enumerate(f):
            pass
    return i + 1


def sum1(fname):
    return sum(1 for _ in open(fname))


def mmap_count(fname):
    with open(fname, "r+") as f:
        buf = mmap.mmap(f.fileno(), 0)

    lines = 0
    while buf.readline():
        lines += 1
    return lines


def for_open(fname):
    lines = 0
    for _ in open(fname):
        lines += 1
    return lines


def buf_count_newlines(fname):
    lines = 0
    buf_size = 2 ** 16
    with open(fname) as f:
        buf = f.read(buf_size)
        while buf:
            lines += buf.count("\n")
            buf = f.read(buf_size)
    return lines


def buf_count_newlines_gen(fname):
    def _make_gen(reader):
        b = reader(2 ** 16)
        while b:
            yield b
            b = reader(2 ** 16)

    with open(fname, "rb") as f:
        count = sum(buf.count(b"\n") for buf in _make_gen(f.raw.read))
    return count


def wc_l(fname):
    return int(subprocess.check_output(["wc", "-l", fname]).split()[0])


def sum_partial(fname):
    with open(fname) as f:
        count = sum(x.count("\n") for x in iter(partial(f.read, 2 ** 16), ""))
    return count


def read_count(fname):
    return open(fname).read().count("\n")


b = perfplot.bench(
    setup=setup,
    kernels=[
        for_enumerate,
        sum1,
        mmap_count,
        for_open,
        wc_l,
        buf_count_newlines,
        buf_count_newlines_gen,
        sum_partial,
        read_count,
    ],
    n_range=[2 ** k for k in range(27)],
    xlabel="num lines",
)
b.save("out.png")
b.show()

1

1''

A one-line bash solution similar to this answer, using the modern subprocess.check_output function:

def line_count(filename):
    return int(subprocess.check_output(['wc', '-l', filename]).split()[0])

This answer should be voted up to a higher spot in this thread for Linux/Unix users. Despite the majority preferences in a cross-platform solution, this is a superb way on Linux/Unix. For a 184-million-line csv file I have to sample data from, it provides the best runtime. Other pure python solutions take on average 100+ seconds whereas subprocess call of wc -l takes ~ 5 seconds.

shell=True is bad for security, it is better to avoid it.

D

Daniel Lee

I would use Python's file object method readlines, as follows:

with open(input_file) as foo:
    lines = len(foo.readlines())

This opens the file, creates a list of lines in the file, counts the length of the list, saves that to a variable and closes the file again.

While this is one of the first ways that comes to mind, it probably isn't very memory efficient, especially if counting lines in files up to 10 GB (Like I do), which is a noteworthy disadvantage.

@TimeSheep Is this an issue for files with many (say, billions) of small lines, or files which have extremely long lines (say, Gigabytes per line)?

The reason I ask is, it would seem that the compiler should be able to optimize this away by not creating an intermediate list.

@dmityugov Per Python docs, xreadlines has been deprecated since 2.3, as it just returns an iterator. for line in file is the stated replacement. See: docs.python.org/2/library/stdtypes.html#file.xreadlines

C

Community

This is the fastest thing I have found using pure python. You can use whatever amount of memory you want by setting buffer, though 2**16 appears to be a sweet spot on my computer.

from functools import partial

buffer=2**16
with open(myfile) as f:
        print sum(x.count('\n') for x in iter(partial(f.read,buffer), ''))

I found the answer here Why is reading lines from stdin much slower in C++ than Python? and tweaked it just a tiny bit. Its a very good read to understand how to count lines quickly, though wc -l is still about 75% faster than anything else.

p

pkit

def file_len(full_path):
  """ Count number of lines in a file."""
  f = open(full_path)
  nr_of_lines = sum(1 for line in f)
  f.close()
  return nr_of_lines

The command "sum(1 for line in f)" seems to delete the content of the file. The command "f.readline()" returns null if I put it after that line.

r

radtek

Here is what I use, seems pretty clean:

import subprocess

def count_file_lines(file_path):
    """
    Counts the number of lines in a file using wc utility.
    :param file_path: path to file
    :return: int, no of lines
    """
    num = subprocess.check_output(['wc', '-l', file_path])
    num = num.split(' ')
    return int(num[0])

UPDATE: This is marginally faster than using pure python but at the cost of memory usage. Subprocess will fork a new process with the same memory footprint as the parent process while it executes your command.

Just as a side note, this won't work on Windows of course.

core utils apparently provides "wc" for windows stackoverflow.com/questions/247234/…. You can also use a linux VM in your windows box if your code will end up running in linux in prod.

Or WSL, highly advised over any VM if stuff like this is the only thing you do. :-)

Yeah that works. I'm not a windows guy but from goolging I learned WSL = Windows Subsystem for Linux =)

python3.7: subprocess return bytes, so code looks like this: int(subprocess.check_output(['wc', '-l', file_path]).decode("utf-8").lstrip().split(" ")[0])

k

kalehmann

One line solution:

import os
os.system("wc -l  filename")

My snippet:

>>> os.system('wc -l *.txt')

0 bar.txt
1000 command.txt
3 test_file.txt
1003 total

Good idea, unfortunately this does not work on Windows though.

if you want to be surfer of python , say good bye to windows.Believe me you will thank me one day .

I just considered it noteworthy that this will only work on windows. I prefer working on a linux/unix stack myself, but when writing software IMHO one should consider the side effects a program could have when run under different OSes. As the OP did not mention his platform and in case anyone pops on this solution via google and copies it (unaware of the limitations a Windows system might have), I wanted to add the note.

You can't save output of os.system() to variable and post-process it anyhow.

@AnSe you are correct but question is not asked whether it saves or not.I guess you are understanding the context.

C

Community

Kyle's answer

num_lines = sum(1 for line in open('my_file.txt'))

is probably best, an alternative for this is

num_lines =  len(open('my_file.txt').read().splitlines())

Here is the comparision of performance of both

In [20]: timeit sum(1 for line in open('Charts.ipynb'))
100000 loops, best of 3: 9.79 µs per loop

In [21]: timeit len(open('Charts.ipynb').read().splitlines())
100000 loops, best of 3: 12 µs per loop

S

Scott Persinger

I got a small (4-8%) improvement with this version which re-uses a constant buffer so it should avoid any memory or GC overhead:

lines = 0
buffer = bytearray(2048)
with open(filename) as f:
  while f.readinto(buffer) > 0:
      lines += buffer.count('\n')

You can play around with the buffer size and maybe see a little improvement.

Nice. To account for files that don't end in \n, add 1 outside of loop if buffer and buffer[-1]!='\n'

A bug: buffer in the last round might not be clean.

what if in between buffers one portion ends with \ and the other portion starts with n? that will miss one new line in there, I would sudgest to variables to store the end and the start of each chunk, but that might add more time to the script =(

B

BandGap

Just to complete the above methods I tried a variant with the fileinput module:

import fileinput as fi   
def filecount(fname):
        for line in fi.input(fname):
            pass
        return fi.lineno()

And passed a 60mil lines file to all the above stated methods:

mapcount : 6.1331050396
simplecount : 4.588793993
opcount : 4.42918205261
filecount : 43.2780818939
bufcount : 0.170812129974

It's a little surprise to me that fileinput is that bad and scales far worse than all the other methods...

S

SilentGhost

As for me this variant will be the fastest:

#!/usr/bin/env python

def main():
    f = open('filename')                  
    lines = 0
    buf_size = 1024 * 1024
    read_f = f.read # loop optimization

    buf = read_f(buf_size)
    while buf:
        lines += buf.count('\n')
        buf = read_f(buf_size)

    print lines

if __name__ == '__main__':
    main()

reasons: buffering faster than reading line by line and string.count is also very fast

But is it? At least on OSX/python2.5 the OP's version is still about 10% faster according to timeit.py.

What if the last line does not end in '\n'?

I don't know how you tested it, dF, but on my machine it's ~2.5 times slower than any other option.

You state that it will be the fastest and then state that you haven't tested it. Not very scientific eh? :)

See solution and stats provided by Ryan Ginstrom answer below. Also check out JF Sebastian's comment and link on the same answer.

T

Texom512

This code is shorter and clearer. It's probably the best way:

num_lines = open('yourfile.ext').read().count('\n')

You should also close the file.

It will load the whole file into memory.

D

Dummy

I have modified the buffer case like this:

def CountLines(filename):
    f = open(filename)
    try:
        lines = 1
        buf_size = 1024 * 1024
        read_f = f.read # loop optimization
        buf = read_f(buf_size)

        # Empty file
        if not buf:
            return 0

        while buf:
            lines += buf.count('\n')
            buf = read_f(buf_size)

        return lines
    finally:
        f.close()

Now also empty files and the last line (without \n) are counted.

Maybe also explain (or add in comment in the code) what you changed and what for ;). Might give people some more inside in your code much easier (rather than "parsing" the code in the brain).

The loop optimization I think allows Python to do a local variable lookup at read_f, python.org/doc/essays/list2str

g

gaborous

A lot of answers already, but unfortunately most of them are just tiny economies on a barely optimizable problem...

I worked on several projects where line count was the core function of the software, and working as fast as possible with a huge number of files was of paramount importance.

The main bottleneck with line count is I/O access, as you need to read each line in order to detect the line return character, there is simply no way around. The second potential bottleneck is memory management: the more you load at once, the faster you can process, but this bottleneck is negligible compared to the first.

Hence, there are 3 major ways to reduce the processing time of a line count function, apart from tiny optimizations such as disabling gc collection and other micro-managing tricks:

Hardware solution: the major and most obvious way is non-programmatic: buy a very fast SSD/flash hard drive. By far, this is how you can get the biggest speed boosts. Data preparation solution: if you generate or can modify how the files you process are generated, or if it's acceptable that you can pre-process them, first convert the line return to unix style (\n) as this will save 1 character compared to Windows or MacOS styles (not a big save but it's an easy gain), and secondly and most importantly, you can potentially write lines of fixed length. If you need variable length, you can always pad smaller lines. This way, you can calculate instantly the number of lines from the total filesize, which is much faster to access. Often, the best solution to a problem is to pre-process it so that it better fits your end purpose. Parallelization + hardware solution: if you can buy multiple hard disks (and if possible SSD flash disks), then you can even go beyond the speed of one disk by leveraging parallelization, by storing your files in a balanced way (easiest is to balance by total size) among disks, and then read in parallel from all those disks. Then, you can expect to get a multiplier boost in proportion with the number of disks you have. If buying multiple disks is not an option for you, then parallelization likely won't help (except if your disk has multiple reading headers like some professional-grade disks, but even then the disk's internal cache memory and PCB circuitry will likely be a bottleneck and prevent you from fully using all heads in parallel, plus you have to devise a specific code for this hard drive you'll use because you need to know the exact cluster mapping so that you store your files on clusters under different heads, and so that you can read them with different heads after). Indeed, it's commonly known that sequential reading is almost always faster than random reading, and parallelization on a single disk will have a performance more similar to random reading than sequential reading (you can test your hard drive speed in both aspects using CrystalDiskMark for example).

If none of those are an option, then you can only rely on micro-managing tricks to improve by a few percents the speed of your line counting function, but don't expect anything really significant. Rather, you can expect the time you'll spend tweaking will be disproportionated compared to the returns in speed improvement you'll see.

A

Andrew Jaffe

the result of opening a file is an iterator, which can be converted to a sequence, which has a length:

with open(filename) as f:
   return len(list(f))

this is more concise than your explicit loop, and avoids the enumerate.

which means that 100 Mb file will need to be read into the memory.

yep, good point, although I wonder about the speed (as opposed to memory) difference. It's probably possible to create an iterator that does this, but I think it would be equivalent to your solution.

-1, it's not just the memory, but having to construct the list in memory.

A

Andrés Torres

print open('file.txt', 'r').read().count("\n") + 1

L

Lerner Zhang

If one wants to get the line count cheaply in Python in Linux, I recommend this method:

import os
print os.popen("wc -l file_path").readline().split()[0]

file_path can be both abstract file path or relative path. Hope this may help.

j

jciloa

def count_text_file_lines(path):
    with open(path, 'rt') as file:
        line_count = sum(1 for _line in file)
    return line_count

Could you please explain what is wrong with it if you think it is wrong? It worked for me. Thanks!

I would be interested in why this answer was downvoted, too. It iterates over the file by lines and sums them up. I like it, it is short and to the point, what's wrong with it?

M

M.Innat

Simple method:

1)

>>> f = len(open("myfile.txt").readlines())
>>> f

430

>>> f = open("myfile.txt").read().count('\n')
>>> f
430
>>>

num_lines = len(list(open('myfile.txt')))

In this example file is not closed.

why did you give 3 options? how are they different? what are the benefits and drawbacks to each of these?

o

odwl

What about this

def file_len(fname):
  counts = itertools.count()
  with open(fname) as f: 
    for _ in f: counts.next()
  return counts.next()

p

pyanon

count = max(enumerate(open(filename)))[0]

This give the count -1 of the true value.

Optional second argument for enumerate() is start count according to docs.python.org/2/library/functions.html#enumerate

l

leba-lev

How about this?

import fileinput
import sys

counter=0
for line in fileinput.input([sys.argv[1]]):
    counter+=1

fileinput.close()
print counter

o

onetwopunch

How about this one-liner:

file_length = len(open('myfile.txt','r').read().split('\n'))

Takes 0.003 sec using this method to time it on a 3900 line file

def c():
  import time
  s = time.time()
  file_length = len(open('myfile.txt','r').read().split('\n'))
  print time.time() - s

m

mdwhatcott

def line_count(path):
    count = 0
    with open(path) as lines:
        for count, l in enumerate(lines, start=1):
            pass
    return count

How to get line count of a large file cheaply in Python?

Follow WeChat

Want to stay one step ahead of the latest teleworks?

相似问题

Platform

Support

Contact US