ChatGPT解决这个技术问题 Extra ChatGPT

Does reading an entire file leave the file handle open?

If you read an entire file with content = open('Path/to/file', 'r').read() is the file handle left open until the script exits? Is there a more concise method to read a whole file?


S
Selcuk

The answer to that question depends somewhat on the particular Python implementation.

To understand what this is all about, pay particular attention to the actual file object. In your code, that object is mentioned only once, in an expression, and becomes inaccessible immediately after the read() call returns.

This means that the file object is garbage. The only remaining question is "When will the garbage collector collect the file object?".

in CPython, which uses a reference counter, this kind of garbage is noticed immediately, and so it will be collected immediately. This is not generally true of other python implementations.

A better solution, to make sure that the file is closed, is this pattern:

with open('Path/to/file', 'r') as content_file:
    content = content_file.read()

which will always close the file immediately after the block ends; even if an exception occurs.

Edit: To put a finer point on it:

Other than file.__exit__(), which is "automatically" called in a with context manager setting, the only other way that file.close() is automatically called (that is, other than explicitly calling it yourself,) is via file.__del__(). This leads us to the question of when does __del__() get called?

A correctly-written program cannot assume that finalizers will ever run at any point prior to program termination.

-- https://devblogs.microsoft.com/oldnewthing/20100809-00/?p=13203

In particular:

Objects are never explicitly destroyed; however, when they become unreachable they may be garbage-collected. An implementation is allowed to postpone garbage collection or omit it altogether — it is a matter of implementation quality how garbage collection is implemented, as long as no objects are collected that are still reachable. [...] CPython currently uses a reference-counting scheme with (optional) delayed detection of cyclically linked garbage, which collects most objects as soon as they become unreachable, but is not guaranteed to collect garbage containing circular references.

-- https://docs.python.org/3.5/reference/datamodel.html#objects-values-and-types

(Emphasis mine)

but as it suggests, other implementations may have other behavior. As an example, PyPy has 6 different garbage collection implementations!


For a while, there weren't really other Python implementations; but relying on implementation details is not really Pythonic.
Is it still implementation-specific, or was it standardized already? Not calling __exit__() in such cases sounds like a design flaw.
@jgmjgm It's precisely because of those 3 issues, GC being unpredictable, try/finally being fiddly and the highly common usefulless of cleanup handlers that with solves. The difference between "explicitly closing" and "managing with with" is that the exit handler is called even if an exception is thrown. You could put the close() in a finally clause, but that is not much different from using with instead, a bit messier (3 extra lines instead of 1), and a little harder to get just right.
What I don't get about that is why 'with' would be anymore reliable since it's not explicit either. Is it because the spec says it has to do that its always implemented like that?
@jgmjgm it's more reliable because with foo() as f: [...] is basically the same as f = foo(), f.__enter__(), [...] and f.__exit__() with exceptions handled, so that __exit__ is always called. So the file always gets closed.
E
Eyal Levin

You can use pathlib.

For Python 3.5 and above:

from pathlib import Path
contents = Path(file_path).read_text()

For older versions of Python use pathlib2:

$ pip install pathlib2

Then:

from pathlib2 import Path
contents = Path(file_path).read_text()

This is the actual read_text implementation:

def read_text(self, encoding=None, errors=None):
    """
    Open the file in text mode, read it, and close the file.
    """
    with self.open(mode='r', encoding=encoding, errors=errors) as f:
        return f.read()

I encountered issues with this solution, maybe someone has an answer to my question? Thanks in advance.
K
Kirill

Well, if you have to read file line by line to work with each line, you can use

with open('Path/to/file', 'r') as f:
    s = f.readline()
    while s:
        # do whatever you want to
        s = f.readline()

Or even better way:

with open('Path/to/file') as f:
    for line in f:
        # do whatever you want to

A
Andreas L.

Instead of retrieving the file content as a single string, it can be handy to store the content as a list of all lines the file comprises:

with open('Path/to/file', 'r') as content_file:
    content_list = content_file.read().strip().split("\n")

As can be seen, one needs to add the concatenated methods .strip().split("\n") to the main answer in this thread.

Here, .strip() just removes whitespace and newline characters at the endings of the entire file string, and .split("\n") produces the actual list via splitting the entire file string at every newline character \n.

Moreover, this way the entire file content can be stored in a variable, which might be desired in some cases, instead of looping over the file line by line as pointed out in this previous answer.