ChatGPT解决这个技术问题 Extra ChatGPT

Extracting extension from filename in Python

Is there a function to extract the extension from a filename?


M
Mateen Ulhaq

Use os.path.splitext:

>>> import os
>>> filename, file_extension = os.path.splitext('/path/to/somefile.ext')
>>> filename
'/path/to/somefile'
>>> file_extension
'.ext'

Unlike most manual string-splitting attempts, os.path.splitext will correctly treat /a/b.c/d as having no extension instead of having extension .c/d, and it will treat .bashrc as having no extension instead of having extension .bashrc:

>>> os.path.splitext('/a/b.c/d')
('/a/b.c/d', '')
>>> os.path.splitext('.bashrc')
('.bashrc', '')

the use of basename is a little confusing here since os.path.basename("/path/to/somefile.ext") would return "somefile.ext"
wouldn't endswith() not be more portable and pythonic?
@klingt.net Well, in that case, .asd is really the extension!! If you think about it, foo.tar.gz is a gzip-compressed file (.gz) which happens to be a tar file (.tar). But it is a gzip file in first place. I wouldn't expect it to return the dual extension at all.
The standard Python function naming convention is really annoying - almost every time I re-look this up, I mistake it as being splittext. If they would just do anything to signify the break between parts of this name, it'd be much easier to recognize that it's splitExt or split_ext. Surely I can't be the only person who has made this mistake?
@Vingtoft You mentioned nothing about werkzeug's FileStorage in your comment and this question has nothing about that particular scenario. Something might be wrong with how you are passed the filename. os.path.splitext('somefile.ext') => ('somefile', '.ext'). Feel free provide an actual counter example without referencing some third party library.
j
jeromej

New in version 3.4.

import pathlib

print(pathlib.Path('yourPath.example').suffix) # '.example'
print(pathlib.Path("hello/foo.bar.tar.gz").suffixes) # ['.bar', '.tar', '.gz']

I'm surprised no one has mentioned pathlib yet, pathlib IS awesome!


example for getting .tar.gz: ''.join(pathlib.Path('somedir/file.tar.gz').suffixes)
Great answer. I found this tutorial more useful than the documentation: zetcode.com/python/pathlib
@user3780389 Wouldn't a "foo.bar.tar.gz" still be a valid ".tar.gz"? If so your snippet should be using .suffixes[-2:] to ensure only getting .tar.gz at most.
there are still cases when this does not work as expected like "filename with.a dot inside.tar". This is the solution i am using currently: "".join([s for s in pathlib.Path('somedir/file.tar.gz').suffixes if not " " in s])
@BenLindsay agreed. I find pathlib very convenient.
B
Brian Neal
import os.path
extension = os.path.splitext(filename)[1]

Out of curiosity, why import os.path instead of from os import path?
Oh, I was just wondering if there was a specific reason behind it (other than convention). I'm still learning Python and wanted to learn more!
it depends really, if you use from os import path then the name path is taken up in your local scope, also others looking at the code may not immediately know that path is the path from the os module. Where as if you use import os.path it keeps it within the os namespace and wherever you make the call people know it's path() from the os module immediately.
I know it's not semantically any different, but I personally find the construction _, extension = os.path.splitext(filename) to be much nicer-looking.
If you want the extension as part of a more complex expression the [1] may be more useful: if check_for_gzip and os.path.splitext(filename)[1] == '.gz':
L
LarsH
import os.path
extension = os.path.splitext(filename)[1][1:]

To get only the text of the extension, without the dot.


This will return empty for both file names end with . and file names without an extension.
M
Murat Çorlu

For simple use cases one option may be splitting from dot:

>>> filename = "example.jpeg"
>>> filename.split(".")[-1]
'jpeg'

No error when file doesn't have an extension:

>>> "filename".split(".")[-1]
'filename'

But you must be careful:

>>> "png".split(".")[-1]
'png'    # But file doesn't have an extension

Also will not work with hidden files in Unix systems:

>>> ".bashrc".split(".")[-1]
'bashrc'    # But this is not an extension

For general use, prefer os.path.splitext


This would get upset if you're uploading x.tar.gz
Not actually. Extension of a file named "x.tar.gz" is "gz" not "tar.gz". os.path.splitext gives ".os" as extension too.
can we use [1] rather than [-1]. I could not understand [-1] with split
[-1] to get last item of items that splitted by dot. Example: "my.file.name.js".split('.') => ['my','file','name','js]
@BenjaminR ah ok, you are making an optimisation about result list. ['file', 'tar', 'gz'] with 'file.tar.gz'.split('.') vs ['file.tar', 'gz'] with 'file.tar.gz'.rsplit('.', 1). yeah, could be.
b
blented

worth adding a lower in there so you don't find yourself wondering why the JPG's aren't showing up in your list.

os.path.splitext(filename)[1][1:].strip().lower()

C
Christian Specht

Any of the solutions above work, but on linux I have found that there is a newline at the end of the extension string which will prevent matches from succeeding. Add the strip() method to the end. For example:

import os.path
extension = os.path.splitext(filename)[1][1:].strip() 

To aid my understanding, please could you explain what additional behaviour the second index/slice guards against? (i.e. the [1:] in .splittext(filename)[1][1:]) - thank you in advance
Figured it out for myself: splittext() (unlike if you split a string using '.') includes the '.' character in the extension. The additional [1:] gets rid of it.
r
r3t40

You can find some great stuff in pathlib module (available in python 3.x).

import pathlib
x = pathlib.PurePosixPath("C:\\Path\\To\\File\\myfile.txt").suffix
print(x)

# Output 
'.txt'

Using PosixPath for a windows path is wrong.
C
Community

With splitext there are problems with files with double extension (e.g. file.tar.gz, file.tar.bz2, etc..)

>>> fileName, fileExtension = os.path.splitext('/path/to/somefile.tar.gz')
>>> fileExtension 
'.gz'

but should be: .tar.gz

The possible solutions are here


do it twice to get the 2 extensions ?
@maazza yep. gunzip somefile.tar.gz what's the output filename?
This is why we have the extension 'tgz' which means: tar+gzip ! :D
@FlipMcF The filename should obviously be somefile.tar. For tar -xzvf somefile.tar.gz the filename should be somefile.
@peterhil I don't think you want your python script to be aware of the application used to create the filename. It's a bit out of scope of the question. Don't pick on the example, 'filename.csv.gz' is also quite valid.
w
weiyixie

Although it is an old topic, but i wonder why there is none mentioning a very simple api of python called rpartition in this case:

to get extension of a given file absolute path, you can simply type:

filepath.rpartition('.')[-1]

example:

path = '/home/jersey/remote/data/test.csv'
print path.rpartition('.')[-1]

will give you: 'csv'


For those not familiar with the API, rpartition returns a tuple: ("string before the right-most occurrence of the separator", "the separator itself", "the rest of the string"). If there's no separator found, the returned tuple will be: ("", "", "the original string").
A
Alex

Just join all pathlib suffixes.

>>> x = 'file/path/archive.tar.gz'
>>> y = 'file/path/text.txt'
>>> ''.join(pathlib.Path(x).suffixes)
'.tar.gz'
>>> ''.join(pathlib.Path(y).suffixes)
'.txt'

P
PascalVKooten

Surprised this wasn't mentioned yet:

import os
fn = '/some/path/a.tar.gz'

basename = os.path.basename(fn)  # os independent
Out[] a.tar.gz

base = basename.split('.')[0]
Out[] a

ext = '.'.join(basename.split('.')[1:])   # <-- main part

# if you want a leading '.', and if no result `None`:
ext = '.' + ext if ext else None
Out[] .tar.gz

Benefits:

Works as expected for anything I can think of

No modules

No regex

Cross-platform

Easily extendible (e.g. no leading dots for extension, only last part of extension)

As function:

def get_extension(filename):
    basename = os.path.basename(filename)  # os independent
    ext = '.'.join(basename.split('.')[1:])
    return '.' + ext if ext else None

This results in an exception when the file doesn't have any extension.
This answer absolutely ignore a variant if a filename contains many points in name. Example get_extension('cmocka-1.1.0.tar.xz') => '.1.0.tar.xz' - wrong.
@PADYMKO, IMHO one should not create filenames with full stops as part of the filename. The code above is not supposed to result in 'tar.xz'
Just change to [-1] then.
A
Aleks Andreev

You can use a split on a filename:

f_extns = filename.split(".")
print ("The extension of the file is : " + repr(f_extns[-1]))

This does not require additional library


s
staytime
filename='ext.tar.gz'
extension = filename[filename.rfind('.'):]

This results in the last char of filename being returned if the filename has no . at all. This is because rfind returns -1 if the string is not found.
D
DS_ShraShetty

Extracting extension from filename in Python

Python os module splitext()

splitext() function splits the file path into a tuple having two values – root and extension.

import os
# unpacking the tuple
file_name, file_extension = os.path.splitext("/Users/Username/abc.txt")
print(file_name)
print(file_extension)

Get File Extension using Pathlib Module

Pathlib module to get the file extension

import pathlib
pathlib.Path("/Users/pankaj/abc.txt").suffix
#output:'.txt'

K
Kenstars

This is a direct string representation techniques : I see a lot of solutions mentioned, but I think most are looking at split. Split however does it at every occurrence of "." . What you would rather be looking for is partition.

string = "folder/to_path/filename.ext"
extension = string.rpartition(".")[-1]

rpartition was already suggested by @weiyixie.
A
Arnaldo P. Figueira Figueira

Another solution with right split:

# to get extension only

s = 'test.ext'

if '.' in s: ext = s.rsplit('.', 1)[1]

# or, to get file name and extension

def split_filepath(s):
    """
    get filename and extension from filepath 
    filepath -> (filename, extension)
    """
    if not '.' in s: return (s, '')
    r = s.rsplit('.', 1)
    return (r[0], r[1])

E
Execuday

Even this question is already answered I'd add the solution in Regex.

>>> import re
>>> file_suffix = ".*(\..*)"
>>> result = re.search(file_suffix, "somefile.ext")
>>> result.group(1)
'.ext'

Or \.[0-9a-z]+$ as in this post.
J
Jeremy Wiebe

you can use following code to split file name and extension.

    import os.path
    filenamewithext = os.path.basename(filepath)
    filename, ext = os.path.splitext(filenamewithext)
    #print file name
    print(filename)
    #print file extension
    print(ext)

G
Georgy

A true one-liner, if you like regex. And it doesn't matter even if you have additional "." in the middle

import re

file_ext = re.search(r"\.([^.]+)$", filename).group(1)

See here for the result: Click Here


I
Ibnul Husainan

try this:

files = ['file.jpeg','file.tar.gz','file.png','file.foo.bar','file.etc']
pen_ext = ['foo', 'tar', 'bar', 'etc']

for file in files: #1
    if (file.split(".")[-2] in pen_ext): #2
        ext =  file.split(".")[-2]+"."+file.split(".")[-1]#3
    else:
        ext = file.split(".")[-1] #4
    print (ext) #5

get all file name inside the list splitting file name and check the penultimate extension, is it in the pen_ext list or not? if yes then join it with the last extension and set it as the file's extension if not then just put the last extension as the file's extension and then check it out


This breaks for a bunch of special cases. See the accepted answer. It's reinventing the wheel, only in a buggy way.
Hello! While this code may solve the question, including an explanation of how and why this solves the problem would really help to improve the quality of your post, and probably result in more up-votes. Remember that you are answering the question for readers in the future, not just the person asking now. Please edit your answer to add explanations and give an indication of what limitations and assumptions apply.
@Brian like that?
You're only making it worse, breaking it in new ways. foo.tar is a valid file name. What happens if I throw that at your code? What about .bashrc or foo? There is a library function for this for a reason...
just create a list of extension file for the penultimate extension, if not in list then just put the last extension as the file's extension
c
cng.buff

You can use endswith to identify the file extension in python

like bellow example

for file in os.listdir():
    if file.endswith('.csv'):
        df1 =pd.read_csv(file)
        frames.append(df1)
        result = pd.concat(frames)

e
eatmeimadanish

For funsies... just collect the extensions in a dict, and track all of them in a folder. Then just pull the extensions you want.

import os

search = {}

for f in os.listdir(os.getcwd()):
    fn, fe = os.path.splitext(f)
    try:
        search[fe].append(f)
    except:
        search[fe]=[f,]

extensions = ('.png','.jpg')
for ex in extensions:
    found = search.get(ex,'')
    if found:
        print(found)

That's a terrible idea. Your code breaks for any file extension you haven't previously added!
I
Import Error

This method will require a dictonary, list, or set. you can just use ".endswith" using built in string methods. This will search for name in list at end of file and can be done with just str.endswith(fileName[index]). This is more for getting and comparing extensions.

https://docs.python.org/3/library/stdtypes.html#string-methods

Example 1:

dictonary = {0:".tar.gz", 1:".txt", 2:".exe", 3:".js", 4:".java", 5:".python", 6:".ruby",7:".c", 8:".bash", 9:".ps1", 10:".html", 11:".html5", 12:".css", 13:".json", 14:".abc"} 
for x in dictonary.values():
    str = "file" + x
    str.endswith(x, str.index("."), len(str))

Example 2:

set1 = {".tar.gz", ".txt", ".exe", ".js", ".java", ".python", ".ruby", ".c", ".bash", ".ps1", ".html", ".html5", ".css", ".json", ".abc"}
for x in set1:
   str = "file" + x
   str.endswith(x, str.index("."), len(str))

Example 3:

fileName = [".tar.gz", ".txt", ".exe", ".js", ".java", ".python", ".ruby", ".c", ".bash", ".ps1", ".html", ".html5", ".css", ".json", ".abc"];
for x in range(0, len(fileName)):
    str = "file" + fileName[x]
    str.endswith(fileName[x], str.index("."), len(str))

Example 4

fileName = [".tar.gz", ".txt", ".exe", ".js", ".java", ".python", ".ruby", ".c", ".bash", ".ps1", ".html", ".html5", ".css", ".json", ".abc"];
str = "file.txt"
str.endswith(fileName[1], str.index("."), len(str))

https://i.stack.imgur.com/lmynI.png

Example 8

fileName = [".tar.gz", ".txt", ".exe", ".js", ".java", ".python", ".ruby", ".c", ".bash", ".ps1", ".html", ".html5", ".css", ".json", ".abc"];
exts = []
str = "file.txt"
for x in range(0, len(x)):
    if str.endswith(fileName[1]) == 1:
         exts += [x]
     

S
Sk8erPeter
# try this, it works for anything, any length of extension
# e.g www.google.com/downloads/file1.gz.rs -> .gz.rs

import os.path

class LinkChecker:

    @staticmethod
    def get_link_extension(link: str)->str:
        if link is None or link == "":
            return ""
        else:
            paths = os.path.splitext(link)
            ext = paths[1]
            new_link = paths[0]
            if ext != "":
                return LinkChecker.get_link_extension(new_link) + ext
            else:
                return ""

m
main--
def NewFileName(fichier):
    cpt = 0
    fic , *ext =  fichier.split('.')
    ext = '.'.join(ext)
    while os.path.isfile(fichier):
        cpt += 1
        fichier = '{0}-({1}).{2}'.format(fic, cpt, ext)
    return fichier

R
Ripon Kumar Saha

This is The Simplest Method to get both Filename & Extension in just a single line.

fName, ext = 'C:/folder name/Flower.jpeg'.split('/')[-1].split('.')

>>> print(fName)
Flower
>>> print(ext)
jpeg

Unlike other solutions, you don't need to import any package for this.


this doesnt work for all files or types for example 'archive.tar.gz
l
lendoo
a = ".bashrc"
b = "text.txt"
extension_a = a.split(".")
extension_b = b.split(".")
print(extension_a[-1])  # bashrc
print(extension_b[-1])  # txt

Please add explanation of the code, rather than simply just the code snippets.
t
the Tin Man
name_only=file_name[:filename.index(".")

That will give you the file name up to the first ".", which would be the most common.


first, he needs not the name, but extension. Second, even if he would need name, it would be wrong by files like: file.name.ext
As mentioned by @ya_dimon, this wont work for files names with dots. Plus, he needs the extension!