ChatGPT解决这个技术问题 Extra ChatGPT

Extract filename and extension in Bash

I want to get the filename (without extension) and the extension separately.

The best solution I found so far is:

NAME=`echo "$FILE" | cut -d'.' -f1`
EXTENSION=`echo "$FILE" | cut -d'.' -f2`

This is wrong because it doesn't work if the file name contains multiple . characters. If, let's say, I have a.b.js, it will consider a and b.js, instead of a.b and js.

It can be easily done in Python with

file, ext = os.path.splitext(path)

but I'd prefer not to fire up a Python interpreter just for this, if possible.

Any better ideas?

This question explains this bash technique and several other related ones.
When applying the great answers below, do not simply paste in your variable like I show here Wrong: extension="{$filename##*.}" like I did for a while! Move the $ outside the curlys: Right: extension="${filename##*.}"
This is clearly a non-trivial problem and for me it is hard to tell if the answers below are completely correct. It's amazing this is not a built in operation in (ba)sh (answers seem to implement the function using pattern matching). I decided to use Python's os.path.splitext as above instead...
As extension have to represent nature of a file, there is a magic command which check file to divine his nature and offert standard extension. see my answer
The question is problematic in the first place because.. From the perspective of the OS and unix file-systems in general, there is no such thing as a file extension. Using a "." to separate parts is a human convention, that only works as long as humans agree to follow it. For example, with the 'tar' program, it could have been decided to name output files with a "tar." prefix instead of a ".tar" suffix -- Giving "tar.somedir" instead of "somedir.tar". There is no "general, always works" solution because of this--you have to write code that matches your specific needs and expected filenames.

L
Ludovic Kuty

First, get file name without the path:

filename=$(basename -- "$fullfile")
extension="${filename##*.}"
filename="${filename%.*}"

Alternatively, you can focus on the last '/' of the path instead of the '.' which should work even if you have unpredictable file extensions:

filename="${fullfile##*/}"

You may want to check the documentation :

On the web at section "3.5.3 Shell Parameter Expansion"

In the bash manpage at section called "Parameter Expansion"


Check out gnu.org/software/bash/manual/html_node/… for the full feature set.
Add some quotes to "$fullfile", or you'll risk breaking the filename.
Heck, you could even write filename="${fullfile##*/}" and avoid calling an extra basename
This "solution" does not work if the file does not have an extension -- instead, the whole file name is output, which is quite bad considering that files without extensions are omnipresent.
Fix for dealing with file names without extension: extension=$([[ "$filename" = *.* ]] && echo ".${filename##*.}" || echo ''). Note that if an extension is present, it will be returned including the initial ., e.g., .txt.
y
yelliver
~% FILE="example.tar.gz"

~% echo "${FILE%%.*}"
example

~% echo "${FILE%.*}"
example.tar

~% echo "${FILE#*.}"
tar.gz

~% echo "${FILE##*.}"
gz

For more details, see shell parameter expansion in the Bash manual.


You (perhaps unintentionally) bring up the excellent question of what to do if the "extension" part of the filename has 2 dots in it, as in .tar.gz... I've never considered that issue, and I suspect it's not solvable without knowing all the possible valid file extensions up front.
Why not solvable? In my example, it should be considered that the file contains two extensions, not an extension with two dots. You handle both extensions separately.
It is unsolvable on a lexical basis, you'll need to check the file type. Consider if you had a game called dinosaurs.in.tar and you gzipped it to dinosaurs.in.tar.gz :)
This gets more complicated if you are passing in full paths. One of mine had a '.' in a directory in the middle of the path, but none in the file name. Example "a/b.c/d/e/filename" would wind up ".c/d/e/filename"
clearly no x.tar.gz's extension is gz and the filename is x.tar that is it. There is no such thing as dual extensions. i'm pretty sure boost::filesystem handles it that way. (split path, change_extension...) and its behavior is based on python if I'm not mistaken.
T
Tomi Po

Usually you already know the extension, so you might wish to use:

basename filename .extension

for example:

basename /path/to/dir/filename.txt .txt

and we get

filename

That second argument to basename is quite the eye-opener, ty kind sir/madam :)
And how to extract the extension, using this technique? ;) Oh, wait! We actually don't know it upfront.
Say you have a zipped directory that either ends with .zip or .ZIP. Is there a way you could do something like basename $file {.zip,.ZIP}?
While this only answers part of the OPs question, it does answer the question I typed into google. :-) Very slick!
easy and POSIX compliant
t
tripleee

You can use the magic of POSIX parameter expansion:

bash-3.2$ FILENAME=somefile.tar.gz
bash-3.2$ echo "${FILENAME%%.*}"
somefile
bash-3.2$ echo "${FILENAME%.*}"
somefile.tar

There's a caveat in that if your filename was of the form ./somefile.tar.gz then echo ${FILENAME%%.*} would greedily remove the longest match to the . and you'd have the empty string.

(You can work around that with a temporary variable:

FULL_FILENAME=$FILENAME
FILENAME=${FULL_FILENAME##*/}
echo ${FILENAME%%.*}

)

This site explains more.

${variable%pattern}
  Trim the shortest match from the end
${variable##pattern}
  Trim the longest match from the beginning
${variable%%pattern}
  Trim the longest match from the end
${variable#pattern}
  Trim the shortest match from the beginning

Much simpler than Joachim's answer but I always have to look up POSIX variable substitution. Also, this runs on Max OSX where cut doesn't have --complement and sed doesn't have -r.
D
Doctor J

That doesn't seem to work if the file has no extension, or no filename. Here is what I'm using; it only uses builtins and handles more (but not all) pathological filenames.

#!/bin/bash
for fullpath in "$@"
do
    filename="${fullpath##*/}"                      # Strip longest match of */ from start
    dir="${fullpath:0:${#fullpath} - ${#filename}}" # Substring from 0 thru pos of filename
    base="${filename%.[^.]*}"                       # Strip shortest match of . plus at least one non-dot char from end
    ext="${filename:${#base} + 1}"                  # Substring from len of base thru end
    if [[ -z "$base" && -n "$ext" ]]; then          # If we have an extension and no base, it's really the base
        base=".$ext"
        ext=""
    fi

    echo -e "$fullpath:\n\tdir  = \"$dir\"\n\tbase = \"$base\"\n\text  = \"$ext\""
done

And here are some testcases:

$ basename-and-extension.sh / /home/me/ /home/me/file /home/me/file.tar /home/me/file.tar.gz /home/me/.hidden /home/me/.hidden.tar /home/me/.. .
/:
    dir  = "/"
    base = ""
    ext  = ""
/home/me/:
    dir  = "/home/me/"
    base = ""
    ext  = ""
/home/me/file:
    dir  = "/home/me/"
    base = "file"
    ext  = ""
/home/me/file.tar:
    dir  = "/home/me/"
    base = "file"
    ext  = "tar"
/home/me/file.tar.gz:
    dir  = "/home/me/"
    base = "file.tar"
    ext  = "gz"
/home/me/.hidden:
    dir  = "/home/me/"
    base = ".hidden"
    ext  = ""
/home/me/.hidden.tar:
    dir  = "/home/me/"
    base = ".hidden"
    ext  = "tar"
/home/me/..:
    dir  = "/home/me/"
    base = ".."
    ext  = ""
.:
    dir  = ""
    base = "."
    ext  = ""

Instead of dir="${fullpath:0:${#fullpath} - ${#filename}}" I've often seen dir="${fullpath%$filename}". It's simpler to write. Not sure if there is any real speed difference or gotchas.
This uses #!/bin/bash which is almost always wrong. Prefer #!/bin/sh if possible or #!/usr/bin/env bash if not.
@Good Person: I don't know how it's almost always wrong: which bash -> /bin/bash ; perhaps it's your distro?
@vol7ron - on many distros bash is in /usr/local/bin/bash. On OSX many people install a updated bash in /opt/local/bin/bash. As such /bin/bash is wrong and one should use env to find it. Even better is to use /bin/sh and POSIX constructs. Except on solaris this is a POSIX shell.
@GoodPerson but if you are more comfortable with bash, why use sh? Isn't that like saying, why use Perl when you can use sh?
p
paxdiablo
pax> echo a.b.js | sed 's/\.[^.]*$//'
a.b
pax> echo a.b.js | sed 's/^.*\.//'
js

works fine, so you can just use:

pax> FILE=a.b.js
pax> NAME=$(echo "$FILE" | sed 's/\.[^.]*$//')
pax> EXTENSION=$(echo "$FILE" | sed 's/^.*\.//')
pax> echo $NAME
a.b
pax> echo $EXTENSION
js

The commands, by the way, work as follows.

The command for NAME substitutes a "." character followed by any number of non-"." characters up to the end of the line, with nothing (i.e., it removes everything from the final "." to the end of the line, inclusive). This is basically a non-greedy substitution using regex trickery.

The command for EXTENSION substitutes a any number of characters followed by a "." character at the start of the line, with nothing (i.e., it removes everything from the start of the line to the final dot, inclusive). This is a greedy substitution which is the default action.


This break for files without extension as it would print the same for name and extension. So I use sed 's,\.[^\.]*$,,' for name, and sed 's,.*\.,., ;t ;g' for extension (uses the atypical test and get commands, along with the typical substitute command).
You could test, after calculating NAME, if it and FILE are equal, and if so set EXTENSION to the empty string.
Fundamentally, using an external process for something the shell can do itself is an antipattern.
tripleee: there are a great many things that the shell can do in a hundred lines that an external process like awk could do in five :-)
B
Bjarke Freund-Hansen

You can use basename.

Example:

$ basename foo-bar.tar.gz .tar.gz
foo-bar

You do need to provide basename with the extension that shall be removed, however if you are always executing tar with -z then you know the extension will be .tar.gz.

This should do what you want:

tar -zxvf $1
cd $(basename $1 .tar.gz)

I suppose cd $(basename $1 .tar.gz) works for .gz files. But in question he mentioned Archive files have several extensions: tar.gz, tat.xz, tar.bz2
Tomi Po posted the same things 2 years before.
Hi Blauhirn, wauw this is an old questions. I think something have happened to the dates. I distinctively remember answering the question shortly after it was asked, and there where only a couple of other answers. Could it be that the question was merged with another one, does SO do that?
Yep I remember correctly. I originally answers this question stackoverflow.com/questions/14703318/… on the same day it was asked, 2 years later it was merged into this one. I can hardly be blamed for a duplicate answer when my answer was moved in this way.
P
Peter Mortensen

Mellen writes in a comment on a blog post:

Using Bash, there’s also ${file%.*} to get the filename without the extension and ${file##*.} to get the extension alone. That is,

file="thisfile.txt"
echo "filename: ${file%.*}"
echo "extension: ${file##*.}"

Outputs:

filename: thisfile
extension: txt

C
Cyker

No need to bother with awk or sed or even perl for this simple task. There is a pure-Bash, os.path.splitext()-compatible solution which only uses parameter expansions.

Reference Implementation

Documentation of os.path.splitext(path):

Split the pathname path into a pair (root, ext) such that root + ext == path, and ext is empty or begins with a period and contains at most one period. Leading periods on the basename are ignored; splitext('.cshrc') returns ('.cshrc', '').

Python code:

root, ext = os.path.splitext(path)

Bash Implementation

Honoring leading periods

root="${path%.*}"
ext="${path#"$root"}"

Ignoring leading periods

root="${path#.}";root="${path%"$root"}${root%.*}"
ext="${path#"$root"}"

Tests

Here are test cases for the Ignoring leading periods implementation, which should match the Python reference implementation on every input.

|---------------|-----------|-------|
|path           |root       |ext    |
|---------------|-----------|-------|
|' .txt'        |' '        |'.txt' |
|' .txt.txt'    |' .txt'    |'.txt' |
|' txt'         |' txt'     |''     |
|'*.txt.txt'    |'*.txt'    |'.txt' |
|'.cshrc'       |'.cshrc'   |''     |
|'.txt'         |'.txt'     |''     |
|'?.txt.txt'    |'?.txt'    |'.txt' |
|'\n.txt.txt'   |'\n.txt'   |'.txt' |
|'\t.txt.txt'   |'\t.txt'   |'.txt' |
|'a b.txt.txt'  |'a b.txt'  |'.txt' |
|'a*b.txt.txt'  |'a*b.txt'  |'.txt' |
|'a?b.txt.txt'  |'a?b.txt'  |'.txt' |
|'a\nb.txt.txt' |'a\nb.txt' |'.txt' |
|'a\tb.txt.txt' |'a\tb.txt' |'.txt' |
|'txt'          |'txt'      |''     |
|'txt.pdf'      |'txt'      |'.pdf' |
|'txt.tar.gz'   |'txt.tar'  |'.gz'  |
|'txt.txt'      |'txt'      |'.txt' |
|---------------|-----------|-------|

Test Results

All tests passed.


no, the base file name for text.tar.gz should be text and extension be .tar.gz
@frederick99 As I said the solution here matches the implementation of os.path.splitext in Python. Whether that implementation is sane for possibly controversial inputs is another topic.
How do the quotes within the pattern ("$root") work? What could happen if they were omitted? (I couldn't find any documentation on the matter.) Also how does this handle filenames with * or ? in them?
Ok, testing shows me that the quotes make the pattern a literal, i.e. * and ? aren't special. So the two parts of my question answer each other. Am I correct that this isn't documented? Or is this supposed to be understood from the fact that quotes disable glob expansion in general?
Brilliant answer! I’ll just suggest a slightly simpler variant for computing the root: root="${path#?}";root="${path::1}${root%.*}" — then proceed the same to extract the extension.
S
Some programmer dude

You could use the cut command to remove the last two extensions (the ".tar.gz" part):

$ echo "foo.tar.gz" | cut -d'.' --complement -f2-
foo

As noted by Clayton Hughes in a comment, this will not work for the actual example in the question. So as an alternative I propose using sed with extended regular expressions, like this:

$ echo "mpc-1.0.1.tar.gz" | sed -r 's/\.[[:alnum:]]+\.[[:alnum:]]+$//'
mpc-1.0.1

It works by removing the last two (alpha-numeric) extensions unconditionally.

[Updated again after comment from Anders Lindahl]


This only works in the case where the filename/path doesn't contain any other dots: echo "mpc-1.0.1.tar.gz" | cut -d'.' --complement -f2- produces "mpc-1" (just the first 2 fields after delimiting by .)
@ClaytonHughes You're correct, and I should have tested it better. Added another solution.
The sed expressions should use $ to check that the matched extension is at the end of the file name. Otherwise, a filename like i.like.tar.gz.files.tar.bz2 might produce unexpected result.
@AndersLindahl It still will, if the order of the extensions is the reverse of the sed chain order. Even with $ at the end a filename such as mpc-1.0.1.tar.bz2.tar.gz will remove both .tar.gz and then .tar.bz2.
$ echo "foo.tar.gz" | cut -d'.' -f2- WITHOUT --complement will get the 2nd split item to the end of the string $ echo "foo.tar.gz" | cut -d'.' -f2- tar.gz
m
mklement0

The accepted answer works well in typical cases, but fails in edge cases, namely:

For filenames without extension (called suffix in the remainder of this answer), extension=${filename##*.} returns the input filename rather than an empty string.

extension=${filename##*.} does not include the initial ., contrary to convention. Blindly prepending . would not work for filenames without suffix.

Blindly prepending . would not work for filenames without suffix.

filename="${filename%.*}" will be the empty string, if the input file name starts with . and contains no further . characters (e.g., .bash_profile) - contrary to convention.

---------

Thus, the complexity of a robust solution that covers all edge cases calls for a function - see its definition below; it can return all components of a path.

Example call:

splitPath '/etc/bash.bashrc' dir fname fnameroot suffix
# -> $dir == '/etc'
# -> $fname == 'bash.bashrc'
# -> $fnameroot == 'bash'
# -> $suffix == '.bashrc'

Note that the arguments after the input path are freely chosen, positional variable names.
To skip variables not of interest that come before those that are, specify _ (to use throw-away variable $_) or ''; e.g., to extract filename root and extension only, use splitPath '/etc/bash.bashrc' _ _ fnameroot extension.

# SYNOPSIS
#   splitPath path varDirname [varBasename [varBasenameRoot [varSuffix]]] 
# DESCRIPTION
#   Splits the specified input path into its components and returns them by assigning
#   them to variables with the specified *names*.
#   Specify '' or throw-away variable _ to skip earlier variables, if necessary.
#   The filename suffix, if any, always starts with '.' - only the *last*
#   '.'-prefixed token is reported as the suffix.
#   As with `dirname`, varDirname will report '.' (current dir) for input paths
#   that are mere filenames, and '/' for the root dir.
#   As with `dirname` and `basename`, a trailing '/' in the input path is ignored.
#   A '.' as the very first char. of a filename is NOT considered the beginning
#   of a filename suffix.
# EXAMPLE
#   splitPath '/home/jdoe/readme.txt' parentpath fname fnameroot suffix
#   echo "$parentpath" # -> '/home/jdoe'
#   echo "$fname" # -> 'readme.txt'
#   echo "$fnameroot" # -> 'readme'
#   echo "$suffix" # -> '.txt'
#   ---
#   splitPath '/home/jdoe/readme.txt' _ _ fnameroot
#   echo "$fnameroot" # -> 'readme'  
splitPath() {
  local _sp_dirname= _sp_basename= _sp_basename_root= _sp_suffix=
    # simple argument validation
  (( $# >= 2 )) || { echo "$FUNCNAME: ERROR: Specify an input path and at least 1 output variable name." >&2; exit 2; }
    # extract dirname (parent path) and basename (filename)
  _sp_dirname=$(dirname "$1")
  _sp_basename=$(basename "$1")
    # determine suffix, if any
  _sp_suffix=$([[ $_sp_basename = *.* ]] && printf %s ".${_sp_basename##*.}" || printf '')
    # determine basename root (filemane w/o suffix)
  if [[ "$_sp_basename" == "$_sp_suffix" ]]; then # does filename start with '.'?
      _sp_basename_root=$_sp_basename
      _sp_suffix=''
  else # strip suffix from filename
    _sp_basename_root=${_sp_basename%$_sp_suffix}
  fi
  # assign to output vars.
  [[ -n $2 ]] && printf -v "$2" "$_sp_dirname"
  [[ -n $3 ]] && printf -v "$3" "$_sp_basename"
  [[ -n $4 ]] && printf -v "$4" "$_sp_basename_root"
  [[ -n $5 ]] && printf -v "$5" "$_sp_suffix"
  return 0
}

test_paths=(
  '/etc/bash.bashrc'
  '/usr/bin/grep'
  '/Users/jdoe/.bash_profile'
  '/Library/Application Support/'
  'readme.new.txt'
)

for p in "${test_paths[@]}"; do
  echo ----- "$p"
  parentpath= fname= fnameroot= suffix=
  splitPath "$p" parentpath fname fnameroot suffix
  for n in parentpath fname fnameroot suffix; do
    echo "$n=${!n}"
  done
done

Test code that exercises the function:

test_paths=(
  '/etc/bash.bashrc'
  '/usr/bin/grep'
  '/Users/jdoe/.bash_profile'
  '/Library/Application Support/'
  'readme.new.txt'
)

for p in "${test_paths[@]}"; do
  echo ----- "$p"
  parentpath= fname= fnameroot= suffix=
  splitPath "$p" parentpath fname fnameroot suffix
  for n in parentpath fname fnameroot suffix; do
    echo "$n=${!n}"
  done
done

Expected output - note the edge cases:

a filename having no suffix

a filename starting with . (not considered the start of the suffix)

an input path ending in / (trailing / is ignored)

an input path that is a filename only (. is returned as the parent path)

a filename that has more than .-prefixed token (only the last is considered the suffix):

----- /etc/bash.bashrc
parentpath=/etc
fname=bash.bashrc
fnameroot=bash
suffix=.bashrc
----- /usr/bin/grep
parentpath=/usr/bin
fname=grep
fnameroot=grep
suffix=
----- /Users/jdoe/.bash_profile
parentpath=/Users/jdoe
fname=.bash_profile
fnameroot=.bash_profile
suffix=
----- /Library/Application Support/
parentpath=/Library
fname=Application Support
fnameroot=Application Support
suffix=
----- readme.new.txt
parentpath=.
fname=readme.new.txt
fnameroot=readme.new
suffix=.txt

h
henfiber

Here are some alternative suggestions (mostly in awk), including some advanced use cases, like extracting version numbers for software packages.

Just note that with a slightly different input some of these may fail, therefore anyone using these should validate on their expected input and adapt the regex expression if required.

f='/path/to/complex/file.1.0.1.tar.gz'

# Filename : 'file.1.0.x.tar.gz'
    echo "$f" | awk -F'/' '{print $NF}'

# Extension (last): 'gz'
    echo "$f" | awk -F'[.]' '{print $NF}'
    
# Extension (all) : '1.0.1.tar.gz'
    echo "$f" | awk '{sub(/[^.]*[.]/, "", $0)} 1'
    
# Extension (last-2): 'tar.gz'
    echo "$f" | awk -F'[.]' '{print $(NF-1)"."$NF}'

# Basename : 'file'
    echo "$f" | awk '{gsub(/.*[/]|[.].*/, "", $0)} 1'

# Basename-extended : 'file.1.0.1.tar'
    echo "$f" | awk '{gsub(/.*[/]|[.]{1}[^.]+$/, "", $0)} 1'

# Path : '/path/to/complex/'
    echo "$f" | awk '{match($0, /.*[/]/, a); print a[0]}'
    # or 
    echo "$f" | grep -Eo '.*[/]'
    
# Folder (containing the file) : 'complex'
    echo "$f" | awk -F'/' '{$1=""; print $(NF-1)}'
    
# Version : '1.0.1'
    # Defined as 'number.number' or 'number.number.number'
    echo "$f" | grep -Eo '[0-9]+[.]+[0-9]+[.]?[0-9]?'

    # Version - major : '1'
    echo "$f" | grep -Eo '[0-9]+[.]+[0-9]+[.]?[0-9]?' | cut -d. -f1

    # Version - minor : '0'
    echo "$f" | grep -Eo '[0-9]+[.]+[0-9]+[.]?[0-9]?' | cut -d. -f2

    # Version - patch : '1'
    echo "$f" | grep -Eo '[0-9]+[.]+[0-9]+[.]?[0-9]?' | cut -d. -f3

# All Components : "path to complex file 1 0 1 tar gz"
    echo "$f" | awk -F'[/.]' '{$1=""; print $0}'
    
# Is absolute : True (exit-code : 0)
    # Return true if it is an absolute path (starting with '/' or '~/'
    echo "$f" | grep -q '^[/]\|^~/'
 

All use cases are using the original full path as input, without depending on intermediate results.


L
Let Me Tink About It

Smallest and simplest solution (in single line) is:

$ file=/blaabla/bla/blah/foo.txt
echo $(basename ${file%.*}) # foo

That's a useless use of echo. In general, echo $(command) is better written simply command unless you specifically require the shell to perform whitespace tokenization and wildcard expansion on the output from command before displaying the result. Quiz: what's the output of echo $(echo '*') (and if that's what you really want, you really really want just echo *).
@triplee I didn't use echo command at all. I just used it to demonstrate the result foo which is appearing in the 3rd line as the result of the 2nd line.
But just basename "${file%.*}" would do the same; you are using a command substitution to capture its output, only to echo that same output immediately. (Without quoting, the result is nominally different; but that's hardly relevant, much less a feature, here.)
Also basename "$file" .txt avoids the complexity of the parameter substitution.
@Ron Read his first comment before accusing him of wasting our time.
P
Peter Mortensen

I think that if you just need the name of the file, you can try this:

FULLPATH=/usr/share/X11/xorg.conf.d/50-synaptics.conf

# Remove all the prefix until the "/" character
FILENAME=${FULLPATH##*/}

# Remove all the prefix until the "." character
FILEEXTENSION=${FILENAME##*.}

# Remove a suffix, in our case, the filename. This will return the name of the directory that contains this file.
BASEDIRECTORY=${FULLPATH%$FILENAME}

echo "path = $FULLPATH"
echo "file name = $FILENAME"
echo "file extension = $FILEEXTENSION"
echo "base directory = $BASEDIRECTORY"

And that is all =D.


Just wanted BASEDIRECTORY :) Thanks!
S
Sarfraaz Ahmed

You can force cut to display all fields and subsequent ones adding - to field number.

NAME=`basename "$FILE"`
EXTENSION=`echo "$NAME" | cut -d'.' -f2-`

So if FILE is eth0.pcap.gz, the EXTENSION will be pcap.gz

Using the same logic, you can also fetch the file name using '-' with cut as follows :

NAME=`basename "$FILE" | cut -d'.' -f-1`

This works even for filenames that do not have any extension.


P
Peter Mortensen

Magic file recognition

In addition to the lot of good answers on this Stack Overflow question I would like to add:

Under Linux and other unixen, there is a magic command named file, that do filetype detection by analysing some first bytes of file. This is a very old tool, initialy used for print servers (if not created for... I'm not sure about that).

file myfile.txt
myfile.txt: UTF-8 Unicode text

file -b --mime-type myfile.txt
text/plain

Standards extensions could be found in /etc/mime.types (on my Debian GNU/Linux desktop. See man file and man mime.types. Perhaps you have to install the file utility and mime-support packages):

grep $( file -b --mime-type myfile.txt ) </etc/mime.types
text/plain      asc txt text pot brf srt

You could create a function for determining right extension. There is a little (not perfect) sample:

file2ext() {
    local _mimetype=$(file -Lb --mime-type "$1") _line _basemimetype
    case ${_mimetype##*[/.-]} in
        gzip | bzip2 | xz | z )
            _mimetype=${_mimetype##*[/.-]}
            _mimetype=${_mimetype//ip}
            _basemimetype=$(file -zLb --mime-type "$1")
            ;;
        stream )
            _mimetype=($(file -Lb "$1"))
            [ "${_mimetype[1]}" = "compressed" ] &&
                _basemimetype=$(file -b --mime-type - < <(
                        ${_mimetype,,} -d <"$1")) ||
                _basemimetype=${_mimetype,,}
            _mimetype=${_mimetype,,}
            ;;
        executable )  _mimetype='' _basemimetype='' ;;
        dosexec )     _mimetype='' _basemimetype='exe' ;;
        shellscript ) _mimetype='' _basemimetype='sh' ;;
        * )
            _basemimetype=$_mimetype
            _mimetype=''
            ;;
    esac
    while read -a _line ;do
        if [ "$_line" == "$_basemimetype" ] ;then
            [ "$_line[1]" ] &&
                _basemimetype=${_line[1]} ||
                _basemimetype=${_basemimetype##*[/.-]}
            break
        fi
        done </etc/mime.types
    case ${_basemimetype##*[/.-]} in
        executable ) _basemimetype='' ;;
        shellscript ) _basemimetype='sh' ;;
        dosexec ) _basemimetype='exe' ;;
        * ) ;;
    esac
    [ "$_mimetype" ] && [ "$_basemimetype" != "$_mimetype" ] &&
      printf ${2+-v} $2 "%s.%s" ${_basemimetype##*[/.-]} ${_mimetype##*[/.-]} ||
      printf ${2+-v} $2 "%s" ${_basemimetype##*[/.-]}
}

This function could set a Bash variable that can be used later:

(This is inspired from @Petesh right answer):

filename=$(basename "$fullfile")
filename="${filename%.*}"
file2ext "$fullfile" extension

echo "$fullfile -> $filename . $extension"

Although not a direct answer to the original post, this is by far the most sensible response. Thanks for providing it.
I really appreciate this thorough answer highlighting common built-ins. Though I ended up just doing this in python using the -c flag, if I were constrained to using just shell scripting I'd use the concepts outlined herein. Thank you!
@JasonRStevensCFA under python, you will use python-magic library!
@F.Hauri Cool lib, thanks for sharing. I just use standard stuff as the string builtins for scripting are beyond simple. For example, $(python -c "'$1'.split('/')[-1]") will get you the filename with extension from a path string variable $1 using a subshell (I use it like this in some local scripts). I don't use this kind of "magic" in prod but these features of the Python language are fantastic for simple task-based things.
@JasonRStevensCFA Using forks to python, like any other language (perl, awk, etc...) for so tiny requirment is counter-productive! Try to run same fork 1000 time and compare with parameter expansion...
m
miriam
$ F = "text file.test.txt"  
$ echo ${F/*./}  
txt  

This caters for multiple dots and spaces in a filename, however if there is no extension it returns the filename itself. Easy to check for though; just test for the filename and extension being the same.

Naturally this method doesn't work for .tar.gz files. However that could be handled in a two step process. If the extension is gz then check again to see if there is also a tar extension.


very clean and straightforward answer, thanks a lot.
Good solution for filenames without path. Breaks for dotfiles without extension, which you shouldn't get in common cases like for file in *.*; do ... ; done +1
C
Community

Ok so if I understand correctly, the problem here is how to get the name and the full extension of a file that has multiple extensions, e.g., stuff.tar.gz.

This works for me:

fullfile="stuff.tar.gz"
fileExt=${fullfile#*.}
fileName=${fullfile%*.$fileExt}

This will give you stuff as filename and .tar.gz as extension. It works for any number of extensions, including 0. Hope this helps for anyone having the same problem =)


The correct result (according to os.path.splitext, which is what the OP wants) is ('stuff.tar', '.gz').
S
SilverWolf

Simply use ${parameter%word}

In your case:

${FILE%.*}

If you want to test it, all following work, and just remove the extension:

FILE=abc.xyz; echo ${FILE%.*};
FILE=123.abc.xyz; echo ${FILE%.*};
FILE=abc; echo ${FILE%.*};

Why the downvote? It's still useful, although there shouldn't be spaces around the = signs.
This works fine. Thank you! (now it doesn't have the spaces around the equal signs, if that was why it was downvoted)
Won't work for dotfiles and needs quotes.
K
Ken Mueller

This is the only one that worked for me:

path='folder/other_folder/file.js'

base=${path##*/}
echo ${base%.*}

>> file

This can also be used in string interpolation as well, but unfortunately you have to set base beforehand.


J
Joydip Datta

I use the following script

$ echo "foo.tar.gz"|rev|cut -d"." -f3-|rev
foo

This is not efficient at all. To forks too many times which is quite unnecessary since this operation can be performed in pure Bash without the need for any external commands and forking.
D
Dennis

How to extract the filename and extension in fish:

function split-filename-extension --description "Prints the filename and extension"
  for file in $argv
    if test -f $file
      set --local extension (echo $file | awk -F. '{print $NF}')
      set --local filename (basename $file .$extension)
      echo "$filename $extension"
    else
      echo "$file is not a valid file"
    end
  end
end

Caveats: Splits on the last dot, which works well for filenames with dots in them, but not well for extensions with dots in them. See example below.

Usage:

$ split-filename-extension foo-0.4.2.zip bar.tar.gz
foo-0.4.2 zip  # Looks good!
bar.tar gz  # Careful, you probably want .tar.gz as the extension.

There's probably better ways to do this. Feel free to edit my answer to improve it.

If there's a limited set of extensions you'll be dealing with and you know all of them, try this:

switch $file
  case *.tar
    echo (basename $file .tar) tar
  case *.tar.bz2
    echo (basename $file .tar.bz2) tar.bz2
  case *.tar.gz
    echo (basename $file .tar.gz) tar.gz
  # and so on
end

This does not have the caveat as the first example, but you do have to handle every case so it could be more tedious depending on how many extensions you can expect.


P
Peter Mortensen

Here is code with AWK. It can be done more simply. But I am not good in AWK.

filename$ ls
abc.a.txt  a.b.c.txt  pp-kk.txt
filename$ find . -type f | awk -F/ '{print $2}' | rev | awk -F"." '{$1="";print}' | rev | awk 'gsub(" ",".") ,sub(".$", "")'
abc.a
a.b.c
pp-kk
filename$ find . -type f | awk -F/ '{print $2}' | awk -F"." '{print $NF}'
txt
txt
txt

You shouldn't need the first awk statement in the last example, right?
You can avoid piping Awk to Awk by doing another split(). awk -F / '{ n=split($2, a, "."); print a[n] }' uses /` as the top-level delimiter but then splits the second fields on . and prints the last element from the new array.
F
Fravadona

No previous answer used a bash regex Here's a pure bash solution that splits a path into:

The directory path, with its trailing / when present The regex that discards the trailing / is so much longer that I didn't post it

The filename, excluding the (last) dot extension

The (last) dot extension, with its leading .

The code is meant to handle every possible case, you're welcome to try it.

#!/bin/bash

for path; do

####### the relevant part ######

[[ $path =~ ^(\.{1,2}|.*/\.{0,2})$|^(.*/)([^/]+)(\.[^/]*)$|^(.*/)(.+)$|^(.+)(\..*)$|^(.+)$ ]]

dirpath=${BASH_REMATCH[1]}${BASH_REMATCH[2]}${BASH_REMATCH[5]}
filename=${BASH_REMATCH[3]}${BASH_REMATCH[6]}${BASH_REMATCH[7]}${BASH_REMATCH[9]}
filext=${BASH_REMATCH[4]}${BASH_REMATCH[8]}

# dirpath should be non-null
[[ $dirpath ]] || dirpath='.'

################################

printf '%s=%q\n' \
    path     "$path" \
    dirpath  "$dirpath" \
    filename "$filename" \
    filext   "$filext"

done

How does it work?

Basically, it ensures that only one sub-expression (delimited with | in the regex) is able to capture the input. Thanks to that, you can concatenate all the capture groups of the same type (for example, the ones related to the directory path) stored in BASH_REMATCH, because at most one will be non-null.

Here are the results of an extended but not exhaustive set of examples:

+--------------------------------------------------------+
| input             dirpath        filename       filext |
+--------------------------------------------------------+
''                  .              ''             ''
.                   .              ''             ''
..                  ..             ''             ''
...                 .              ..             .
.file               .              .file          ''
.file.              .              .file          .
.file..             .              .file.         .
.file.Z             .              .file          .Z
.file.sh.Z          .              .file.sh       .Z
file                .              file           ''
file.               .              file           .
file..              .              file.          .
file.Z              .              file           .Z
file.sh.Z           .              file.sh        .Z
dir/                dir/           ''             ''
dir/.               dir/.          ''             ''
dir/...             dir/           ..             .
dir/.file           dir/           .file          ''
dir/.file.          dir/           .file          .
dir/.file..         dir/           .file.         .
dir/.file.Z         dir/           .file          .Z
dir/.file.x.Z       dir/           .file.x        .Z
dir/file            dir/           file           ''
dir/file.           dir/           file           .
dir/file..          dir/           file.          .
dir/file.Z          dir/           file           .Z
dir/file.x.Z        dir/           file.x         .Z
dir./.              dir./.         ''             ''
dir./...            dir./          ..             .
dir./.file          dir./          .file          ''
dir./.file.         dir./          .file          .
dir./.file..        dir./          .file.         .
dir./.file.Z        dir./          .file          .Z
dir./.file.sh.Z     dir./          .file.sh       .Z
dir./file           dir./          file           ''
dir./file.          dir./          file           .
dir./file..         dir./          file.          .
dir./file.Z         dir./          file           .Z
dir./file.x.Z       dir./          file.x         .Z
dir//               dir//          ''             ''
dir//.              dir//.         ''             ''
dir//...            dir//          ..             .
dir//.file          dir//          .file          ''
dir//.file.         dir//          .file          .
dir//.file..        dir//          .file.         .
dir//.file.Z        dir//          .file          .Z
dir//.file.x.Z      dir//          .file.x        .Z
dir//file           dir//          file           ''
dir//file.          dir//          file           .
dir//file..         dir//          file.          .
dir//file.Z         dir//          file           .Z
dir//file.x.Z       dir//          file.x         .Z
dir.//.             dir.//.        ''             ''
dir.//...           dir.//         ..             .
dir.//.file         dir.//         .file          ''
dir.//.file.        dir.//         .file          .
dir.//.file..       dir.//         .file.         .
dir.//.file.Z       dir.//         .file          .Z
dir.//.file.x.Z     dir.//         .file.x        .Z
dir.//file          dir.//         file           ''
dir.//file.         dir.//         file           .
dir.//file..        dir.//         file.          .
dir.//file.Z        dir.//         file           .Z
dir.//file.x.Z      dir.//         file.x         .Z
/                   /              ''             ''
/.                  /.             ''             ''
/..                 /..            ''             ''
/...                /              ..             .
/.file              /              .file          ''
/.file.             /              .file          .
/.file..            /              .file.         .
/.file.Z            /              .file          .Z
/.file.sh.Z         /              .file.sh       .Z
/file               /              file           ''
/file.              /              file           .
/file..             /              file.          .
/file.Z             /              file           .Z
/file.sh.Z          /              file.sh        .Z
/dir/               /dir/          ''             ''
/dir/.              /dir/.         ''             ''
/dir/...            /dir/          ..             .
/dir/.file          /dir/          .file          ''
/dir/.file.         /dir/          .file          .
/dir/.file..        /dir/          .file.         .
/dir/.file.Z        /dir/          .file          .Z
/dir/.file.x.Z      /dir/          .file.x        .Z
/dir/file           /dir/          file           ''
/dir/file.          /dir/          file           .
/dir/file..         /dir/          file.          .
/dir/file.Z         /dir/          file           .Z
/dir/file.x.Z       /dir/          file.x         .Z
/dir./.             /dir./.        ''             ''
/dir./...           /dir./         ..             .
/dir./.file         /dir./         .file          ''
/dir./.file.        /dir./         .file          .
/dir./.file..       /dir./         .file.         .
/dir./.file.Z       /dir./         .file          .Z
/dir./.file.sh.Z    /dir./         .file.sh       .Z
/dir./file          /dir./         file           ''
/dir./file.         /dir./         file           .
/dir./file..        /dir./         file.          .
/dir./file.Z        /dir./         file           .Z
/dir./file.x.Z      /dir./         file.x         .Z
/dir//              /dir//         ''             ''
/dir//.             /dir//.        ''             ''
/dir//...           /dir//         ..             .
/dir//.file         /dir//         .file          ''
/dir//.file.        /dir//         .file          .
/dir//.file..       /dir//         .file.         .
/dir//.file.Z       /dir//         .file          .Z
/dir//.file.x.Z     /dir//         .file.x        .Z
/dir//file          /dir//         file           ''
/dir//file.         /dir//         file           .
/dir//file..        /dir//         file.          .
/dir//file.Z        /dir//         file           .Z
/dir//file.x.Z      /dir//         file.x         .Z
/dir.//.            /dir.//.       ''             ''
/dir.//...          /dir.//        ..             .
/dir.//.file        /dir.//        .file          ''
/dir.//.file.       /dir.//        .file          .
/dir.//.file..      /dir.//        .file.         .
/dir.//.file.Z      /dir.//        .file          .Z
/dir.//.file.x.Z    /dir.//        .file.x        .Z
/dir.//file         /dir.//        file           ''
/dir.//file.        /dir.//        file           .
/dir.//file..       /dir.//        file.          .
/dir.//file.Z       /dir.//        file           .Z
/dir.//file.x.Z     /dir.//        file.x         .Z
//                  //             ''             ''
//.                 //.            ''             ''
//..                //..           ''             ''
//...               //             ..             .
//.file             //             .file          ''
//.file.            //             .file          .
//.file..           //             .file.         .
//.file.Z           //             .file          .Z
//.file.sh.Z        //             .file.sh       .Z
//file              //             file           ''
//file.             //             file           .
//file..            //             file.          .
//file.Z            //             file           .Z
//file.sh.Z         //             file.sh        .Z
//dir/              //dir/         ''             ''
//dir/.             //dir/.        ''             ''
//dir/...           //dir/         ..             .
//dir/.file         //dir/         .file          ''
//dir/.file.        //dir/         .file          .
//dir/.file..       //dir/         .file.         .
//dir/.file.Z       //dir/         .file          .Z
//dir/.file.x.Z     //dir/         .file.x        .Z
//dir/file          //dir/         file           ''
//dir/file.         //dir/         file           .
//dir/file..        //dir/         file.          .
//dir/file.Z        //dir/         file           .Z
//dir/file.x.Z      //dir/         file.x         .Z
//dir./.            //dir./.       ''             ''
//dir./...          //dir./        ..             .
//dir./.file        //dir./        .file          ''
//dir./.file.       //dir./        .file          .
//dir./.file..      //dir./        .file.         .
//dir./.file.Z      //dir./        .file          .Z
//dir./.file.sh.Z   //dir./        .file.sh       .Z
//dir./file         //dir./        file           ''
//dir./file.        //dir./        file           .
//dir./file..       //dir./        file.          .
//dir./file.Z       //dir./        file           .Z
//dir./file.x.Z     //dir./        file.x         .Z
//dir//             //dir//        ''             ''
//dir//.            //dir//.       ''             ''
//dir//...          //dir//        ..             .
//dir//.file        //dir//        .file          ''
//dir//.file.       //dir//        .file          .
//dir//.file..      //dir//        .file.         .
//dir//.file.Z      //dir//        .file          .Z
//dir//.file.x.Z    //dir//        .file.x        .Z
//dir//file         //dir//        file           ''
//dir//file.        //dir//        file           .
//dir//file..       //dir//        file.          .
//dir//file.Z       //dir//        file           .Z
//dir//file.x.Z     //dir//        file.x         .Z
//dir.//.           //dir.//.      ''             ''
//dir.//...         //dir.//       ..             .
//dir.//.file       //dir.//       .file          ''
//dir.//.file.      //dir.//       .file          .
//dir.//.file..     //dir.//       .file.         .
//dir.//.file.Z     //dir.//       .file          .Z
//dir.//.file.x.Z   //dir.//       .file.x        .Z
//dir.//file        //dir.//       file           ''
//dir.//file.       //dir.//       file           .
//dir.//file..      //dir.//       file.          .
//dir.//file.Z      //dir.//       file           .Z
//dir.//file.x.Z    //dir.//       file.x         .Z

As you can see, the behaviour is different from basename and dirname. For example basename dir/ outputs dir while the regex will give you an empty filename for it. Same for . and .., those are considered directories, not filenames.

I timed it with 10000 paths of 256 characters and it took about 1 second, while the equivalent POSIX shell solution is 2x slower and solutions based on wild forking (external calls inside the for loop) are at least 60x slower.

remark: It's not necessary to test paths that contain \n or other notorious characters because all characters are handled the same way by the regex engine of bash. The only characters that would be able to break the current logic are / and ., intermixed or multiplied in a currently unexpected way. When I first posted my answer I found a few border cases that I had to fix; I can't say that the regex is 100% bullet proof but it should be quite robust now.

As an aside, here's the POSIX shell solution that yields the same output:

#!/bin/sh

for path; do

####### the relevant part ######

fullname=${path##*/}

case $fullname in
. | ..)
    dirpath="$path"
    filename=''
    filext=''
    ;;
*)
    dirpath=${path%"$fullname"}
    dirpath=${dirpath:-.}       # dirpath should be non-null
    filename=${fullname#.}
    filename="${fullname%"$filename"}${filename%.*}"
    filext=${fullname#"$filename"}
    ;;
esac

################################

printf '%s=%s\n' \
    path     "$path" \
    dirpath  "$dirpath" \
    filename "$filename" \
    filext   "$filext"

done

postscript: There are a few points for which some people might disagree with the results given by the above codes:

The special case of dotfiles: The reason is that dotfiles are a UNIX concept.

The special case of . and ..: IMHO it seems obvious to treat them as directories, but most libraries don't do that and force the user to post-process the result.

No support for double-extensions: That's because you'd need a whole database for storing all the valid double-extensions, and above all, because file extensions don't mean anything in UNIX; for example you can call a tar archive my_tarred_files and that's completely fine, you'll be able to tar xf my_tarred_files without any problem.


C
Community

Building from Petesh answer, if only the filename is needed, both path and extension can be stripped in a single line,

filename=$(basename ${fullname%.*})

Did not work for me: "basename: missing operand Try 'basename --help' for more information."
Strange, are you certain you're using Bash? In my case, with both versions 3.2.25 (old CentOS) and 4.3.30 (Debian Jessie) it works flawlessly.
Maybe there is a space in the filename? Try using filename="$(basename "${fullname%.*}")"
The second argument to basename is optional, but specifies the extension to strip off. The substitution might still be useful but perhaps basename actually isn't, since you can actually perform all of these substitutions with shell builtins.
a
agc

Based largely off of @mklement0's excellent, and chock-full of random, useful bashisms - as well as other answers to this / other questions / "that darn internet"... I wrapped it all up in a little, slightly more comprehensible, reusable function for my (or your) .bash_profile that takes care of what (I consider) should be a more robust version of dirname/basename / what have you..

function path { SAVEIFS=$IFS; IFS=""   # stash IFS for safe-keeping, etc.
    [[ $# != 2 ]] && echo "usage: path <path> <dir|name|fullname|ext>" && return    # demand 2 arguments
    [[ $1 =~ ^(.*/)?(.+)?$ ]] && {     # regex parse the path
        dir=${BASH_REMATCH[1]}
        file=${BASH_REMATCH[2]}
        ext=$([[ $file = *.* ]] && printf %s ${file##*.} || printf '')
        # edge cases for extensionless files and files like ".nesh_profile.coffee"
        [[ $file == $ext ]] && fnr=$file && ext='' || fnr=${file:0:$((${#file}-${#ext}))}
        case "$2" in
             dir) echo      "${dir%/*}"; ;;
            name) echo      "${fnr%.*}"; ;;
        fullname) echo "${fnr%.*}.$ext"; ;;
             ext) echo           "$ext"; ;;
        esac
    }
    IFS=$SAVEIFS
}     

Usage examples...

SOMEPATH=/path/to.some/.random\ file.gzip
path $SOMEPATH dir        # /path/to.some
path $SOMEPATH name       # .random file
path $SOMEPATH ext        # gzip
path $SOMEPATH fullname   # .random file.gzip                     
path gobbledygook         # usage: -bash <path> <dir|name|fullname|ext>

Nicely done; a few suggestions: - You don't seem to be relying on $IFS at all (and if you were, you could use local to localize the effect of setting it). - Better to use local variables. - Your error message should be output to stderr, not stdout (use 1>&2), and you should return a non-zero exit code. - Better to rename fullname to basename (the former suggests a path with dir components). - name unconditionally appends a . (period), even if the original had none. You could simply use the basename utility, but note that it ignores a terminating /.
P
Peter Mortensen

A simple answer:

To expand on the POSIX variables answer, note that you can do more interesting patterns. So for the case detailed here, you could simply do this:

tar -zxvf $1
cd ${1%.tar.*}

That will cut off the last occurrence of .tar..

More generally, if you wanted to remove the last occurrence of .. then

${1.*.*}

should work fine.

The link the above answer appears to be dead. Here's a great explanation of a bunch of the string manipulation you can do directly in Bash, from TLDP.


Is there a way to make the match case-insensitive?
p
phil294

If you also want to allow empty extensions, this is the shortest I could come up with:

echo 'hello.txt' | sed -r 's/.+\.(.+)|.*/\1/' # EXTENSION
echo 'hello.txt' | sed -r 's/(.+)\..+|(.*)/\1\2/' # FILENAME

1st line explained: It matches PATH.EXT or ANYTHING and replaces it with EXT. If ANYTHING was matched, the ext group is not captured.


B
Bruno BEAUFILS

IMHO the best solution has already been given (using shell parameter expansion) and are the best rated one at this time.

I however add this one which just use dumbs commands, which is not efficient and which noone serious should use ever :

FILENAME=$(echo $FILE | cut -d . -f 1-$(printf $FILE | tr . '\n' | wc -l))
EXTENSION=$(echo $FILE | tr . '\n' | tail -1)

Added just for fun :-)


h
historystamp

Here is the algorithm I used for finding the name and extension of a file when I wrote a Bash script to make names unique when names conflicted with respect to casing.

#! /bin/bash 

#
# Finds 
# -- name and extension pairs
# -- null extension when there isn't an extension.
# -- Finds name of a hidden file without an extension
# 

declare -a fileNames=(
  '.Montreal' 
  '.Rome.txt' 
  'Loundon.txt' 
  'Paris' 
  'San Diego.txt'
  'San Francisco' 
  )

echo "Script ${0} finding name and extension pairs."
echo 

for theFileName in "${fileNames[@]}"
do
     echo "theFileName=${theFileName}"  

     # Get the proposed name by chopping off the extension
     name="${theFileName%.*}"

     # get extension.  Set to null when there isn't an extension
     # Thanks to mklement0 in a comment above.
     extension=$([[ "$theFileName" == *.* ]] && echo ".${theFileName##*.}" || echo '')

     # a hidden file without extenson?
     if [ "${theFileName}" = "${extension}" ] ; then
         # hidden file without extension.  Fixup.
         name=${theFileName}
         extension=""
     fi

     echo "  name=${name}"
     echo "  extension=${extension}"
done 

The test run.

$ config/Name\&Extension.bash 
Script config/Name&Extension.bash finding name and extension pairs.

theFileName=.Montreal
  name=.Montreal
  extension=
theFileName=.Rome.txt
  name=.Rome
  extension=.txt
theFileName=Loundon.txt
  name=Loundon
  extension=.txt
theFileName=Paris
  name=Paris
  extension=
theFileName=San Diego.txt
  name=San Diego
  extension=.txt
theFileName=San Francisco
  name=San Francisco
  extension=
$ 

FYI: The complete transliteration program and more test cases can be found here: https://www.dropbox.com/s/4c6m0f2e28a1vxf/avoid-clashes-code.zip?dl=0


From all the solutions this is the only one that returns an empty string when the file has no extension with: extension=$([[ "$theFileName" == *.* ]] && echo ".${theFileName##*.}" || echo '')