ChatGPT解决这个技术问题 Extra ChatGPT

Use grep --exclude/--include syntax to not grep through certain files

I'm looking for the string foo= in text files in a directory tree. It's on a common Linux machine, I have bash shell:

grep -ircl "foo=" *

In the directories are also many binary files which match "foo=". As these results are not relevant and slow down the search, I want grep to skip searching these files (mostly JPEG and PNG images). How would I do that?

I know there are the --exclude=PATTERN and --include=PATTERN options, but what is the pattern format? The man page of grep says:

--include=PATTERN     Recurse in directories only searching file matching PATTERN.
--exclude=PATTERN     Recurse in directories skip file matching PATTERN.

Searching on grep include, grep include exclude, grep exclude and variants did not find anything relevant

If there's a better way of grepping only in certain files, I'm all for it; moving the offending files is not an option. I can't search only certain directories (the directory structure is a big mess, with everything everywhere). Also, I can't install anything, so I have to do with common tools (like grep or the suggested find).

Just FYI, the arguments used: -c count the matches in file -i case-insensitive -l only show matching files -r recursive
A quicker way to exclude svn dirs is --exclude-dir=.svn, so grep doesn't go into them at all
A couple of pedantic points people may need to know: 1. Note the lack of quotes around the glob here: --exclude='.{png,jpg}' doesn't work (at least with my GNU grep version) because grep doesn't support {} in its globs. The above is shell-expanded to '--exclude=.png --exclude=*.jpg' (assuming no files match in the cwd - highly unlikely since you don't normally start filenames with '--exclude=') which grep likes just fine. 2. --exclude is a GNU extension and not part of POSIX's definition of grep, so if you write scripts using this be aware they won't necessarily run on non-GNU systems.
Full example of exclude-dir usage: grep -r --exclude-dir=var "pattern" .

A
Adam Rosenfield

Use the shell globbing syntax:

grep pattern -r --include=\*.cpp --include=\*.h rootdir

The syntax for --exclude is identical.

Note that the star is escaped with a backslash to prevent it from being expanded by the shell (quoting it, such as --include="*.cpp", would work just as well). Otherwise, if you had any files in the current working directory that matched the pattern, the command line would expand to something like grep pattern -r --include=foo.cpp --include=bar.cpp rootdir, which would only search files named foo.cpp and bar.cpp, which is quite likely not what you wanted.

Update 2021-03-04

I've edited the original answer to remove the use of brace expansion, which is a feature provided by several shells such as Bash and zsh to simplify patterns like this; but note that brace expansion is not POSIX shell-compliant.

The original example was:

grep pattern -r --include=\*.{cpp,h} rootdir

to search through all .cpp and .h files rooted in the directory rootdir.


I don't know why, but I had to quote the include pattern like this: grep pattern -r --include="*.{cpp,h}" rootdir
@topek: Good point -- if you have any .cpp/.h files in your current directory, then the shell will expand the glob before invoking grep, so you'll end up with a command line like grep pattern -r --include=foo.cpp --include=bar.h rootdir, which will only search files named foo.cpp or bar.h. If you don't have any files that match the glob in the current directory, then the shell passes on the glob to grep, which interprets it correctly.
I just realized that the glob is used to only matching the filename. To exclude a whole directory one needs --exclude-dir option. Same rules apply though. Only directory filename is matched, not a path.
--include doesn't seem to work after --exclude. I suppose it doesn't make sense to even try, except that I have an alias to grep with a long list of --exclude and --exclude-dir, which I use for searching code, ignoring libraries and swap files and things. I would've hoped that grep -r --exclude='*.foo' --include='*.bar' would work, so I could limit my alias to --include='*.bar' only, but it seems to ignore the --include and include everything that's not a .foo file. Swapping the order of the --include and --exclude works, but alas, that's not helpful with my alias.
how can we read someone's mind to get rules for this PATTERN. Half of hour I can't find any description of what are they waiting there for
K
KeithWM

If you just want to skip binary files, I suggest you look at the -I (upper case i) option. It ignores binary files. I regularly use the following command:

grep -rI --exclude-dir="\.svn" "pattern" *

It searches recursively, ignores binary files, and doesn't look inside Subversion hidden folders, for whatever pattern I want. I have it aliased as "grepsvn" on my box at work.


--exclude-dir is not available everywhere. my RH box at work with GNU grep 2.5.1 does not have it.
Any suggestions for what to use when --exclude-dir is unavailable? In all my attemps, --exclude does not appear to fit the bill.
You can always download the latest grep source from GNU, and do a 'configure; make; sudo make install'. This is one of the first things I do on a Mac or older Linunx distribution.
Exactly what I needed. Actually, I use git. So, --exclude-dir="\.git". :-)
@IonicăBizău git has a grep wrapper which searches only files that are indexed in your repository: git-scm.com/docs/git-grep
A
Andy Lester

Please take a look at ack, which is designed for exactly these situations. Your example of

grep -ircl --exclude=*.{png,jpg} "foo=" *

is done with ack as

ack -icl "foo="

because ack never looks in binary files by default, and -r is on by default. And if you want only CPP and H files, then just do

ack -icl --cpp "foo="

Looks nice, will try the standalone Perl version next time, thanks.
Good call, I can no longer live without ack.
stackoverflow.com/questions/667471/… - This will allow you to get ack on windows, if that is where you are running grep from.
@Chance Maybe you want silversearcher-ag, just apt-get in Ubuntu :)
Ripgrep can also do this - ignores binary and git ignored files by default. To exclude a filetype, you use rg --type-not cpp, to search only for a filetype you use rg --type cpp. You can download just a single executable and run it.
L
Lii

grep 2.5.3 introduced the --exclude-dir parameter which will work the way you want.

grep -rI --exclude-dir=\.svn PATTERN .

You can also set an environment variable: GREP_OPTIONS="--exclude-dir=\.svn"

I'll second Andy's vote for ack though, it's the best.


+1 for mentioning the exact version number; I have grep 2.5.1 and exclude-dir option is not available
R
Rushabh Mehta

I found this after a long time, you can add multiple includes and excludes like:

grep "z-index" . --include=*.js --exclude=*js/lib/* --exclude=*.min.js

It is better to combine them in a list like : --exclude={pattern1,pattern2,pattern3}
make sure you add the --include/s before any --exclude/s
佚名

The suggested command:

grep -Ir --exclude="*\.svn*" "pattern" *

is conceptually wrong, because --exclude works on the basename. Put in other words, it will skip only the .svn in the current directory.


Yep, it doesn't work at all for me. The one that worked for me was: exclude-dir=.svn
@Nicola thank you! I've been tearing my hair out about why this won't work. Tell me, is there a way to discover this from the manpage? All it says is it matches "PATTERN". EDIT manpage says "file", as explained here fixunix.com/unix/…
d
deric

In grep 2.5.1 you have to add this line to ~/.bashrc or ~/.bash profile

export GREP_OPTIONS="--exclude=\*.svn\*"

A
Aaron Maenpaa

I find grepping grep's output to be very helpful sometimes:

grep -rn "foo=" . | grep -v "Binary file"

Though, that doesn't actually stop it from searching the binary files.


You can use grep -I to skip binary files.
have also done that when i was young... now i know better and when confronted with a problem, first thing is RTFM
grepping grep will remove the color highlights.
O
OnlineCop

If you are not averse to using find, I like its -prune feature:

find [directory] \
        -name "pattern_to_exclude" -prune \
     -o -name "another_pattern_to_exclude" -prune \
     -o -name "pattern_to_INCLUDE" -print0 \
| xargs -0 -I FILENAME grep -IR "pattern" FILENAME

On the first line, you specify the directory you want to search. . (current directory) is a valid path, for example.

On the 2nd and 3rd lines, use "*.png", "*.gif", "*.jpg", and so forth. Use as many of these -o -name "..." -prune constructs as you have patterns.

On the 4th line, you need another -o (it specifies "or" to find), the patterns you DO want, and you need either a -print or -print0 at the end of it. If you just want "everything else" that remains after pruning the *.gif, *.png, etc. images, then use -o -print0 and you're done with the 4th line.

Finally, on the 5th line is the pipe to xargs which takes each of those resulting files and stores them in a variable FILENAME. It then passes grep the -IR flags, the "pattern", and then FILENAME is expanded by xargs to become that list of filenames found by find.

For your particular question, the statement may look something like:

find . \
     -name "*.png" -prune \
     -o -name "*.gif" -prune \
     -o -name "*.svn" -prune \
     -o -print0 | xargs -0 -I FILES grep -IR "foo=" FILES


One amendment I'd suggest: include -false immediately after each -prune so forgetting to use -print0 or some kind of exec command won't actually print the files you wanted to exclude: -name "*.png" -prune -false -o name "*.gif -prune -false ...
a
aesede

On CentOS 6.6/Grep 2.6.3, I have to use it like this:

grep "term" -Hnir --include \*.php --exclude-dir "*excluded_dir*"

Notice the lack of equal signs "=" (otherwise --include, --exclude, include-dir and --exclude-dir are ignored)


k
kenorb

git grep

Use git grep which is optimized for performance and aims to search through certain files.

By default it ignores binary files and it is honoring your .gitignore. If you're not working with Git structure, you can still use it by passing --no-index.

Example syntax:

git grep --no-index "some_pattern"

For more examples, see:

How to exclude certain directories/files from git grep search.

Check if all of multiple strings or regexes exist in a file


4
4D4M

I'm a dilettante, granted, but here's how my ~/.bash_profile looks:

export GREP_OPTIONS="-orl --exclude-dir=.svn --exclude-dir=.cache --color=auto" GREP_COLOR='1;32'

Note that to exclude two directories, I had to use --exclude-dir twice.


Necro comment from the distant dead .... GREP_OPTIONS is now deprecated, so I don't think these answers using that are valid anymore. Hey, I know it's late, but this is news to me. :)
S
Stéphane Laurent

If you search non-recursively you can use glop patterns to match the filenames.

grep "foo" *.{html,txt}

includes html and txt. It searches in the current directory only.

To search in the subdirectories:

   grep "foo" */*.{html,txt}

In the subsubdirectories:

   grep "foo" */*/*.{html,txt}

k
kenorb

In the directories are also many binary files. I can't search only certain directories (the directory structure is a big mess). Is there's a better way of grepping only in certain files?

ripgrep

This is one of the quickest tools designed to recursively search your current directory. It is written in Rust, built on top of Rust's regex engine for maximum efficiency. Check the detailed analysis here.

So you can just run:

rg "some_pattern"

It respect your .gitignore and automatically skip hidden files/directories and binary files.

You can still customize include or exclude files and directories using -g/--glob. Globbing rules match .gitignore globs. Check man rg for help.

For more examples, see: How to exclude some files not matching certain extensions with grep?

On macOS, you can install via brew install ripgrep.


G
Gravstar

Try this one:

$ find . -name "*.txt" -type f -print | xargs file | grep "foo=" | cut -d: -f1

Founded here: http://www.unix.com/shell-programming-scripting/42573-search-files-excluding-binary-files.html


This doesn't work on filenames with spaces, but that problem is easily solved by using print0 instead of print and adding the -0 option to xargs.
A
Andrew Stein

find and xargs are your friends. Use them to filter the file list rather than grep's --exclude

Try something like

find . -not -name '*.png' -o -type f -print | xargs grep -icl "foo="

The advantage of getting used to this, is that it is expandable to other use cases, for example to count the lines in all non-png files:

find . -not -name '*.png' -o -type f -print | xargs wc -l

To remove all non-png files:

find . -not -name '*.png' -o -type f -print | xargs rm

etc.

As pointed out in the comments, if some files may have spaces in their names, use -print0 and xargs -0 instead.


This doesn't work on filenames with spaces, but that problem is easily solved by using print0 instead of print and adding the -0 option to xargs.
佚名

those scripts don't accomplish all the problem...Try this better:

du -ha | grep -i -o "\./.*" | grep -v "\.svn\|another_file\|another_folder" | xargs grep -i -n "$1"

this script is so better, because it uses "real" regular expressions to avoid directories from search. just separate folder or file names with "\|" on the grep -v

enjoy it! found on my linux shell! XD


a
animuson

Look @ this one.

grep --exclude="*\.svn*" -rn "foo=" * | grep -v Binary | grep -v tags

Things that achieve approximately this have been covered in other posts; what's more, this is wrong, in that with various layout options set it will mess up line numbers and things like that or exclude lines of context which were desired.
m
mjs

The --binary-files=without-match option to GNU grep gets it to skip binary files. (Equivalent to the -I switch mentioned elsewhere.)

(This might require a recent version of grep; 2.5.3 has it, at least.)


K
Keith Knauber

suitable for tcsh .alias file:

alias gisrc 'grep -I -r -i --exclude="*\.svn*" --include="*\."{mm,m,h,cc,c} \!* *'

Took me a while to figure out that the {mm,m,h,cc,c} portion should NOT be inside quotes. ~Keith


N
Nakilon

To ignore all binary results from grep

grep -Ri "pattern" * | awk '{if($1 != "Binary") print $0}'

The awk part will filter out all the Binary file foo matches lines


j
jacoz

Try this:

Create a folder named "--F" under currdir ..(or link another folder there renamed to "--F" ie double-minus-F. #> grep -i --exclude-dir="\-\-F" "pattern" *