ChatGPT解决这个技术问题 Extra ChatGPT

How to grep (search) committed code in the Git history

I have deleted a file or some code in a file sometime in the past. Can I grep in the content (not in the commit messages)?

A very poor solution is to grep the log:

git log -p | grep <pattern>

However, this doesn't return the commit hash straight away. I played around with git grep to no avail.

These blog posts by Junio C Hamano (git maintainer) might be interesting for you: * Linus's ultimate content tracking tool (about pickaxe search i.e. git log -S and blame) * [Fun with "git log --grep"][2] (searching commit messages) * [Fun with "git grep"][3] [2]: gitster.livejournal.com/30195.html [3]: gitster.livejournal.com/27674.html
answer from possible duplicate actually works: stackoverflow.com/a/1340245/492
issue with this is that it doesn't give any context to the change.. i.e. who / when
I believe as of 2021, VonC's answer is the only entirely correct one, and well deserves a green checkmark.

N
Nickolay

To search for commit content (i.e., actual lines of source, as opposed to commit messages and the like), you need to do:

git grep <regexp> $(git rev-list --all)

git rev-list --all | xargs git grep <expression> will work if you run into an "Argument list too long" error.

If you want to limit the search to some subtree (for instance, "lib/util"), you will need to pass that to the rev-list subcommand and grep as well:

git grep <regexp> $(git rev-list --all -- lib/util) -- lib/util

This will grep through all your commit text for regexp.

The reason for passing the path in both commands is because rev-list will return the revisions list where all the changes to lib/util happened, but also you need to pass to grep so that it will only search in lib/util.

Just imagine the following scenario: grep might find the same <regexp> on other files which are contained in the same revision returned by rev-list (even if there was no change to that file on that revision).

Here are some other useful ways of searching your source:

Search working tree for text matching regular expression regexp:

git grep <regexp>

Search working tree for lines of text matching regular expression regexp1 or regexp2:

git grep -e <regexp1> [--or] -e <regexp2>

Search working tree for lines of text matching regular expression regexp1 and regexp2, reporting file paths only:

git grep -l -e <regexp1> --and -e <regexp2>

Search working tree for files that have lines of text matching regular expression regexp1 and lines of text matching regular expression regexp2:

git grep -l --all-match -e <regexp1> -e <regexp2>

Search working tree for changed lines of text matching pattern:

git diff --unified=0 | grep <pattern>

Search all revisions for text matching regular expression regexp:

git grep <regexp> $(git rev-list --all)

Search all revisions between rev1 and rev2 for text matching regular expression regexp:

git grep <regexp> $(git rev-list <rev1>..<rev2>)

Thanks, works great! It's sad though that "$(git rev-list --all)" is needed and no convenient switch to specify searching in the whole history of a branch.
Excellent. +1. The GitBook add some details (book.git-scm.com/4_finding_with_git_grep.html), and Junio C Hamano illustrates some of your points: gitster.livejournal.com/27674.html
Unfortunately, I cannot get this going with msysgit-1.7.4. It tells me sh.exe": /bin/git: Bad file number. VonC's answer also works with msysgit.
If you get an "unable to read tree" error when you invoke git grep history with rev-list, you might need to clean things up. Try git gc or check out: stackoverflow.com/questions/1507463/…
Yeah, this seems to fail on Windows as well, alas.
P
Peter Mortensen

You should use the pickaxe (-S) option of git log.

To search for Foo:

git log -SFoo -- path_containing_change
git log -SFoo --since=2009.1.1 --until=2010.1.1 -- path_containing_change

See Git history - find lost line by keyword for more.

As Jakub Narębski commented:

this looks for differences that introduce or remove an instance of . It usually means "revisions where you added or removed line with 'Foo'".

the --pickaxe-regex option allows you to use extended POSIX regex instead of searching for a string. Example (from git log): git log -S"frotz\(nitfol" --pickaxe-regex

As Rob commented, this search is case-sensitive - he opened a follow-up question on how to search case-insensitive.


Thanks, I wasn't aware of this option. Looks like this is the best solution if you're interested in the commit messages and Jeet's solution is most appropriate if you need the traditional UNIX grep behavior of pure line matching.
Combine it with the -p flag to also output the diff.
Is there any way to exclude a all directories matching a specific patterns using git log -S?
@BakaKuna why yes, there sure is (it uses the --format option though, not the -S option): stackoverflow.com/a/21079437/6309
@Anentropic you would need the --branches --all options to search for the all repo.
T
Tyler Holien

My favorite way to do it is with git log's -G option (added in version 1.7.4).

-G<regex>
       Look for differences whose added or removed line matches the given <regex>.

There is a subtle difference between the way the -G and -S options determine if a commit matches:

The -S option essentially counts the number of times your search matches in a file before and after a commit. The commit is shown in the log if the before and after counts are different. This will not, for example, show commits where a line matching your search was moved.

With the -G option, the commit is shown in the log if your search matches any line that was added, removed, or changed.

Take this commit as an example:

diff --git a/test b/test
index dddc242..60a8ba6 100644
--- a/test
+++ b/test
@@ -1 +1 @@
-hello hello
+hello goodbye hello

Because the number of times "hello" appears in the file is the same before and after this commit, it will not match using -Shello. However, since there was a change to a line matching hello, the commit will be shown using -Ghello.


Is there a way to show the matching change context in the git log output?
@Thilo-AlexanderGinkel - I usually just add the -p option to show a diff for each commit. Then when the log is opened in my pager, I search for whatever it is I'm looking for. If your pager is less and you git log -Ghello -p, you can type /hello, press Enter, and use n and N to find the next/previous occurrences of "hello".
I found an interesting issue with -G and Regex: If command line uses UTF-8 and the file you are looking at uses some ISO-Latin (8 bit) encoding, .* fails. For example, I have a change Vierter Entwurf -> Fünfter Entwurf, and while 'V.*ter Entwurf' produces a match, 'F.*ter Entwurf' does not.
P
Peter Mortensen

git log can be a more effective way of searching for text across all branches, especially if there are many matches, and you want to see more recent (relevant) changes first.

git log -p --all -S 'search string'
git log -p --all -G 'match regular expression'

These log commands list commits that add or remove the given search string/regex, (generally) more recent first. The -p option causes the relevant diff to be shown where the pattern was added or removed, so you can see it in context.

Having found a relevant commit that adds the text you were looking for (for example, 8beeff00d), find the branches that contain the commit:

git branch -a --contains 8beeff00d

Hi, these lines don't seem to work at all. My command is > git log -p --all -S 'public string DOB { get; set; } = string.Empty;' and every time I try to run it I get > fatal: ambiguous argument 'string': unknown revision or path not in the working tree. > Use '--' to separate paths from revisions, like this: > 'git [...] -- [...]'
@user216652 For some reason the ' quotes aren't grouping your search string together as a single argument. Instead, 'public is the argument to -S, and it's treating the rest as separate arguments. I'm not sure what environment you're running in, but that context would be necessary to help troubleshoot. I'd suggest opening a separate StackOverflow question if needed to help you troubleshoot, with all the context of how your git command is being sent to the shell. It seems to me that it's getting sent through some other command? Comments here aren't the right place to figure this out.
P
Peter Mortensen

If you want to browse code changes (see what actually has been changed with the given word in the whole history) go for patch mode - I found a very useful combination of doing:

git log -p
# Hit '/' for search mode.
# Type in the word you are searching.
# If the first search is not relevant, hit 'n' for next (like in Vim ;) )

The accepeted solution din't work for me neither the git log -S. This one did!
I think this interactive mode is the most efficient. But how can you get the commit id after you found an occurrence?
@CristianTraìna scroll up and you should see "commit SHA1"
P
Peter Mortensen

Search in any revision, any file (Unix/Linux):

git rev-list --all | xargs git grep <regexp>

Search only in some given files, for example XML files:

git rev-list --all | xargs -I{} git grep <regexp> {} -- "*.xml"

The result lines should look like this: 6988bec26b1503d45eb0b2e8a4364afb87dde7af:bla.xml: text of the line it found...

You can then get more information like author, date, and diff using git show:

git show 6988bec26b1503d45eb0b2e8a4364afb87dde7af

P
Peter Mortensen

I took Jeet's answer and adapted it to Windows (thanks to this answer):

FOR /F %x IN ('"git rev-list --all"') DO @git grep <regex> %x > out.txt

Note that for me, for some reason, the actual commit that deleted this regex did not appear in the output of the command, but rather one commit prior to it.


+1 -- and if you want to avoid hitting "q" after each find, add --no-pager to the git command at the end
Also, I would note that appending to a text file has the added advantage of actually displaying the matching text. (append to a text file using >>results.txt for those not versed in Windows piping...
And I thought bash's syntax is ugly :)
P
Peter Mortensen

For simplicity, I'd suggest using GUI: gitk - The Git repository browser. It's pretty flexible

To search code: To search files: Of course, it also supports regular expressions:

And you can navigate through the results using the up/down arrows.


P
Peter Mortensen

Whenever I find myself at your place, I use the following command line:

git log -S "<words/phrases i am trying to find>" --all --oneline  --graph

Explanation:

git log - Need I write more here; it shows the logs in chronological order. -S "" - It shows all those Git commits where any file (added/modified/deleted) has the words/phrases I am trying to find without '<>' symbols. --all - To enforce and search across all the branches. --oneline - It compresses the Git log in one line. --graph - It creates the graph of chronologically ordered commits.


"Whenever I find myself at your place, I feel the need to use git!"
P
Peter Mortensen

For anyone else trying to do this in Sourcetree, there is no direct command in the UI for it (as of version 1.6.21.0). However, you can use the commands specified in the accepted answer by opening Terminal window (button available in the main toolbar) and copy/pasting them therein.

Note: Sourcetree's Search view can partially do text searching for you. Press Ctrl + 3 to go to Search view (or click Search tab available at the bottom). From far right, set Search type to File Changes and then type the string you want to search. This method has the following limitations compared to the above command:

Sourcetree only shows the commits that contain the search word in one of the changed files. Finding the exact file that contains the search text is again a manual task. RegEx is not supported.


D
Dichen

Inspired by the answer https://stackoverflow.com/a/2929502/6041515, I found git grep seems to search for the full code base at each commit, not just the diffs, to the result tends to be repeating and long. This script below will search only the diffs of each git commit instead:

for commit in $(git rev-list --all); do 
    # search only lines starting with + or -
    if  git show "$commit" | grep "^[+|-].*search-string"; then 
        git show --no-patch --pretty=format:'%C(yellow)%h %Cred%ad %Cblue%an%Cgreen%d %Creset%s' --date=short $commit
    fi  
done

Example output, the bottom git commit is the one that first introduced the change I'm searching for:

csshx$ for commit in $(git rev-list --all); do 
>     if  git show "$commit" | grep "^[+|-].*As csshX is a command line tool"; then 
>         git show --no-patch --pretty=format:'%C(yellow)%h %Cred%ad %Cblue%an%Cgreen%d %Creset%s' --date=short $commit
>     fi  
> done

+As csshX is a command line tool, no special installation is needed. It may
987eb89 2009-03-04 Gavin Brock Added code from initial release

P
Peter Mortensen

I was kind of surprised here and maybe I missed the answer I was looking for, but I came here looking for a search on the heads of all the branches. Not for every revision in the repository, so for me, using git rev-list --all is too much information.

In other words, for me the variation most useful would be

git grep -i searchString $(git branch -r)

or

git branch -r | xargs git grep -i searchString

or

git branch -r | xargs -n1 -i{} git grep -i searchString {}

And, of course, you can try the regular expression approach here. What's cool about the approach here is that it worked against the remote branches directly. I did not have to do a check out on any of these branches.


P
Peter Mortensen

Jeet's answer works in PowerShell.

git grep -n <regex> $(git rev-list --all)

The following displays all files, in any commit, that contain a password.

# Store intermediate result
$result = git grep -n "password" $(git rev-list --all)

# Display unique file names
$result | select -unique { $_ -replace "(^.*?:)|(:.*)", "" }

I like your answer, and can see where it is going, but it's not working on MacOS zsh: parse error near `-unique'`
Okay! I got it working stackoverflow.com/a/69714869/10830091 GOT I HATE BASH
V
Vinita Maloo

Adding more to the answers already present. If you know the file in which you might have made do this:

git log --follow -p -S 'search-string' <file-path>

--follow: lists the history of a file


j
jthill

Okay, twice just today I've seen people wanting a closer equivalent for hg grep, which is like git log -pS but confines its output to just the (annotated) changed lines.

Which I suppose would be handier than /pattern/ in the pager if you're after a quick overview.

So here's a diff-hunk scanner that takes git log --pretty=%h -p output and spits annotated change lines. Put it in diffmarkup.l, say e.g. make ~/bin/diffmarkup, and use it like

git log --pretty=%h -pS pattern | diffmarkup | grep pattern
%option main 8bit nodefault
        // vim: tw=0
%top{
        #define _GNU_SOURCE 1
}
%x commitheader
%x diffheader
%x hunk
%%
        char *afile=0, *bfile=0, *commit=0;
        int aline,aremain,bline,bremain;
        int iline=1;

<hunk>\n        ++iline; if ((aremain+bremain)==0) BEGIN diffheader;
<*>\n   ++iline;

<INITIAL,commitheader,diffheader>^diff.*        BEGIN diffheader;
<INITIAL>.*     BEGIN commitheader; if(commit)free(commit); commit=strdup(yytext);
<commitheader>.*

<diffheader>^(deleted|new|index)" ".*   {}
<diffheader>^"---".*            if (afile)free(afile); afile=strdup(strchrnul(yytext,'/'));
<diffheader>^"+++".*            if (bfile)free(bfile); bfile=strdup(strchrnul(yytext,'/'));
<diffheader,hunk>^"@@ ".*       {
        BEGIN hunk; char *next=yytext+3;
        #define checkread(format,number) { int span; if ( !sscanf(next,format"%n",&number,&span) ) goto lostinhunkheader; next+=span; }
        checkread(" -%d",aline); if ( *next == ',' ) checkread(",%d",aremain) else aremain=1;
        checkread(" +%d",bline); if ( *next == ',' ) checkread(",%d",bremain) else bremain=1;
        break;
        lostinhunkheader: fprintf(stderr,"Lost at line %d, can't parse hunk header '%s'.\n",iline,yytext), exit(1);
        }
<diffheader>. yyless(0); BEGIN INITIAL;

<hunk>^"+".*    printf("%s:%s:%d:%c:%s\n",commit,bfile+1,bline++,*yytext,yytext+1); --bremain;
<hunk>^"-".*    printf("%s:%s:%d:%c:%s\n",commit,afile+1,aline++,*yytext,yytext+1); --aremain;
<hunk>^" ".*    ++aline, ++bline; --aremain; --bremain;
<hunk>. fprintf(stderr,"Lost at line %d, Can't parse hunk.\n",iline), exit(1);

R
Rob Di Marco

So are you trying to grep through older versions of the code looking to see where something last exists?

If I were doing this, I would probably use git bisect. Using bisect, you can specify a known good version, a known bad version, and a simple script that does a check to see if the version is good or bad (in this case a grep to see if the code you are looking for is present). Running this will find when the code was removed.


Yes, but your "test" can be a script that greps for the code and returns "true" if the code exists and "false" if it does not.
Well, what if code was bad in revision 10, become good in revision 11 and become bad again in revision 15...
I agree with Paolo. Binary search is only appropriate for "ordered" values. In the case of git bisect, this means all "good" revisions come before all "bad" revisions, starting from the reference point, but that assumption can't be made when looking for transitory code. This solution might work in some cases, but it isn't a good general purpose solution.
I think this is highly inefficient as the whole tree is checked out multiple times for bisect.
P
Peter Mortensen
git rev-list --all | xargs -n 5 git grep EXPRESSION

is a tweak to Jeet's solution, so it shows results while it searches and not just at the end (which can take a long time in a large repository).


It gives "real-time" results by running git grep on 5 revisions at a time, for anyone who was curious.
P
Peter Mortensen

Scenario: You did a big clean up of your code by using your IDE. Problem: The IDE cleaned up more than it should and now you code does not compile (missing resources, etc.)

Solution:

git grep --cached "text_to_find"

It will find the file where "text_to_find" was changed.

You can now undo this change and compile your code.


o
om-ha

A. Full, unique, sorted, paths:

# Get all unique filepaths of files matching 'password'
# Source: https://stackoverflow.com/a/69714869/10830091
git rev-list --all | (
    while read revision; do
        git grep -F --files-with-matches 'password' $revision | cat | sed "s/[^:]*://"
    done
) | sort | uniq

B. Unique, sorted, filenames (not paths):

# Get all unique filenames matching 'password'
# Source: https://stackoverflow.com/a/69714869/10830091
git rev-list --all | (
    while read revision; do
        git grep -F --files-with-matches 'password' $revision | cat | sed "s/[^:]*://"
    done
) | xargs basename | sort | uniq

This second command is useful for BFG, because it only accept file names and not repo-relative/system-absolute paths.

Check out my full answer here for more explanation.