ChatGPT解决这个技术问题 Extra ChatGPT

Remove sensitive files and their commits from Git history

I would like to put a Git project on GitHub but it contains certain files with sensitive data (usernames and passwords, like /config/deploy.rb for capistrano).

I know I can add these filenames to .gitignore, but this would not remove their history within Git.

I also don't want to start over again by deleting the /.git directory.

Is there a way to remove all traces of a particular file in your Git history?


B
Black

For all practical purposes, the first thing you should be worried about is CHANGING YOUR PASSWORDS! It's not clear from your question whether your git repository is entirely local or whether you have a remote repository elsewhere yet; if it is remote and not secured from others you have a problem. If anyone has cloned that repository before you fix this, they'll have a copy of your passwords on their local machine, and there's no way you can force them to update to your "fixed" version with it gone from history. The only safe thing you can do is change your password to something else everywhere you've used it.

With that out of the way, here's how to fix it. GitHub answered exactly that question as an FAQ:

Note for Windows users: use double quotes (") instead of singles in this command

git filter-branch --index-filter \
'git update-index --remove PATH-TO-YOUR-FILE-WITH-SENSITIVE-DATA' <introduction-revision-sha1>..HEAD
git push --force --verbose --dry-run
git push --force

Update 2019:

This is the current code from the FAQ:

  git filter-branch --force --index-filter \
  "git rm --cached --ignore-unmatch PATH-TO-YOUR-FILE-WITH-SENSITIVE-DATA" \
  --prune-empty --tag-name-filter cat -- --all
  git push --force --verbose --dry-run
  git push --force

Keep in mind that once you've pushed this code to a remote repository like GitHub and others have cloned that remote repository, you're now in a situation where you're rewriting history. When others try pull down your latest changes after this, they'll get a message indicating that the changes can't be applied because it's not a fast-forward.

To fix this, they'll have to either delete their existing repository and re-clone it, or follow the instructions under "RECOVERING FROM UPSTREAM REBASE" in the git-rebase manpage.

Tip: Execute git rebase --interactive

In the future, if you accidentally commit some changes with sensitive information but you notice before pushing to a remote repository, there are some easier fixes. If you last commit is the one to add the sensitive information, you can simply remove the sensitive information, then run:

git commit -a --amend

That will amend the previous commit with any new changes you've made, including entire file removals done with a git rm. If the changes are further back in history but still not pushed to a remote repository, you can do an interactive rebase:

git rebase -i origin/master

That opens an editor with the commits you've made since your last common ancestor with the remote repository. Change "pick" to "edit" on any lines representing a commit with sensitive information, and save and quit. Git will walk through the changes, and leave you at a spot where you can:

$EDITOR file-to-fix
git commit -a --amend
git rebase --continue

For each change with sensitive information. Eventually, you'll end up back on your branch, and you can safely push the new changes.


[git filter-branch --index-filter 'git update-index --remove filename' ..HEAD] running this didn't rewrite the commit history, on running 'git log' still commit history is present. is there any spl thing to check ?
Got this to work. I was lost in translations. I used the link instead of the command here. Also, Windows command ended up requiring double-quotes as ripper234 mentions, full path as MigDus suggests, and not including the "\" characters that the link pasted as new line wrapping indicators. Final command looked something like: git filter-branch --force --index-filter "git rm --cached --ignore-unmatch src[Project][File].[ext]" --prune-empty --tag-name-filter cat -- --all
There seem to be some substantive differences between your filter-branch code and that in the github page you linked to. E.g their 3rd line --prune-empty --tag-name-filter cat -- --all. Has the solution changed or am I missing something?
This solution looks quite good, but if I've introduced the file to remove in the initial commit <introduction-revision-sha1>..HEAD doesn't work. It only removes the file from the second commit onward. (How do I include the initial commit into the range of commits?) The save way is pointed out here: help.github.com/articles/… git filter-branch --force --index-filter \ 'git rm --cached --ignore-unmatch PATH-TO-YOUR-FILE-WITH-SENSITIVE-DATA' \ --prune-empty --tag-name-filter cat -- --all
I get fatal: refusing to merge unrelated histories
C
Community

Changing your passwords is a good idea, but for the process of removing password's from your repo's history, I recommend the BFG Repo-Cleaner, a faster, simpler alternative to git-filter-branch explicitly designed for removing private data from Git repos.

Create a private.txt file listing the passwords, etc, that you want to remove (one entry per line) and then run this command:

$ java -jar bfg.jar  --replace-text private.txt  my-repo.git

All files under a threshold size (1MB by default) in your repo's history will be scanned, and any matching string (that isn't in your latest commit) will be replaced with the string "***REMOVED***". You can then use git gc to clean away the dead data:

$ git gc --prune=now --aggressive

The BFG is typically 10-50x faster than running git-filter-branch and the options are simplified and tailored around these two common use-cases:

Removing Crazy Big Files

Removing Passwords, Credentials & other Private data

Full disclosure: I'm the author of the BFG Repo-Cleaner.


@Henridv I'm not sure how the accepted answer by natacado differs in that respect from my own answer? Both of our answers specifically address the key sentence of the question: "Is there a way to remove all traces of a particular file in your Git history?" - ie they talk about Git history-rewriting. The issue of /how/ NixNinja /should/ supply passwords to his app isn't mentioned either in his question, or in any of the current answers. As it happens, the BFG specifically addresses the issue of unintended consequences, see rtyley.github.com/bfg-repo-cleaner/#protected-commits
This is a big win right here. After a couple tries, I was able to use this to strip commits containing sensitive information from a private repo very thoroughly and forcefully update the remote repo with the revised history. One side note is that you do have to ensure the tip of your repo (HEAD) is itself clean with no sensitive data as this commit is considered "protected" and won't be revised by this tool. If it isn't, just clean/replace manually and git commit. Otherwise, +1 for new tool in developer's toolbox :)
@Henridv Per my recent comment, it should not break your application as you might anticipate, assuming your application is currently situated at the tip or head of your branch (i.e. latest commit). This tool will explicitly report for your last commit These are your protected commits, and so their contents will NOT be altered while traversing and revising the rest of your commit history. If you needed to rollback, however, then yes you would need to just do a search for ***REMOVED*** in the commit you just rolled back to.
+1 for BFG (if you have Java installed or don't mind installing it). One catch is that BFG refuses to delete a file if it is contained in HEAD. So it's better to first do a commit where the desired files will be deleted and only then run BFG. After that you can revert that last commit, now it doesn't change a thing.
This should actually be accepted as the correct answer. Does what it says on the box!
C
Ciro Santilli Путлер Капут 六四事

If you pushed to GitHub, force pushing is not enough, delete the repository or contact support

Even if you force push one second afterwards, it is not enough as explained below.

The only valid courses of action are:

is what leaked a changeable credential like a password? yes: modify your passwords immediately, and consider using more OAuth and API keys! no (naked pics): do you care if all issues in the repository get nuked? no: delete the repository yes: contact support if the leak is very critical to you, to the point that you are willing to get some repository downtime to make it less likely to leak, make it private while you wait for GitHub support to reply to you

yes: modify your passwords immediately, and consider using more OAuth and API keys!

no (naked pics): do you care if all issues in the repository get nuked? no: delete the repository yes: contact support if the leak is very critical to you, to the point that you are willing to get some repository downtime to make it less likely to leak, make it private while you wait for GitHub support to reply to you

do you care if all issues in the repository get nuked? no: delete the repository yes: contact support if the leak is very critical to you, to the point that you are willing to get some repository downtime to make it less likely to leak, make it private while you wait for GitHub support to reply to you

no: delete the repository

yes: contact support if the leak is very critical to you, to the point that you are willing to get some repository downtime to make it less likely to leak, make it private while you wait for GitHub support to reply to you

contact support

if the leak is very critical to you, to the point that you are willing to get some repository downtime to make it less likely to leak, make it private while you wait for GitHub support to reply to you

Force pushing a second later is not enough because:

GitHub keeps dangling commits for a long time. GitHub staff does have the power to delete such dangling commits if you contact them however. I experienced this first hand when I uploaded all GitHub commit emails to a repo they asked me to take it down, so I did, and they did a gc. Pull requests that contain the data have to be deleted however: that repo data remained accessible up to one year after initial takedown due to this. Dangling commits can be seen either through: the commit web UI: https://github.com/cirosantilli/test-dangling/commit/53df36c09f092bbb59f2faa34eba15cd89ef8e83 (Wayback machine) the API: https://api.github.com/repos/cirosantilli/test-dangling/commits/53df36c09f092bbb59f2faa34eba15cd89ef8e83 (Wayback machine) One convenient way to get the source at that commit then is to use the download zip method, which can accept any reference, e.g.: https://github.com/cirosantilli/myrepo/archive/SHA.zip

the commit web UI: https://github.com/cirosantilli/test-dangling/commit/53df36c09f092bbb59f2faa34eba15cd89ef8e83 (Wayback machine)

the API: https://api.github.com/repos/cirosantilli/test-dangling/commits/53df36c09f092bbb59f2faa34eba15cd89ef8e83 (Wayback machine)

It is possible to fetch the missing SHAs either by: listing API events with type": "PushEvent". E.g. mine: https://api.github.com/users/cirosantilli/events/public (Wayback machine) more conveniently sometimes, by looking at the SHAs of pull requests that attempted to remove the content

listing API events with type": "PushEvent". E.g. mine: https://api.github.com/users/cirosantilli/events/public (Wayback machine)

more conveniently sometimes, by looking at the SHAs of pull requests that attempted to remove the content

There are scrappers like http://ghtorrent.org/ and https://www.githubarchive.org/ that regularly pool GitHub data and store it elsewhere. I could not find if they scrape the actual commit diff, and that is unlikely because there would be too much data, but it is technically possible, and the NSA and friends likely have filters to archive only stuff linked to people or commits of interest.

If you delete the repository instead of just force pushing however, commits do disappear even from the API immediately and give 404, e.g. https://api.github.com/repos/cirosantilli/test-dangling-delete/commits/8c08448b5fbf0f891696819f3b2b2d653f7a3824 This works even if you recreate another repository with the same name.

To test this out, I have created a repo: https://github.com/cirosantilli/test-dangling and did:

git init
git remote add origin git@github.com:cirosantilli/test-dangling.git

touch a
git add .
git commit -m 0
git push

touch b
git add .
git commit -m 1
git push

touch c
git rm b
git add .
git commit --amend --no-edit
git push -f

See also: How to remove a dangling commit from GitHub?

git filter-repo is now officially recommended over git filter-branch

This is mentioned in the manpage of git filter-branch in Git 2.5 itself.

With git filter repo, you could either remove certain files with: Remove folder and its contents from git/GitHub's history

pip install git-filter-repo
git filter-repo --path path/to/remove1 --path path/to/remove2 --invert-paths

This automatically removes empty commits.

Or you can replace certain strings with: How to replace a string in a whole Git history?

git filter-repo --replace-text <(echo 'my_password==>xxxxxxxx')

If the repository is part of a fork network, making the repository private or deleting it may not help and may make the problem worse. Fork networks on GitHub seem to share an internal bare repository, so that commits in one fork are also retrievable through other forks. Making a repository private or deleting it causes a split from the fork network, with the sensitive commits now duplicated in each remaining bare repository. The commits will continue to be accessible through forks until GC has been run on both bare repositories.
J
Jason Goemaat

I recommend this script by David Underhill, worked like a charm for me.

It adds these commands in addition natacado's filter-branch to clean up the mess it leaves behind:

rm -rf .git/refs/original/
git reflog expire --all
git gc --aggressive --prune

Full script (all credit to David Underhill)

#!/bin/bash
set -o errexit

# Author: David Underhill
# Script to permanently delete files/folders from your git repository.  To use 
# it, cd to your repository's root and then run the script with a list of paths
# you want to delete, e.g., git-delete-history path1 path2

if [ $# -eq 0 ]; then
    exit 0
fi

# make sure we're at the root of git repo
if [ ! -d .git ]; then
    echo "Error: must run this script from the root of a git repository"
    exit 1
fi

# remove all paths passed as arguments from the history of the repo
files=$@
git filter-branch --index-filter \
"git rm -rf --cached --ignore-unmatch $files" HEAD

# remove the temporary history git-filter-branch
# otherwise leaves behind for a long time
rm -rf .git/refs/original/ && \
git reflog expire --all && \
git gc --aggressive --prune

The last two commands may work better if changed to the following:

git reflog expire --expire=now --all && \
git gc --aggressive --prune=now

Note that your usage of expire and prune are incorrect, if you don't specify the date then it defaults to all commits older than 2 weeks for prune. What you want is all commits so do: git gc --aggressive --prune=now
@Adam Parkin I'm going to leave the code in the answer the same because it is from the script on David Underhill's site, you could comment there and if he changes it I would change this answer since I really don't know git that well. The expire command prior to the prune doesn't affect that does it?
@MarkusUnterwaditzer: That one won't work for pushed commits.
Maybe you should just put all the commands in your answer; it would be much more consistent and wouldn't require the mental combining of separate posts :)
n
nachoparker

You can use git forget-blob.

The usage is pretty simple git forget-blob file-to-forget. You can get more info here

https://ownyourbits.com/2017/01/18/completely-remove-a-file-from-a-git-repository-with-git-forget-blob/

It will disappear from all the commits in your history, reflog, tags and so on

I run into the same problem every now and then, and everytime I have to come back to this post and others, that's why I automated the process.

Credits to contributors from Stack Overflow that allowed me to put this together


v
vertigo71

Here is my solution in windows

git filter-branch --tree-filter "rm -f 'filedir/filename'" HEAD git push --force

make sure that the path is correct otherwise it won't work

I hope it helps


S
Stephen Rauch

Use filter-branch:

git filter-branch --force --index-filter 'git rm --cached --ignore-unmatch *file_path_relative_to_git_repo*' --prune-empty --tag-name-filter cat -- --all

git push origin *branch_name* -f

l
lostphilosopher

To be clear: The accepted answer is correct. Try it first. However, it may be unnecessarily complex for some use cases, particularly if you encounter obnoxious errors such as 'fatal: bad revision --prune-empty', or really don't care about the history of your repo.

An alternative would be:

cd to project's base branch Remove the sensitive code / file rm -rf .git/ # Remove all git info from your code Go to github and delete your repository Follow this guide to push your code to a new repository as you normally would - https://help.github.com/articles/adding-an-existing-project-to-github-using-the-command-line/

This will of course remove all commit history branches, and issues from both your github repo, and your local git repo. If this is unacceptable you will have to use an alternate approach.

Call this the nuclear option.


E
Ercan

In my android project I had admob_keys.xml as separated xml file in app/src/main/res/values/ folder. To remove this sensitive file I used below script and worked perfectly.

git filter-branch --force --index-filter \
'git rm --cached --ignore-unmatch  app/src/main/res/values/admob_keys.xml' \
--prune-empty --tag-name-filter cat -- --all

b
b01

I've had to do this a few times to-date. Note that this only works on 1 file at a time.

Get a list of all commits that modified a file. The one at the bottom will the the first commit: git log --pretty=oneline --branches -- pathToFile To remove the file from history use the first commit sha1 and the path to file from the previous command, and fill them into this command: git filter-branch --index-filter 'git rm --cached --ignore-unmatch ' -- ..


p
przbadu

So, It looks something like this:

git rm --cached /config/deploy.rb
echo /config/deploy.rb >> .gitignore

Remove cache for tracked file from git and add that file to .gitignore list