ChatGPT解决这个技术问题 Extra ChatGPT

How can I inspect the file system of a failed `docker build`?

I'm trying to build a new Docker image for our development process, using cpanm to install a bunch of Perl modules as a base image for various projects.

While developing the Dockerfile, cpanm returns a failure code because some of the modules did not install cleanly.

I'm fairly sure I need to get apt to install some more things.

My question is, where can I find the /.cpanm/work directory quoted in the output, in order to inspect the logs? In the general case, how can I inspect the file system of a failed docker build command?

Morning edit After biting the bullet and running a find I discovered

/var/lib/docker/aufs/diff/3afa404e[...]/.cpanm

Is this reliable, or am I better off building a "bare" container and running stuff manually until I have all the things I need?

about /var/lib/docker/aufs/diff/3afa404e[...]/.cpanm those are internals of Docker and I would not mess with them

A
Alexis Wilke

Everytime docker successfully executes a RUN command from a Dockerfile, a new layer in the image filesystem is committed. Conveniently you can use those layers ids as images to start a new container.

Take the following Dockerfile:

FROM busybox
RUN echo 'foo' > /tmp/foo.txt
RUN echo 'bar' >> /tmp/foo.txt

and build it:

$ docker build -t so-26220957 .
Sending build context to Docker daemon 47.62 kB
Step 1/3 : FROM busybox
 ---> 00f017a8c2a6
Step 2/3 : RUN echo 'foo' > /tmp/foo.txt
 ---> Running in 4dbd01ebf27f
 ---> 044e1532c690
Removing intermediate container 4dbd01ebf27f
Step 3/3 : RUN echo 'bar' >> /tmp/foo.txt
 ---> Running in 74d81cb9d2b1
 ---> 5bd8172529c1
Removing intermediate container 74d81cb9d2b1
Successfully built 5bd8172529c1

You can now start a new container from 00f017a8c2a6, 044e1532c690 and 5bd8172529c1:

$ docker run --rm 00f017a8c2a6 cat /tmp/foo.txt
cat: /tmp/foo.txt: No such file or directory

$ docker run --rm 044e1532c690 cat /tmp/foo.txt
foo

$ docker run --rm 5bd8172529c1 cat /tmp/foo.txt
foo
bar

of course you might want to start a shell to explore the filesystem and try out commands:

$ docker run --rm -it 044e1532c690 sh      
/ # ls -l /tmp
total 4
-rw-r--r--    1 root     root             4 Mar  9 19:09 foo.txt
/ # cat /tmp/foo.txt 
foo

When one of the Dockerfile command fails, what you need to do is to look for the id of the preceding layer and run a shell in a container created from that id:

docker run --rm -it <id_last_working_layer> bash -il

Once in the container:

try the command that failed, and reproduce the issue

then fix the command and test it

finally update your Dockerfile with the fixed command

If you really need to experiment in the actual layer that failed instead of working from the last working layer, see Drew's answer.


when one of the Dockerfile command fails, what you need to do is to look for the id of the preceding layer and run a container with a shell of that id: docker run --rm -it <id_last_working_layer> bash -il and once in the container try the command that failed to reproduce the issue, then fix the command and test it, finally update your Dockerfile with the fixed command.
I thought this wasn't working because it said Unable to find image 'd5219f1ffda9:latest' locally. However, I was confused by the multiple kinds of IDs. It turns out that you have to use the IDs that are directly after the arrows, not the ones that say "Running in ...".
When I run docker build it doesn't give me a hash ID of each layer. I don't see any command options to enable this.
@ADJenks incredibly annoying isn't it! Found the answer here: stackoverflow.com/questions/65614378/… basically you need to change the buildkit to false in the super secret options settings. Perhaps they should put a "Beware of the Leopard" sign over it just to be sure.
Need to add DOCKER_BUILDKIT=0 docker build ...... if you can't see the hash IDs
D
Drew

The top answer works in the case that you want to examine the state immediately prior to the failed command.

However, the question asks how to examine the state of the failed container itself. In my situation, the failed command is a build that takes several hours, so rewinding prior to the failed command and running it again takes a long time and is not very helpful.

The solution here is to find the container that failed:

$ docker ps -a
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                          PORTS               NAMES
6934ada98de6        42e0228751b3        "/bin/sh -c './utils/"   24 minutes ago      Exited (1) About a minute ago                       sleepy_bell

Commit it to an image:

$ docker commit 6934ada98de6
sha256:7015687976a478e0e94b60fa496d319cdf4ec847bcd612aecf869a72336e6b83

And then run the image [if necessary, running bash]:

$ docker run -it 7015687976a4 [bash -il]

Now you are actually looking at the state of the build at the time that it failed, instead of at the time before running the command that caused the failure.


Out of interest, why would you need to create a new image from the container? Why not just start the container? If an image created from the failed container is able to run, then surely the stopped/failed container is also able to run? Or am I missing something?
@nmh Because it allows you to capture and inspect a container in the failed state without having to run the failing command again. Sometimes the failing command takes minutes or longer to execute so this is a convenient way to tag the failed state. For example, I am currently using this approach to inspect the logs of a failed C++ library build that takes several minutes. Edit - Just noticed that Drew said that in [his] situation, the failed command is a build that takes several hours, so rewinding prior to the failed command and running it again takes a long time and is not very helpful.
@nmh I think the issue with trying to start the failed container is that the container's start command normally needs to be changed to be useful. If you tried to start the failed container again it would run the command that failed again, and you'd be back where you started. By creating an image you can start a container with a different start command.
This doesn't work if you're using DOCKER_BUILDKIT=1 to build your Dockerfile
To @nmh's point - you don't need to commit the image if you're just after the build output. You can use docker container cp to extract the file results from the failed build container.
J
Jannis Schönleber

Update for newer docker versions 20.10 onwards

Linux or macOS

DOCKER_BUILDKIT=0 docker build ...

Windows

# Command line
set DOCKER_BUILDKIT=0 docker build ...
# PowerShell
$env:DOCKER_BUILDKIT=0

Use DOCKER_BUILDKIT=0 docker build ... to get the intermediate container hashes as known from older versions.

On newer versions, Buildkit is activated per default. It is recommended to only use it for debugging purposes. Build Kit can make your build faster.

For reference: Buildkit doesn't support intermediate container hashes: https://github.com/moby/buildkit/issues/1053

Thanks to @David Callanan and @MegaCookie for their inputs.


I was suspecting this for a LONG time, your answer nailed it! It also removes intermediate containers during multi-stage.
Or on Windows, run the command set DOCKER_BUILDKIT=0 followed by the docker build ... command.
Or when using PowerShell on Windows: $env:DOCKER_BUILDKIT=0
It now needs to be DOCKER_BUILDKIT=plain, I think.
D
DomQ

Docker caches the entire filesystem state after each successful RUN line.

Knowing that:

to examine the latest state before your failing RUN command, comment it out in the Dockerfile (as well as any and all subsequent RUN commands), then run docker build and docker run again.

to examine the state after the failing RUN command, simply add || true to it to force it to succeed; then proceed like above (keep any and all subsequent RUN commands commented out, run docker build and docker run)

Tada, no need to mess with Docker internals or layer IDs, and as a bonus Docker automatically minimizes the amount of work that needs to be re-done.


This is an especially helpful answer when using DOCKER_BUILDKIT, as buildkit doesn't seem to support the same solutions as those listed above.
m
mikaraento

Debugging build step failures is indeed very annoying.

The best solution I have found is to make sure that each step that does real work succeeds, and adding a check after those that fails. That way you get a committed layer that contains the outputs of the failed step that you can inspect.

A Dockerfile, with an example after the # Run DB2 silent installer line:

#
# DB2 10.5 Client Dockerfile (Part 1)
#
# Requires
#   - DB2 10.5 Client for 64bit Linux ibm_data_server_runtime_client_linuxx64_v10.5.tar.gz
#   - Response file for DB2 10.5 Client for 64bit Linux db2rtcl_nr.rsp 
#
#
# Using Ubuntu 14.04 base image as the starting point.
FROM ubuntu:14.04

MAINTAINER David Carew <carew@us.ibm.com>

# DB2 prereqs (also installing sharutils package as we use the utility uuencode to generate password - all others are required for the DB2 Client) 
RUN dpkg --add-architecture i386 && apt-get update && apt-get install -y sharutils binutils libstdc++6:i386 libpam0g:i386 && ln -s /lib/i386-linux-gnu/libpam.so.0 /lib/libpam.so.0
RUN apt-get install -y libxml2


# Create user db2clnt
# Generate strong random password and allow sudo to root w/o password
#
RUN  \
   adduser --quiet --disabled-password -shell /bin/bash -home /home/db2clnt --gecos "DB2 Client" db2clnt && \
   echo db2clnt:`dd if=/dev/urandom bs=16 count=1 2>/dev/null | uuencode -| head -n 2 | grep -v begin | cut -b 2-10` | chgpasswd && \
   adduser db2clnt sudo && \
   echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers

# Install DB2
RUN mkdir /install
# Copy DB2 tarball - ADD command will expand it automatically
ADD v10.5fp9_linuxx64_rtcl.tar.gz /install/
# Copy response file
COPY  db2rtcl_nr.rsp /install/
# Run  DB2 silent installer
RUN mkdir /logs
RUN (/install/rtcl/db2setup -t /logs/trace -l /logs/log -u /install/db2rtcl_nr.rsp && touch /install/done) || /bin/true
RUN test -f /install/done || (echo ERROR-------; echo install failed, see files in container /logs directory of the last container layer; echo run docker run '<last image id>' /bin/cat /logs/trace; echo ----------)
RUN test -f /install/done

# Clean up unwanted files
RUN rm -fr /install/rtcl

# Login as db2clnt user
CMD su - db2clnt

M
MegaCookie

Currently with the latest docker-desktop, there isn't a way to opt out of the new Buildkit, which doesn't support debugging yet (follow the latest updates on this on this GitHub Thread: https://github.com/moby/buildkit/issues/1472).

First let docker try to build, and find out at which line in your Dockerfile it is failing.

Next, in your Dockerfile you can add a build target at the top: FROM xxx as debug

Then, in your Dockerfile add an additional target FROM xxx as next just one line before the failing command (as you don't want to build that part). Example:

FROM xxx as debug
# Working command
RUN echo "working command"

FROM xxx as next
# Example of failing command
RUN echoo "failing command"

Then run docker build -f Dockerfile --target debug --tag debug .

Next you can run docker run -it debug /bin/sh

You can quit the shell by pressing CTRL P + CTRL Q

If you want to use docker compose build instead of docker build it's possible by adding target: debug in your docker-compose.yml under build.
Then start the container by docker compose run xxxYourServiceNamexxx and use either:

The second top answer to find out how to run a shell inside the container.

Or add ENTRYPOINT /bin/sh before the FROM xxx as next line in your Dockerfile.


This was the only answer here that helped me reproduce my error within my Docker image. Thanks!
A
Alexis Wilke

In my case, I have to have:

DOCKER_BUILDKIT=1 docker build ...

and as mentioned by Jannis Schönleber in his answer, there is currently no debug available in this case (i.e. no intermediate images/containers get created).

What I've found I could do is use the following option:

... --progress=plain ...

and then add various RUN ... or additional lines on existing RUN ... to debug specific commands. This gives you what to me feels like full access (at least if your build is relatively fast).

For example, you could check a variable like so:

RUN echo "Variable NAME = [$NAME]"

If you're wondering whether a file is installed properly, you do:

RUN find /

etc.

In my situation, I had to debug a docker build of a Go application with a private repository and it was quite difficult to do that debugging. I've other details on that here.


This is the proper solution. even works with docker-compose build --progress=plain servicename for me!
Man, this is super useful
s
seanmcl

What I would do is comment out the Dockerfile below and including the offending line. Then you can run the container and run the docker commands by hand, and look at the logs in the usual way. E.g. if the Dockerfile is

RUN foo
RUN bar
RUN baz

and it's dying at bar I would do

RUN foo
# RUN bar
# RUN baz

Then

$ docker build -t foo .
$ docker run -it foo bash
container# bar
...grep logs...

That's what I'd have done too before finding this thread. There are better ways though that don't require re-running the build.
@Aaron. Thanks for reminding me of this answer. I haven't looked at it for a long time. Could you please explain why the accepted answer is better than this one from a practical point of view. I definitely get why Drew's answer is better. It seems the accepted answer still requires re-running.
I actually voted for Drew's answer and not the accepted. They both work without re-running the build. In the accepted answer you can jump into a shell just before the failed command (You might run it again to see the error if it's quick). Or with Drew's answer you can get a shell after the failed command has run (In his case the failed command was long running and left state behind that could be inspected).
D
Don Giulio

my solution would be to see what step failed in the docker file, RUN bundle install in my case,

and change it to

RUN bundle install || cat <path to the file containing the error>

This has the double effect of printing out the reason for the failure, AND this intermediate step is not figured as a failed one by docker build. so it's not deleted, and can be inspected via:

docker run --rm -it <id_last_working_layer> bash -il

in there you can even re run your failed command and test it live.


V
VonC

Still using BuildKit, as in Alexis Wilke's answer, you can use ktock/buildg.

See "Interactive debugger for Dockerfile" from Kohei Tokunaga

buildg is a tool to interactively debug Dockerfile based on BuildKit. Source-level inspection Breakpoints and step execution Interactive shell on a step with your own debugigng tools Based on BuildKit (needs unmerged patches) Supports rootless

Example:

$ buildg.sh debug --image=ubuntu:22.04 /tmp/ctx
WARN[2022-05-09T01:40:21Z] using host network as the default            
#1 [internal] load .dockerignore
#1 transferring context: 2B done
#1 DONE 0.1s

#2 [internal] load build definition from Dockerfile
#2 transferring dockerfile: 195B done
#2 DONE 0.1s

#3 [internal] load metadata for docker.io/library/busybox:latest
#3 DONE 3.0s

#4 [build1 1/2] FROM docker.io/library/busybox@sha256:d2b53584f580310186df7a2055ce3ff83cc0df6caacf1e3489bff8cf5d0af5d8
#4 resolve docker.io/library/busybox@sha256:d2b53584f580310186df7a2055ce3ff83cc0df6caacf1e3489bff8cf5d0af5d8 0.0s done
#4 sha256:50e8d59317eb665383b2ef4d9434aeaa394dcd6f54b96bb7810fdde583e9c2d1 772.81kB / 772.81kB 0.2s done
Filename: "Dockerfile"
      2| RUN echo hello > /hello
      3| 
      4| FROM busybox AS build2
 =>   5| RUN echo hi > /hi
      6| 
      7| FROM scratch
      8| COPY --from=build1 /hello /
>>> break 2
>>> breakpoints
[0]: line 2
>>> continue
#4 extracting sha256:50e8d59317eb665383b2ef4d9434aeaa394dcd6f54b96bb7810fdde583e9c2d1 0.0s done
#4 DONE 0.3s
...