ChatGPT解决这个技术问题 Extra ChatGPT

How to check if pytorch is using the GPU?

How do I check if pytorch is using the GPU? It's possible to detect with nvidia-smi if there is any activity from the GPU during the process, but I want something written in a python script.

is there a way to get a list of all currently available gpus? something like devices = torch.get_all_devices() # [0, 1, 2] or whatever their name is
See stackoverflow.com/questions/64776822/…: [torch.cuda.device(i) for i in range(torch.cuda.device_count())]
I was told this works list(range(torch.cuda.device_count())). Thanks though!
@CharlieParker, You'd want (assuming you've import torch): devices = [d for d in range(torch.cuda.device_count())] And if you want the names: device_names = [torch.cuda.get_device_name(d) for d in devices] You may, like me, like to map these as dict for cross machine management: device_to_name = dict( device_names, devices )

M
Mateen Ulhaq

These functions should help:

>>> import torch

>>> torch.cuda.is_available()
True

>>> torch.cuda.device_count()
1

>>> torch.cuda.current_device()
0

>>> torch.cuda.device(0)
<torch.cuda.device at 0x7efce0b03be0>

>>> torch.cuda.get_device_name(0)
'GeForce GTX 950M'

This tells us:

CUDA is available and can be used by one device.

Device 0 refers to the GPU GeForce GTX 950M, and it is currently chosen by PyTorch.


I think this just shows that these devices are available on the machine but I'm not sure whether you can get how much memory is being used from each GPU or so..
running torch.cuda.current_device() was helpful for me. It showed that my gpu is unfortunately too old: "Found GPU0 GeForce GTX 760 which is of cuda capability 3.0. PyTorch no longer supports this GPU because it is too old."
torch.cuda.is_available()
@kmario23 Thanks for pointing this out. Is there a function call that gives us that information (how much memory is being used by each GPU) ? :)
@frank Yep, simply this command: $ watch -n 2 nvidia-smi does the job. For more details, please see my answer below.
C
Christoph Rackwitz

As it hasn't been proposed here, I'm adding a method using torch.device, as this is quite handy, also when initializing tensors on the correct device.

# setting device on GPU if available, else CPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)
print()

#Additional Info when using cuda
if device.type == 'cuda':
    print(torch.cuda.get_device_name(0))
    print('Memory Usage:')
    print('Allocated:', round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
    print('Cached:   ', round(torch.cuda.memory_reserved(0)/1024**3,1), 'GB')

Edit: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved. So use memory_cached for older versions.

Output:

Using device: cuda

Tesla K80
Memory Usage:
Allocated: 0.3 GB
Cached:    0.6 GB

As mentioned above, using device it is possible to:

To move tensors to the respective device: torch.rand(10).to(device)

To create a tensor directly on the device: torch.rand(10, device=device)

Which makes switching between CPU and GPU comfortable without changing the actual code.

Edit:

As there has been some questions and confusion about the cached and allocated memory I'm adding some additional information about it:

torch.cuda.max_memory_cached(device=None) Returns the maximum GPU memory managed by the caching allocator in bytes for a given device.

torch.cuda.memory_allocated(device=None) Returns the current GPU memory usage by tensors in bytes for a given device.


You can either directly hand over a device as specified further above in the post or you can leave it None and it will use the current_device().

Additional note: Old graphic cards with Cuda compute capability 3.0 or lower may be visible but cannot be used by Pytorch!
Thanks to hekimgil for pointing this out! - "Found GPU0 GeForce GT 750M which is of cuda capability 3.0. PyTorch no longer supports this GPU because it is too old. The minimum cuda capability that we support is 3.5."


I tried your code, it recognizes the graphics card but the allocated and cached are both 0GB. Is it normal or do I need to configure them?
@KubiK888 If you haven't done any computation before this is perfectly normal. It's also rather unlikely that you can detect the GPU model within PyTorch but not access it. Try doing some computations on GPU and you should see that the values change.
@KubiK888 You have to be consistent, you cannot perform operations across devices. Any operation like my_tensor_on_gpu * my_tensor_on_cpu will fail.
Your answer is great but for the first device assignment line, I would like to point out that just because there is a cuda device available, does not mean that we can use it. For example, I have this in my trusty old computer: Found GPU0 GeForce GT 750M which is of cuda capability 3.0. PyTorch no longer supports this GPU because it is too old. The minimum cuda capability that we support is 3.5.
@CharlieParker I haven't tested this, but I believe you can use torch.cuda.device_count() where list(range(torch.cuda.device_count())) should give you a list over all device indices.
k
kmario23

After you start running the training loop, if you want to manually watch it from the terminal whether your program is utilizing the GPU resources and to what extent, then you can simply use watch as in:

$ watch -n 2 nvidia-smi

This will continuously update the usage stats for every 2 seconds until you press ctrl+c

If you need more control on more GPU stats you might need, you can use more sophisticated version of nvidia-smi with --query-gpu=.... Below is a simple illustration of this:

$ watch -n 3 nvidia-smi --query-gpu=index,gpu_name,memory.total,memory.used,memory.free,temperature.gpu,pstate,utilization.gpu,utilization.memory --format=csv

which would output the stats something like:

https://i.stack.imgur.com/AxUa6.png

Note: There should not be any space between the comma separated query names in --query-gpu=.... Else those values will be ignored and no stats are returned.

Also, you can check whether your installation of PyTorch detects your CUDA installation correctly by doing:

In [13]: import  torch

In [14]: torch.cuda.is_available()
Out[14]: True

True status means that PyTorch is configured correctly and is using the GPU although you have to move/place the tensors with necessary statements in your code.

If you want to do this inside Python code, then look into this module:

https://github.com/jonsafari/nvidia-ml-py or in pypi here: https://pypi.python.org/pypi/nvidia-ml-py/


Just remember that PyTorch uses a cached GPU memory allocator. You might see low GPU-Utill for nividia-smi even if it's fully used.
@JakubBielan thanks! could you please provide a reference for more reading on this?
That watch is useful
Is this only for linux?
nvidia-smi has a flag -l for loop seconds, so you don't have to use watch: nvidia-smi -l 2 Or in milliseconds: nvidia-smi -lms 2000
p
prosti

From practical standpoint just one minor digression:

import torch
dev = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

This dev now knows if cuda or cpu.

And there is a difference in how you deal with models and with tensors when moving to cuda. It is a bit strange at first.

import torch
import torch.nn as nn
dev = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
t1 = torch.randn(1,2)
t2 = torch.randn(1,2).to(dev)
print(t1)  # tensor([[-0.2678,  1.9252]])
print(t2)  # tensor([[ 0.5117, -3.6247]], device='cuda:0')
t1.to(dev)
print(t1)  # tensor([[-0.2678,  1.9252]])
print(t1.is_cuda) # False
t1 = t1.to(dev)
print(t1)  # tensor([[-0.2678,  1.9252]], device='cuda:0')
print(t1.is_cuda) # True

class M(nn.Module):
    def __init__(self):        
        super().__init__()        
        self.l1 = nn.Linear(1,2)

    def forward(self, x):                      
        x = self.l1(x)
        return x
model = M()   # not on cuda
model.to(dev) # is on cuda (all parameters)
print(next(model.parameters()).is_cuda) # True

This all is tricky and understanding it once, helps you to deal fast with less debugging.


also you need at the begning import torch.nn as nn
i
iacob

From the official site's get started page, you can check if the GPU is available for PyTorch like so:

import torch
torch.cuda.is_available()

Reference: PyTorch | Get Started


i
iacob

Query Command Does PyTorch see any GPUs? torch.cuda.is_available() Are tensors stored on GPU by default? torch.rand(10).device Set default tensor type to CUDA: torch.set_default_tensor_type(torch.cuda.FloatTensor) Is this tensor a GPU tensor? my_tensor.is_cuda Is this model stored on the GPU? all(p.is_cuda for p in my_model.parameters())


J
Jadiel de Armas

To check if there is a GPU available:

torch.cuda.is_available()

If the above function returns False,

you either have no GPU, or the Nvidia drivers have not been installed so the OS does not see the GPU, or the GPU is being hidden by the environmental variable CUDA_VISIBLE_DEVICES. When the value of CUDA_VISIBLE_DEVICES is -1, then all your devices are being hidden. You can check that value in code with this line: os.environ['CUDA_VISIBLE_DEVICES']

If the above function returns True that does not necessarily mean that you are using the GPU. In Pytorch you can allocate tensors to devices when you create them. By default, tensors get allocated to the cpu. To check where your tensor is allocated do:

# assuming that 'a' is a tensor created somewhere else
a.device  # returns the device where the tensor is allocated

Note that you cannot operate on tensors allocated in different devices. To see how to allocate a tensor to the GPU, see here: https://pytorch.org/docs/stable/notes/cuda.html


v
vinzee

Almost all answers here reference torch.cuda.is_available(). However, that's only one part of the coin. It tells you whether the GPU (actually CUDA) is available, not whether it's actually being used. In a typical setup, you would set your device with something like this:

device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

but in larger environments (e.g. research) it is also common to give the user more options, so based on input they can disable CUDA, specify CUDA IDs, and so on. In such case, whether or not the GPU is used is not only based on whether it is available or not. After the device has been set to a torch device, you can get its type property to verify whether it's CUDA or not.

if device.type == 'cuda':
    # do something

v
vinzee

Simply from command prompt or Linux environment run the following command.

python -c 'import torch; print(torch.cuda.is_available())'

The above should print True

python -c 'import torch; print(torch.rand(2,3).cuda())'

This one should print the following:

tensor([[0.7997, 0.6170, 0.7042], [0.4174, 0.1494, 0.0516]], device='cuda:0')

v
vinzee

If you are here because your pytorch always gives False for torch.cuda.is_available() that's probably because you installed your pytorch version without GPU support. (Eg: you coded up in laptop then testing on server).

The solution is to uninstall and install pytorch again with the right command from pytorch downloads page. Also refer this pytorch issue.


Even though what you have written is related to the question. The question is: "How to check if pytorch is using the GPU?" and not "What can I do if PyTorch doesn't detect my GPU?" So I would say that this answer does not really belong to this question. But you may find another question about this specific issue where you can share your knowledge. If not you could even write a question and answer it yourself to help others with the same issue!
D
David G.

It is possible for

torch.cuda.is_available()

to return True but to get the following error when running

>>> torch.rand(10).to(device)

as suggested by MBT:

RuntimeError: CUDA error: no kernel image is available for execution on the device

This link explains that

... torch.cuda.is_available only checks whether your driver is compatible with the version of cuda we used in the binary. So it means that CUDA 10.1 is compatible with your driver. But when you do computation with CUDA, it couldn't find the code for your arch.


M
Matteo Pennisi

If you are using Linux I suggest to install nvtop https://github.com/Syllo/nvtop

https://i.stack.imgur.com/7DjOh.png


i
iacob

Create a tensor on the GPU as follows:

$ python
>>> import torch
>>> print(torch.rand(3,3).cuda()) 

Do not quit, open another terminal and check if the python process is using the GPU using:

$ nvidia-smi

I specifically asked for a solution that does not involve nvidia-smi from the command line
Well, technically you can always parse the output any command-line tools, including nvidia-smi.
r
r_k_y

Using the code below

import torch
torch.cuda.is_available()

will only display whether the GPU is present and detected by pytorch or not.

But in the "task manager-> performance" the GPU utilization will be very few percent.

Which means you are actually running using CPU.

To solve the above issue check and change:

Graphics setting --> Turn on Hardware accelerated GPU settings, restart. Open NVIDIA control panel --> Desktop --> Display GPU in the notification area [Note: If you have newly installed windows then you also have to agree the terms and conditions in NVIDIA control panel]

This should work!


The task manager is a very bad way of determining GPU usage actually, see here: stackoverflow.com/questions/69791848/…