存在空闲CUDA前提下报错：RuntimeError: CUDA error: out of memory

时间：2021-09-21 09:59:06 阅读：18 评论：0 收藏：0 [点我收藏+]

问题背景：

最近跑代码时发现报错CUDA out of memory，进入linux终端查看GPU使用情况（nvidia-smi），结果如下：

技术分享图片

我用的GPU序号是0，但这块被人占用了，所以我可以用剩下的3号和4号。

解决方案：

在代码中更改GPU使用序号（修改/添加代码）：

1 import os
2 
3 os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3"
4 args.device = torch.device(‘cuda:{}‘.format(2) if torch.cuda.is_available() else ‘cpu‘)

os是列举出可用的GPU序号, args选择可用的index为2的序号，因此也为2.

可能出现的问题：代码中有些位置没有使用arg.device，而是直接使用model.cuda()，因为此时默认的序号0的GPU被占用，同样会报错：cuda out of memory

解决方法：需要修改代码为model.to(arg.device)

可能

存在空闲CUDA前提下报错：RuntimeError: CUDA error: out of memory

原文：https://www.cnblogs.com/mlblog27/p/15303292.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)