![]() We can reproduce this by setting nning_var = None and nning_mean = None or by creating new model with nn.BatchNorm2d(channels, track_running_stats=True). onnx it assumes that batchnorm layers are in training mode if track_running_stats=False even though layers clearly have training attribute set to False. ures=SpareRendererForSitePerProcess 16MiB | sion,SpareRendererForSitePerProcess 57MiB | | 0 N/A N/A 74339 G /usr/bin/gnome-control-center 10MiB | local/share/Steam/ubuntu12_32/steam 6MiB | ![]() ns/audiorelay-0.27.5/bin/AudioRelay 14MiB | | 0 N/A N/A 3740 G /usr/bin/gnome-software 79MiB | | GPU GI CI PID Type Process name GPU Memory | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. Host: brass Kernel: 6.7.86_64 arch: x86_64 bits: 64ĭesktop: GNOME v: 45.4 Distro: Fedora Linux 39 (Workstation Edition) But when the amount of data is very large, an unknown error occurs when the parameters of the training data are placed on CUDA. I think my CUDA settings should be no problem. In pytorch, you should always set can-be-seen GPU device id=0 to be used, because the other can-be-seen GPU devices are only used for distributed computing.įirst of all, I only have one GPU, and it can run normally with a small amount of data. Model = torch.nn.DataParallel(model, device_ids=(1,) ).cuda() I have 4 GPU devices, but I just want to GPU device id=0 and id=2 (get from nvidia-smi) to be seen to CUDA in this python environment, so I will add this into my ~/.bashrc However, the CUDA_VISIBLE_DEVICES specifies the GPU device in your python environment, but the device_ids specifies the GPU usage in your pytorch code. This can easily misleading that the value of device_ids should be the same as CUDA_VISIBLE_DEVICES. If you set ~/.bashrc correctly, another fault could be that: you set your device_ids without the value 0 in your Pytorch code.Maybe you should check this, especially the code export CUDA_VISIBLE_DEVICES=0, which specifies the GPU device id to be seen to CUDA in your python environment.I have the following two suggestions and hope they can be helpful: I am not pretty sure why this come from, because it greatly depends on your code and your configuration. It seems that you want to put your model into GPU, but you didn't set your GPU visible to CUDA (or do it in mistake). Setting the available devices to be zero. changing env variable CUDA_VISIBLE_DEVICES after program start. RuntimeError: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. Weights = torch.FloatTensor(train_dataset.weights).cuda()įile "Anaconda3/envs/pytorch17/lib/python3.7/site-packages/torch/cuda/init.py", line 172, in _lazy_init And the GPU memory viewed with 'nvidia-smi -l 1' is sufficient. Thank you for your attention, the error message is as follows. Would u mind offer more error info ? CUDA unknown error can be caused by a lot of problems. Can someone provide an answer? thank you! Can run under small data volume, this problem occurs with large data volume. CUDA unknown error with large data volume.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |