OpenGL without X.org in linux

Update (Sep. 17, 2017): NVIDIA recently published an article detailing how to use OpenGL on headless systems, which is a very similar use case as the question describes. In summary: Link to libOpenGL.so and libEGL.so instead of libGL.so. (Your linker options should therefore be -lOpenGL -lEGL Call eglGetDisplay, then eglInitialize to initialize EGL. Call eglChooseConfig … Read more

Forcing NVIDIA GPU programmatically in Optimus laptops

According to http://developer.download.nvidia.com/devzone/devcenter/gamegraphics/files/OptimusRenderingPolicies.pdf starting from 302 drivers it is enough to link statically with one of the following libraries: vcamp110.dll, vcamp110d.dll, nvapi.dll, nvapi64.dll, opencl.dll, nvcuda.dll, cudart*.*, or to export a NvOptimusEnablement variable in your program: extern “C” { _declspec(dllexport) DWORD NvOptimusEnablement = 0x00000001; }

CUDA determining threads per block, blocks per grid

In general you want to size your blocks/grid to match your data and simultaneously maximize occupancy, that is, how many threads are active at one time. The major factors influencing occupancy are shared memory usage, register usage, and thread block size. A CUDA enabled GPU has its processing capability split up into SMs (streaming multiprocessors), … Read more

How do I check if PyTorch is using the GPU?

These functions should help: >>> import torch >>> torch.cuda.is_available() True >>> torch.cuda.device_count() 1 >>> torch.cuda.current_device() 0 >>> torch.cuda.device(0) <torch.cuda.device at 0x7efce0b03be0> >>> torch.cuda.get_device_name(0) ‘GeForce GTX 950M’ This tells us: CUDA is available and can be used by one device. Device 0 refers to the GPU GeForce GTX 950M, and it is currently chosen by PyTorch.

How do I select which GPU to run a job on?

The problem was caused by not setting the CUDA_VISIBLE_DEVICES variable within the shell correctly. To specify CUDA device 1 for example, you would set the CUDA_VISIBLE_DEVICES using export CUDA_VISIBLE_DEVICES=1 or CUDA_VISIBLE_DEVICES=1 ./cuda_executable The former sets the variable for the life of the current shell, the latter only for the lifespan of that particular executable invocation. … Read more

Why is NVIDIA Pascal GPUs slow on running CUDA Kernels when using cudaMallocManaged

Under CUDA 8 with Pascal GPUs, managed memory data migration under a unified memory (UM) regime will generally occur differently than on previous architectures, and you are experiencing the effects of this. (Also see note at the end about CUDA 9 updated behavior for windows.) With previous architectures (e.g. Maxwell), managed allocations used by a … Read more