nvidia - w3toppers.com

OpenGL without X.org in linux

Update (Sep. 17, 2017): NVIDIA recently published an article detailing how to use OpenGL on headless systems, which is a very similar use case as the question describes. In summary: Link to libOpenGL.so and libEGL.so instead of libGL.so. (Your linker options should therefore be -lOpenGL -lEGL Call eglGetDisplay, then eglInitialize to initialize EGL. Call eglChooseConfig … Read more

Forcing NVIDIA GPU programmatically in Optimus laptops

According to http://developer.download.nvidia.com/devzone/devcenter/gamegraphics/files/OptimusRenderingPolicies.pdf starting from 302 drivers it is enough to link statically with one of the following libraries: vcamp110.dll, vcamp110d.dll, nvapi.dll, nvapi64.dll, opencl.dll, nvcuda.dll, cudart*.*, or to export a NvOptimusEnablement variable in your program: extern “C” { _declspec(dllexport) DWORD NvOptimusEnablement = 0x00000001; }

Are cuda kernel calls synchronous or asynchronous

Kernel calls are asynchronous from the point of view of the CPU so if you call 2 kernels in succession the second one will be called without waiting for the first one to finish. It only means that the control returns to the CPU immediately. On the GPU side, if you haven’t specified different streams … Read more

How does CUDA assign device IDs to GPUs?

Set the environment variable CUDA_DEVICE_ORDER as: export CUDA_DEVICE_ORDER=PCI_BUS_ID Then the GPU IDs will be ordered by pci bus IDs.

CUDA determining threads per block, blocks per grid

In general you want to size your blocks/grid to match your data and simultaneously maximize occupancy, that is, how many threads are active at one time. The major factors influencing occupancy are shared memory usage, register usage, and thread block size. A CUDA enabled GPU has its processing capability split up into SMs (streaming multiprocessors), … Read more

why do we need cudaDeviceSynchronize(); in kernels with device-printf?

A kernel launch is asynchronous. This means it returns control to the CPU thread immediately after starting up the GPU process, before the kernel has finished executing. So what is the next thing in the CPU thread here? Application exit. At application exit, it’s ability to send output to the standard output is terminated by … Read more

How can I make tensorflow run on a GPU with capability 2.x?

Recent GPU versions of tensorflow require compute capability 3.5 or higher (and use cuDNN to access the GPU. cuDNN also requires a GPU of cc3.0 or higher: cuDNN is supported on Windows, Linux and MacOS systems with Pascal, Kepler, Maxwell, Tegra K1 or Tegra X1 GPUs. Kepler = cc3.x Maxwell = cc5.x Pascal = cc6.x … Read more

How do I check if PyTorch is using the GPU?

These functions should help: >>> import torch >>> torch.cuda.is_available() True >>> torch.cuda.device_count() 1 >>> torch.cuda.current_device() 0 >>> torch.cuda.device(0) <torch.cuda.device at 0x7efce0b03be0> >>> torch.cuda.get_device_name(0) ‘GeForce GTX 950M’ This tells us: CUDA is available and can be used by one device. Device 0 refers to the GPU GeForce GTX 950M, and it is currently chosen by PyTorch.

How do I select which GPU to run a job on?

The problem was caused by not setting the CUDA_VISIBLE_DEVICES variable within the shell correctly. To specify CUDA device 1 for example, you would set the CUDA_VISIBLE_DEVICES using export CUDA_VISIBLE_DEVICES=1 or CUDA_VISIBLE_DEVICES=1 ./cuda_executable The former sets the variable for the life of the current shell, the latter only for the lifespan of that particular executable invocation. … Read more

Why is NVIDIA Pascal GPUs slow on running CUDA Kernels when using cudaMallocManaged

Under CUDA 8 with Pascal GPUs, managed memory data migration under a unified memory (UM) regime will generally occur differently than on previous architectures, and you are experiencing the effects of this. (Also see note at the end about CUDA 9 updated behavior for windows.) With previous architectures (e.g. Maxwell), managed allocations used by a … Read more