cuda - w3toppers.com

“invalid configuration argument ” error for the call of CUDA kernel?

This type of error message frequently refers to the launch configuration parameters (grid/threadblock dimensions in this case, could also be shared memory, etc. in other cases). When you see a message like this it’s a good idea just to print out your actual config parameters before launching the kernel, to see if you’ve made any … Read more

Using an array of device function pointers

function pointers are allowed on Fermi. This is how you could do it: typedef double (*func)(double x); __device__ double func1(double x) { return x+1.0f; } __device__ double func2(double x) { return x+2.0f; } __device__ double func3(double x) { return x+3.0f; } __device__ func pfunc1 = func1; __device__ func pfunc2 = func2; __device__ func pfunc3 = … Read more

Which Compute Capability is supported by which CUDA versions?

CUDA Version Min CC Deprecated CC Default CC Max CC 5.5 (and prior) 1.0 N/A 1.0 ? 6.0 1.0 1.0 1.0 ? 6.5 1.1 1.x 2.0 ? 7.x 2.0 N/A 2.0 ? 8.0 2.0 2.x 2.0 6.2 9.x 3.0 N/A 3.0 7.0 10.x 3.0 * N/A 3.0 7.5 11.x 3.5 † 3.x 5.2 11.0:8.0, 11.1:8.6, … Read more

Multiply Rectangular Matrices in CUDA

After the help of Ira, Ahmad, ram, and Oli Fly, I got the correct answer as follows: #include <wb.h> #define wbCheck(stmt) do { \ cudaError_t err = stmt; \ if (err != cudaSuccess) { \ wbLog(ERROR, “Failed to run stmt “, #stmt); \ return -1; \ } \ } while(0) // Compute C = A … Read more

CUDA function pointers

To get rid of your compile error, you’ll have to use -gencode arch=compute_20,code=sm_20 as a compiler argument when compiling your code. But then you’ll likely have some runtime problems: Taken from the CUDA Programming Guide http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#functions Function pointers to __global__ functions are supported in host code, but not in device code. Function pointers to __device__ … Read more

printf() in my CUDA kernel doesn’t result produce any output

printf() output is only displayed if the kernel finishes successfully, so check the return codes of all CUDA function calls and make sure no errors are reported. Furthermore printf() output is only displayed at certain points in the program. Appendix B.32.2 of the Programming Guide lists these as Kernel launch via <<<>>> or cuLaunchKernel() (at … Read more

CUDA allocation alignment is 256 bytes – seriously?

The pointers which are allocated by using any of the CUDA Runtime’s device memory allocation functions e.g cudaMalloc or cudaMallocPitch are guaranteed to be 256 byte aligned, i.e. the address is a multiple of 256. Consider the following example: char *ptr1, *ptr2; int bytes = 1; cudaMalloc((void**)&ptr1,bytes); cudaMalloc((void**)&ptr2,bytes); Suppose the address returned in ptr1 is … Read more

Inconsistency of IDs between ‘nvidia-smi -L’ and cuDeviceGetName()

You can set the device order for CUDA environment in your shell to follow the bus ID instead of the default of fastest card. Requires CUDA 7 and up. export CUDA_DEVICE_ORDER=PCI_BUS_ID

How to disable a specific nvcc compiler warnings

It is actually possible to disable specific warnings on the device with NVCC. It took me ages to figure out how to do it. You need to use the -Xcudafe flag combined with a token listed on this page. For example, to disable the “controlling expression is constant” warning, pass the following to NVCC: -Xcudafe … Read more

How does CUDA assign device IDs to GPUs?

Set the environment variable CUDA_DEVICE_ORDER as: export CUDA_DEVICE_ORDER=PCI_BUS_ID Then the GPU IDs will be ordered by pci bus IDs.