Using an array of device function pointers

function pointers are allowed on Fermi. This is how you could do it: typedef double (*func)(double x); __device__ double func1(double x) { return x+1.0f; } __device__ double func2(double x) { return x+2.0f; } __device__ double func3(double x) { return x+3.0f; } __device__ func pfunc1 = func1; __device__ func pfunc2 = func2; __device__ func pfunc3 = … Read more

Multiply Rectangular Matrices in CUDA

After the help of Ira, Ahmad, ram, and Oli Fly, I got the correct answer as follows: #include <wb.h> #define wbCheck(stmt) do { \ cudaError_t err = stmt; \ if (err != cudaSuccess) { \ wbLog(ERROR, “Failed to run stmt “, #stmt); \ return -1; \ } \ } while(0) // Compute C = A … Read more

Compiling code containing dynamic parallelism fails

You can do something like this nvcc -arch=sm_35 -rdc=true simple1.cu -o simple1 -lcudadevrt or If you have 2 files simple1.cu and test.c then you can do something as below. This is called seperate compilation. nvcc -arch=sm_35 -dc simple1.cu nvcc -arch=sm_35 -dlink simple1.o -o link.o -lcudadevrt g++ -c test.c g++ link.o simple1.o test.o -o simple -L/usr/local/cuda/lib64/ … Read more

CUDA function pointers

To get rid of your compile error, you’ll have to use -gencode arch=compute_20,code=sm_20 as a compiler argument when compiling your code. But then you’ll likely have some runtime problems: Taken from the CUDA Programming Guide http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#functions Function pointers to __global__ functions are supported in host code, but not in device code. Function pointers to __device__ … Read more

CUDA allocation alignment is 256 bytes – seriously?

The pointers which are allocated by using any of the CUDA Runtime’s device memory allocation functions e.g cudaMalloc or cudaMallocPitch are guaranteed to be 256 byte aligned, i.e. the address is a multiple of 256. Consider the following example: char *ptr1, *ptr2; int bytes = 1; cudaMalloc((void**)&ptr1,bytes); cudaMalloc((void**)&ptr2,bytes); Suppose the address returned in ptr1 is … Read more