Thrust inside user written kernels

As it was originally written, Thrust is purely a host side abstraction. It cannot be used inside kernels. You can pass the device memory encapsulated inside a thrust::device_vector to your own kernel like this: thrust::device_vector< Foo > fooVector; // Do something thrust-y with fooVector Foo* fooArray = thrust::raw_pointer_cast( fooVector.data() ); // Pass raw array and … Read more

Cuda atomics change flag

It looks to me like what you want is a “critical section” in your code. A critical section allows one thread to execute a sequence of instructions while preventing any other thread or threadblock from executing those instructions. A critical section can be used to control access to a memory area, for example, so as … Read more

CUDA compute capability requirements

CUDA VERSION Min CC Deprecated CC Default CC Max CC 5.5 (and prior) 1.0 N/A 1.0 6.0 1.0 1.0 1.0 6.5 1.1 1.x 2.0 7.x 2.0 N/A 2.0 8.0 2.0 2.x 2.0 6.2 9.x 3.0 N/A 3.0 7.0 10.x 3.0 N/A 3.0 7.5 (3.0 deprecated in 10.2) 11.x 3.5 3.x,5.0 5.2 8.6 (11.0:8.0, 11.1:8.6) (CUDA … Read more

Different CUDA versions shown by nvcc and NVIDIA-smi

CUDA has 2 primary APIs, the runtime and the driver API. Both have a corresponding version (e.g. 8.0, 9.0, etc.) The necessary support for the driver API (e.g. libcuda.so on linux) is installed by the GPU driver installer. The necessary support for the runtime API (e.g. libcudart.so on linux, and also nvcc) is installed by … Read more

Unspecified launch failure on Memcpy

When I compile and run your code, I get: an illegal memory access was encountered-3 printed out. You may indeed be getting “unspecified launch failure” instead. The exact error reporting will depend on CUDA version, GPU, and platform. But we can proceed forward regardless. Either message indicates that the kernel launched but encountered an error, … Read more

What is the canonical way to check for errors using the CUDA runtime API?

Probably the best way to check for errors in runtime API code is to define an assert style handler function and wrapper macro like this: #define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); } inline void gpuAssert(cudaError_t code, const char *file, int line, bool abort=true) { if (code != cudaSuccess) { fprintf(stderr,”GPUassert: %s %s %d\n”, cudaGetErrorString(code), file, … Read more