Cuda atomics change flag

It looks to me like what you want is a “critical section” in your code. A critical section allows one thread to execute a sequence of instructions while preventing any other thread or threadblock from executing those instructions. A critical section can be used to control access to a memory area, for example, so as … Read more

Reduce matrix rows with CUDA

Since you mentioned you need general reduction algorithm other than sum only. I will try to give 3 approaches here. kernel approach may have the highest performance. thrust approach is easiest to implement. cuBLAS approach works only with sum and have good performance. Kernel Approach Here’s a very good doc introducing how to optimize standard … Read more

CUDA compute capability requirements

CUDA VERSION Min CC Deprecated CC Default CC Max CC 5.5 (and prior) 1.0 N/A 1.0 6.0 1.0 1.0 1.0 6.5 1.1 1.x 2.0 7.x 2.0 N/A 2.0 8.0 2.0 2.x 2.0 6.2 9.x 3.0 N/A 3.0 7.0 10.x 3.0 N/A 3.0 7.5 (3.0 deprecated in 10.2) 11.x 3.5 3.x,5.0 5.2 8.6 (11.0:8.0, 11.1:8.6) (CUDA … Read more

CUDA incompatible with my gcc version

As already pointed out, nvcc depends on gcc 4.4. It is possible to configure nvcc to use the correct version of gcc without passing any compiler parameters by adding softlinks to the bin directory created with the nvcc install. The default cuda binary directory (the installation default) is /usr/local/cuda/bin, adding a softlink to the correct … Read more

Using Java with Nvidia GPUs (CUDA)

First of all, you should be aware of the fact that CUDA will not automagically make computations faster. On the one hand, because GPU programming is an art, and it can be very, very challenging to get it right. On the other hand, because GPUs are well-suited only for certain kinds of computations. This may … Read more