cuda - w3toppers.com

Cuda atomics change flag

It looks to me like what you want is a “critical section” in your code. A critical section allows one thread to execute a sequence of instructions while preventing any other thread or threadblock from executing those instructions. A critical section can be used to control access to a memory area, for example, so as … Read more

Reduce matrix rows with CUDA

Since you mentioned you need general reduction algorithm other than sum only. I will try to give 3 approaches here. kernel approach may have the highest performance. thrust approach is easiest to implement. cuBLAS approach works only with sum and have good performance. Kernel Approach Here’s a very good doc introducing how to optimize standard … Read more

How to use Thrust to sort the rows of a matrix?

I can think of 2 possibilities, one of which is suggested already by @JaredHoberock. I don’t know of a general methodology to fuse for-loop iterations in thrust, but the second method is the more general approach. My guess is that the first method would be the faster of the two approaches, in this case. Use … Read more

CUDA compute capability requirements

CUDA VERSION Min CC Deprecated CC Default CC Max CC 5.5 (and prior) 1.0 N/A 1.0 6.0 1.0 1.0 1.0 6.5 1.1 1.x 2.0 7.x 2.0 N/A 2.0 8.0 2.0 2.x 2.0 6.2 9.x 3.0 N/A 3.0 7.0 10.x 3.0 N/A 3.0 7.5 (3.0 deprecated in 10.2) 11.x 3.5 3.x,5.0 5.2 8.6 (11.0:8.0, 11.1:8.6) (CUDA … Read more

How do I start a new CUDA project in Visual Studio 2008?

NOTE With the release of version 3.2 of the CUDA Toolkit, NVIDIA now includes the rules file with the Toolkit as opposed to the SDK. Therefore I’ve split this answer into two halves, use the correct instructions for your version of the Toolkit. NOTE These instructions are valid for Visual Studio 2005 and 2008. For … Read more

What is the correct version of CUDA for my nvidia driver?

304.xx is a driver that will support CUDA 5 and previous (does not support newer CUDA versions.) If you want to reinstall ubuntu to create a clean setup, the linux getting started guide has all the instructions needed to set up CUDA if that is your intent. I believe you are picking up a 304.xx … Read more

CUDA incompatible with my gcc version

As already pointed out, nvcc depends on gcc 4.4. It is possible to configure nvcc to use the correct version of gcc without passing any compiler parameters by adding softlinks to the bin directory created with the nvcc install. The default cuda binary directory (the installation default) is /usr/local/cuda/bin, adding a softlink to the correct … Read more

Using Java with Nvidia GPUs (CUDA)

First of all, you should be aware of the fact that CUDA will not automagically make computations faster. On the one hand, because GPU programming is an art, and it can be very, very challenging to get it right. On the other hand, because GPUs are well-suited only for certain kinds of computations. This may … Read more

cudaMemcpy segmentation fault

I believe I know what the problem is, but to confirm it, it would be useful to see the code that you are using to set up the Grid_dev classes on the device. When a class or other data structure is to be used on the device, and that class has pointers in it which … Read more

How can I add up two 2d (pitched) arrays using nested for loops?

The short answer is, you can’t. The cudaMallocPitch()function does exactly what its name implies, it allocates pitched linear memory, where the pitch is chosen to be optimal for the GPU memory controller and texture hardware. If you wanted to use arrays of pointers in the kernel, the kernel code would have to look like this: … Read more