In a CUDA kernel, how do I store an array in “local thread memory”?

Arrays, local memory and registers There is a misconception here regarding the definition of “local memory”. “Local memory” in CUDA is actually global memory (and should really be called “thread-local global memory”) with interleaved addressing (which makes iterating over an array in parallel a bit faster than having each thread’s data blocked together). If you … Read more