How to measure the inner kernel time in NVIDIA CUDA?
You can do something like this: __global__ void kernelSample(int *runtime) { // …. clock_t start_time = clock(); //some code here clock_t stop_time = clock(); // …. runtime[tidx] = (int)(stop_time – start_time); } Which gives the number of clock cycles between the two calls. Be a little careful though, the timer will overflow after a couple … Read more