printf()
output is only displayed if the kernel finishes successfully, so check the return codes of all CUDA function calls and make sure no errors are reported.
Furthermore printf()
output is only displayed at certain points in the program. Appendix B.32.2 of the Programming Guide lists these as
- Kernel launch via
<<<>>>
orcuLaunchKernel()
(at the start of the launch, and if the CUDA_LAUNCH_BLOCKING environment variable is set to 1, at the end of the launch as well), - Synchronization via
cudaDeviceSynchronize()
,cuCtxSynchronize()
,cudaStreamSynchronize()
,cuStreamSynchronize()
,cudaEventSynchronize()
, orcuEventSynchronize()
, - Memory copies via any blocking version of
cudaMemcpy*()
orcuMemcpy*()
, - Module loading/unloading via
cuModuleLoad()
orcuModuleUnload()
, - Context destruction via
cudaDeviceReset()
orcuCtxDestroy()
. - Prior to executing a stream callback added by
cudaStreamAddCallback()
orcuStreamAddCallback()
.
To check this is your problem, put the following code after your kernel invocation:
{
cudaError_t cudaerr = cudaDeviceSynchronize();
if (cudaerr != cudaSuccess)
printf("kernel launch failed with error \"%s\".\n",
cudaGetErrorString(cudaerr));
}
You should then see either the output of your kernel or an error message.
More conveniently, cuda-memcheck
will automatically check all return codes for you if you run your executable under it. While you should always check for errors anyway, this comes handy when resolving concrete issues.