CUDA limit seems to be reached, but what limit is that?

The resource which is being exhausted is time. On all current CUDA platforms, the display driver includes a watchdog timer which will kill any kernel which takes more than a few seconds to execute. Running code on a card which is running a display is subject to this limit.

On the WDDM Windows platforms you are using, there are three possible solutions/work-arounds:

  1. Get a Telsa card and use the TCC driver, which eliminates the problem completely
  2. Try modifying registry settings to increase the timer limit (google for the TdrDelay registry key for more information, but I am not a Windows user and can’t be more specific than that)
  3. Modify your kernel code to be “re-entrant” and process the data parallel work load in several kernel launches rather than one. Kernel launch overhead isn’t all that large and processing the workload over several kernel runs is often pretty easy to achieve, depending on the algorithm you are using.

Leave a Comment