CUDA apps time out & fail after several seconds – how to work around this?

I’m not a CUDA expert, — I’ve been developing with the AMD Stream SDK, which AFAIK is roughly comparable. You can disable the Windows watchdog timer, but that is highly not recommended, for reasons that should be obvious. To disable it, you need to regedit HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Watchdog\Display\DisableBugCheck, create a REG_DWORD and set it to 1. You … Read more

Why is Tensorflow not recognizing my GPU after conda install?

August 2021 Conda install may be working now, as according to @ComputerScientist in the comments below, conda install tensorflow-gpu==2.4.1 will give cudatoolkit-10.1.243 and cudnn-7.6.5 The following was written in Jan 2021 and is out of date Currently conda install tensorflow-gpu installs tensorflow v2.3.0 and does NOT install the conda cudnn or cudatoolkit packages. Installing them … Read more

Clearing Tensorflow GPU memory after model execution

A git issue from June 2016 (https://github.com/tensorflow/tensorflow/issues/1727) indicates that there is the following problem: currently the Allocator in the GPUDevice belongs to the ProcessState, which is essentially a global singleton. The first session using GPU initializes it, and frees itself when the process shuts down. Thus the only workaround would be to use processes and … Read more

Is there a way of determining how much GPU memory is in use by TensorFlow?

(1) There is some limited support with Timeline for logging memory allocations. Here is an example for its usage: run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE) run_metadata = tf.RunMetadata() summary, _ = sess.run([merged, train_step], feed_dict=feed_dict(True), options=run_options, run_metadata=run_metadata) train_writer.add_run_metadata(run_metadata, ‘step%03d’ % i) train_writer.add_summary(summary, i) print(‘Adding run metadata for’, i) tl = timeline.Timeline(run_metadata.step_stats) print(tl.generate_chrome_trace_format(show_memory=True)) trace_file = tf.gfile.Open(name=”timeline”, mode=”w”) trace_file.write(tl.generate_chrome_trace_format(show_memory=True)) You can … Read more

CUDA how to get grid, block, thread size and parallalize non square matrix calculation

As you have written it, that kernel is completely serial. Every thread launched to execute it is going to performing the same work. The main idea behind CUDA (and OpenCL and other similar “single program, multiple data” type programming models) is that you take a “data parallel” operation – so one where the same, largely … Read more

How do I use Nvidia Multi-process Service (MPS) to run multiple non-MPI CUDA applications?

The necessary instructions are contained in the documentation for the MPS service. You’ll note that those instructions don’t really depend on or call out MPI, so there really isn’t anything MPI-specific about them. Here’s a walkthrough/example. Read section 2.3 of the above-linked documentation for various requirements and restrictions. I recommend using CUDA 7, 7.5, or … Read more