gpu - w3toppers.com

CUDA apps time out & fail after several seconds – how to work around this?

I’m not a CUDA expert, — I’ve been developing with the AMD Stream SDK, which AFAIK is roughly comparable. You can disable the Windows watchdog timer, but that is highly not recommended, for reasons that should be obvious. To disable it, you need to regedit HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Watchdog\Display\DisableBugCheck, create a REG_DWORD and set it to 1. You … Read more

Why is Tensorflow not recognizing my GPU after conda install?

August 2021 Conda install may be working now, as according to @ComputerScientist in the comments below, conda install tensorflow-gpu==2.4.1 will give cudatoolkit-10.1.243 and cudnn-7.6.5 The following was written in Jan 2021 and is out of date Currently conda install tensorflow-gpu installs tensorflow v2.3.0 and does NOT install the conda cudnn or cudatoolkit packages. Installing them … Read more

Clearing Tensorflow GPU memory after model execution

A git issue from June 2016 (https://github.com/tensorflow/tensorflow/issues/1727) indicates that there is the following problem: currently the Allocator in the GPUDevice belongs to the ProcessState, which is essentially a global singleton. The first session using GPU initializes it, and frees itself when the process shuts down. Thus the only workaround would be to use processes and … Read more

How to generalize fast matrix multiplication on GPU using numba

There are arguably at least two errors in that posted code: This can’t possibly be a correct range check: if x >= C.shape[0] and y >= C.shape[1]: In order for us to decide that a particular thread in the grid not do any loading activity, we require either that x is out of range or … Read more

Is there a way of determining how much GPU memory is in use by TensorFlow?

(1) There is some limited support with Timeline for logging memory allocations. Here is an example for its usage: run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE) run_metadata = tf.RunMetadata() summary, _ = sess.run([merged, train_step], feed_dict=feed_dict(True), options=run_options, run_metadata=run_metadata) train_writer.add_run_metadata(run_metadata, ‘step%03d’ % i) train_writer.add_summary(summary, i) print(‘Adding run metadata for’, i) tl = timeline.Timeline(run_metadata.step_stats) print(tl.generate_chrome_trace_format(show_memory=True)) trace_file = tf.gfile.Open(name=”timeline”, mode=”w”) trace_file.write(tl.generate_chrome_trace_format(show_memory=True)) You can … Read more

Using CUDA with Visual Studio 2017

If you want to install CUDA 8.0 with Visual Studio 2017 you need to install additional components for Visual Studio 2017. Click on the Start Menu and type Visual Studio Installer. Open Visual Studio Installer Open Individual components tab and select VC++ 2015.3 v140 toolset under Compilers, build tools and runtimes. You also need to … Read more

Why does OpenGL not support multiple index buffering?

OpenGL (and D3D. And Metal. And Mantle. And Vulkan) doesn’t support this because hardware doesn’t support this. Hardware doesn’t support this because, for the vast majority of mesh data, this would not help. This is primarily useful for meshes that are predominantly not smooth (vertices sharing positions but not normals and so forth). And most … Read more

Any particular function to initialize GPU other than the first cudaMalloc call?

A call to cudaFree(0); is the canonical way to force lazy context establishment in the CUDA runtime. You can’t reduce the overhead, that is a function of driver, runtime and operating system latencies. But the call above will let you control how/when those overheads occur during program execution. EDIT in 2015 to add that the … Read more

CUDA how to get grid, block, thread size and parallalize non square matrix calculation

As you have written it, that kernel is completely serial. Every thread launched to execute it is going to performing the same work. The main idea behind CUDA (and OpenCL and other similar “single program, multiple data” type programming models) is that you take a “data parallel” operation – so one where the same, largely … Read more

How do I use Nvidia Multi-process Service (MPS) to run multiple non-MPI CUDA applications?

The necessary instructions are contained in the documentation for the MPS service. You’ll note that those instructions don’t really depend on or call out MPI, so there really isn’t anything MPI-specific about them. Here’s a walkthrough/example. Read section 2.3 of the above-linked documentation for various requirements and restrictions. I recommend using CUDA 7, 7.5, or … Read more