Compiling code containing dynamic parallelism fails

You can do something like this nvcc -arch=sm_35 -rdc=true simple1.cu -o simple1 -lcudadevrt or If you have 2 files simple1.cu and test.c then you can do something as below. This is called seperate compilation. nvcc -arch=sm_35 -dc simple1.cu nvcc -arch=sm_35 -dlink simple1.o -o link.o -lcudadevrt g++ -c test.c g++ link.o simple1.o test.o -o simple -L/usr/local/cuda/lib64/ … Read more

Out-of-order instruction execution: is commit order preserved?

TL:DR: memory ordering is not the same thing as out of order execution. It happens even on in-order pipelined CPUs. In-order commit is necessary1 for precise exceptions that can roll-back to exactly the instruction that faulted, without any instructions after that having already retired. The cardinal rule of out-of-order execution is don’t break single-threaded code. … Read more