GPU Computing

cancel
Showing results for 
Search instead for 
Did you mean: 

Revisiting "Why Do I Need To Use A Compute Context?"

I see references in the documentation for "LVCUDA – Why Do I Need To Use A Compute Context", but I can't find it. 

 

Basically, I have this same question now, and maybe with the latest CUDA versions the rules have changed.  From this document, What's a LabVIEW Developer Got To Do To Get Some GPU Computing Around Here, Anyway? : Mathguy lays out this argument :

 

Why do the resources play such an important role in deciding what to do? The answer lies in the
CUDA runtime engine. For proper GPU execution, we have the following two constraints:
I. Functions that run on a GPU device and allocate resources on that device must run from
the same host thread if they want to reuse (a) the device and (b) resources preallocated on
that device.
II. Calling a GPU function repeatedly from a LabVIEW application must be executed by the
same host thread if shared GPU resources are needed.

 

With CUDA streams and cudaSetDevice can't an arbitrary thread call cudaSetDevice and then access the resources that are on that device and subsequently the resources for a given (possibly preexisting) CUDA stream?

 

Why can't I just build a normal .dll, where every function I build into a .dll has device id as an input.  Then every function in the .dll starts with cudaSetDevice. 

 -D

Message 1 of 2
(4,683 Views)

Ok.  Pre-CUDA 4.0, this context thing was probably a sticky problem.  Since then (and using the Runtime API which I think most everyone does instead of the driver) this a smaller problem. From CUDA_4.0_Readiness_Tech_Brief.pdf:

CUDA Runtime API
Prior to version 4.0 of the CUDA Runtime, each host thread accessing a particular device would get its own context to (view of) that device. As a consequence, it was not possible to share memory objects, events, and so forth across host threads, even when they were referring to the same device.
In version 4.0, host threads within a given process that access a particular device automatically share a single context to that device, rather than each having its own context. In other words, the new model for runtime applications is one context per device per process.

 

From Cuda-c-programming-guide:

The runtime is introduced in Compilation Workflow. It provides C functions that execute on the host to allocate and deallocate device memory, transfer data between host memory and device memory, manage systems with multiple devices, etc. A complete description of the runtime can be found in the CUDA reference manual.

The runtime is built on top of a lower-level C API, the CUDA driver API, which is also accessible by the application. The driver API provides an additional level of control by exposing lower-level concepts such as CUDA contexts - the analogue of host processes for the device - and CUDA modules - the analogue of dynamically loaded libraries for the device. Most applications do not use the driver API as they do not need this additional level of control and when using the runtime, context and module management are implicit, resulting in more concise code.

0 Kudos
Message 2 of 2
(4,666 Views)