GPU Computing

cancel
Showing results for 
Search instead for 
Did you mean: 

Is the Labview Runtime Environment sufficient to support utilizing a GPU?

We have multiple users - not me - who develop LabVIEW applications back in their office and labs. These applications are then run by the developers and their coworkers in the field on shared workstations (quantity 80) equipped with multiple versions (...2012, 2013...) of the LabVIEW Runtime Environment, I'm in the process of adding a graphics card to the workstations so that the workstations can be upgraded from a single monitor to multiple monitors. As I investigate which graphics card to use, I'm becoming more aware of the possibilities of using GPU resources to improve the performance of some applications. I've found myself wondering if I should be thinking in terms of adding a graphics card that not only enables the upgrade to multiple monitors, but also enables an upgrade to the option of significant GPU-acceleration of suitable applications. For example, I'm looking at Nvidia GTX 750 and GTX 750Ti, based on the recently released Maxwell processor, which appear to provide a lot of processing capability with relatively low power dissipation and a minimal $ cost. While there are many more powerful (and more power hungry, and more expensive) solutions out there, I wondering if adding this inexpensive card would open the door to significant GPU-acceleration of some of our applications, while at the same time handling the required upgrade to multiple monitors. 

I've become aware of the NI LabVIEW GPU Analysis Toolkit. Is this a toolkit that we would purchase for the developers who want to add GPU-acceleration to their LabVIEW applications, but would not be needed for the 80 workstations on which the GPU-accelerated applications might be run? Would the standard LabVIEW Runtime Environments we currently install on the workstations be sufficient to support the GPU-accelerated applications?

0 Kudos
Message 1 of 18
(24,669 Views)

If your developers are already compiling and distributing LabVIEW executables and run time environments, then they will have no issues running GPU accelerated code within a runtime environment. Just be aware of this issue here.

I don't know what algorithm's you are thinking of accelerating, but remember that GPUs are only good at accelerating a specific class of algorithm's. Mainly algorithm's where the same operation is applied many times on different datasets. If your algorithm's utilize a lot of matrix multiplications or FFTs, then you could get a lot of benefit (these are mainly what has been implemented in LabVIEW GPU). If you aren't doing matrix multiplications or FFTs, it's not impossible for you to get GPUs to work for you, but you would have to look into CUDA in order to do that and that is not nearly as simple to drop right in.

If you do use many matrix multiplications or FFTs get one and try it because, as you say, the cost of the hardware is so low that if you get significant speed benefits it's worth it. I would talk with your developers first though and make sure that those operations are the bottlenecks in your applications.

0 Kudos
Message 2 of 18
(11,533 Views)

Thanks, Cole.

I'm glad to hear we will need to buy the LabVIEW GPU Analysis Toolkit only for the developer's, while the users of the applications will need only the LabVIEW RTE.

My vision is we will use GPU-acceleration only for parallel computing applications such as FFTs or linear algebra - and I suspect we have some applications of that type - or will have once our workstation is capable of decent performance with applications of that type..

I already have a query out to the users. However, even if they don't see an immediate need for a GPU-accelerated application, since I have to add a graphics card to upgrade the workstations from a single monitor to at least dual monitors, it looks like I should choose a graphics card that will also provide significant GPU-acceleration capability. So far I think I can do so without incurring any significant disadvantages - cost or otherwise.

0 Kudos
Message 3 of 18
(11,533 Views)

I'm compelled to add a bit more information regarding the algorithms suitable for GPUs. There are far more than just matrix multiplication and FFT that can benefit from the GPU's many-core architecture. However, the toolkit only offers a relatively small set of wrapper functions that can be used out-of-the-box from the toolkit.

Whoever is developing your GPU accelerated LV apps can leverage a very large set of functions from NVIDIA's CUDA librarires (NPP, CUBLAS, CUFFT, CUSPARSE) but they will have to build their own custom wrappers. The difficulty of doing so is really based on the function(s) you want to call from w/in LabVIEW.

There is a knowledgebase entry which describes the process (http://digital.ni.com/public.nsf/allkb/82D5EDC19AF79F0786257A4C007417B1). It is possible to implement purely custom functions based on CUDA  and call them from LabVIEW as you would any other external C function.

These too would deploy in a similar fashion. The only exception is that dependencies on specific NVIDIA libraries (e.g. CUSPARSE) would require those libraries be installed on each target machine.

When it comes to using the additional graphics card for both computation and display, you should be aware that it is not recommended to do any 'heavy lifting' on display devices under Windows. This is because the driver assumes a device is 'hung' when it does not respond after 5 seconds.

The workaround is simple. If you need to do extensive computing on the extra graphics card, turn off the display on the card (i.e. don't extend the desktop to it). This devotes all device's resources to the computation and avoids the timeout issue. This also improves performance in many cases. Of course, the time a  function takes to execute varies based on the GPU processor.

I would also not feel shy about using both GPUs in the systems. I've run many dual GPU benchmarks on systems with low- and mid-level GPUs and have been quite happy w/ the performance.

Message 4 of 18
(11,533 Views)

Thanks for the wealth of information on the different ways of pulling general processing on a GPU into a LabVIEW application.

Perhaps you can help me further with the issue of shared/dedicated GPU. It's currently our chief concern about whether we can work general processing on a GPU into our existing test workstations.

The workstations are based on an old no-longer-supported motherboard, Intel DG965WH, with no add-in graphics card, but the scheduled major workstation upgrade is still a couple or more years in the future. As far as I've been able to tell working the immediate project of upgrading the workstations from single to dual monitors, at least for purposes of driving displays, BIOS gives me three options - run the motherboard graphics, run the add-in-card graphics, or run the add-in-card if its plugged in otherwise run the motherboard graphics - but not the option of run both the motherboard graphics and the add-in-card graphics. So, even if I were to decide that I could handle the upgrade to dual monitors by using a "USB video adapter" for the second monitor while running the motherboard graphics for the first monitor, I still couldn't have the add-in card running to provide general processing on a GPU. Or is there something here that I'm missing - other than a more modern motherboard.

If there isn't a possible configuration that would leave the add-in card dedicated to general processing, is there a means of still getting a reduced amount of general processing from the shared add-in card without undue risk of the driver concluding the card is hung?

Thanks for your help. 

0 Kudos
Message 5 of 18
(11,533 Views)

My experience is that the BIOS setting for a system is tied to designating the primary display (i.e. so that the pre-OS boot processes can display information). This is the three option setting you are referring to.

In the case of Windows, any display device recognized by the BIOS (not just the one chosen to be the primary) will be accessible and can be used for display. Depending on the video cards (embedded or add-in), the options on how more than 1 display is used is tied to the driver support (plus any lmitations by the OS). This is what I was referring to when I said the user has the option to not extend the desktop to the additional display.

Using two video chips from different vendors used to be a problem but it's much more common now (e.g. Intel's build in graphics + an NVIDIA PCIe card) and you should be able to install drivers for both so that you could display on one and 'turn off' the other for compute purposes - again, if the driver supports this.

I have limited experience with USB video adapters. My instincts tell me that they cannot be used as the primary for system boot which means you can't turn off display to the add-in card - it will be the only card seen by the BIOS at boot time.

That said, in this situation it can still be used for compute despite having resources reserved for display. The downsides are that fewer video resources are available for the computation and the ~5s execution limit.

Regarind the 5s issue, it is *very* rare to deploy a computation to a GPU that consumes it for this time. It is really only an issue for the developer who is investigating implementations. There, it's common to accidently create a infinite loop such that the device doesn't respond after awhile. I would not expect that you would deploy a LabVIEW application where a GPU function took that long. In fact, you can work around it by spliltting the function up into smaller calls which happen sequentially but each run shorter than 5s.

What would be a more likely failure condition is developing a GPU function that involves a large computation where it is too large to fit into a low-end GPU's resources. This would generate a runtime error and not crash the device (or OS). If the app is developed and tested on a machine w/ the right hardware, you can avoid this.

Message 6 of 18
(11,533 Views)

Thanks for the help..

I'll make an attempt to have both the motherboard graphics and the add-in graphics card functioning with the desktop extended only to the monitor connected to the motherboard graphics. As an indication of whether the add-in-graphics card is functioning, I'll attempt to alternately extend and not-extent the desktop to a monitor connected to the add-in card.

I found the following that might be helpful for GPGPU using a graphics card that is not TCC-capable, i.e. a graphics card susceptible to the ~5s execution limit.

https://www.pgroup.com/userforum/viewtopic.php?t=3845&sid=ed2643db7e2a79ca6d6eb5791d24a719

Here's the response I received back from my contacts at NVIDIA:

---------------------------------------------------------------------------------------
On
Windows Vista and later, the watchdog timer applies to all WDDM devices, regardless of whether there is a display attached. For someone hitting the timeouts, they have three choices:

(1) Use a TCC-capable board (e.g., a Tesla) and enable TCC mode with nvidia-smi.
(2) Increase the watchdog timeout in the registry (I prefer this over disabling the timeout completely). A timeout of, say, 30-60 seconds is enough to let most valid cases complete but still reset without rebooting in cases of a true hang.
(3) Change the kernels -- or rather the batches of kernels, which are a little hard to predict under WDDM -- so they always finish inside the default two seconds maximum.

If one of these solutions is implemented and the app still hangs/TDR's, then it could be a legitimate deadlock condition in the application code, the compiler-generated code, or the NVIDIA driver, in that order of likelihood.

----------------------------------------------------------------------------------

...

If you are using a non-Tesla card (such as a GTX or Quadro), then your best option would be to increase the Watchdog time out.

I don't yet fully understand the kernels or batches of kernels subject to timeout. This CUDA tutorial http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf helped me. Perhaps identifying the elements subject to timeout can be a bit nebulous, as suggested by "batches of kernels, which are a little hard to predict under WDDM".

0 Kudos
Message 7 of 18
(11,533 Views)

If you venture down the path of registry updates, just beware of the likelihood that today's registry key (e.g. Win7's 'TdrDelay') may not be used in future OS updates. You'll have to track this information as your system update happen.

I think you'll find that the apps and areas where the timeout is most susceptible are in the HPC area. If you are not implementating functions that consume GBs of data and do massive amounts of computation (as might happen for weather modeling, large system simulations, etc), it is unlikely your kernel will execute this long unless your GPU is 'too small' to handle the problem.

0 Kudos
Message 8 of 18
(11,533 Views)

Thanks for putting the timeout issue in perspective. Since we wouldn't be doing massive amounts of computation such as weather modeling or large system simulation, it sounds like we should be fine with the default TdrDelay.

The final question I'm trying to sort out is, if the GPU must be shared between GPGPU and other tasks such as driving multiple monitors with multiple windows of electronic strip charts and decoded MPEG2 video streams, will GPGPU provide a worthwhile performance boost for applicable LabVIEW applications without adverse effects, e.g. momentary screen freezes, on the existing windows of electronics strip charts and decoded MPEG2 video streams?

0 Kudos
Message 9 of 18
(11,533 Views)

I'll first state the most of what I've seen or done focuses on accelerating code on a GPU that is *not* used for display. That said, I've tested almost all behavior in situations where the GPU has to do both just to make sure there were no surprises. Intuition comes at a premium these days!

Besides a delay at start-up in front panel refreshes (e.g. a chart filling up it's buffer with plot data), I can't say that I've seen degradation in display behavior w/ a computation running too. This is because many common visualization options don't task the GPU hardware (e.g. multiple 1D plots, histograms). However, complicated 3D graphics/plots which are updated at high frequency might be different. That will require some trial and error to investigate.

Regarding things like MPEG2 decoding, that is often built into HW and is not likely to cause excessive jitter unless it has a major software component. I know there's been a shift in support AVCHD/MPEG4 vs MPEG2 in the HW so it may heavily depend on whether you are leveraging HW or SW implementations.

0 Kudos
Message 10 of 18
(11,533 Views)