The interface to the sound cards that I'm using is a library called WaveIO (http://www.zeitnitz.eu/waveio). Writing and reading buffers are done via Call dynamic linked library nodes i.e a C DLL and these function calls uses multi-threading, not the UI thread.
The calls to the WaveIO isn't done from within the GUI. Breifly the application consists of a GUI VI based on a state machine with an event structure. Each time I start a playback / recording session I dynamically start a daemon process (VI) dedicated for that specific soundcard. This VI contains a two sub VIs, one for recording (loop) and one for playback (loop), and these are running in parallel. I have set the execution properties of all these I/O VIs to time critical priority and data acquisition as their preferred execution system. These VIs are also set to shared clone reentrants because they are shared between multiple sessions (sound card) that run in parallel. The data from these loops are continuously enqueued to queues and dequeued in a lower pace in the GUI for analysis. The I/O VIs are clean from local variables, property nodes and other stuff that for what I know uses the UI thread. The CPU for the application while running is fairly low (about 1-4 % cpu) according to the windows task manager.
JonS