LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

FPGA: What's the best way to multiplex access to different FIFOs?

Hi, In our application, we use the FPGA to send a sequence of digital signals through 16 channels in parallel, in real-time and at 80 MHz rate. It works like this: On the host VI, the user sets up the sequence, choosing which channels output which values at which times in the sequence. These values are inserted to a host-to-target big FIFO, where each item contains channel+hi/lo+time. When the sequence is activated, the FPGA takes control. It reads the big FIFO and divides the information among 16 host-only FIFOs, one for each digital channel. Then there are 16 parallel loops reading the 16 FIFOs and outputing to the desired channels. This works ok, but when we try to add more channels, or when we try to increase the size of the FIFOs, we get timing violations. My question concerns only the stage where the data is divided from the big FIFO into the 16 FIFOs. We suspect it is highly inefficient. We read the channel number from the big FIFO, then we use a case structure with 16 cases, in each case writing a value to a different FIFO. This goes on until we exhaust the big FIFO. Thus a lot of branching is needlessly involved, which turns more severe each time a channel is added. A much better way would be to use a LUT where each item points to a FIFO, then in a deterministic way (i.e. without branching, if's, cases, etc.) directly accessing the desired FIFO. Or another much more efficient way may be possible. Is there a way to optimize such a process on FGPA? Thanks, Itay.
0 Kudos
Message 1 of 6
(3,461 Views)

I don't fully understand what you're saying, but surely a single FIFO of 16 bit values would be the most efficient way to send the data.

0 Kudos
Message 2 of 6
(3,451 Views)

First of all, it is unfortunate that my post was wrapped up to a single paragraph.

 

I want 16 timed loops (80 MHz) running in parallel, each with its own data stream (FIFO).

 

I don't want a big loop which has to read channel number each time. This will slow things down. Also, I won't be able to send two signals at the same time.

 

So I distribute from one big FIFO into 16 smaller FIFOs (this is done on the board) before starting the sequence.

 

The distribution process is cumbersome and consumes lots of resources, and in principle can be done without branching. I just don't know how to do it.

 

Right now I'm using a case structure with 16 cases, so it's equivalent to a long list of if..elseif etc.

 

Itay.

0 Kudos
Message 3 of 6
(3,441 Views)

Someone will correct me if I'm wrong, but I don't think there's a way to optimise this FIFO selection. There may be another method that doesn't involve the use of 16 internal FIFOs, though. Do you really need 16 parallel timed loops, or can you process the 16 channels all in a single loop?

0 Kudos
Message 4 of 6
(3,432 Views)

As always, if you post your code it will be easier to help you.

 

From your description, I assume that you empty the DMA FIFO into the internal FIFOs before the test starts, since there is no way that a single DMA FIFO can keep up with feeding 16 80mhz loops. If this is true, have you considered other means of filling the internal FIFOs? You could have a separate loop for each one that reads a value and uses a boolean for handshaking. The data transfer will be slower than over DMA, but it is easily expandable - just add another loop. This method may use more FPGA space, though.

0 Kudos
Message 5 of 6
(3,411 Views)

Another option is to split the big 16-sel-1 mux to a series of multi-layer mux and insert pipeline in-between.

 

For example, since 0-15 is coded with 4 bits, you can rout the data from the big FIFO to 2 branches based on the MSB, insert a pipeline, and then branch it again into 2 branches based on the second bit, insert a pipeline... You can have 4 layers of branches with well pipelined structure, which will improve your timing but add to your resource usage.

 

Besides, if you are not so demanding on the 16 processing loops, you can replace the 16 local FIFO with Handshake modules which work similar to Register modules but with 4-wire handshake to make sure no loss of data and no data is fetched twice. This saves your resource. The disadvantage of Handshake module is the latency, since by Handshake data is transferred point by point and it takes about 6 cycles to finish the transfer of one point.

Message 6 of 6
(3,376 Views)