12-17-2013 03:29 AM
12-17-2013 04:05 AM
I don't fully understand what you're saying, but surely a single FIFO of 16 bit values would be the most efficient way to send the data.
12-17-2013 06:25 AM
First of all, it is unfortunate that my post was wrapped up to a single paragraph.
I want 16 timed loops (80 MHz) running in parallel, each with its own data stream (FIFO).
I don't want a big loop which has to read channel number each time. This will slow things down. Also, I won't be able to send two signals at the same time.
So I distribute from one big FIFO into 16 smaller FIFOs (this is done on the board) before starting the sequence.
The distribution process is cumbersome and consumes lots of resources, and in principle can be done without branching. I just don't know how to do it.
Right now I'm using a case structure with 16 cases, so it's equivalent to a long list of if..elseif etc.
Itay.
12-17-2013 06:57 AM
Someone will correct me if I'm wrong, but I don't think there's a way to optimise this FIFO selection. There may be another method that doesn't involve the use of 16 internal FIFOs, though. Do you really need 16 parallel timed loops, or can you process the 16 channels all in a single loop?
12-17-2013 10:43 AM
As always, if you post your code it will be easier to help you.
From your description, I assume that you empty the DMA FIFO into the internal FIFOs before the test starts, since there is no way that a single DMA FIFO can keep up with feeding 16 80mhz loops. If this is true, have you considered other means of filling the internal FIFOs? You could have a separate loop for each one that reads a value and uses a boolean for handshaking. The data transfer will be slower than over DMA, but it is easily expandable - just add another loop. This method may use more FPGA space, though.
12-18-2013 02:14 AM
Another option is to split the big 16-sel-1 mux to a series of multi-layer mux and insert pipeline in-between.
For example, since 0-15 is coded with 4 bits, you can rout the data from the big FIFO to 2 branches based on the MSB, insert a pipeline, and then branch it again into 2 branches based on the second bit, insert a pipeline... You can have 4 layers of branches with well pipelined structure, which will improve your timing but add to your resource usage.
Besides, if you are not so demanding on the 16 processing loops, you can replace the 16 local FIFO with Handshake modules which work similar to Register modules but with 4-wire handshake to make sure no loss of data and no data is fetched twice. This saves your resource. The disadvantage of Handshake module is the latency, since by Handshake data is transferred point by point and it takes about 6 cycles to finish the transfer of one point.