06-03-2016 05:30 AM
We have FPGA code which does different operations at several different clock frequencies.
We have 8MHz, 10MHz, 50MHz, 80MHz, 128MHz and 160MHz. Each of these processes require several parameters for operation. One example would be the harmonic of a modulator running at 160MHz. Let's say we have 32 parameters of 14 bit each (448 bits of data).
We pass these parameters to the FPGA code via a DMA which is 64-bit in width, the uppermost 8 bits signify the module number for the parameter being transferred, the second 8 bits the command and the rest is the payload. This way we have a scaleable parameter queue for our FPGA code. This DMA is currently read in the 8MHz loop. Many of the parameter being written are, however, used otherwise exclusively in a different clock domain, let's stick with the 160MHz example. As I understand things at the moment (Our code originates from earlier LV versions) globals written to and read from in different clock domains are synchronised by handshake in the background (meaning each transfer takes multiple clock cycles).
What is the most efficient method of passing these parameters on to the processes which require them? It is very important to note that I am not speaking of purely resource-optimised transfer (less LUTs, less registers and so on) but much more which version prevents other issues from emerging and becoming a major PITA. We currently just write to registers (instantiated by the individual modules) and let LV do the routing. But we are getting more and more timing violations with large routing delays in which many seemingly unrelated signals of the FPGA fabric are causing problems (VHDL signals from different clock domains within a single timing path!). Could it be that our clock-domain crossing is causing logically independent parameters to be unneccessarily coupled due to routing limitations?
Would it be better for example to simply re-route ALL received parameters which are destined for the 160MHz loop to a single 160MHz FIFO (Preferably BRAM with built-in control logic) and let the write / read of the parameters be performed in the module itself instead of writing to 32 parameters int he 8MHz loop and just reading them int he 160MHz loop? This causes a lot of handshaking to go on, right because each parameter is handled independently right?
8MHz: DRAM Read : Interpret Module Nr. : Interpret Command Nr : Write Global Nr 1
8MHz: DRAM Read : Interpret Module Nr. : Interpret Command Nr : Write Global Nr 2
8MHz: DRAM Read : Interpret Module Nr. : Interpret Command Nr : Write Global Nr 3
8MHz: DRAM Read : Interpret Module Nr. : Interpret Command Nr : Write Global Nr 4
8MHz: DRAM Read : Interpret Module Nr. : Interpret Command Nr : Write Global Nr 5
8MHz: DRAM Read : Interpret Module Nr. : Interpret Command Nr : Write Global Nr 6
8MHz: DRAM Read : Interpret Module Nr. : Interpret Command Nr : Write Global Nr 7
8MHz: DRAM Read : Interpret Module Nr. : Interpret Command Nr : Write Global Nr 8.... etc
160MHz: Read Global 1 : Read Global 2 : Read Global 3 : Read Global 4 : Read Global 5 : Read Global 6 : Read Global 7 : Read Global 8 :...etc do stuff3
as opposed to
8MHz: DRAM Read : Interpret Module Nr. : Interpret Command Nr : Write 160MHz FIFO (write to same FIFO for all signals going to the 160MHz domain)
160MHz : Read FIFO : Interpret Command : Write one of 32 Globals
160MHz : Read Global : do stuff
I have marked the actions which are inherently leading to clock-crossing in red. I would assume that the FIFO method would lead to the 160MHz domain being much less tightly coupled (ina routing sense due to the first method having a LOT more red) to the 8MHz domain but I'm not sure if this is actually the case or not. Does anyone have enough behind-the-scenes informationt o be able to comment?