03-08-2024 11:24 AM
My FPGA code won’t compile as we’re using 135% of the logic lookup tables. (0% for memory) And it’s not finished.
I’ve inherited this code and am new to FPGA so am not clear how best to optimize it, but I suspect given the number of front panel elements there are a lot of ways.
Most of the front panel items are for settings. Many of these could be written at startup and not changed again. Is there a low memory way to do that? Is this what a ‘Memory item’ is for?
This would be a large settings cluster of many data types, (or an ini file) but wouldn’t need editing again during runtime so the values could be considered constants. (Though some option such as an ini file is needed for the user to be able to edit them pre run.)
In terms of communication methods, I’m unclear as it seems sometimes methods are touted as ‘better’ because they are faster, but it’s not clear to me what is least resource intensive. Speed is not essential in this program.
For between FPGA and host is writing front panel elements the least memory intensive way of occasionally updating values?
If I want some FPGA ‘debug’ info, should I display this on the FPGA front panel and read these to the host UI, or use local variables to grab this info from across the program and send it as a cluster using a DMA?
For data transfer between loops on the FPGA:
Is it true local variables are less resource intensive than queues if I already need the front panel indicator anyway?
What if I don’t need the front panel item, should I use a queue to transfer the same info?
Many indicators are for debugging only, I’ll cut them out.
Otherwise I’ll cut down the eg 8 timeout Booleans, and can make some of the arrays a bit smaller. It’s really this data communication piece that I’m not clear on.
Any advice very welcome.
My FPGA is a PXI-7853R
Solved! Go to Solution.
03-08-2024 02:31 PM
You're FP items don't actually look too big. It's hard to tell exactly because not all of the arrays are fixed size but nothing looks crazy.
The first thing I would try minimizing is the size of your "Data to PC" DMA FIFO. Right now it's at close to 33k requested elements which is a ton. You can probably drop that down to closer to 2k. You can make the host side buffer of the DMA be much larger but the FPGA side of the buffer should typically be very small.
03-08-2024 02:48 PM
I tried to open your Project file, but it appeared to contain only the Host part of the Project, i.e. it was not structured as a LabVIEW-RT Project, with separate sections for "My Computer" (Host) and the RT Target (Target). Here's an example from one of my RT Projects that uses a myRIO as the Target.
Your Project is missing the RT Target section (the bottom 6 entries in the list above).
Bob Schor
03-08-2024 03:34 PM
Oh sorry for that. It does normally look like what you’ve attached, with the FPGA and associated files at the bottom, although it’s a USB FPGA I’ve been developing on so I don’t know if that’s removed itself somehow.
The FPGA program should be on there as ‘FPGA main.vi’
When I’m back at my laptop tomorrow I will reattach
03-11-2024 05:24 AM
I downloaded and extracted the folder I linked, and the FPGA target does appear on the project for me, so I'm not sure how I can make it so it appears the same for you
03-11-2024 07:56 AM - edited 03-11-2024 08:34 AM
Which LV version? I'd like to have a look
I had a look. You would save a lot of resources if you used SCTLs instead of standard while loops. You appear to be trying to implement something similar to pipelining by splitting up your loops with FIFOs between. This feels to me to be a half-way point between standard while-loop code on FPGA and SCTL code.
Did you split up the loops in order to increase overall throughput?
03-11-2024 08:46 AM
Thanks for having a look.
That was the previous developer's decision so I can't say for sure but yes about 5 of the loops are doing consecutive steps in the same processing sequence, so I believe it was to increase throughput,
However speed is not very important in our program, so if there is something I can do with these that will decrease the resource usage that would be great even if it's at the expense of throughput.
03-11-2024 09:16 AM - edited 03-11-2024 09:17 AM
Well, SCTL code is inherently more efficient than standard while loops. You could then replace all of the FIFOs with simply shift registers in a single loop.
What code in a non-SCTL does is switches all of the nodes individually, clock for clock until everything gets processed in sequence. The intermediate values are ALL stored in registers.
In a SCTL (i.e. Timed loop on FPGA), everything runs at full speed all the time and balancing delays between different nodes is the responsibility of the programmer. It's a bit harder to get into, but offers both a better resource utilisation AND a higher overall throughput. The standard While loop approach is to provide ease of access to traditional LV programmers, but it leaves a LOT of potential on the table. Having said that, some IO nodes for NI modules require standard while loops to write to them. The rest of the code, however, can easily be placed in SCTLs.
Things like autoindexing for loops are less trivial to implement in SCTLs, you generally have to implement the parallalisation yourself. More work, but really good payoff when it's running. And again, much better resource utilisation.
03-11-2024 09:19 AM
I don't have a FPGA setup right now, so I can only look at your code.
I'd suspect:
1) Trigonometric functions (sin, cos, tan) that might use a LUT per instance (using gates?);
2) The FIFOs that might use gates;
3) Those filters.
It would be quite easy to pin point which (if any) of them (0..n) is the problem, by simply disabling the code.
Then you can look for a solution...
03-11-2024 09:53 AM
So to add some detail:
LUTs are used for branching in FPGA code. Not storing of values. FP elements tend to end up being really register-heavy, but LUT-light.
If your LUT count is too high, there's too much decision-making going on. As Wiebe points out, Sine, Cosine and so on, depending on how they are implemented, can very quickly take up a LOT of LUTs.
Also case structure output terminals, adders, selectors in general all take up LUTs. Double-check that none of your FIFOs are set up to use LUTs, use BRAM with built.in FIFO logic instead.