cRIO-9057 FPGA DSP48 usage too high - optimization?

joshdoe420 · ‎01-23-2024

Alright, I'll have to admit that I was a bit uniformed. The multiply and the high throughput multiply does seem to use DSP modules to calculate. The problem is that your fixed-point data type is too large.

Per this post:

https://forums.ni.com/t5/LabVIEW/Reducing-DSP48s-usage-in-Labview-FPGA/m-p/4150266/highlight/true#M1...

If you use a fixed-point number with a very high resolution then it would require more DSP modules to do multiplication.

Terry_ALE · ‎01-23-2024

If you look up UG479 (v1.10) March 27, 2018 (Xilinx AMD doc) you can see the specs of the DSP used in the 9057's FPGA.

I recommend a range analysis of your code. That is, go through each aspect of your code and review the actual range (and significant digits) needed. This can help inform you on how many bits are actually needed. With fixed point math it is easy to have 'bit-creep' where each stage adds bits where in reality this does not mean we need all of those bits.

Certified LabVIEW Architect, Certified Professional Instructor
ALE Consultants

Introduction to LabVIEW FPGA for RF, Radar, and Electronic Warfare Applications

ilh262 · ‎01-24-2024

Josh,

I do not use any of the built-in PID controllers, but I do use lead-lag form PID controllers in my code (shown in fpga3 image). Unfortunately, these are necessary for the function of this program. This is where I use two z^-1 functions... does this affect DSP48 usage?

Thank you for your help!

ilh262 · ‎01-24-2024

Terry, thanks again for your response. I am using a NI9264 AO module and a NI9401 DIO module with a sample rate of 9 MHz (using 8 input channels). I am hoping to run the controller loop of my FPGA program (the slowest loop) at 20 kHz max. Will this allow me to do anything with clock timing to decrease DSP usage?

I went through my program and have tried to give a thorough overview of the signal process: In this FPGA VI I have three while loops:

- The first while loop is purely logic gates and integer math to interpolate the six quadrature encoders I am using for position feedback.

- The second while loop (shown in fpga1) takes the position outputs of the first while loop and computes the x, y, z, theta x, theta y, theta z positions of the controlled object. I downgraded all of the math from high throughput math (HT) to regular math operators. There are 11 multiply functions in this loop.
- The third loop is the controller and output loop. For one degree of freedom (DOF), I am using a vi to generate a trapezoidal velocity trajectory profile using nested case structures inside of a while loop. The output represents a position setpoint. I have again downgraded all of this math from HT to regular math operators. There are 13 multiply functions and 9 exponent functions. I attached a snip of this code in (fpga4). The other DOFs receive a static position setpoint.
These position setpoints are fed into a PID controller (fpga3). The P controller is just a single HT multiplier. The I controller utilizes a single cycle timed loop (default timing) and HT math. The D controller utilizes a Z delay function, a single cycle timed loop (default timing) and HT math. There are a total of 5 multiply functions per controller and 6 controllers.
The control efforts are then sent to a VI to map the six control efforts to the 8 actuators I am using. This VI uses all HT math, a total of 25 multiply functions, 4 divide functions, and I have included a snippet in (fpga5). The resulting actuator commands are then each multiplied by a slide integer (fpga2).

Total, there are 75 multiply functions, 9 exponent functions, and 4 divide functions, among many add/subtracts. Please let me know if you have any advice to reduce DSP48 usage! Thanks.

Intaris · ‎01-25-2024

75 multiplicators, but depending on the bit widths of each multiplication you may need more than 1 DSP for each multiply.

I am not familiar with your target, but the max for a multiplicator is mostly 25x17 or something like that. I see you're using quite a few 32-bit numbers and multiplying two 32-bitnumbers will definitely require more than a single DSP.

You will need to either decrease your bit widths or significantly re-structure your code to allow for multiplexing multiple calculations over a single set of DSPs.

ilh262 · ‎01-25-2024

Intaris, this is very helpful. It sounds like the bit widths are the main culprit in taking up all of these DSPs.

I am curious to know what you mean about "multiplexing multiple calculations over a single set of DSPs." I have been trying to understand the DSP48E1 function built into LabVIEW and whether this might reduce some DSP usage... can you speak to this at all? Thanks again.

ilh262 · ‎01-25-2024

Terry, I am absolutely dealing with bit creep, the math is yielding FXP values with many more bits than I need, but I am unsure how to reduce the effect of this. Do you have tips or resources for dealing with this? Is there a way to restrict the output of math functions to a certain number of bits? I assume the FXP conversion function won't impact this. Please let me know!

Intaris · ‎01-26-2024

If you structure your code very differently, you can iterate over a single DSP instance and feed it with different data each cycle, collecting the results and sending them on their way. This way the DSP can be executed at a faster speed, with it alternating between different actual calculations. tT's important to take note of any latency issues so that your data doesn't become desynced.

Terry_ALE · ‎01-30-2024

@Intaris wrote:

If you structure your code very differently, you can iterate over a single DSP instance and feed it with different data each cycle, collecting the results and sending them on their way. This way the DSP can be executed at a faster speed, with it alternating between different actual calculations. tT's important to take note of any latency issues so that your data doesn't become desynced.

This is what I am also suggesting.

Certified LabVIEW Architect, Certified Professional Instructor
ALE Consultants

Introduction to LabVIEW FPGA for RF, Radar, and Electronic Warfare Applications

LabVIEW

cRIO-9057 FPGA DSP48 usage too high - optimization?

Re: cRIO-9057 FPGA DSP48 usage too high - optimization?

Re: cRIO-9057 FPGA DSP48 usage too high - optimization?

Re: cRIO-9057 FPGA DSP48 usage too high - optimization?

Re: cRIO-9057 FPGA DSP48 usage too high - optimization?

Re: cRIO-9057 FPGA DSP48 usage too high - optimization?

Re: cRIO-9057 FPGA DSP48 usage too high - optimization?

Re: cRIO-9057 FPGA DSP48 usage too high - optimization?

Re: cRIO-9057 FPGA DSP48 usage too high - optimization?

Re: cRIO-9057 FPGA DSP48 usage too high - optimization?