LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

Refactoring FPGA causes increase of SLICEs when converting code blocks to subVIs?

Just wondering if this to be expected.   I noticed that the SLICE count increased by about 500 (12700 -> 13200) when converting some code to sub VIs (using create sub VI).  I created about 5 sub VIs with some basic logic in them.  Theroretically there should be no difference..? I'm using a 3M crio 9104 backplane.
Message Edited by robdevyogi on 08-15-2008 03:20 PM
Rob
0 Kudos
Message 1 of 6
(4,949 Views)
Hi,

Actually, there is a difference.

Using subVI's does enable the FPGA to reuse the code. Notice that if you use
a subVI, the subVI will only execute, just like in normal LabVIEW. I don't
know why you got in increase in slices, but measuring the nr of slices isn't
really reliable. The only reliable thing to do is to make code that doesn't
fit, and optimize it until it fits. This is because if the code fits, the
code isn't optimized for size, and there is no telling how small it can be.

So theoretically, 5 peaces of code in one subVI should produce 5 times less
code. But there will be synchronization code, and probably lot's of it... So
it will depend on what is in the subVI if it will be smaller or not (if the
added sync. code will be worth the code reduction). See what is bigger: 50
copies of the code, or 50 subVI's. What about 200? You should be able to
calculate the slope and interception, meaning the size of the code, and the
subVI overhead... It will be inaccurate, since the slices are just a rough
measurement.

If you want to simply hide the code, but make it execute in parallel, you
need to make the subvi reentrant. Reentrant subVI's also decrease the code's
memory, but a bit less then normal subVI's. My guess is that parts of the
code get reused, and critical parts are copied and synchronized on round 1
ticks.

I did some experiments where I copied some code, and saved it in two
subvi's. This will be the biggest, then reentrant subVI's, then normal
subVI's. Where code that is not in subVI's fits in, probably depends on the
code...

Some additional optimalisation tips:

Do not use cases! Unless (if it can't be avoided) around memory access or
FIFO send function.
Put calculations in a single cycled timed loop (compilation will probably
take longer).
If you need to send 8 u8's to the FPGA, use a u64 instead of 8 u8's. The
unbundling will be smaller then 8 times the synchronization mechanism than
LabVIEW adds for the communication.

Search NI.com. There is an excellent article on FPGA optimization
techniques.

Regards,

Wiebe.



Message 2 of 6
(4,914 Views)

Thanks Wiebe for the useful hints.  You're mentioning some things that I did not find in the NI docs on optimization.

 

The sub VIs I created were code blocks from the parent VI.  They are all reentrant and only instantiated once each.

 

It is also worth noting that the gate usage increased from 88% to 92% between the two compiles.  I assume the optimization would have kicked in at 90%.

 

Just curious, you mentioned that you found the SLICE count to not be an accurate indicator when comparing compiles to each other.  But would you say it is accurate once you exceed the 90% threshold, since then all code is optimized and doing something causing an increase is a definite increase in SLICE usage?

 

Also interesting you mentioned that one U64 takes less space than 8x U8. I do this, so this could help if I get close to running out of gates. 

Rob
0 Kudos
Message 3 of 6
(4,888 Views)
For all the optimisation ins and outs, I think you need to study Xilinx
compiler manuals. LV simply uses a (slightly adjusted, so we can't see the
VHDL) version of this compiler.

The compiler doesn't optimise for size, when it has enough space. So if you
have 9 equal parts, and it thats 90%, you might be able to fit 2 extra
blocks. There is really no way to tell, except trying it out.

Another thing I used to avoid large arrays (the arrays I needed where so
big, array constants crashed the compiler after 3 days), is to use memory
blocks. To read and write this memory from the host, make a parallel loop
that does nothing else but reading or writing an address. This address is a
control, and the result is an indicator. Now you can simply set the control,
and read the indicator on the host. So you can transfer all the data to your
host by looping through all the elements.

Regards,

Wiebe.


Message 4 of 6
(4,868 Views)
We also upgraded to LabView 2009 too and now our FPGA Vi won't compile. It either stops with a timing violation (Error -61499 Internal Error?) or mostly likely it loops forever never converging in Phase 6 Intermediate with 537 to 172 unrouted. We get this warning:

       “The design might fail to fit on the FPGA because the estimated device utilization exceeds 100 percent for one or more types of

  FPGA resources.  Refer to the 'Estimated device utilization (synthesis)' report for more details.”

Under the Estimated device utilization (synthesis):

       Total Slices           Used 21756    Total 20480           Percent 106.2

 

It looks like we're just over the number of slices. Somehow we have to reduce that number down. We have no subvis, no single-cycle timed loops - just while loops.

 

Found these resources that I'll be trying:

Optimizing FPGA Vis

Timing Violation Analysis

Xilinx Options

 

If anyone have any other resources, please post them.

-Paul

 

0 Kudos
Message 5 of 6
(4,596 Views)

Hi Paul,

 

SCTLs actually reduce the number of flip-flops and therefore, the number of logic slices used. The resources that you have found are excellent--try to pipeline your code as much as you can and use simple operations. 

 

Ipshita C.

National Instruments
Applications Engineer
0 Kudos
Message 6 of 6
(4,552 Views)