LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

Matrix Multiplication in LabVIEW FPGA space

Solved!
Go to solution

We have a cRIO 9025 here. Currently all matrix calculations are being handled by the real time target of the crio 9025 but i want to move those calculations over to the fpga. the rt can only (from what i understand) recalculate every 25ish ms so moving it to the fpga will mean i can calculate and update much much faster (like 1ms i think).

 

Yep will always be a 4x4 (for this case atleast)

 

EDIT: we've also got a 9038 and a 9023 but I am focusing on the 25 atm

0 Kudos
Message 11 of 32
(2,093 Views)

The image you showed makes no sense. Why are you bundling portions of an array into clusters, if the arrays are all the same size? It's not clear what you're trying to do, and it would help if when you show your entire code, rather than small sections of it.

 

I think you are making this too complicated. On the host, you take your 4x4 arrays, reshape them to 1D arrays, and pass them to the FPGA through either the front panel or a FIFO. You then signal the FPGA that data is ready (the DMA FIFO essentially does this for you). The FPGA reads the two 16-element arrays, indexes out the elements, does the multiplication and addition, builds a new 16-element 1D array, and sends it back to the host. The host then reshapes that 1D array back to a 4x4 matrix.

 

However, it's unlikely this will be an improvement over doing the calculation on the host. I'm sure your RT target can multiply two 4x4 matrices in well under a millisecond, so your 25ms limit is coming from somewhere else in your code. Given the odd snippets you've shown so far, there's probably a lot of room to improve your existing code without adding in the FPGA.

Message 12 of 32
(2,089 Views)

Sounds like inputting your matrices as two clusters of four arrays of four elements can do the job. Then serialize four (or even sixteen) parallel sets of Mult-Add operations to return likewise a cluster of four arrays of four elements.

 

By the way, what is your data type (FPX?) and what type of accuracy do you expect for your results?

Message 13 of 32
(2,088 Views)

... and I totally agree with Nathand's comment:

 

However, it's unlikely this will be an improvement over doing the calculation on the host....

Message 14 of 32
(2,085 Views)

Sorry if it comes across as messy - I am only really new to this (1 week). Doing some kinematics on the RT and want to eventually move that entire kinematics system to the fpga but wanted to see if i could do it with a simple matrix. I agree that i think i am overcomplicating it but not on purpose haha.

 

once again, super sorry for all of these seemingly simple questions but i am still learning!

 

so originally i thought i would pretend it was a 4x4 matrix by sending each element as a single down to the fpga. but this (if i had many matrices to calculate) would use a lot of resources on the fpga so i found this 2d to 1d code online and was going to send this entire 1d array to the fpga then calculate it down there. first pic is what i had done and second is what im up to now

 

 

the RT can do a simple matrix calc super quick but the problem comes i think when we need to calculate 100s of them super fast

 

Download All
0 Kudos
Message 15 of 32
(2,083 Views)

The second image doesn't help you at all. There is no need to bundle parts of your arrays into clusters. Use Reshape Array, as I suggested.

Message 16 of 32
(2,078 Views)

Hey Muri777,

 

   Don't be sorry, at least you are trying to learn, and please keep learning!

   Your questions are all valid ones, the problem here is to understand how RT-FPGA communication works and what trade-offs make sense. FPGA is very powerful especially when it comes to parallel computations, but in your case it sounds like 4x4 matrices multiplications may not be worth the trouble, your RT engine is quite capable. The transfer to-from FPGA may over-shadow your actual processing needs.

 

 So really, repeating earlier questions ... what are your requirements for:

1 - speed (how many matrix operations do you need to achieve per second - 'as many as possible' is not really an answer, please give at least a minimum requirement)?

2 - accuracy (would Fixed.Point accuracy be ok or do you need float (single or double precision)) type of accuracy)? 

 

   then we can start discussing the best approach considering

- your RT processing capability

- your RT-FPGA communication speed. (how many DMA channels do you have etc...)

- your result accuracy requirements (float type of operations is very expensive on FPGA)

- ... your project schedule. We assume you have a deadline.... and RT programming is afterall a lot easier

Message 17 of 32
(2,061 Views)

Thanks for all of the replies!

 

We are running robotic kinematics on the RT contorller and using the fpga to gather data. currently we're maxing our matrix calculation speeds at 25ms and the ulitmate goal with that is to send that to the fpga so we can calculate them faster - there is no real deadline this is more of a "can i do it" project. Fixed point accuracy should also be fine as the numbers wont really be many numbres different from 0 and 1. Some will be a little different.

 

In terms of DMA - I'm not too sure. I think the 9025 has 2 and the 9038 has more. But I think from what i know this is the ebest way to send the data? since it acts like both the fpga and RT's own memory (for the lack of a better explanation)

 

I understand that the RT can easily do 4x4 matrix calcs and stuff but I wanna get the FPGA to do the calcs and send them back. 

 

So far ive got my 4x4 input matrix, did what nathand said which was use the reshape array and ive got it into a 1d string. im working on sending thatinfo down to the fpga now which i can unbundle in the fpga i think and use the info there. 

 

sorry to make this sound confusing but ill work on some code in a bit and post what i get.

 

sorry for the troubles!

 

thanks!

 

0 Kudos
Message 18 of 32
(2,038 Views)

Muri777 wrote:

We are running robotic kinematics on the RT contorller and using the fpga to gather data. currently we're maxing our matrix calculation speeds at 25ms and the ulitmate goal with that is to send that to the fpga so we can calculate them faster - there is no real deadline this is more of a "can i do it" project.

Please share your code. You are doing something wrong if you can't multiply 2 4x4 matrices in less than 25ms. The amount of time it will take to copy the 4x4 matrices to the FPGA, and copy them back to the RT, will almost definitely be greater than the time savings you might achieve by doing the multiplies on the FPGA, although even there I doubt you'll get any savings. The 9025 has an 800mhz processor - with a floating-point unit, I assume - whereas your FPGA probably runs at only 40mhz, meaning that even if you can complete the entire multiply in a single FPGA clock cycle your main processor can execute 20 less-complex operations in the same time period.

 


@Muri777 wrote:
 I understand that the RT can easily do 4x4 matrix calcs and stuff but I wanna get the FPGA to do the calcs and send them back. 

 

So far ive got my 4x4 input matrix, did what nathand said which was use the reshape array and ive got it into a 1d string. im working on sending thatinfo down to the fpga now which i can unbundle in the fpga i think and use the info there.

There is no need for either strings or unbundling here. I can't tell if you're using the wrong LabVIEW functions or the wrong terminology. Take your two 4x4 input matrices, reshape them to 1D arrays, write them to DMA FIFOs. On the FPGA, read 16 elements from the FIFO (you can use 1 FIFO or two, depending on what's available and how fast you need to go) to build the array. Index out the desired elements and do the multiplies and additions. Write the results to a target-to-host DMA. On the host, read 16 elements at a time from that DMA FIFO, reshape back to a 4x4 matrix.

 

For maximum speed, use 2 host-to-target DMA FIFOs, one for each input matrix. If you transpose one of the matrices you can probably do something clever where you read from each FIFO and immediately multiply the results, although this may not gain you anything versus loading both matrices and then attempting to do all the multiplies and additions in a single cycle before sending the results back.

Message 19 of 32
(2,026 Views)

asd

0 Kudos
Message 20 of 32
(1,965 Views)