[FPGA] Tick Wait vs Discrete Delay not adding up

Foreshadow20 · ‎05-29-2015

The delays are inside the WHILE loop, how is the while loop executed

The delays are inside the loop, i dont see why it is considered as a shift regisiter. The way I see discrete delays is a basic buffer/useless logic door in order to keep the synch of the inputs.

This is what I expect the Discrete delays of LabVIEW to be:

The 3 "NOTs" that were added were to keep sync. It can be any logic based door that people use in FPGA programming in order to keep sync. Usually buffers or useless flip flops for longer delays (I'm using NOT since that is the software I found online).

But what you are telling me, this is what is happening (because the Z^-1s are in the while loop):

This destroys any possiblity to synchronize signals.

The other way I see the code would be the "stack" version in assembly:

You get the 7 count of the loop added to all the delays. So 512*9+7.

I don't see how it would be a complete loop for each nop.

 mov eax, $x
 cmp eax, 0x0A
 jg end
 beginning:
 inc eax
    //9*512 nop operations
        nop
        nop
        nop
        nop
        [...]
        nop
 cmp eax, 0x0A
 jle beginning
 end:

Foreshadow20 · ‎05-29-2015

In this example we still get 9*512 +7. Still no where from ~410us.

The calculations still don't work.

((9*512)+7) * 1/40MHz = 115.375us.

(9*512) + 7 + latency of all the other components (pulling these clock cycles out of my arse but I doubt they are so high: 4 for substaction, 8 for inverse and multiply, 1 for bit shift, 1 for conversions) would give:

7+1+(longest: 8 or 4+8) +1+1+(9*512)+8 = 4638 clock cycles for 115.95us. Still no where close to the ~410us.

Foreshadow20 · ‎05-29-2015

edited previous post*

Intaris · ‎05-29-2015

Right. One last try.

We all agree that 9*512 = 4608 right?

This is 4608 Clock cycles of a 80MHz loop. This equals an absolute time delay of 4608/80M = 57.6 uS if executed each cycle. Everybody on board?

So how does this relate to our output signal?

The code for our output signal is running at 80MHz but how often do we output a value? We have seen that a non-timed loop will take a full 7 cycles to execute this code (This is the way LV does this, there are alternatives as have been mentioned before and I mention again later). Our execution will look like:

0 : Step 1 - Read Input

1 : Step 2

2 : Step 3

3 : Step 4

4 : Step 5 - Discrete Delay

5 : Step 6

6 : Step 7 - Write Output

7 : Step 1 - Read Input

8 : Step 2

9 : Step 3

10 : Step 4

11 : Step 5 - Discrete Delay

and so on. So our output values are actually being written once every 7 cycles. This also means that our discrete delays are also executing once every 7 cycles. So because of this we need to MULTIPLY the 4608 delay by 7 which gives us ((4608*7)/80M) 403.2us OR divide the effective clock rate by 7 which gives us 4608/(80/7) which is mathematically identical to the previous case and again yields 403.2us delay.

IF you don't want this "One execution step per cycle" and would rather have ALL steps executing per cycle then you need a SCTL (Single-Cycle Timed Loop) which does what's written on the box. It executes the entirety of the code once per cycle. Bear in mind you'll need shift registers to pipeline the code. But then your delay will be wrong. You'll need 7x as many discrete delays for that but your data output rate will be much higher.

Intaris · ‎05-29-2015

@Foreshadow20 wrote:

In this example we still get 9*512 +7. Still no where from ~410us.

The calculations still don't work.

((9*512)+7) * 1/40MHz = 115.375us.

(9*512) + 7 + latency of all the other components (pulling these clock cycles out of my arse but I doubt they are so high: 4 for substaction, 8 for inverse and multiply, 1 for bit shift, 1 for conversions) would give:

7+1+(longest: 8 or 4+8) +1+1+(9*512)+8 = 4638 clock cycles for 115.95us. Still no where close to the ~410us.

Your calculations reflect their source.

Die Divide / Invert / Bit shift and so on all take only ONE or at max a couple of clock cycle to execute. But the catch is that the normal While loop on FPGA does NOT execute them all on the same cycle. Instead it executes ONE stage of the chain each clock cycle. So every seventh iteration we read the AI, the next cycle we negate and subtract, the next Cycle we multiply and so on.

Because the normal While loop allows each operation to take as much as it needs, they cannot be executed at the same time. They are not bounded by timings as they would be in a SCTL. This is why it is possible to use a "Wait" function in a normal While loop but not in a SCTL. Each and every operation stage in your while loop is a process which may take several clock cycles. It is in essence unbounded. This means that only one part of your chain is ever executed at a time, this is where the x7 comes from instead of your +7. And due to the nature of the large discrete delay, even if each operation takes 5 cycles, it makes almost no difference in the overall delay as this is hugely dominated by the discrete delays.

nathand · ‎05-29-2015

@Foreshadow20 wrote:

The delays are inside the loop, i dont see why it is considered as a shift regisiter. The way I see discrete delays is a basic buffer/useless logic door in order to keep the synch of the inputs.

Neither of your images, nor your assembly code (which is kind of meaningless in the context of an FPGA since there's no processor executing instructions), represents what is actually happening here (I think - I can't quite interpret your second image).

The delay is a circular buffer. Each time you call a specific instance of the delay function, it returns the oldest element, replaces it with a new element, and moves up one index. (That might not be the actual implementation, but it is the effect).

The while loop takes 7 clock cycles (of an 80Mhz clock) to execute a single iteration. So, every 7 clock cycles, you get the element that was put into the delay buffer 9*512 iterations earlier, or 9*512*7/80 us ago.

The delay is not logic. The amount of time it takes to execute is not related to the length of the delay. If it were actually implemented the way you propose (with a logic operation or a no-op), it wouldn't be very useful, because the while loop would have to wait for it to run before it could move on to the next iteration, and if you wanted that behavior, you'd just put a wait inside the while loop.

crossrulz · ‎05-29-2015

Just for fun, try this: Use 1 delay of 34 iterations. Why? You just want a phase shift of 6us. Since it takes 7 clock cycles to perform 1 iteration of the loop, each iteration takes 7/40MHz = 175ns. 6us/175ns = 34.29 iterations. So setting a delay of 34 would give you a phase shift of (34 iterations)*(7 clock cycles/iteration)*(1/40 clock cycles/us) = 5.95us. If using 80MHz clock, just double your number of iterations to 68. This would also allow you to react to the current sinusoid period instead of a previous period.

There are only two ways to tell somebody thanks: Kudos and Marked Solutions
Unofficial Forum Rules and Guidelines
"Not that we are sufficient in ourselves to claim anything as coming from us, but our sufficiency is from God" - 2 Corinthians 3:5

nathand · ‎05-29-2015

@nathand wrote:

Again, as ToeCutter wrote, the correct math is 9*512 delays * 7 clock cycles/delay (not +7) = 32556 clock cycles. Each clock cycle is 1/80Mhz, 32556/80000000 = 0.00040695 seconds, or 406.95us

Eventually someone will probably notice that I miscopied some numbers here, and 9*512*7 is actually 32256, not 32556, resulting in a delay of 403us (as others have mentioned) and not 407.

Foreshadow20 · ‎06-01-2015

@nathand wrote:

@nathand wrote:

Again, as ToeCutter wrote, the correct math is 9*512 delays * 7 clock cycles/delay (not +7) = 32556 clock cycles. Each clock cycle is 1/80Mhz, 32556/80000000 = 0.00040695 seconds, or 406.95us

Eventually someone will probably notice that I miscopied some numbers here, and 9*512*7 is actually 32256, not 32556, resulting in a delay of 403us (as others have mentioned) and not 407.

That's fine, I considered it as a typo when I saw it.

I think I now I see what you mean by the Discrete Delay being a shift register.

The data is propagated through each delay. So in fact you have 9*512 input values queued up being released 1 at the time on each loop? That could explain the extra 7 clock cycles on each discrete delay.

So it would look something like:

c0/AI7 -> Math -> [Input 1 loop ago][Input 2 loops ago]...[Input Input 9*512-2 loops ago][Input Input 9*512-1 loops ago][Input 9*512 loops ago] -> c0/AO7

nathand · ‎06-01-2015

@Foreshadow20 wrote:

I think I now I see what you mean by the Discrete Delay being a shift register.

The data is propagated through each delay. So in fact you have 9*512 input values queued up being released 1 at the time on each loop? That could explain the extra 7 clock cycles on each discrete delay.

So it would look something like:

c0/AI7 -> Math -> [Input 1 loop ago][Input 2 loops ago]...[Input Input 9*512-2 loops ago][Input Input 9*512-1 loops ago][Input 9*512 loops ago] -> c0/AO7

Yes! That's it exactly. The delay functions as a FIFO queue, in which one element is enqeueued and one is dequeued each time the delay function is called.

LabVIEW

[FPGA] Tick Wait vs Discrete Delay not adding up

Re: [FPGA] Tick Wait vs Discrete Delay not adding up

Re: [FPGA] Tick Wait vs Discrete Delay not adding up

Re: [FPGA] Tick Wait vs Discrete Delay not adding up

Re: [FPGA] Tick Wait vs Discrete Delay not adding up

Re: [FPGA] Tick Wait vs Discrete Delay not adding up

Re: [FPGA] Tick Wait vs Discrete Delay not adding up

Re: [FPGA] Tick Wait vs Discrete Delay not adding up

Re: [FPGA] Tick Wait vs Discrete Delay not adding up

Re: [FPGA] Tick Wait vs Discrete Delay not adding up

Re: [FPGA] Tick Wait vs Discrete Delay not adding up