Good code rule for casting datatypes

FrasseKatt · ‎10-07-2020

I've created a code rule document with one rule saying:

Prevent unintentional impact on the data integrity, e. g. by "downcasting" the datatype (e. g. I16 → U16, path → string), rounding off and eliminating decimals.

However, when my code was exposed to code review, the reviewer found a lot of coercion dots, and most of them were hard/impossible to avoid. So obviously, this rule is too tough.

Now we've just discussed this rule and found it difficult to find a good balance, at the same time preventing truncating and wrap-around, but not catching "safe" casting of representation.

Within this brilliant community, I'm sure that there are some good ways to solve this. What is a good, balanced rule to avoid impact on data integrity?

RavensFan · ‎10-07-2020

Some you won't be able to avoid. Some LabVIEW functions put out or take I32's, while others do U32's. You can always add a type conversion bullet which will eliminate the coercion dot, but that isn't really necessary.

In general, you can stick with 32 bit integers and double floating point numbers, there generally isn't much to gain by limiting yourself to U16's and U8's. Modern PC's have wide enough memory paths to easily handle the larger integers. However, if you are trying to work with a tremendous about of floating point numbers, you can probably use single precision as it takes half the memory space, as long as the values you are working with can tolerate the lower precision.

Kevin_Price · ‎10-07-2020

Honestly, I don't expect that there'll be a simple, clear rule that'll be appropriate for every situation.

What I actually often do is place explicit conversions in places where there would otherwise be coercion dots to help clarify intentionality. I even usually do it between a block diagram constant and the function terminal I'm wiring it to because a 0 on the diagram looks the same for all bit-widths worth of integers whether signed or unsigned. (My general habit is not to bother for i32 or double, but to add this no-op of explicit type conversion for other types. I just like to draw attention to places where less common datatypes are being used.)

But I also wouldn't advocate making this a code rule that *must* be followed. It's probably only turned out to be truly helpful a pretty small %age of the times I've done it.

-Kevin P

ALERT! LabVIEW's subscription-only policy came to an end (finally!). Unfortunately, pricing favors the captured and committed over new adopters -- so tread carefully.

johntrich1971 · ‎10-07-2020

@Kevin_Price wrote:

What I actually often do is place explicit conversions in places where there would otherwise be coercion dots to help clarify intentionality.

I also do this. If there are coercion dots then the data is being downconverted anyway, and I find the explicit conversion to be more readable. The only other option that I see would be to write new functions for the datatype that you're using - a task that I would not attempt unless I absolutely knew that it was necessary.

billko · ‎10-07-2020

@RavensFan wrote:

Some you won't be able to avoid. Some LabVIEW functions put out or take I32's, while others do U32's. You can always add a type conversion bullet which will eliminate the coercion dot, but that isn't really necessary.

In general, you can stick with 32 bit integers and double floating point numbers, there generally isn't much to gain by limiting yourself to U16's and U8's. Modern PC's have wide enough memory paths to easily handle the larger integers. However, if you are trying to work with a tremendous about of floating point numbers, you can probably use single precision as it takes half the memory space, as long as the values you are working with can tolerate the lower precision.

The only time I worry about the width of an integer is if it needs to "fit" into a message I am going to send somewhere. At the "human" level of the message, I will even do this for the enums.

Bill

(Mid-Level minion.)
My support system ensures that I don't look totally incompetent.
Proud to say that I've progressed beyond knowing just enough to be dangerous. I now know enough to know that I have no clue about anything at all.
Humble author of the CLAD Nugget.

altenbach · ‎10-07-2020

First let's get the terminology right. casting (such as in typecast) is very different to a datatype conversion. Coercions don't do "casting" (but for integers of the same number of bytes it's almost the same).

Simplified, conversions and coercions try to retain the value while typecasting tries to retain the underlying bit pattern. Big difference! (Let's keep extra complications such as byte order out of the discussion).

A coercion dot in itself is nothing bad. The compiler is just telling you that "Hey, I am doing something for you here that's needed". Sometimes a coercion is actually more efficient than an explicit conversion because it might avoid extra memory allocations.

For example if you wire a constant to a subVI, the inexperienced programmer will just grab an integer from the palette and wire it (always I32), while the seasoned programmer would just right-click...create-constant on the terminal and automatically get the correct type integer constant.

The right way is to be more aware of datatypes during programming, maybe carry the correct datatype from the beginning, and use functions that retain the datatype (e.g. maybe "Q&R" is more appropriate that "divide" when dealing with integers).

Your path->string example is a bigger can of worms. Paths are OS independent and know about platform specific path delimiters. Once you convert to a string, the code becomes OS specific. So if you e.g. manipulate paths as strings, it could break on a different platform. Typically path-to-string should only be used to convert a path to something that requires a string, such as e.g. a dll or command-line input.

The code reviewer can't just say "I see red coercion dots and that's bad". However an abundance of coercion dots typically indicates that the programmer might not be fully aware of datatypes.

If you want to share a small section of code with coercion dots, let us have a look to decide.

LabVIEW Champion.

billko · ‎10-07-2020

@altenbach wrote:

First let's get the terminology right. casting (such as in typecast) is very different to a datatype conversion. Coercions don't do "casting" (but for integers of the same number of bytes it's almost the same).

Simplified, conversions and coercions try to retain the value while typecasting tries to retain the underlying bit pattern. Big difference! (Let's keep extra complications such as byte order out of the discussion).

A coercion dot in itself is nothing bad. The compiler is just telling you that "Hey, I am doing something for you here that's needed". Sometimes a coercion is actually more efficient than an explicit conversion because it might avoid extra memory allocations.

For example if you wire a constant to a subVI, the inexperienced programmer will just grab an integer from the palette and wire it (always I32), while the seasoned programmer would just right-click...create-constant on the terminal and automatically get the correct type integer constant.

The right way is to be more aware of datatypes during programming, maybe carry the correct datatype from the beginning, and use functions that retain the datatype (e.g. maybe "Q&R" is more appropriate that "divide" when dealing with integers).

Your path->string example is a bigger can of worms. Paths are OS independent and know about platform specific path delimiters. Once you convert to a string, the code becomes OS specific. So if you e.g. manipulate paths as strings, it could break on a different platform. Typically path-to-string should only be used to convert a path to something that requires a string, such as e.g. a dll or command-line input.

The code reviewer can't just say "I see red coercion dots and that's bad". However an abundance of coercion dots typically indicates that the programmer might not be fully aware of datatypes.

If you want to share a small section of code with coercion dots, let us have a look to decide.

I have to say that when you are first formally indoctrinated into the World of LabVIEW, NI, through their official tests, makes you very pathologically afraid of coercion dots. I used to be that way, but I've since been able to (mostly) control my obsession with eliminating every single coercion dot, mostly by making sure not to have them in the first place. But I will still explicitly coerce if I feel it adds to the "self-documentation" aspect of my coding.

Bill

(Mid-Level minion.)
My support system ensures that I don't look totally incompetent.
Proud to say that I've progressed beyond knowing just enough to be dangerous. I now know enough to know that I have no clue about anything at all.
Humble author of the CLAD Nugget.

Kyle97330 · ‎10-07-2020

@FrasseKatt wrote:

I've created a code rule document with one rule saying:

Prevent unintentional impact on the data integrity, e. g. by "downcasting" the datatype (e. g. I16 → U16, path → string), rounding off and eliminating decimals.

However, when my code was exposed to code review, the reviewer found a lot of coercion dots, and most of them were hard/impossible to avoid. So obviously, this rule is too tough.

Now we've just discussed this rule and found it difficult to find a good balance, at the same time preventing truncating and wrap-around, but not catching "safe" casting of representation.

Within this brilliant community, I'm sure that there are some good ways to solve this. What is a good, balanced rule to avoid impact on data integrity?

One alternative that can work in some specific situations is to use Malleable VIs (VIMs) instead of normal VIs. You can then use a combination of the Type Specialization Structure with the Assert functions to do any needed checks for safe conversion inside the subVI (subVIM?) that is consuming the incoming data type.

For instance, if a subVI needs a U16 variable and nothing else (because it's writing to a Modbus register, for instance), it can check that any incoming numeric is between 0 and 2^16-1 before the conversion to U16 is done internally, and return an error if it's out of range instead of just being coerced to the min or max value. It also conveniently will never have a coercion dot on its connection pane, and you can even set it to completely break the calling VI if a completely incompatible type is passed in (like if someone wired in a complex number).

johntrich1971 · ‎10-07-2020

@altenbach wrote:

First let's get the terminology right. casting (such as in typecast) is very different to a datatype conversion. Coercions don't do "casting" (but for integers of the same number of bytes it's almost the same).

Simplified, conversions and coercions try to retain the value while typecasting tries to retain the underlying bit pattern. Big difference! (Let's keep extra complications such as byte order out of the discussion).

A coercion dot in itself is nothing bad. The compiler is just telling you that "Hey, I am doing something for you here that's needed". Sometimes a coercion is actually more efficient than an explicit conversion because it might avoid extra memory allocations.

For example if you wire a constant to a subVI, the inexperienced programmer will just grab an integer from the palette and wire it (always I32), while the seasoned programmer would just right-click...create-constant on the terminal and automatically get the correct type integer constant.

The right way is to be more aware of datatypes during programming, maybe carry the correct datatype from the beginning, and use functions that retain the datatype (e.g. maybe "Q&R" is more appropriate that "divide" when dealing with integers).

Your path->string example is a bigger can of worms. Paths are OS independent and know about platform specific path delimiters. Once you convert to a string, the code becomes OS specific. So if you e.g. manipulate paths as strings, it could break on a different platform. Typically path-to-string should only be used to convert a path to something that requires a string, such as e.g. a dll or command-line input.

The code reviewer can't just say "I see red coercion dots and that's bad". However an abundance of coercion dots typically indicates that the programmer might not be fully aware of datatypes.

If you want to share a small section of code with coercion dots, let us have a look to decide.

As usual a very insightful post. I was a little loose with terminology, meaning an explicit coercion and not a typecast. I wholeheartedly agree that the best way is to be aware of types from the beginning. I do sometimes find, however, that sometimes I have an I32, for instance from an iteration terminal, that I need to connect to a function that takes a U32. I often find that the explicit coercion is more self documenting in those cases when I need to coerce. It probably does also somewhat stem from the old mantra that coercion dots are bad, lol.

FrasseKatt · ‎10-07-2020

Wow, I knew there would be so much knowledge and competence in here. So far, thanks for all your contibutions.

When it comes to the actual issue, let me just clarify that it's a good, balanced code rule that I'm looking for. Something catching the risky cases, but not more.

In the code itself, there are probably lots of ways to avoid the risky cases. But right now, I need something for the reviewer to follow.

LabVIEW

Good code rule for casting datatypes

Good code rule for casting datatypes

Re: Good code rule for casting datatypes

Re: Good code rule for casting datatypes

Re: Good code rule for casting datatypes

Re: Good code rule for casting datatypes

Re: Good code rule for casting datatypes

Re: Good code rule for casting datatypes

Re: Good code rule for casting datatypes

Re: Good code rule for casting datatypes

Re: Good code rule for casting datatypes