Errors vs Faults? How to handle them?

1984 · ‎06-15-2023

Hello there,

I'd like to get an input from the community to develop better error / fault handling routines. I'm struggling a bit with the definition so feel free to add your own.

Error: quite like exceptions in text based programming languages, indicating something really unexected happened.

Unhandled exception: when the error cluster is not terminated so the automatic error handling takes over throws an error popup or just ignores the error.
Handled exception: we either let this error going all the way thru the error wire and we let it to stop our application (gracefully) or we could ignore it consciously or take action. The key that it does not crash our app unexpectedly

Fault: a fault (for me) is something we expected to be in a certain way, but its not. So its a deviation from the expectations. Typical example we measure a signal which should be withing 0-5V but it is 5.1V which is not a problem with our code but with the UUT.

(There is a gray area between the two. For example the test station has a cylinder which should go from one position to another in 500ms. What if it can't? Is that an error or a fault? I'd say its a fault but actually many people consider that its an error with the station.)

If my definitions are more or less agreeable then the question is: how should I store / propagate faults? For an error its not a problem (just use the error cluster), but what should I do with the faults? I dont need to build a very complex system to have tons of different possible faults which affect the rest of the code. Trival example: the process most likely should not continue any further if the cylinder couldn't get to position in time.

Some (actually many) people propagates faults on the error cluster. While I understand why do they do it I burnt myself so many times with this that whenever I see something like this I immediately start having bad feelings about the project.

I finished lots of project successfully in the past 10 years but I dont think I've evey seen / ever could come up with a solution which is truly good.

Thanks.

Defaphe · ‎06-15-2023

Hi,

Here how I am doing it but I do in most case state machine software.

Error cluster => Error that should require operator action : continue or stop action

The error cluster is read after each step and resetted for the next one as the operator action has already taken care of it.

Boolean indicator in a register or a class or FGV => Status of the test

The boolean is read before each step

If status = ok then continue

If status = NOK then do some action or do nothing at all

I record each step into a multicolumn array and that allow me to know if the result is Pass / Fail / Skipped.

AeroSoul · ‎06-15-2023

@1984 wrote:

(There is a gray area between the two. For example the test station has a cylinder which should go from one position to another in 500ms. What if it can't? Is that an error or a fault? I'd say its a fault but actually many people consider that its an error with the station.)

I'd say if it can't reach or doesn't reach in time then it's an error - in this case i'd abort the procedure due to safety.

If it misses the position it's a fault - in this case try to mitigate the poor positioning.

1984 · ‎06-15-2023

I'd say if it can't reach or doesn't reach in time then it's an error - in this case i'd abort the procedure due to safety. If it misses the position it's a fault - in this case try to mitigate the poor positioning.

Yes, I know many would consider that as an error. For me its rather a fault of the test system itself which my code should handle and I dont see how "can't reach" vs "not in the right position" should fall into different categories. If this is an error and propagated on the error wire than VIs along that line might work as expected. Also propagating this on the error might supress other errors. Another issue that while there could be simulatneous faults the error cluster can only propagate one (except if you do some pretty weird magic on it)

Dont get me wrong, your approach could be 110% correct, I just having difficulty with this topic and at the same time I really dont see a right solution. We solved this all the time, but none of them looks great.

paul_a_cardinale · ‎06-15-2023

I consider an error to be anything that prevents a test from executing properly.

Thus, if something goes wrong with any test equipment, an error should be thrown and the test aborted.

A UUT fault is when the UUT fails a test. The failure should be reported/logged. Whether or not the test should continue or be aborted depends on the particular failure. In your test specs, in addition to having test limits for each step, there should be a flag that indicates whether the test should continue or be aborted if the UUT fails that particular step.

RTSLVU · ‎06-15-2023

@paul_a_cardinale wrote:

I consider an error to be anything that prevents a test from executing properly.

Thus, if something goes wrong with any test equipment, an error should be thrown and the test aborted.

A UUT fault is when the UUT fails a test. The failure should be reported/logged. Whether or not the test should continue or be aborted depends on the particular failure. In your test specs, in addition to having test limits for each step, there should be a flag that indicates whether the test should continue or be aborted if the UUT fails that particular step.

⬆️ THIS ⬆️

Also I might mention in that I use the "Error subsystem" with custom error codes to handle UUT test faults rather than redundant code to halt a test when a UUT fault is serious enough to abort.

I use the same method for things like user abort too. When the user presses the "Stop" button the GUI loop sends an Error with a custom error code to my main control loop to shutdown the test

========================
=== Engineer Ambiguously ===
========================

wiebe@CARYA · ‎06-21-2023

When it comes to error handling, the multiple error feature changed my life.

Error Handling 2.0 - Wiebe Walstra (Carya) - GDevCon#3 - YouTube

If you attribute an error (at any point in your code) as fault or not, it might even be a direct help for your question.

Search LabVIEW like a graph!

1984 · ‎06-21-2023

Wow. This one is actually pretty interesting, I have not noticed this palette before.

There is something odd I have noticed though: this code does nothing with the error on my PC. It does not convert the error to JSON and the Created? output remains false. No surprize that if I ask for the attributes with that other VI it returns an empty array:

This one works as described, it assigns an attribute to the first error in the array:

Really weird, not sure if others see the same behaviour. (LV2022Q3 on win11)

wiebe@CARYA · ‎06-22-2023

Seems to work for me:

Search LabVIEW like a graph!

LabVIEW

Errors vs Faults? How to handle them?

Errors vs Faults? How to handle them?

Re: Errors vs Faults? How to handle them?

Re: Errors vs Faults? How to handle them?

Re: Errors vs Faults? How to handle them?

Re: Errors vs Faults? How to handle them?

Re: Errors vs Faults? How to handle them?

Re: Errors vs Faults? How to handle them?

Re: Errors vs Faults? How to handle them?

Re: Errors vs Faults? How to handle them?