"Acceptable" or extremely "low risk" situations for the use of synchronous messaging?

DoctorAutomatic · ‎11-16-2024

Over the years I have worked to train myself to think (in terms of coding) asynchronously. There are still almost daily circumstances that I come across where I have to expend a little extra mental energy (and coding effort) to make some inter-module communication safe for asynchronous style when it is apparent right away how easy the same could be accomplished in synchronous fashion. That said, I do hate to "pollute" my asynchronous project with a synchronous call , after spending so much time getting it to where it is. So, let the record show that I am indeed a devout asynchronous believer.

That said, I have a nagging feeling that it might be a-ok to be synchronous in certain circumstances and that all I need is for some of the esteemed members of this community to put their stamp of approval on it.

So, let's say you're working on an AF project (or DQMH for that matter), when do you forego making some intermodule/actor call asynchronous and just make it synchronous? I feel like my asynchronophilia might be overboard for many(most?) circumstances. It seems particularly safe to go synchronous for stuff that is local (no network involved), and completely under your own control (not waiting on something from a user, a database, third party software APIs, calls to .NET/Activex/DLLs, or hardware IO). Which basically means a whole lot of circumstances! Does this thinking call into question the whole logic/suitability of AF (DQMH asynchronously) of even using them where the synchronous-friendly situations dominate? I hope not, I rather enjoy using these powerful tools even for quite small/simple applications. But seriously, it seems there's no risk at all of hanging an application on a synchronous call to some labview method that you've coded yourself and you are 99.999% sure will return in a matter of microseconds.

CaseyM · ‎11-17-2024

Just to clarify, you're talking specifically about between actors/modules in this case, right? I think this is the case based on your post title, but just wanted to make sure.

OneOfTheDans · ‎11-17-2024

@DoctorAutomatic wrote:

...you are 99.999% sure will return in a matter of microseconds.

imo, this perspective is what leads to deadlocks. It doesn't matter if it's blocked for useconds or full seconds, and it doesn't matter if it's talking to hardware or .NET or just internal. The issue is circular dependencies.

For example, if Actor A synchronously polls Actor B to read an internal value (useconds), but Actor B happened to throw an internal error and shutdown at the same moment, then Actor A will deadlock.

This means synchronous is OK anywhere you can guarantee the message will be handled and returned. In AF that's generally only calling down to a Nested, never up. But per the example above, it also requires intentional consideration of the Nested's possible Stop causes.

I believe asynchronous has been oversold and overused, in the same way that OOP has been. They both solve a specific medium+ complexity problem in a novel way, but then we all (self included!) end up over-complicating our code just to use these software concepts.

drjdpowell · ‎11-17-2024

For a non-AF comparison, with my Messenger Library I use Synchronous Queries extensively, without ever having a lock up. That is because the Messenger Library API is asymmetric between Caller and Nested "actors"; Callers can synchronously query their Nested, but Nested can't query their callers. And Callers usually control the lifetime of their Nested; Nested do not shut themselves down on error.

AristosQueue · ‎11-17-2024

To understand when synchronous is safe, you have to know clearly when it is unsafe.

In bilateral communication, synchronous implies risk of deadlock. No synchronous? No possibility of deadlock. You literally never have to worry, "Is this possibly going to cause a deadlock in some rare situation?" My goal with AF was to make a system where the issues that were hardest to explain, hardest to replicate in debugging, and hardest to fix when they did crop up were ruled out by design rather than by convention. So once you admit synchronous into the communications, you have to think about it.

I started with "bilateral communication". Alice sends Bob messages; Bob sends Alice messages. As long as neither one waits for a response, no chance for deadlock. Deadlock occurs when Alice is waiting for Bob and Bob is waiting for Alice and neither one is checking their inbox to respond to the other. JDPowell notes that in a non-AF library, caller can synchronously query nested, but nested cannot query caller. That single-sided communication means no deadlock. Great.

But there's a second trap in synchronous communication that you have to worry about: latency.

Alice sends to Bob and waits for a reply
Bob sends to Alice. Sends to Alice. Sends more to Alice. Finally gets around to checking inbox and replies to Alice.
Repeat.

In this scenario, Alice's inbox is potentially filling up. Now you have definite hit of buffering or lossy messages, and you may have a bigger problem if Alice cannot clear inbox quickly between requests to Bob. And it gets worse if Alice is receiving messages from lots of senders.

The more I researched massively parallel processing, the more I found that the number one thing to keep the software healthy was for every actor to be checking and clearing inbox as fast as possible. As long as we can generally assume "message sent is message received", our programs are much healthier.

So where are synchronous messages safe? On one side of a low-bandwidth connection. Or one side of a high-bandwidth connection where backlog of messages can be cleared out.

People sometimes point out to me that it is safe to have sync messages on both sides of a connection where there's a timeout on the synchronous message. That breaks the deadlock at the cost of introducing a new failure mode. My problem with that idea is that handling the timeout case usually looks identical to async messaging -- if you make the timeout "0", it's pretty obvious that this is the same as asynch, so "synch with timeout" is generally just a performance enhancement on aysnch, not a replacement for asynch. At least, that's my experience.

I've also seen people try to get clever with synchronous comms using a DVR so that Alice can just query some value from Bob and get the latest that Bob has without waiting for an update message to come through the usual inbox from Bob to Alice. This way lies the madness of ill-defined state transmissions. It works at small scales, but it doesn't scale up at all. Messages arriving out of order and out of context is something code can deal with, but it's not trivial code for humans to write. There's a reason the world prefers TCP over UDP.

There's lots of times you can safely introduce synchronous comms. But having done so, your second introduction carries increased risk to the system as a whole, and that risk increases over time. A well-defined rule like "callers to nested only" constrains that risk considerably.

AristosQueue · ‎11-17-2024

BTW -- if you want to learn a lot about latency effects on a broader system, I recommend playing "Shapez 2", available on Steam. It's a factory production game, very low stress, but it's really fascinating* to visualize massive parallel messaging systems and see how bottlenecks propagate.

* to me. I'm weird like that. 🙂

AristosQueue · ‎11-17-2024

> I believe asynchronous has been oversold and overused,

To say that async is oversold, we would have to define completely the value that async provides. You said it was useful for dodging deadlocks. As I said in my post above, there's two dangers with synchronous -- the other is latency.

But beyond the transmission layer bugs, async adds one more major benefit. And it's this other benefit that accounts, in my observation, for the majority of the value.

The goal of AF was to make it so that each actor could be understood on its own without reference to the rest of the system. Obviously that's impossible to do perfectly, but that remains the goal. Minimizing how much a programmer has to remember while fixing any single part dramatically improves the programmer's chances of getting it right and allows novice programmers to work at far greater scales than they could otherwise.

Async messaging is critical to that. With async, you don't have to worry what messaging is done on the other side of the pipe. You don't have to worry about propagating bottlenecks. An actor's job is just to respond to messages as fast as possible, with no side effects on the rest of the system to the maximum degree possible. The system naturally will hit equilibrium based on the caller's rate of input messages. And it becomes the caller who can balance all the latency issues in one place.

> in the same way that OOP has been

A consistent way of handling data is itself a value. Boeing can have eight valve types on the Starliner, but each of those valve types requires independent exhaustive testing, and figuring out which one to use at a given connection is hard. SpaceX can use two valve types in Dragon and has a much lower testing and analysis burden, even if the efficiency of any single valve is not quite as optimal as it can be. Once the system is fully up and running, if something needs optimization, go in and sub in a special case valve. But you do that sparingly.

The claim of overselling any programming technique is usually made, in my experience, by skilled programmers who can see that obviously there's a better solution to the programming problem in a given scenario. But that is often not the better solution to the programer problem.

Ultimately, you choose the tools that work for your team on your project. Whether it's making pluggable systems for flexibility or hardcoding values for performance and safety, there's 10,000 tradeoffs in every application. But when we start talking about general programming and what's right in most cases, I'm going to tend to start a new dev off with OOP everywhere and asynch messaging always, and then let them find the cases to back away from those mandates over time. To me, that isn't overselling -- it's good teaching.

DoctorAutomatic · ‎11-17-2024

@CaseyM wrote:

Just to clarify, you're talking specifically about between actors/modules in this case, right? I think this is the case based on your post title, but just wanted to make sure.

Yes, that's what I'm talking about

DoctorAutomatic · ‎11-17-2024

@drjdpowell wrote:

For a non-AF comparison, with my Messenger Library I use Synchronous Queries extensively, without ever having a lock up. That is because the Messenger Library API is asymmetric between Caller and Nested "actors"; Callers can synchronously query their Nested, but Nested can't query their callers. And Callers usually control the lifetime of their Nested; Nested do not shut themselves down on error.

Interesting point, I never considered the difference between the direction (up or down) of a synchronous call. Generally, I feel like most of the time I have a higher level module/actor that needs something from a nested/lower level module/actor, which sounds like it aligns with what you're saying. I try to assume nested actors don't "know" their caller, so usually no need to query them for anything. I've been meaning to look into your messenger library for a long time, I hope to have some spare time soon to finally do it.

DoctorAutomatic · ‎11-17-2024

@AristosQueue wrote:

To understand when synchronous is safe, you have to know clearly when it is unsafe.

In bilateral communication, synchronous implies risk of deadlock. No synchronous? No possibility of deadlock. You literally never have to worry, "Is this possibly going to cause a deadlock in some rare situation?" My goal with AF was to make a system where the issues that were hardest to explain, hardest to replicate in debugging, and hardest to fix when they did crop up were ruled out by design rather than by convention. So once you admit synchronous into the communications, you have to think about it.

agreed, and by the way, I blame you, personally, for being the most influential person in convincing(or brow beating) me into "thinking" asynchronously. Thanks! But seriously, it has made me a better developer by a mile.

I started with "bilateral communication". Alice sends Bob messages; Bob sends Alice messages. As long as neither one waits for a response, no chance for deadlock. Deadlock occurs when Alice is waiting for Bob and Bob is waiting for Alice and neither one is checking their inbox to respond to the other. JDPowell notes that in a non-AF library, caller can synchronously query nested, but nested cannot query caller. That single-sided communication means no deadlock. Great.

by pure luck/intuition, I have always implemented one-way synchronous calls since joining the asynchronous church.

But there's a second trap in synchronous communication that you have to worry about: latency.

Alice sends to Bob and waits for a reply

Bob sends to Alice. Sends to Alice. Sends more to Alice. Finally gets around to checking inbox and replies to Alice.

Repeat.

In this scenario, Alice's inbox is potentially filling up. Now you have definite hit of buffering or lossy messages, and you may have a bigger problem if Alice cannot clear inbox quickly between requests to Bob. And it gets worse if Alice is receiving messages from lots of senders.

I am usually way over cautious in situations where message traffic/throughput will be high, so thankfully haven't had to learn the hard way.

The more I researched massively parallel processing, the more I found that the number one thing to keep the software healthy was for every actor to be checking and clearing inbox as fast as possible. As long as we can generally assume "message sent is message received", our programs are much healthier.

So where are synchronous messages safe? On one side of a low-bandwidth connection. Or one side of a high-bandwidth connection where backlog of messages can be cleared out.

exactly what I was asking about. I mean, when I quick drop a math operator down on the BD, I am not worried at all about my program having to wait for it to return the result. I don't try to make some absurd non-blocking mechanism to perform addition because I know it will return in a fraction of a blink of an eye, and certainly faster than sending a message to another actor who's sole job is to perform addition.

People sometimes point out to me that it is safe to have sync messages on both sides of a connection where there's a timeout on the synchronous message. That breaks the deadlock at the cost of introducing a new failure mode. My problem with that idea is that handling the timeout case usually looks identical to async messaging -- if you make the timeout "0", it's pretty obvious that this is the same as asynch, so "synch with timeout" is generally just a performance enhancement on aysnch, not a replacement for asynch. At least, that's my experience.

I've also seen people try to get clever with synchronous comms using a DVR so that Alice can just query some value from Bob and get the latest that Bob has without waiting for an update message to come through the usual inbox from Bob to Alice. This way lies the madness of ill-defined state transmissions. It works at small scales, but it doesn't scale up at all. Messages arriving out of order and out of context is something code can deal with, but it's not trivial code for humans to write. There's a reason the world prefers TCP over UDP.

that's funny, I am pretty sure the DVR idea has crossed my mind in the past once or twice (and quickly recognized as a cheap way out and likely a bad idea).

There's lots of times you can safely introduce synchronous comms. But having done so, your second introduction carries increased risk to the system as a whole, and that risk increases over time. A well-defined rule like "callers to nested only" constrains that risk considerably.

Actor Framework Discussions

"Acceptable" or extremely "low risk" situations for the use of synchronous messaging?

"Acceptable" or extremely "low risk" situations for the use of synchronous messaging?

Re: "Acceptable" or extremely "low risk" situations for the use of synchronous messaging?

Re: "Acceptable" or extremely "low risk" situations for the use of synchronous messaging?

Re: "Acceptable" or extremely "low risk" situations for the use of synchronous messaging?

Re: "Acceptable" or extremely "low risk" situations for the use of synchronous messaging?

Re: "Acceptable" or extremely "low risk" situations for the use of synchronous messaging?

Re: "Acceptable" or extremely "low risk" situations for the use of synchronous messaging?

Re: "Acceptable" or extremely "low risk" situations for the use of synchronous messaging?

Re: "Acceptable" or extremely "low risk" situations for the use of synchronous messaging?

Re: "Acceptable" or extremely "low risk" situations for the use of synchronous messaging?