A few notes on message passing

4 hours ago by javajosh

>Messages are always copied before being inserted into the queue. As wasteful as this may sound it greatly reduces garbage collection (GC) latency as the GC never has to look beyond a single process. Non-copying implementations have been tried in the past, but they turned out to be a bad fit as low latency is more important than sheer throughput for the kind of soft-realtime systems that Erlang is designed to build.

I do so wish there was a reference or link around "tried in the past".

3 hours ago by moonchild

Linear types (as an implementation detail) seem helpful here; if escape analysis shows a message is never touched after being sent, you can move it into the queue instead of copying it.

3 hours ago by joppy

Would this help? My understanding is that the copy was so that the new object exists in the heap of the receiving process, and therefore the GC can have a strong guarantee that all objects within a process belong to the GC for that process.

Whether moving an object between two GC’d heaps is even possible (let alone simple or complicated) surely depends a lot on the GC implementation. The implementation of concurrent GCs is not easy...

3 hours ago by dnautics

So this isn't entirely true; binaries above a certain size have their own recounting, process independent GC going on in the BEAM; I wonder why the VM doesn't use a similar system for highly mobile data structures (copy to a common refcounted "scratchpad" on first send).

an hour ago by knome

binaries never contain references to anything else, and therefore can never be a part of a reference loop. so it's safe to toss them out into a shared space and use reference counting to track them. you don't want to do it with small binaries, as the overhead of shared-allocating/ locking/ incrementing/ locking/ decrementing would probably be worse than just coping the small binary in the first place.

if you put complex reference bearing data out in the void, now whatever memory arena that allocated it may contain data that it references, and now you have to keep track of those references and use all of the shared items as new roots into your data. and if some other memory arena receives it and plucks out a value, do you only then copy it or do you reference back to the original. if you send a message back and form a loop, the gc process for either now has to jump through both, and you've destroyed your cheap gc cycles. if you shunt all of the data into shared memory to avoid multi-gc reference loops, you've now just created a big shared gc that will have all the problems of big shared gcs.

the per-process small gcs with full copying between them mean you can do things like just dropping the arena when the process dies without even checking liveness for anything in it ( you'll need to run any destructors if anything needs cleanup ( like references to shared large binaries needing dereferencing ), but you can keep track of items that need that in a bitmap or something, and avoiding the trace is a win on its own )

3 hours ago by toast0

It would seem off heap messages could be a base for this. I've not gone all the way into the weeds on those, but it seems like it could be possible to start with sending a whole message to another process by bumping a refcount and adding a reference to that process's mailbox without needing to make a new copy.

If that works, and is useful, you could extend by bumping the refcount, and adding a new type of subreference to a process's mailbox.

I don't think you'd want to send something referencing multiple off-heap messages or adding some terms and a reference (although maybe, who knows), because that would be a lot of complexity. And you would also want to be sensitive to subreference memory leaks, like can happen with binaries where a binary is large, but only small subreferences stick around, leading to lots of wasted space (this is extra fun when you put a small subbinary into ets/mnesia and it lives forever (well until you restart beam, but that can be a long time).

4 hours ago by undefined

[deleted]

6 hours ago by JetAlone

"Messages are always copied before being inserted into the queue. As wasteful as this may sound it greatly reduces garbage collection (GC) latency as the GC never has to look beyond a single process. Non-copying implementations have been tried in the past, but they turned out to be a bad fit as low latency is more important than sheer throughput for the kind of soft-realtime systems that Erlang is designed to build."

So the alternative to copying the message would be to use a cross-process reference in memory that GC has to take the extra step of looking at two processes per message... I write a lot of higher level code than this and would be prone to overlook these kinds of details, but I want to get better at noticing performance gotchas. The architecture of the compiler and target assembly would have to be key to that.

5 hours ago by bcrosby95

> So the alternative to copying the message would be to use a cross-process reference in memory that GC has to take the extra step of looking at two processes per message

At what point do you decide to make a copy? What if you pass a subset of the data to process C, which stores a subset of that data, then later passes it to process D? If you don't copy the data at any point in this series of steps, you need full blown cross-process GC which is counter to their goal.

5 hours ago by undefined

[deleted]

3 hours ago by dnautics

> So the alternative to copying the message would be to use a cross-process reference in memory that GC has to take the extra step of looking at two processes per message

To be frank, erlang already does this with binaries over some size limit (I think 64B)

6 hours ago by chrisseaton

The big problem nobody talks about with actors and message passing is non-determinism and state.

If two otherwise independent processes A and B are sending messages to a single process C, it's non-deterministic which arrives at C first. This can create a race condition. C may respond differently depending on which messages arrives first - A's or B's. This is also effectively an example of shared mutable state - the state of C, which can be mutated whenever a message is received - is shared with A and B because they're communicating with it and their messages work differently based on the current state.

Non-determinism, race-conditions, shared mutable state: the absolute opposite of what we want when dealing with concurrency and parallelism.

3 hours ago by dnautics

Yeah but the way you're framing the problem is a bit crazy.

If such a race condition is critical, then making A and B "otherwise independent" is a poorly designed architecture (and, yes, novices will do this, which is why there is a big admonition in the Elixir docs: https://hexdocs.pm/elixir/master/GenServer.html#module-when-...). Maybe they should all be branching logic or state machine in the same actor and you shouldn't be coordinating with message passing?

If for some reason A and B TRULY need to be asynchronous and independent (let's say they are tracking real-world things), AND order matters, then no amount of language or VM architecture will get you out of building a transactional lock, consensus system, or CRDT to guarantee sane results. That is just how the universe works.

5 hours ago by toast0

Often, the order of messages doesn't matter that much. When it does matter, there are a couple patterns to consider.

a) be sure to give C whole requests, don't ask C for current data, modify it in A and then send it back to C (which is begging for B to send a modification in the mean time). C's mailbox enforces an ordering on requests (all calls will be handled in the order they are received, not in the order they are sent)

b) rather than A sending to B and C, and B also sends to C, send only from A to B and from B to C. B sends along A's information. Downside is if B crashes before sending to C, A's information is lost. You could also have A send to C, and C replies to B, which then does work and sends back to C.

c) when multiple processes are working on a single thing, try to combine them. A single process will find it hard to race with itself, and will have a clear order of events. I'm in the middle of cleaning up some code that had three processes around one thing (a state process, a reader and a writer), and there were so many weird and racy things because of the separation; putting it into one process means things can't change unexpectedly.

4 hours ago by yellowapple

> be sure to give C whole requests, don't ask C for current data, modify it in A and then send it back to C (which is begging for B to send a modification in the mean time)

A simple mitigation against these sorts of race conditions would be for C to maintain a version number alongside its data, then have any update requests send back that version number. If the message's version matches the current, then do the update and change the version number; else, reject the message with a reply directing the sender to reobtain the current data and try again. This way, if both A and B grab data from C and then simultaneously try to update that data in C, you won't lose any data.

This is (relevantly to Erlang) exactly how CouchDB works.

3 hours ago by catlifeonmars

I believe this is called “optimistic locking”

5 hours ago by ProfHewitt

Indeterminacy (using arbiters) is crucial to the performance of digital computation. (Arbiters are used in the interconnect of many-core chips.)

See the following for an explanation:

"Physical Indeterminacy in Digital Computation" https://papers.ssrn.com/abstract=3459566

5 hours ago by napsterbr

I'm sorry to go completely off-topic here, but I just wanted to say thank you! Your discussion[0] with Meijer and Szyperski was mind-blowing and taught me a lot about the actor model. This was, hands down, one of the best classes I've ever had. Thanks for sharing this knowledge.

[0] - https://www.youtube.com/watch?v=7erJ1DV_Tlo

3 hours ago by ProfHewitt

Thanks for your interest!

Cheers, Carl

5 hours ago by chrisseaton

You can program a completely deterministic parallel system using a simple fork-join model, can't you? No physical or timing factors interfere with that, can they?

5 hours ago by j-pb

The guy you just responded to invented the Actor model.

I'd guess he knows...

Besides, such a fork join model might be theoretically free of timing and coordination in a system with infinite resources (bus widths, no cache coherence, an infinitely fast coordinator e.t.c.) but that's not what silicon looks like.

The nice thing about the Actor model is exactly that it immediately exposes you to non-determinism, distributed state, and in Erlangs case, failure.

It's what allows Erlang systems to "effortlessly" (you had to put the effort upfront) span many node clusters with arbitrary reliability in the face of failures.

Somewhere else you write:

> The problem is this corner case opens up all the fundamental issues I mention. When we include it we've got a big non-deterministic, racey, stateful, problem.

That's not caused by the Actor model, but by the fundamental laws governing distributed systems. Distributed systems are inherently, non-deterministic, racey and statefull.

Two generals dictates that you can't solve these issues, unless you're on a single writer system (which your scatter-gather algorithm is an example of, in which you will always be limited by the throughput of the process that organises your scatter-gather operations / your control plane). In which case, you might as well ignore all forms of concurrency and do one thing at a time, which will be faster anyways.

3 hours ago by dnautics

> You can program a completely deterministic parallel system using a simple fork-join model, can't you? No physical or timing factors interfere with that, can they?

How simple is "simple" What happens if process 1 crashes? What if it suffers a network disconnection and looks like it crashes, but then tries to reconnect later? What is the correct logic for completing the combined task?

5 hours ago by africanboy

probably you haven't realized it, but the person you are replying to is Carl Hewitt.

https://en.m.wikipedia.org/wiki/Carl_Hewitt

5 hours ago by dragonwriter

> The big problem nobody talks about with actors and message passing is non-determinism and state.

People talk about that all the time with actors and message passing, since the actor model is largely a tool for structuring the management of those things (not usually for eliminating them, because for many problem domains those are inherent to the domain.)