Hacker News

4 years ago by javajosh

>Messages are always copied before being inserted into the queue. As wasteful as this may sound it greatly reduces garbage collection (GC) latency as the GC never has to look beyond a single process. Non-copying implementations have been tried in the past, but they turned out to be a bad fit as low latency is more important than sheer throughput for the kind of soft-realtime systems that Erlang is designed to build.

I do so wish there was a reference or link around "tried in the past".

4 years ago by ramchip

There’s a few papers about it such as:

- “Heap architectures for concurrent languages using message passing”

- “Exploring Alternative Memory Architectures for Erlang: Implementation and Performance Evaluation”

- “A Case for the Unified Heap Approach to Erlang Memory Management”

You can find those and more discussion by googling shared heap erlang.

4 years ago by dnautics

So this isn't entirely true; binaries above a certain size have their own recounting, process independent GC going on in the BEAM; I wonder why the VM doesn't use a similar system for highly mobile data structures (copy to a common refcounted "scratchpad" on first send).

4 years ago by knome

binaries never contain references to anything else, and therefore can never be a part of a reference loop. so it's safe to toss them out into a shared space and use reference counting to track them. you don't want to do it with small binaries, as the overhead of shared-allocating/ locking/ incrementing/ locking/ decrementing would probably be worse than just coping the small binary in the first place.

if you put complex reference bearing data out in the void, now whatever memory arena that allocated it may contain data that it references, and now you have to keep track of those references and use all of the shared items as new roots into your data. and if some other memory arena receives it and plucks out a value, do you only then copy it or do you reference back to the original. if you send a message back and form a loop, the gc process for either now has to jump through both, and you've destroyed your cheap gc cycles. if you shunt all of the data into shared memory to avoid multi-gc reference loops, you've now just created a big shared gc that will have all the problems of big shared gcs.

the per-process small gcs with full copying between them mean you can do things like just dropping the arena when the process dies without even checking liveness for anything in it ( you'll need to run any destructors if anything needs cleanup ( like references to shared large binaries needing dereferencing ), but you can keep track of items that need that in a bitmap or something, and avoiding the trace is a win on its own )

4 years ago by KMag

> shared-allocating/ locking/ incrementing/ locking/ decrementing

Minor nit: shared allocation, counter incrementing, and counter decrementing can all be done lock-free. They'd still need memory fence operations (and retries in case of contention), and the associated performance hits, but not actual locking.

4 years ago by toast0

It would seem off heap messages could be a base for this. I've not gone all the way into the weeds on those, but it seems like it could be possible to start with sending a whole message to another process by bumping a refcount and adding a reference to that process's mailbox without needing to make a new copy.

If that works, and is useful, you could extend by bumping the refcount, and adding a new type of subreference to a process's mailbox.

I don't think you'd want to send something referencing multiple off-heap messages or adding some terms and a reference (although maybe, who knows), because that would be a lot of complexity. And you would also want to be sensitive to subreference memory leaks, like can happen with binaries where a binary is large, but only small subreferences stick around, leading to lots of wasted space (this is extra fun when you put a small subbinary into ets/mnesia and it lives forever (well until you restart beam, but that can be a long time).

4 years ago by strmpnk

This is sort of what Pony did with its ORCA based GC. It has some tricky edge cases to optimize in practice but it can be made to work especially well if data is immutable.

4 years ago by strmpnk

The analog for that would be shared literals which are not copied (super important for small map overhead reduction if the constructor initialized a fix set of keys like elixir structs as the key index is shared). The persistent_term module allows any value to be registered as such at runtime and avoids a lot of caveats that came with mochi_global and the like. While they aren’t ref counted it can definitely help cut down on copying in my experience.

4 years ago by moonchild

Linear types (as an implementation detail) seem helpful here; if escape analysis shows a message is never touched after being sent, you can move it into the queue instead of copying it.

4 years ago by joppy

Would this help? My understanding is that the copy was so that the new object exists in the heap of the receiving process, and therefore the GC can have a strong guarantee that all objects within a process belong to the GC for that process.

Whether moving an object between two GC’d heaps is even possible (let alone simple or complicated) surely depends a lot on the GC implementation. The implementation of concurrent GCs is not easy...

4 years ago by dominicl

A definitive reference would be great, for me the "tried in the past" quote has always been a pointer to the Java VM. Which has great benefits of it's own but always had the problem of global GC induced freezes. Working with Cassandra instances this has always been a problem since 2013 till today. It just seems to be a very hard problem to solve when object references flow everywhere. But it's really unpleasant when your whole DB instance with all it's parallel requests locks up for GC.

[0] https://stackoverflow.com/questions/21992943/persistent-gc-i...

4 years ago by secondcoming

Try out Scylla instead. We moved to it from Cassandra and have had no stability issues at all. Our use case is several 100k requests per second with occasional batch writes. Request latency reduced, as did the number of nodes we required.

4 years ago by alexhutcheson

DataStax had good results testing the new Shenandoah garbage collector with Cassandra 4.0: https://www.datastax.com/blog/apache-cassandra-benchmarking-...

4 years ago by jorangreef

Just an aside, but message passing also makes a huge difference if you want to implement a distributed consensus protocol such as Viewstamped Replication or Raft.

Compared to typical request/response connection-oriented protocols (e.g. TCP or HTTP), message passing maps 1-to-1 to the distributed consensus domain, where the client can send a request message to the leader and then fully expect to get the response message back from another subsequent leader (after a view change or leader election).

I ran into this while implementing Viewstamped Replication in Zig for https://github.com/coilhq/tigerbeetle, where we had a typical TCP request/response protocol to begin with, and then realized that a message bus would make things much easier. It's just really nice to be able to say, send this message to process A or process B (if you can) and not worry about much else beyond timeouts and retries, which are needed for distributed consensus anyway, and straight away it opens up the door for alternative underlying transport protocols such as QUIC or UDT, multicast etc. Message passing is just a really good abstraction to have in place above all this in general.

4 years ago by choeger

Out of curiosity: What happens with messages that are never received? Do they just pile up until the (not-so) receiving process OOMs? How does one debug such a program?

4 years ago by srijan4

Yes, messages keep piling up and memory usage keeps increasing until the Erlang VM goes out of memory.

To debug, there's a max_heap_size parameter which can be configured to log an error message when heap size of any process goes beyond a threshold, or it can even kill the process[0]. Although I don't know if this takes the "off-heap" messages into account.

Also, there are lots of libraries which can provide a "top" like view of the running system.

[0] - http://erlang.2086793.n4.nabble.com/Max-heap-size-td4716974....

4 years ago by bullen

I think erlangs strength is also it's weakness: structuring everything around messages instead of having complex memory model and concurrency.

I always post these quotes until someone comes across that can explain them, so far 2 years later nobody have been able to:

"While I'm on the topic of concurrency I should mention my far too brief chat with Doug Lea. He commented that multi-threaded Java these days far outperforms C, due to the memory management and a garbage collector. If I recall correctly he said "only 12 times faster than C means you haven't started optimizing"." - Martin Fowler https://martinfowler.com/bliki/OOPSLA2005.html

"Many lock-free structures offer atomic-free read paths, notably concurrent containers in garbage collected languages, such as ConcurrentHashMap in Java. Languages without garbage collection have fewer straightforward options, mostly because safe memory reclamation is a hard problem..." - Travis Downs https://travisdowns.github.io/blog/2020/07/06/concurrency-co...

4 years ago by yetihehe

And erlang outperforms multi-threaded java single-handedly. I've made a system in java, which caps at about 6k threads, server starts to slow to a crawl. Similar system in erlang (newer version of server doing essentially the same tasks) is currently also servicing about 6k connections, but cpu utilisation is about 10%. Yeah, we could rewrite that old system to use some async library, but that's not multi threaded, just scheduled (like in erlang).

4 years ago by bullen

You need non-blocking IO, if you have more threads than cores you are doing it wrong.

You cannot build a MMO server with erlang.

4 years ago by toast0

People were doing in 2009 and posting it here [1] (project link is dead though).

The massively multiplayer online part would be clearly in favor of Erlang; although the RPG part maybe not so much. It would depend in my mind on how you model the environment and the game objects in it and their interactions. I think there would need to be too much communication to model game objects as processes, you'd need to make a simulation zone a process, and the game objects data for the simulation processes, and immutability makes that not a whole lot of fun probably.

https://news.ycombinator.com/item?id=981597

4 years ago by yetihehe

Why not? I have such server in plans, but that won't be soon (due to lack of time). If you have only as many threads as cores, that's not very MULTI threading, that's avoiding threads because of performance reasons.

4 years ago by JetAlone

"Messages are always copied before being inserted into the queue. As wasteful as this may sound it greatly reduces garbage collection (GC) latency as the GC never has to look beyond a single process. Non-copying implementations have been tried in the past, but they turned out to be a bad fit as low latency is more important than sheer throughput for the kind of soft-realtime systems that Erlang is designed to build."

So the alternative to copying the message would be to use a cross-process reference in memory that GC has to take the extra step of looking at two processes per message... I write a lot of higher level code than this and would be prone to overlook these kinds of details, but I want to get better at noticing performance gotchas. The architecture of the compiler and target assembly would have to be key to that.

4 years ago by bcrosby95

> So the alternative to copying the message would be to use a cross-process reference in memory that GC has to take the extra step of looking at two processes per message

At what point do you decide to make a copy? What if you pass a subset of the data to process C, which stores a subset of that data, then later passes it to process D? If you don't copy the data at any point in this series of steps, you need full blown cross-process GC which is counter to their goal.

4 years ago by undefined

[deleted]

4 years ago by dnautics

> So the alternative to copying the message would be to use a cross-process reference in memory that GC has to take the extra step of looking at two processes per message

To be frank, erlang already does this with binaries over some size limit (I think 64B)

4 years ago by amelius

> Messages are always copied before being inserted into the queue. As wasteful as this may sound it greatly reduces garbage collection (GC) latency as the GC never has to look beyond a single process.

This means there is no structural sharing, which can be troublesome with huge data structures. (Edit: not entirely true, see comments below)

4 years ago by srijan4

The only sharing is for large binaries (>64 bytes)[0].

Like all messaging mechanisms, this can be handled by passing a reference or id, and the receiver fetching the actual large data from db/file/source of truth.

[0] - http://erlang.2086793.n4.nabble.com/how-message-size-effects...

4 years ago by amelius

> The only sharing is for large binaries

Can you also share small objects by reference? (I.e. prevent serialization of a large tree composed of small objects)

And refer to those objects through pointers (avoiding an expensive db layer)?

4 years ago by srijan4

> Can you also share small objects by reference?

No.

But, you can use a cheap cache layer instead of expensive db layer - ETS.

4 years ago by chrisseaton

The big problem nobody talks about with actors and message passing is non-determinism and state.

If two otherwise independent processes A and B are sending messages to a single process C, it's non-deterministic which arrives at C first. This can create a race condition. C may respond differently depending on which messages arrives first - A's or B's. This is also effectively an example of shared mutable state - the state of C, which can be mutated whenever a message is received - is shared with A and B because they're communicating with it and their messages work differently based on the current state.

Non-determinism, race-conditions, shared mutable state: the absolute opposite of what we want when dealing with concurrency and parallelism.

4 years ago by dnautics

Yeah but the way you're framing the problem is a bit crazy.

If such a race condition is critical, then making A and B "otherwise independent" is a poorly designed architecture (and, yes, novices will do this, which is why there is a big admonition in the Elixir docs: https://hexdocs.pm/elixir/master/GenServer.html#module-when-...). Maybe they should all be branching logic or state machine in the same actor and you shouldn't be coordinating with message passing?

If for some reason A and B TRULY need to be asynchronous and independent (let's say they are tracking real-world things), AND order matters, then no amount of language or VM architecture will get you out of building a transactional lock, consensus system, or CRDT to guarantee sane results. That is just how the universe works.

4 years ago by toast0

Often, the order of messages doesn't matter that much. When it does matter, there are a couple patterns to consider.

a) be sure to give C whole requests, don't ask C for current data, modify it in A and then send it back to C (which is begging for B to send a modification in the mean time). C's mailbox enforces an ordering on requests (all calls will be handled in the order they are received, not in the order they are sent)

b) rather than A sending to B and C, and B also sends to C, send only from A to B and from B to C. B sends along A's information. Downside is if B crashes before sending to C, A's information is lost. You could also have A send to C, and C replies to B, which then does work and sends back to C.

c) when multiple processes are working on a single thing, try to combine them. A single process will find it hard to race with itself, and will have a clear order of events. I'm in the middle of cleaning up some code that had three processes around one thing (a state process, a reader and a writer), and there were so many weird and racy things because of the separation; putting it into one process means things can't change unexpectedly.

4 years ago by yellowapple

> be sure to give C whole requests, don't ask C for current data, modify it in A and then send it back to C (which is begging for B to send a modification in the mean time)

A simple mitigation against these sorts of race conditions would be for C to maintain a version number alongside its data, then have any update requests send back that version number. If the message's version matches the current, then do the update and change the version number; else, reject the message with a reply directing the sender to reobtain the current data and try again. This way, if both A and B grab data from C and then simultaneously try to update that data in C, you won't lose any data.

This is (relevantly to Erlang) exactly how CouchDB works.

4 years ago by catlifeonmars

I believe this is called “optimistic locking”

4 years ago by ProfHewitt

Indeterminacy (using arbiters) is crucial to the performance of digital computation. (Arbiters are used in the interconnect of many-core chips.)

See the following for an explanation:

"Physical Indeterminacy in Digital Computation" https://papers.ssrn.com/abstract=3459566

4 years ago by napsterbr

I'm sorry to go completely off-topic here, but I just wanted to say thank you! Your discussion[0] with Meijer and Szyperski was mind-blowing and taught me a lot about the actor model. This was, hands down, one of the best classes I've ever had. Thanks for sharing this knowledge.

[0] - https://www.youtube.com/watch?v=7erJ1DV_Tlo

4 years ago by ProfHewitt

Thanks for your interest!

Cheers, Carl

4 years ago by chrisseaton

You can program a completely deterministic parallel system using a simple fork-join model, can't you? No physical or timing factors interfere with that, can they?

4 years ago by j-pb

The guy you just responded to invented the Actor model.

I'd guess he knows...

Besides, such a fork join model might be theoretically free of timing and coordination in a system with infinite resources (bus widths, no cache coherence, an infinitely fast coordinator e.t.c.) but that's not what silicon looks like.

The nice thing about the Actor model is exactly that it immediately exposes you to non-determinism, distributed state, and in Erlangs case, failure.

It's what allows Erlang systems to "effortlessly" (you had to put the effort upfront) span many node clusters with arbitrary reliability in the face of failures.

Somewhere else you write:

> The problem is this corner case opens up all the fundamental issues I mention. When we include it we've got a big non-deterministic, racey, stateful, problem.

That's not caused by the Actor model, but by the fundamental laws governing distributed systems. Distributed systems are inherently, non-deterministic, racey and statefull.

Two generals dictates that you can't solve these issues, unless you're on a single writer system (which your scatter-gather algorithm is an example of, in which you will always be limited by the throughput of the process that organises your scatter-gather operations / your control plane). In which case, you might as well ignore all forms of concurrency and do one thing at a time, which will be faster anyways.

4 years ago by dnautics

> You can program a completely deterministic parallel system using a simple fork-join model, can't you? No physical or timing factors interfere with that, can they?

How simple is "simple" What happens if process 1 crashes? What if it suffers a network disconnection and looks like it crashes, but then tries to reconnect later? What is the correct logic for completing the combined task?

4 years ago by africanboy

probably you haven't realized it, but the person you are replying to is Carl Hewitt.

https://en.m.wikipedia.org/wiki/Carl_Hewitt

4 years ago by dragonwriter

> The big problem nobody talks about with actors and message passing is non-determinism and state.

People talk about that all the time with actors and message passing, since the actor model is largely a tool for structuring the management of those things (not usually for eliminating them, because for many problem domains those are inherent to the domain.)

Daily digest email

Get a daily email with the the top stories from Hacker News. No spam, unsubscribe at any time.