Erlang Processes vs Java Threads

Repeat after me: “These are different paradigms”

Say that aloud 20 times or so — it is our mantra for the moment.

If we really must compare apples and oranges, let’s at least consider where the common aspects of “being fruit” intersect.

Java “objects” are a Java programmer’s basic unit of computation. That is, an object (basically a struct with arms and legs that has encapsulation somewhat more strictly enforced than in C++) is the primary tool with which you model the world. You think “This object knows/has Data {X,Y,Z} and performs Functions {A(),B(),C()} over it, carries the Data everywhere it goes, and can communicate with other objects by calling functions/methods defined as part of their public interface. It is a noun, and that noun does stuff.”. That is to say, you orient your thought process around these units of computation. The default case is that things that happen amongst the objects occur in sequence, and a crash interrupts that sequence. They are called “objects” and hence (if we disregard Alan Kay’s original meaning) we get “object orientation”.

Erlang “processes” are an Erlang programmer’s basic unit of computation. A process (basically a self-contained sequential program running in its own time and space) is the primary tool with which an Erlanger models the world(1). Similar to how Java objects define a level of encapsulation, Erlang processes also define the level of encapsulation, but in the case of Erlang the units of computation are completely cut off from one another. You cannot call a method or function on another process, nor can you access any data that lives within it, nor does one process even run within the same timing context as any other processes, and there is no guarantee about the ordering of message reception relative to other processes which may be sending messages. They may as well be on different planets entirely (and, come to think of it, this is actually plausible). They can crash independently of one another and the other processes are only impacted if they have deliberately elected to be impacted (and even this involves messaging: essentially registering to receive a suicide note from the dead process which itself is not guaranteed to arrive in any sort of order relative to the system as a whole, to which you may or may not choose to react).

Java deals with complexity directly in compound algorithms: how objects work together to solve a problem. It is designed to do this within a single execution context, and the default case in Java is sequential execution. Multiple threads in Java indicates multiple running contexts and is a very complex topic because of the impact activity in different timing contexts have on one another (and the system as a whole: hence defensive programming, exception schemes, etc.). Saying “multi-threaded” in Java means something different than it does in Erlang, in fact this is never even said in Erlang because it is always the base case. Note here that Java threads imply segregation as pertains to time, not memory or visible references — visibility in Java is controlled manually by choosing what is private and what is public; universally accessible elements of a system must be either designed to be “threadsafe” and reentrant, sequentialized via queueing mechanisms, or employ locking mechanisms. In short: scheduling is a manually managed issue in threaded/concurrent Java programs.

Erlang separates each processes’ running context in terms of execution timing (scheduling), memory access and reference visibility and in doing so simplifies each component of an algorithm by isolating it completely. This is not just the default case, this is the only case available under this model of computation. This comes at the cost of never knowing exactly the sequence of any given operation once a part of your processing sequences crosses a message barrier — because messages are all essentially network protocols and there are no method calls that can be guaranteed to execute within a given context. This would be analogous to creating a JVM instance per object, and only permitting them to communicate across sockets — that would be ridiculously cumbersome in Java, but is the way Erlang is designed to work (incidentally, this is also the basis of the concept of writing “Java microservices” if one ditches the web-oriented baggage the buzzword tends to entail — Erlang programs are, by default, swarms of microservices). Its all about tradeoffs.

These are different paradigms. The closest commonality we can find is to say that from the programmer’s perspective, Erlang processes are analogous to Java objects. If we must find something to compare Java threads to… well, we’re simply not going to find something like that in Erlang, because there is no such comparable concept in Erlang. To beat a dead horse: these are different paradigms. If you write a few non-trivial programs in Erlang this will become readily apparent.

Note that I’m saying “these are different paradigms” but have not even touched the topic of OOP vs FP. The difference between “thinking in Java” and “thinking in Erlang” is more fundamental than OOP vs FP. (In fact, one could write an OOP language for the Erlang VM that works like Java — for example: An implementation of OOP objects in Erlang.)

While it is true that Erlang’s “concurrency oriented” or “process oriented” foundation is closer to what Alan Kay had in mind when he coined the term “object oriented”(2), that is not really the point here. What Kay was getting at was that one can reduce the cognitive complexity of a system by cutting your computrons into discrete chunks, and isolation is necessary for that. Java accomplishes this in a way that leaves it still fundamentally procedural in nature, but structures code around a special syntax over higher-order dispatching closures called “class definitions”. Erlang does this by splitting the running context up per object. This means Erlang thingies can’t call methods on one another, but Java thingies can. This means Erlang thingies can crash in isolation but Java thingies can’t. A vast number of implications flow from this basic difference — hence “different paradigms”. Tradeoffs.

Footnotes:

Incidentally, Erlang implements a version of “the actor model“, but we don’t use this terminology as Erlang predates the popularization of this model. Joe was unaware of it when he designed Erlang and wrote his thesis.
Alan Kay has said quite a bit about what he meant when he coined the term “object oriented”, the most interesting being his take on messaging (one-way notification from one independent process with its own timing and memory to another) VS calls (function or method calls within a sequential execution context with shared memory) — and how the lines blur a bit between programming interface as presented by the programming language and the implementation underneath.

More Related Contents:

Leave a Comment Cancel reply