Dear Mozart wizards,

The Mozart platform that we all know is getting unmaintainable.
Recent big changes introduced more bugs than they solved.  We dread
the release of GCC as it means hunting for what optimizations will
need to be disabled for the platform to build in an apparently correct
way. Core issues are not addressed although several people gave a try.

The cause of these problems is an implementation that was too advanced
for its time. The radical constraints on memory, the lack of even
basic optimization in the compilers of the era forced the developers
to optimize by hand, relying on undefined behaviour of the compilers
and the understanding by all developers of hypothesis that were
generally not documented. The problem was compounded by a few PhD
thesis that were rushed in to prove ideas but where the implementation
was immature, not designed for maintainability and left to rot as the
creator left the community once the thesis was finished.

The current platform is thus a mix of extremely bright ideas, creative
hacks and terrible code. In order to save the platform, the ideas and
the community, we decided to rewrite the emulator. This will be a
rewrite from scratch, keeping the ideas, avoiding the hacks, cleanly
coded and most importantly, documented.

The resources that we have are tight.  For the next year, we have a
bit less than one man-year plus your contributions, and an unclear
amount of contributions from new Ph.D. students.  It might not be
possible to have the whole platform brought back in a year but enough
can be done to deprecate the current platform.

Of course, an implementation is always a three-way compromise between
space-efficiency, time-efficiency and readability. The sweet spot is
for the whole community to define and I will ask for your opinion when
I think the situation isn't clear cut. Don't hesitate to bring up your
opinion even when not asked as it is your platform as well.

Cordially,

Yves Jaradin

%%%%%%%%%%%%%%%%%%%%%

Technical question: Target machines and node representation

This looks like two separate questions but they are linked. The core
issue is the one of node representation. A node in the emulator is
basically anything that can be accessed by an identifier at the Oz
level. This includes values (integers, atoms, cells, records,
procedures, ...), variables (dataflow, constrained, ...) and other
types of values that are only of use in certain context (code blocks,
...). The nodes form a big graph which changes over time as the
program execute.

Here is some background on how this is to be represented.  The natural
way to represent this graph in an object-oriented language like C++ is
with objects of classes having a common ancestor and using virtual
method dispatching. However, this isn't an adequate solution as the
nodes have to change their representation and even to merge together.
Representation changes are quite common. A variable can get bound, an
optimized representation can need to be expanded, the domain of a
constraint variable can collapse to a ground value, a node can become
distributed, etc. Overwriting an object with a different one is
undefined behaviour and also extremely unsafe if the new object is
bigger than the old one. Changing all the pointers is impossible as
they can only be traversed forward.

The solution is to use a level of indirection. A node is then
represented by a small object of known type, containing a reference to
the current, dynamic representation. This approach (which could be
somewhat optimized) is unsuitable for the merging of nodes.  The
merging of nodes is caused by unification. To implement it without
keeping track of all references to a node, one needs to have chain of
references. Combining this with the previous solution, we get that the
small object from the indirection layer can now be pointed to and that
they need to know if they are pointing to another object similar to
themselves (if they were merged away) or to a representation (if they
are a node proper). This implies that the intermediate objects and
representations have a common ancestor which they override.

From this point, several transformations can be applied and it isn't
easy to define when we leave the domain of the bright ideas for the
one of dark hacks.

* The first is noticing that many pointers know the concrete type of
  what they are pointing to and that really dynamically typed objects
  have generally only one pointer pointing to them. We can therefore
  move the type from the pointee to the pointer. This means that the
  intermediate object will contain both a reference to the
  representation or to a further node and the kind of representation
  or whether it is a link to a further such structure.  The
  representation don't need anymore to be dynamically typed. This
  results in a net gain of memory but is more complex to implement
  since it doesn't allow to use the normal virtual/override mechanism
  of C++.

* The second is to consider that the number of common representations
  is small and that we can store the type in the alignment bits of the
  reference to the representation. For uncommon types, the
  representation has to have virtual methods. In C++, it is currently
  very difficult to guarantee the alignment of a type above physical
  demands in a portable way.

* The third is to remark that it is not very smart to have a unique
  pointer to a single memory word. So, if the representation is
  smaller than a pointer, it could be stored directly in the
  intermediate structure. This interacts with the previous point
  because the bits used to represent the type needs to be unused by
  the representation. It also imply that only a few chosen types can
  benefit from this optimization. From an implementation point of
  view, the combination of the second and third forces to make
  hypotheses on the representation of pointers and integers that are
  non-portable.

* The fourth is to replace the pointer to the intermediate structure
  (of known and small size) by the structure itself. It allows for
  coreferences and merged nodes to share the same mechanism, it
  reduces the number of indirections and it saves memory if the number
  of coreferences is small enough (or always if the second transform
  is applied as well)

Of these transformations, the most controversial is the second. In the
best case it can give almost a factor two bonus in memory usage. But
it is also a transformation that is highly machine dependent, relying
on undefined behaviour and generally non-portable. It has far reaching
consequences, visible in the Oz language such as the strange value
(2^27-1) above which integers become inefficients or the absence of a
proper character type.

If we decide to keep all these optimizations in the new emulator, then
it becomes important to consider which physical machines to support.
Supporting only 32 bits machine isn't a viable option for the future.
Supporting both 32 and 64 bits platforms forces to use small tags and
to optimize the same types as in the current implementation, with the
same limitations.  Supporting only 64 bits platforms allows to gain
one more tag bit which allows for more optimized types and to have
optimized integers at least up to 32 bits. It is of course a bet on
the future as we won't be able to support 32 bits platform at all,
including current mobile platforms.

My personal opinion is that the second optimization is pushing us too
far in the dark hacks and that even a factor 2 in memory usage isn't
worth it.  Peter Van Roy's opinion is that without it, we get an
unacceptable hit of a factor 4 (2 from this and 2 from the move to 64
bits) on modern computers and that the problems can be mitigated by
dropping 32 bits support altogether.

What is your opinion?

%%%%%%%%%%%%%%%%%%%%