Skip to content
Marshall Lochbaum edited this page May 7, 2016 · 2 revisions

J's new memory model

Reference counting

J uses reference counting for memory management. By keeping track of precisely how many references each object (noun, verb, anything that can be stored in a variable) has, it can free the object as soon as it is no longer needed, with relatively small overhead on each copy and free of an object.

J stores a reference count with every object (AC(w)) indicating the number of times that object is referenced. The count is incremented when a copy is made and decremented when one is deleted. Once the reference count drops to zero, J knows the object can be freed, and does so.

Because J objects are immutable, and cannot be defined recursively, they cannot contain themselves. Even using J's in-place modification, with

   a =: i.&.> 2 3
   a =: (<a) 1} a

will just create a noun that contains its former value, not one that contains itself. It is possible by calling unsafe C functions to change pointers so that a noun contains itself. The result will be a noun that tends to break things when used, and also which can never be freed, as it always has a nonzero reference count from its own reference.

The old model

Currently, J uses an unusual recursive procedure for reference counts. If a boxed array (or verb with arguments, or other object with children) is copied, then the reference counts of all its children (items in boxes) are incremented, and so on recursively. This makes copies of composite objects take a long time for no benefit.

In contrast, the typical procedure is to treat the parent/child relationship as a single reference. Thus the reference counts of children are incremented when they are added to the parent, and decremented when the parent is deleted, and otherwise not changed. Copying a composite object simply increments its reference count, and deleting decrements it.

J maintains a stack of local nouns, whose function (but not implementation) is equivalent to that of C's stack. Any time an object is created, a pointer to it is pushed onto the stack. Thus the writer of a J function can ignore all the reference count arithmetic while using objects. Instead, he or she calls PROLOG, which saves the location of the beginning of the stack, before doing anything, and EPILOG(z), where z is the value to be returned, after everything. EPILOG copies z, deletes everything added to the stack since PROLOG was called, and returns z.

In the old system, when an object is added to the stack, its children are added below it, and so on recursively. Thus the stack provides a direct reference to every object created by J.

The new model

We would very much like for copies of J objects to take constant time, and thus we are in the process of switching J to the more standard reference counting procedure. Copying an object just increments its reference count, and adding it to the stack pushes a single pointer.

Problems

The obvious way to make this happen is to modify J's copy and delete functions to have the desired effect. Unfortunately, this is not compatible with the way current J functions are written.

Suppose we call EPILOG(z) for a composite z. We want to avoid deleting any of z's children, despite the fact that they may be on the stack. In the old paradigm this is automatic: on copying z, all of its children are also copied, and deleting the copies on the stack leaves those copies. But if incrementing the reference count is not recursive, this doesn't happen, and the children are deleted, leaving z with dangling pointers.

The problem arises because when verbs connect z to its children, they don't bother to inform the children. The reference from z to its children cannot be reflected in their reference counts. So the correct fix might be to increment the reference counts as they are added to z. This is not feasible: the places where children are added to objects just look like assignments, and are impossible to detect automatically.

Solution

J functions might do crazy things to objects which are on the stack, but generally treat objects returned by other functions (that is, those that have passed through EPILOG) as immutable. Thus, we use a flag AFREC to mark whether an object (or its parent) has been passed through EPILOG.

If AFREC is set, we use the old reference counting scheme. Counts are updated recursively. If not, we use the new scheme. Copies and deletions other than the last take constant time.

EPILOG converts an object and all of its descendants to the new form. It subtracts from the reference count of each descendant so that there is only one originating from its parent, and turns off the AFREC flag.

Specification and proof

Definition:

A referrer is either a J object or the stack. An object references another object if that object would be visited by a call to traverse (with a non-recursive function as its argument). The stack is a list of pointers; it references each object it points to. A referrer may reference an object multiple times if it is visited multiple times by traverse or appears multiple times in the stack. The list of objects it refers to are its referents.

Code specification:

Every object is created with AFREC set and reference count zero. Calling EPILOG(z) copies z, then pops from the stack, then calls derec(z,0). Calling derec(w,n), where w's reference count is initially m, decrements w's reference count by n. Then, if w has AFREC set, it calls derec(z,m-1) on each object referred to by z and removes the flag. There is no other way to remove the AFREC flag, and no way to add the AFREC flag at all.

derec(z,0) is equivalent to derec1(z), which is not used in the actual code because it requires more traversals: derec1(z) does nothing if z does not have AFREC set, and otherwise decrements the counts of each of z's descendants by one less than z's reference count, then recurses.

Invariant 1

At all times except during calls to derec, an object z that refers to an object marked AFREC is also marked AFREC. Equivalently, descendants of non-recursive objects are never recursive.

Assume: Descendants are added only to objects not marked AFREC.

Proof: When z is created, it has no descendants, so the property holds. Three events can potentially falsify the invariant:

  • A new descendant is added to z. If z does not have AFREC set, then descendants cannot be added to it, by the assumption. If it does have AFREC set, the invariant holds regardless.
  • The AFREC flag is added to a descendant of z. This is impossible.
  • The AFREC flag is removed from z--that is, derec is called on z, and AFREC is set for z. In this case, derec removes the flag and recurses. But then the flag will be removed for each descendant as well, and the invariant still holds.
Invariant 2

The reference count of an object is equal to the sum over all referrers (with multiplicity) of either 1, if the referrer is the stack or an object not marked AFREC, or that object's reference count, if the referrer is an object marked AFREC.

Except during the execution of memory functions, this invariant holds

  • At all times for objects not marked AFREC, and
  • When EPILOG(z) is called for objects z marked AFREC and their descendants.

We assume the invariant in the second case. If not, the old system was broken.

We now show that derec1(z), equivalent to the main work of EPILOG derec(z,0), does not falsify the invariant. We omit the proof that other memory operations (ra, fa, tpop) satisfy it, as it is simple in each case.

  • If AFREC is not set, then derec1(z) does nothing by definition.
  • If AFREC is set, then derec1(z) removes the AFREC flag from z, which decreases the corresponding terms in the reference formula from m to 1, where m is the reference count of z. It also decrements the counts of z's descendants, to the same effect. Note that the formula counts referrers with multiplicity, and traverse does the same. Then derec1 recurses, which maintains our invariant proveded the rest of the function does as well.
Conclusion

Our assumptions about J's code base were:

  • Descendants are only added to objects marked AFREC.

This could be violated by some things like in-place amend. If it is, it will break invariant 1, which is easy to test for in derec by doing a full recurse. The fix is to make sure each addition calls derec on the added objects.

  • When EPILOG(z) is called, z and its descendants obey the reference count formula.

Mostly, this depends on J working already. The only way the changes can break it is if a function does strange memory stuff (not fa or ra) to an object not marked AFREC. But that's not really out of the question.

Clone this wiki locally