-
Notifications
You must be signed in to change notification settings - Fork 7
Memory
J uses reference counting for memory management. By keeping track of precisely how many references each object (noun, verb, anything that can be stored in a variable) has, it can free the object as soon as it is no longer needed, with relatively small overhead on each copy and free of an object.
J stores a reference count with every object (AC(w)) indicating the
number of times that object is referenced. The count is incremented when a
copy is made and decremented when one is deleted. Once the reference count
drops to zero, J knows the object can be freed, and does so.
Because J objects are immutable, and cannot be defined recursively, they cannot contain themselves. Even using J's in-place modification, with
a =: i.&.> 2 3
a =: (<a) 1} a
will just create a noun that contains its former value, not one that contains itself. It is possible by calling unsafe C functions to change pointers so that a noun contains itself. The result will be a noun that tends to break things when used, and also which can never be freed, as it always has a nonzero reference count from its own reference.
Currently, J uses an unusual recursive procedure for reference counts. If a boxed array (or verb with arguments, or other object with children) is copied, then the reference counts of all its children (items in boxes) are incremented, and so on recursively. This makes copies of composite objects take a long time for no benefit.
In contrast, the typical procedure is to treat the parent/child relationship as a single reference. Thus the reference counts of children are incremented when they are added to the parent, and decremented when the parent is deleted, and otherwise not changed. Copying a composite object simply increments its reference count, and deleting decrements it.
J maintains a stack of local nouns, whose function (but not
implementation) is equivalent to that of C's stack. Any time an object is
created, a pointer to it is pushed onto the stack. Thus the writer of a J
function can ignore all the reference count arithmetic while using
objects. Instead, he or she calls PROLOG, which saves the location of
the beginning of the stack, before doing anything, and EPILOG(z), where
z is the value to be returned, after everything. EPILOG copies z,
deletes everything added to the stack since PROLOG was called, and
returns z.
In the old system, when an object is added to the stack, its children are added below it, and so on recursively. Thus the stack provides a direct reference to every object created by J.
We would very much like for copies of J objects to take constant time, and thus we are in the process of switching J to the more standard reference counting procedure. Copying an object just increments its reference count, and adding it to the stack pushes a single pointer.
The obvious way to make this happen is to modify J's copy and delete functions to have the desired effect. Unfortunately, this is not compatible with the way current J functions are written.
Suppose we call EPILOG(z) for a composite z. We want to avoid deleting
any of z's children, despite the fact that they may be on the stack. In
the old paradigm this is automatic: on copying z, all of its children
are also copied, and deleting the copies on the stack leaves those copies.
But if incrementing the reference count is not recursive, this doesn't
happen, and the children are deleted, leaving z with dangling pointers.
The problem arises because when verbs connect z to its children, they
don't bother to inform the children. The reference from z to its
children cannot be reflected in their reference counts. So the correct fix
might be to increment the reference counts as they are added to z. This
is not feasible: the places where children are added to objects just look
like assignments, and are impossible to detect automatically.
J functions might do crazy things to objects which are on the stack, but
generally treat objects returned by other functions (that is, those that
have passed through EPILOG) as immutable. Thus, we use a flag AFREC
to mark whether an object (or its parent) has been passed through
EPILOG.
If AFREC is set, we use the old reference counting scheme. Counts are
updated recursively. If not, we use the new scheme. Copies and deletions
other than the last take constant time.
EPILOG converts an object and all of its descendants to the new form.
It subtracts from the reference count of each descendant so that there
is only one originating from its parent, and turns off the AFREC flag.
A referrer is either a J object or the stack. An object references
another object if that object would be visited by a call to traverse
(with a non-recursive function as its argument). The stack is a list of
pointers; it references each object it points to. A referrer may
reference an object multiple times if it is visited multiple times by
traverse or appears multiple times in the stack. The list of objects it
refers to are its referents.
Every object is created with AFREC set and reference count zero. Calling
EPILOG(z) copies z, then pops from the stack, then calls derec(z,0).
Calling derec(w,n), where w's reference count is initially m,
decrements w's reference count by n. Then, if w has AFREC set, it
calls derec(z,m-1) on each object referred to by z and removes the
flag. There is no other way to remove the AFREC flag, and no way to add
the AFREC flag at all.
derec(z,0) is equivalent to derec1(z), which is not used in the actual
code because it requires more traversals: derec1(z) does nothing if z
does not have AFREC set, and otherwise decrements the counts of each of
z's descendants by one less than z's reference count, then recurses.
At all times except during calls to derec, an object z that refers to
an object marked AFREC is also marked AFREC. Equivalently, descendants
of non-recursive objects are never recursive.
Assume: Descendants are added only to objects not marked AFREC.
Proof: When z is created, it has no descendants, so the property
holds. Three events can potentially falsify the invariant:
- A new descendant is added to
z. Ifzdoes not haveAFRECset, then descendants cannot be added to it, by the assumption. If it does haveAFRECset, the invariant holds regardless. - The
AFRECflag is added to a descendant ofz. This is impossible. - The
AFRECflag is removed fromz--that is, derec is called onz, andAFRECis set forz. In this case,derecremoves the flag and recurses. But then the flag will be removed for each descendant as well, and the invariant still holds.
The reference count of an object is equal to the sum over all referrers
(with multiplicity) of either 1, if the referrer is the stack or an object
not marked AFREC, or that object's reference count, if the referrer is
an object marked AFREC.
Except during the execution of memory functions, this invariant holds
- At all times for objects not marked
AFREC, and - When
EPILOG(z)is called for objectszmarkedAFRECand their descendants.
We assume the invariant in the second case. If not, the old system was broken.
We now show that derec1(z), equivalent to the main work of EPILOG
derec(z,0), does not falsify the invariant. We omit the proof that other
memory operations (ra, fa, tpop) satisfy it, as it is simple in each
case.
- If
AFRECis not set, thenderec1(z)does nothing by definition. - If
AFRECis set, thenderec1(z)removes theAFRECflag fromz, which decreases the corresponding terms in the reference formula from m to 1, where m is the reference count ofz. It also decrements the counts ofz's descendants, to the same effect. Note that the formula counts referrers with multiplicity, and traverse does the same. Thenderec1recurses, which maintains our invariant proveded the rest of the function does as well.
Our assumptions about J's code base were:
- Descendants are only added to objects marked
AFREC.
This could be violated by some things like in-place amend. If it is, it
will break invariant 1, which is easy to test for in derec by doing a
full recurse. The fix is to make sure each addition calls derec on the
added objects.
- When
EPILOG(z)is called,zand its descendants obey the reference count formula.
Mostly, this depends on J working already. The only way the changes can
break it is if a function does strange memory stuff (not fa or ra) to
an object not marked AFREC. But that's not really out of the question.