Þ   briarpig  » archive  » refcounting


Old blog posts from treedragon touching refcounting are reposted on this page, arranged by age but otherwise little edited. This page is just a raw dump.

thorn

     For current treatment on this topic, see refcounting under the tech section.

26apr99 Ontology Mechanics Guild

strong vs. weak refs

     The following material appears in strong vs. weak refs: (cf 3724E6AD.5F7482C5-netscape.txt)

scc@netscape.com (Scott Collins) wrote: > Here is a starter set of simple and obvious guidelines. > Our code violates these guidelines everywhere. > > (1a1) Parents own their children; > > (1a2) Children do not own their parents; > > (1a3) You don't need to own an object whose lifetime > is guaranteed to be longer than yours; > > (1a4) You don't own an object because you need > it; you own it because it needs you > > (1a5) You don't want to own an object that (even > transitively) owns you

     As a tool for implementing a robust ownership model, simple refcounting is not sufficient in real (i.e. complex) applications unless one can distinguish between strong and weak references. I can explain these rather well in terms of the 1a* bullets in Scott Collins' message.

     A weak ref is for memory management purposes, so a pointer to an object does not go away and leave a dangling pointer. But a weak ref does not force an object to stay open for business and usable.

     A strong ref is a weak ref plus a use ref, so the weak ref keeps an object from being collected, and the use ref keeps the object from being closed. The idea of 'use' is a generalization of the idea of owership, where counting owners permits more than one owner to keep an object open.

     To support weak and strong refs, one possible implementation model keeps a use count seperate from a ref count, where uses must never become any fewer than refs. When strong refs are released, uses are always dropped first before refs, so an object will always finish closing with a nonzero ref count. This prevents the possibility of self-deletion while closing.

     Strong ref graphs must be acyclic, so that no cycles exist that stop objects from closing when owners are finished using them. Closing an object causes it to release ALL references to other objects, both weak and strong. But closing an object does not cause it to be deleted.

     Weak ref graphs are allowed to be cyclic, and the cycles do not prevent garbage collection in practice because the closing of objects when strong refs hit zero will cause all refs including weak ones to be released, and this causes weak graphs to come undone at exactly the rate needed when owners of objects finish using them.

     Parents typically own children with a strong ref. When a child must know about it's parent (or any ancestor at all), such refs must be weak. A weak ref is typically a backpointer to some driving object, and they often create reference cycles, which is okay as long as the strong refs alone create no cycles, since weak refs never stop objects from closing.

     Currently all the refcounting in our code amounts to strong references alone, and this is a problem for more than one reason. First, any cycles will create serious leaks. Second, it is hard to remove all cycles from the strong ref graph, because complex apps usually need some backpointers if only for performance reasons. Third, releasing refs from destructors causes a complex dependency between closing and collection.

     I expect our code to keep having leaks until we start using weak refs in addition to strong refs. And unless we drop the use counts before ref counts, we will have self-deletion problems when closing objects.

     The most annoying thing about the weak and strong ref discipline is that one must create a polymorphic close method that obviates the need for C++ destructors, since by the time an object is destroyed it had better be already closed, in which case there's nothing to do in the destructor. So a destructor can only useful assert that close has already happened.

     The weak and strong ref discipline is not just theory, because I've used it before, and it is currently being used by Mork in the mail/news tree.

08apr01 Sunday

biscuit box formats

     Here are a couple newish header files for gc box formats. They average about fifty percent explanatory comments. I'll add remarks that might lead you to read the comments.

     ybRbBox.h explains some size and refcounting issues. The 16-bit size slot is related to gc chapter block limits. More complex refcounting conventions are also addressed. Gc boxes are obviously collected and not refcounted. But the relation to other refcounted objects is defined.

     At least three different interesting refcount issues occur. One general and two specific issues. First the general one. I explain how refcounting is a quality of refcounted slots. For a refcounted object R, we don't all refs to object R. We only count refs to R when they occur in specific slots.

     Presumably, uncounted refs are covered by counted ones. That is, uncounted refs occur under safety of counted refs. We identify a set of slots that always count object refs. Then we meticulously count all ref changes in those slots. But we don't pay any attention to extra refs elsewhere.

     Now let's consider two specific issues of ybNode refcounts. First, a box can ref a node. Second, a box can be a node. Some refs from boxes are counted (as either soft or hard). In this case, the ref is released when such a box is collected. Another kind of box ref is not counted, so gc is immaterial.

     The second case, when a box is a node, has consequences. Box pointers are freely copied between slots without counting. So if these boxes are nodes, it means they're not counted. But any given box might survive collection indefinitely. So a node box must live forever, and never hit zero refs.

14jul01 passport boycott

refcount bookkeeping

     Interned backtraces can be used to find refcount leaks. Each backtrace (aka stackcrawl) needs two more slots. These slots count all rising and falling usage events. Each ref increment must be matched by a decrement. So we want each increment to generate a backtrace.

     Ideally, a refcounted slot is really stored as a slot pair. When a counted slot is filled, it pairs with a backtrace. So when released, the backtrace can record decrements. It's best to have a special class for the (ref, trace) pairs. So nothing alters when backtrace watching is disabled.

     In order to find ref leaks, you scan the backtrace table. When adds and releases don't balance, it's a red flag. This kind of instrumentation supports decisive analysis. Any leak is revealed in terms of backtraces responsible. It's absolutely cut and dried, yielding perfect refcounts.

     Standard memory tools don't help with refcounts at all. Bookkeeping analysis must audit allocation interfaces. With refcounting, it's the addRef() and release() API. This is what I whipped together for Pivia one week. Sooner or later I also need it in all personal projects.

acyclic refcounting

     Netscape engineers seemed disinclined to get this idea. Although it was their job to grasp this, they didn't. Instead of stupidity, it might have been simple arrogance. Arrogance can incapacitate almost as well as stupidity. Maybe I should have used sentences of ten words only.

     This piece is motivated by a conversation I had at Pivia. I was telling the engineering VP about refcount practice. I claimed no one now did it correctly as I had figured out. However, my way only applies to pervasive refcounts. When you have hairy data graphs, mine is the only way.

     It's unneeded for simple graphs that are naturally acyclic. With simple graphs at Pivia, my way is not necessary. But for complex document app systems like a browser? With no garbage collection, my way is required there. Otherwise you get refcounting that doesn't quite work.

     The VP's interesting question was whether I published. I said kinda, sorta. You mean, in a professional journal? Yeah, he wanted to know if I'd published it in a journal. Nope, there's no dead tree edition. Just web and usenet. However, I'd never really tried hard to convince anyone.

     Why not? Well, the right way just seems obvious to me. And I'm not an academic sort. But I should write a paper. Of course, writing about it now interferes with that goal. Because refereed journals prefer to review new materials. But communication is my goal, and not fancy references.

     Mithril does the kind of refcounting I describe below. And at the end, I include an old Netscape usenet posting. An introductory excerpt is also reachable from runtime. But right now I'll give a short summary in clear terms. A main idea is that two counts are necessary, not one.

     Most of what you know about refcounts is still true. But one standard piece of refcount folklore is wrong. This lie, or mistake if you will, blinkers further ideas. Folks say cycles in refcount graphs cannot be collected. This is wrong when refcounting right with two counts.

     The background context is traditional garbage collection. Two different styles are refcounting and stop-and-copy. Counting reclaims old space faster, but is very tedious. And in typical systems with one count, a cycle "leaks." A cycle causes a self-referencing chain of nonzero refs.

     It doesn't happen with two counts and clear definitions. The problem with one count is confusion of two issues. Coders normally conflate two things best kept separate. Using space and referring to space are independent. If you confuse them, you get unresolvable dependency.

     But if you count them separately, all problems go away. Say you have cycled objects r => A => B => C -> A. Notice => and -> are different. This is very important. => means uses and refers to both at the same time. An object being used is not allowed to stop working.

     In contrast, -> only means the lesser refers to alone. An object being referred to cannot be space reclaimed. We call => a strong or a hard reference (use and ref). But we call -> a weak or a soft reference (ref only). Two bars in = means both, and one in - means ref only.

     This logical model is done physically with two counts. Counted items have a count of uses and a count of refs. Adding a -> weak ref increments only the refs count. Adding a => strong ref increments both uses and refs. Furthermore, a strong graph must be without any cycles.

     The last statement is the key factor. It might seem a trick. Graph cycles were the problem, and now they're illegal. Can we do that? Duh, why the heck not? Yes, it works. If two objects need a relation, a weak ref does just fine. Strong refs are only needed to show ownership relations.

     Note this means use and reference lifetimes are different. When uses go to zero, an object halts (stops working). One effect of halting is that all uses and refs are released. Notice a halted object can no longer be part of any cycle. Use graphs are acyclic, so cycles can't hang object halting.

     The good part is that halting does not cause reclamation. That's where we finally got away from evil dependency. When no longer used, an object halts even with more refs. No weak referrers will crash since memory is still good. It allows cyclic graphs to unwind without dangling refs.

     Using strong refs, multiple owners share owned objects. With weak refs, objects can connect without ownership. Concepts of ownership are largely an application view. It requires high level semantic knowledge from coders. Low level runtimes have trouble automating ownership.

     Strongness or weakness is an attribute of variable slots. Some variables are always strong, some always weak. (If you don't use this convention, things get messy.) Old refcounting systems would have needed this idea. This clean model did not evolve in the idea's absence.

     Let's go back to the example, r => A => B => C -> A. What happens when root r releases the strong ref to A? If these are the only refs, this graph totally unwinds. The one weak ref, C -> A, does not keep them around. A full description of events requires one more rule.

     When a strong ref releases, uses go down before refs. Inside the strong release, a strong ref becomes weak. This is another key idea that prevents more problems. When an object is halting, the refs can never hits zero. Because refs is not decremented until after the halt.

     In old systems, some objects free themselves too soon. This is deucedly awkward, but avoidable with hacks. But hacks are unnecessary with a two counter system. When r's ref goes weak we get r -> A => B => C -> A. Then when A's uses hit zero, we halt the object A.

     During A's halt, it releases the strong ref to object B. When A's ref to B goes weak: r -> A -> B => C -> A. Then B's uses hit zero, releasing the strong ref to C. When B's ref to C goes weak: r -> A -> B -> C -> A. Finally during C's halt, it releases the weak ref to A.

     As we start to unwind, we now get: r -> A -> B -> C. Notice lots of recursion is going on in all this halting. In the graph show, we're inside halt() for A, B, and C. But now we are unwinding, and will decrement refs. r -> A -> B -> C becomes r -> A -> B, then r -> A.

     Each time our refs goes to zero, we free the object. C, B, and A all go to zero refs in turn and get freed. In the end, the entire graph r owned is halted and freed. And all it had to do was release the one strong ref to A. The same unwinding works for big and complex graphs.

21dec01 support for interpreting

gc and resourcing

     A couple days ago, Jean-Francois Brouillet sent me this link. I'm no longer quite awake enough to write about it tonight. But this is right, garbage collection is not enough all alone. A conscientious programmer must allocate memory anyway. So refcounting practice also applies in collected languages.

Jean-Francois Brouillet wrote: One of the best (if a bit verbose, and at times too digressing) expositions of the issues a ``real'' programmer has to face, GC or not GC [and the explanation of why, oh why I hated the java.awt.Color class so much: no possible ``resourcing'' as Duane puts it]

     A good code interface does not set usage policy for callers. The most important usage policy is for allocation of objects. So object allocation with no choices at all is very bad form. A mechanism layer must let callers decide where objects live. When this rule is not obeyed, engineering is lousy at best.

     Soon I'll quote something from that link and say more on this. It's possible to write code that does not generate much garbage. This leads to better performance when gc is thereby avoided. It's considered standard expert practice in dynamic languages. Libraries preventing the practice are second rate junior efforts.

24aug02 separated at birth

runtime infrastructure

     I wrote some basic C++ wrappers for pthread mutexes and threads. These are vaguely similar to C++ wrappers we use in my day job. I don't need them right now, but I was trying to warm up to coding. Then I started working on the new Mithril refcounting infrastructure. I picked up where I left off a while back, and instrumented class ybNd.

     That's the new simpler node class I use for base reference counting. I added hooks for automated auditing with interned stack backtraces. I wrote a C++ wrapper for backtrace code I wrote a few weeks ago. This is the same kind of auditing I normally do for memory auditing. It involves annotating either blocks or references with the backtraces.

     That way any block or reference leaks can be pinned on a guilty party. Right now I'm much more interested in auditing the node refcounting. I'm doing a lot of shoot-from-the-hip coding today. Going fast helps. It's boring to do this stuff again. I just need to be careful so it works. I'm reworking green memory management for an Ag virtual machine.

25aug02 daydream believer

     I've been coding a lot, and now I'm going to bore you with results. I only just finished some timing tests, and now it's too late for more. My goal today was to instrument refcounts with backtrace capture. But using interned objects that can capture the population statistics. I still have a lot of existing Mithril code to rework using class ybNd.

mutex debugging

     I wrote my mutex wrapper slightly wrong, but I debugged it today. I decided to make my backtrace interning utility fully threadsafe. Later that will be useful when coping with multithreaded contexts. I added compiler switches to turn off the mutex usage to save time. I wanted to see the refcount instrumentation overhead in isolation.

static destructors

     I wanted to print a table of all interned backtraces on my app's exit. So an object which fires at static destructor time is useful for this. But I didn't have a good way to make my standard sinks available. So I wrote new global (static) accessors to create then on demand. When dynamically allocated on the heap, they won't ever go away.

     And I went ahead and made creation threadsafe with new mutexes. That will be useful later. They only lock a mutex when it's necessary. (I'll let you consider how you can check values without locking them.) While I was at it, I added a mutex to file sinks for cooperative locks. This is useful for reserving a shared sink under multithreaded usage.

node migration

     I started the node migration from old class ybNode to new class ybNd. To start, I made one of the old subclasses use the new superclass. This was the tracer class I use for formatting call tree diagnostics. To accomodate this, I had to alter the creating context a fair amount. As a side effect, I reworked heap interfaces universally in the code.

     This caused most uses of old class ybHeap to become new ybHp instead. The old heap class is still used, but only in few places knowing more. Then I spent a long time getting the new codebase to just run again. So I got my basic testing done with one node class fully switched over. Then I decided to instrument and test the refcounting used by ybNd.

pins and arcs

     A pointer to a node is basically an arc. Some arcs are refcounted. Not every pointer to a node is refcounted. Only some of them are. I already had a special pin class to handle all ybNd arc refcounting. (Actually, there's two flavors for both hard and soft refcounted arcs.) The use of pins in Mithril will be pervasive for all node refcounting.

     This is because it's feasible to turn on automated refcount auditing. The pin class can associate an interned backtrace with a node pointer. (But to save space, this slot is not there when auditing is switched off.) I wanted to see this actually work, and I wanted to see how fast it ran. So I wired up all the auditing and wrote tests to measure the overhead.

refcount instrumentation

     By the way, I did all this today. I don't care how that sounds to you. The interned backtraces show in the output below, at the very end. This is the backtrace method I wrote but in a cacheable object form. I ran output below through my ytr tool to escape the HTML markup. I use a lot of pseudo XML format for output purposes, so this helps.

pinTimeTest(10000): msElapsed='27.512' us='27512' usPer='2.751' pinTimeTest(): mutex:on/off='49997:49997' path:count/sum='10000:100000'

     The two lines above turn into lines in blue below with mutex disabled. You can undefine switches to make backpath capture non-threadsafe. Since I'm testing single threaded now, I wanted to simplify the times. So the numbers below remove overhead associdated with mutexing. The difference is quite small, though. I wish numbers were consistent.

     What did those fifty thousand on/off call pairs for mutex come from? I had a loop which wrote one of two node refs into a pin, 10,000 times. I took turns writing one node then another, to force cut/add ref pairs. With instrumentation enabled, each time I interned a stack backtrace. So five mutex locking events were caused by these specific operations:

  1. mutex the factory to allocate a backpath object from the free list.
  2. mutex the hashmap to add the new backpath with a backtrace.
  3. mutex the factory to deallocate the duplicate backpath just used.
  4. mutex the newly found backpath to add usage for the ref added.
  5. mutex the old pin backpath to cut usage for the old ref removed.

     Here's what some of the timing numbers mean in the output below. Let's use microseconds as our standard, using 'us' as units notation. (Keep in mind this is all on a 800Mhz TiBook running osx 10.1.5.) Lock and unlock a single mutex: just over a third of a microsecond. Using a C++ object's destructor for safe auto reap is slightly slower.

     Add or cut a reference a node: just over a twentieth of a microsecond. But doing them both together in a tight loop is, um, four times as long. You'd expect twice as long. Maybe this is a caching effect for the code. The five mutexes for interning a backtrace is less than 5 times 0.33us. Only 1.1us of time drops off the time for setting a new node in a pin.

     Setting a different node in a pin takes about 1.7us when instrumented. This is an addref, a cutref, a backtrace capture, and a backpath interning. That includes a CRC-32 of the 40 bytes worth of backtrace array slots. And it includes all the factory and hash table work for finding a match. This sounds fast for instrumentation to me, and it won't happen often.

13sep02 calculus for dummies

mithril landing

     Today I was finally able to link Ytterbium code with many changes. I'm not out of the woods yet, since the runtime cleanup isn't finished. I completely moved aside old Biscuit code and old node class code. (This old code is just in an "attic" directory now for future reference.) This broke a few remaining uses that I finally cleaned up completely.

     Now I'm switched over to my new refcounting (it's similar to the old). I can see why it's almost impossible for folks to alter their runtimes. I did way too much work in this without even redesigning anything. Just for laughs, I ran the recent atom unit test to see what happens. The soft failure generated a backtrace using my backtrace generation.

     (My code fails with assertions much more often than it ever crashes.) I had neglected to allocate space for the box flap in boxed class stubs. So a dynamic type check failed when booting the class object system. I might be able to straighten it out tomorrow, after going on holiday. We're going to a Renaissance Faire, so I might not code tomorrow.

22sep02 silent labor

events

     However, I will be coding events since they're a basic part of Mithril. I worked on it indirectly tonight, starting with memory managment. Well, it's in common Ytterbium stuff that IronDoc and Mithril share. Everything grounds in memory management, of course. Events too. Right now I've defined a bunch of page block allocation interfaces.

     Mithril arenas allocate large chapter blocks for garbage collection. A smaller granularity for re-usable, uniform sized block is needed. My new page heap interface can be implemented on top of arenas. Chapter blocks can be divvied up into smaller aligned page blocks. But that's not required, because I try to avoid unnecessary coupling.

     So I have a couple more primitive versions that do the simplest thing. The base allocates and frees page blocks outright without alignment. Another subclass allocates hunks (aka slabs) for page suballocation. You can choose the right page heap to use depending on the context. I allow (dynamically) for both threadsafe and non-threadsafe usage.

     Elsewhere I say gc space is "gold" memory, and C++ space is "green." Pages are for re-usable green memory allocation with uniform blocks. Various types of both btrees and blobs will use pages for internal state. I try to keep state out of collected gold memory when I want speed. Messaging is inter-city, so events must live mainly in green space.

     They'll need to be refcounted, and have a wrapper box in gold space. Or they could be converted on demand to gold box format if desired. In any case, since events are variable length, they'll use page blocks. So I'm whipping up a worse-is-better blob class using a page substrate. But it's an inflexible shadow of the better IronDoc btree blob format.

     That's okay because it's simpler and can help test the IronDoc blobs. In principle, later versions of events might prefer a blob btree format. It would certainly simplify editing the events, with copy-on-write too. But right now I don't need editing finesse. I just need simple results. I'm just being pragmatic, since it tends to get things done much faster.

26oct02 crocodile hopping

refcounting

     An old coworker, Seth Spitzer of Netscape, has been investigating the current practice of weak references in the Mozilla codebase, so we can see whether I'm full of, uh, shinola based on my criticism of using only one count and not two (use and ref counts both) for refcounted objects. Today he emailed a description suggesting both counts are present, but distributed over multiple objects.

     Instead of putting both counts in one object (the one being refcounted), something complex happens instead. An associated "weak ref" object counts weak refs separately, and this weakRef object and the main object point at each other and play proxy games. If you consider them together as one object (merge them into one node in the graph by considering their arcs to each other as zero length), then you end up with both counts in one object.

     If his description is right, then the Mozilla model might be equivalent to mine, but using the ugly and inefficient tactic of modeling a simple integer as an entirely separate object which gets allocated separately and adds interface glue letting it pretend to be the first object. The smell test suggests a little bit of architecture makework is involved, but it doesn't sound broken.

     Argument seems unproductive. Tell me what you want me to plead so we can let it go. :-) My original thesis was that Netscape engineers did not grasp the two count concept, as in two simple integers. Now we see one of them is an XPCOM object with proxying. Is that close enough for horseshoes? Hmm, I still don't think folks realized the point was two counts, but it looks like the job is getting done despite complex design style.

     I won't concede my attribution of arrogance is wrong. However, this makes me arrogant too, and probably this incapacitates me in some way resembling stupidity, especially along some social dimension. I don't need to be right, and I don't want to piss off other folks. But I like correctness and simplicity in memory management. I had thought Mozilla's refcounting was crippled, but it appears to be merely complex instead. I was half wrong.