Þ   briarpig  » thorn  » demos  » pile


demos are explained here; a menu at top column right indexes actual topic demos. Here we demo pile.

problem «

     To manage memory for garbage collecting dynamic languages, Wil wants a new kind of heap dividing memory into re-usable pieces he can use to create lightweight processes with disjoint address spaces. You can see a little of this described in the book demo Several years ago Wil came up with a plausible first draft design described here, which hasn't yet been used in production. Nor has any code like this been used in a day job, though large uniform-sized memory blocks for buffering distantly resemble parts of i/o page caches Wil wrote in the 90's.

     (And in fact, Wil will probably write another i/o page cache using either chapters or books in some later demo for implementing btree indexes or file systems is a file. But that's not relevant here, even if this demo makes it easy.)

     Instead of a virtual memory composed of (typically 4K) pages, Wil wants a larger size of pooled memory buffer containing a power of two number of bytes an aligned address. The book demo describes a typical format for a 64K super-page aligned to a 64K address — or variations in size for larger books.

     Future toy languages will allocate garbage collected memory in books allocated from the yPBH class shown on this page, whose name means thorn Page Book Heap; it subclasses the basic yH heap api shown in the heap demo, and adds new methods to allocate yP pages and yB books, and other similar objects of other sizes (eg yK kilos and yC chapters).

     But pile is a short nickname for yPBH.

pile «

     Wil uses the term pile as a terse one-syllable synonym for the yPBH class shown on this page, because pile is a synonym for heap, and works describing a pile of pages and books, which are the primary entities managed by a pile.

mechanism «

     The pile api focuses on mechanism for allocating pages and books and similar objects of varying size, but has almost nothing to say about policy, such as what you might use pages and books for (whatever you like), or how one might impose resource use constraints like: don't allocate more than some limit from a source heap. (yPBH allocates raw memory from another arbitrary yH instance passed to the constructor.)

     So although Wil plans to use books for garbage collected language address spaces, among other things, nothing below exlains how that might be done, or how memory usage in a language might later be controlled by user driven policies about desired footprint and tolerances for special situations. All those issues are higher level concerns, and none of yPBH's business, as a simple matter of good architectural separation of concerns, let alone incremental coding and Wil's peace of mind.

     Policy is something you add later by composition. So a pile manages pages and books; some other code decides how much to allocate, and what purpose is served by each allocation.

     Some messy questions are simply not addressed. For example, how long should a pile live? How can you ensure a pile lives as long as the objects it manages? It wouldn't do to have a page or book free itself by using the allocating pile if that pile has already been destroyed (which would actually free every page and book inside, so trouble might start earlier).

     The pile api provides no answers. That's all "policy" and must be answered by applications and/or frameworks using pile instances for some purpose. Presumably more api will need to be added so apps can handle some situations better. (But you might simply use the node api if you want a refcounted heap — just put yPBH inside a node and ensure a handle is kept safe as long as the pile is still in use.)

     In the meantime, if something bad happens when you use a pile in some pathological manner, Wil has the following useful advice applicable here and elsewhere: don't do that.

threading «

     The original pile implementation was not threadsafe, because Wil didn't need it to be threadsafe yet, and Wil imagined uses of a pile that don't need thread safety. (Maybe later Wil will write a demo showing how to turn any piece of code you fancy into a thread-safe version just by following a few rules and conventions. But that won't happen here.)

     However, yPBH is a heap subclass, and most folks are used to thinking a heap is automatically and necessarily thread-safe, since typically "the heap" is a global shared resource that must service concurrent requests in multiple threads. Don't let past associations confuse you: not all heaps are the same.

     But at the last minute, Wil asked himself "Why not?" and added mutex support so yPBH can be used with multiple threads. This was almost added with a compiler switch so thread-safety was only present when preprocessor symbol yPBH_MUTEX was defined. But there seemed little reason to ever remove the mutex once it was added, so Wil removed the conditional compilation. However, to remove it again, all you need do is comment out every bit of code where member h_mutex is used.

pile mutex «

     When this demo was nearly complete, Wil decided to make yPBH thread-safe by adding use of member h_mutex for mutual exclusion via the yx mutex object shown in the mutex demo, which is basically a thin C++ wrapper around pthread_mutex_t from the Posix pthreads library.

     This addition was quite easy. Let's describe it here as an example for you to imitate in the future when making objects threadsafe. First Wil added h_mutex to the class declaration, and made sure each constructor mentioned it by name. Wil declared this member variable using a mutable keyword so const methods — like print — can lock the mutex without casting away const.

     Then Wil needed to find public method entry points needing mutex before mutable yPBH state is read or written. In these Wil added a line that looks like this:

yxg guard(h_mutex); // thread safety

     This usage locks member h_mutex where guard is constructed, then unlocks it again when guard is destroyed at end of the containing block scope (usually end of method).

     Note this particular mutex implementation does not allow re-entrant locks by the same thread, so once h_mutex is locked it can't be locked again by the same thread without deadlocking.

     So Wil had to make sure the mutex would be locked once and only once when public entry points are called. This happened to be easy because methods needing mutex did not call each other. But if they had, then Wil would have needed to break those common pieces of code into inside and outside parts, where inside parts are only called from code already holding the mutex.

     This is basically a boundary logic problem, which is rarely discussed in programming circles. In fact, Wil doesn't remember talking about boundary logic with other programmers in the last fifteen years. Hmm. Wil first encountered the idea when discussing virtual world representations with folks looking at Spencer Brown's somewhat strange Laws of Form for clues in managing spatial integrity. Wil was told folks in biology used this form of logic to discuss cell integrity. Anyway, one of the basic ideas is distinguishing between inside and outside, carefully tracking when you pass through the membrane of a boundary. (Parts of Brown's odd notation are elegant representations of crossing boundaries of circles in Venn diagrams. If you like, think of it as viewing logic diagrams from the perspective of inhabiting a moving point traversing divisions between spaces, instead of having a global viewpoint as when reading a Venn diagram of circles denoting sets.)

     Was that too big a bite for you to chew? Okay, let's simplify. Your object api presents one or more paths to reach the inside of your object where state must be protected by mutex. Each of these paths must be guarded by a lock which acquires the mutex. But you must only do it once when passing the membrane to the inside. If you lock again when inside, you'll just deadlock because you're already holding the mutex.

     Using a yxg guard to acquire the mutex is so simple because you're just declaring the point at which the membrane is entered where you'll need mutex once inside. Think about it spatially and you won't have any problem.

     The boundary between inside and outside is not identical to the division of public and private methods, but it's close and somewhat related. You must think and not mutex blindly.

mutex optimization «

     Because yPBH wasn't designed and written with thread safety in mind, current places where mutex is acquired might not be most efficient despite being safe and correct (Wil thinks).

     A potential latency problem exists when yPBH asks the source heap for more memory. But this effect is likely to be small and not worth attention unless the source heap can be very slow compared to normal heap latencies.

     Wil expects users of yPBH to avoid mutex costs by allocating books and pages and then hoarding them locally for use by a single thread. This is how some lightweight processes for dynamic languages can use yPBH: as a source of books stored in yBqm address spaces (cf «) used by a single thread, because lightweight processes can communicate by messages instead of sharing mutable state. But both are also easily arranged.

arena «

     A yPBH pile is actually a kind of arena-style heap, holding some memory allocated until the pile instance is destroyed, at which time all memory still allocated is finally freed all at once. Pages and books, for example, are held in pools inside yPBH until the destructor is executed. The source yH heap passed to yPBH's constructor won't get large blocks allocated back until yPBH is destroyed. But keep the following exceptions in mind:

     Currently all inherited virtual yH methods — hmalloc(), hcalloc(), and hfree() shown in the heap demo — all simply delegate to the source yH heap, so each hfree() is freed immediately: that memory won't be pooled. Note it could be pooled; that would be easy. And maybe that would be useful in some versions of yPBH. But that doesn't happen now. Think of that behavior as an optional contract with your local yPBH implementation; but not this one.

foreign function interface «

     The pile api shown with associated models to manage pages and books is part of the foreign function interface for toy dynamic languages written in þ, as is the refcounting api shown in the node demo. These interfaces explain where memory comes from and how long it lives in dynamic languages written under þ. Extensions written in C++ (and C if you must, but Wil thinks you're crazy) must take these into account.

     Garbage collected memory in particular is tricky, especially when it can move in memory when collected by copying. According to Wil's current plan, only memory allocated inside a book can move, and only when that book is in a garbage collected address space. This will be the subject of other demos, like the near future stack demo showing a C++ object for holding gc roots into garbage collected books.

     Another simpler object (TBD) will support holding a single gc root inexpensively, so code written in C++ can safely refer to a boxed gc node in collected memory that can move.

     References in the other direction, from gc memory to non-collected memory, are simpler to support — actually trivial in many cases — when referenced memory doesn't move. When a non-gc object x lives indefinitely, like a static global C string, a boxed object in gc memory can simply point directly at x without interfering with gc. If (non-gc object) x masquerades as a gc format boxed object, which Wil expects to do often, x need only have a suitable header with expected alignment; it won't be copy collected because the book of x won't be in the source address space described by yBqm from the book demo enumerating all books in a given gc address space (cf «).

     When a gc boxed object refers to C++ object refcounted using a node api handle, this will require a specific type of box format with a ynh node handle inside, which can be bitwise copied safely from one gc location to another while maintaining integrity.

alphabet soup «

     The pile api is riddled with letter-coded names that might seem unfriendly compared to long explanatory names. But yPBH's public api of interest is very small. Other than hmalloc(), hcalloc(), and hfree() inherited from the base yH heap api (shown in the heap demo, cf «) the remaining methods are mainly about allocating and finding yBn, yPn, and yKn: nodes for yB books, yP pages, and yK kilos.

     So the public api (near the end) is little more than:

  • hBn() and hfindB() allocate and find book nodes
  • hPn() and hfindP() allocate and find page nodes
  • hKn() and hfindK() allocate and find kilo nodes

     Of course, as common to most þ objects, you can also ask an instance if it looks good, and you can print each to an out-stream. But the public api is tiny and highly parallel since it just repeats itself for different object types allocated by the heap. When a 16K chapter object named yC is added, then yPBH will sprout new methods hCn() and hfindC() to allocate and find chapter nodes. Confusing? Not at all.

     You might feel names allocBookNode(), allocPageNode(), and allocKiloNode() are vastly more clear than hBn(), hPn(), and hKn(). But Wil disagrees since the purpose of yPBH in the first place is to allocate these, so alloc is implied. In this matter of taste you're welcome to rename methods in your clone where no one gets to argue with you.

     Names used internally are more terse than they need to be however. But there aren't many types used internally either. Most of the internal state of yPBH is just lists of the yl32 variety shown in the list demo (cf ») which derives from primitive yq: a simple double ended queue (cf ») to which yl32 merely adds a length member.

     Most yPBH list members have names of form h_lxY where Y is format — P for page, B for page, and K for Kilo — and x is the state of nodes in that list: f for free, u for used, h for heap, and r for raw (ie unconstructed). So member h_lfB means list of free Books and would instead be named m_freeBookList in most of Wil's day jobs. But when Wil writes code for himself, he cuts corners when a shorter name style suits a desire to see more code at once in a smaller space, with fewer distracting words hiding the forest with verbal trees.

     Let's face it: member names aren't going to give you real trouble. Instead, some of the methods might bend your mind just a little, and different names for lists won't help you much. Wil knows exactly what the names mean, and subtle inter-method dependencies still make him pause.

comments «

     Code comments are just for you: Wil doesn't need them — until many months go by and Wil tries to reconstruct exactly what he was thinking. Okay, so Wil does need them. Wil's philosopy of code comments revolves around a belief comments can re-express what code does — especially why it is done — in sufficiently redundant detail without repeating exactly what you see in code, so you can verify what you think code does matches what the author meant the code to do.

     If comments are not redundant enough to be potentially inconsistent, then they aren't useful because they don't predict anything testable. A developer reading code should be able to form a theory based on what comments say which can be born out by testing later when more code is read, studied, correlated and analyzed. When you see something odd in code that might be a bug, how can you be sure the author didn't mean that odd thing to occur? If a code author says something entirely different in a comment, that's a sign the code might have a bug. Or the comment might be stale, but that's the price of redundancy.

     Wil would rather have comments that are sometimes wrong than lack more clues to help show when code is wrong. A clue in comments that helps you find and fix one nasty bug is worth ten incorrect comments.

     When Wil reads code written by others, he usually finds bugs almost immediately. But it's really hard to be sure without comments explaining the author's intent. Reading large bodies of code with no comments, Wil often feels like he's solving what should be N equations with N unknowns, if the code just barely manages to work correctly. But what's more common is N equations with N+2 unknowns or even (shudder) N+10 unknowns due to obvious collective bugs hard to localize to specific lines without some incriminating intent in the form of comments to narrow a search for meaning in symbols.

     When Wil learns a codebase written by someone else, Wil asks questions when the original author is still around, to explore what happens when certain edge conditions occur — because no comments suggest what might happen, and nothing in code addresses the issue. More than half the time (sometimes much more) the question is news to the code author, and the absence of an answer is a bug: the code does the wrong thing under that condition. Folks tell Wil, "You should fix that."

     When Wil sees code without comments, he usually assumes an implicit message from the author is a message like this: "You can't prove I don't know what I'm doing if I never say what I think I'm doing." No, the proof just takes longer.

     By adding lots of comments to code, Wil is kinda hoping you'll notice an inconsistency and report it so Wil can fix a bug, because his self esteem isn't threatened by making an error now and then. Whereas bugs in code are very irritating.

helper classes «

     The following helper classes are used internally by private yPBH pile methods, and might later be used in some public api. So they're defined globally where you can see them, instead of inside yPBH as private nested classes. However, you can safely ignore them unless you want grasp all of yPBH's code. Each represents a hashmap subset corresponding to the bucket to which a key hashes.

     These helper classes end with z for slice even though they do not subclass the slice api; but they do have the same z_p and z_n member names as yz. However, yPBH only uses the z_p member so far — putting the rest of the helper class clearly into overkill territory, except Wil has a vague idea these might be useful in iterators later.

struct yBHz «

     Class name yBHz means vaguely book heap slice and refers to a hashmap bucket for a linked list of yBn book nodes, where one might search for a target or push a new member.

     The z_n member is included since a general class based on a pointer ought to always have an associated length, in case its useful. (Not too expensive as yagni goes.)

/// yBHz is a slice (not subclass of yz: maps are unordered) struct yBHz { // slice of yPBH is ONE yPBH bucket: elem list « typedef size_t sz_t; // size type « typedef yBn e_t; // elem type «

yBn** z_p; // buckets vector (point in buckets vector) « sz_t z_n; // vector length (meaning depends on context) « // 1) on new bucket vector alloc, z_n is vector length. // 2) other uses of yBHz use z_n as convenient

yBHz() : z_p(0), z_n(0) { } « explicit yBHz(yBn** p) : z_p(p), z_n(0) { } yBHz(yBn** p, sz_t n) : z_p(p), z_n(n) { } }; // struct yBHz

     Note yPBH just uses yBn** z_p, and only internally for book map operations.

struct yPHz «

     Class name yPHz means vaguely page heap slice and refers to a hashmap bucket for a linked list of yPn page nodes, where one might search for a target or push a new member.

     The z_n member is included since a general class based on a pointer ought to always have an associated length, in case its useful. (Not too expensive as yagni goes.)

/// yPHz is a slice (not subclass of yz: maps are unordered) struct yPHz { // slice of yPBH is ONE yPBH bucket: elem list « typedef size_t sz_t; // size type « typedef yPn e_t; // elem type «

yPn** z_p; // buckets vector (point in buckets vector) « sz_t z_n; // vector length (meaning depends on context) « // 1) on new bucket vector alloc, z_n is vector length. // 2) other uses of yPHz use z_n as convenient

yPHz() : z_p(0), z_n(0) { } « explicit yPHz(yPn** p) : z_p(p), z_n(0) { } yPHz(yPn** p, sz_t n) : z_p(p), z_n(n) { } }; // struct yPHz

     Note yPBH just uses yPn** z_p, and only internally for page map operations.

struct yKHz «

     Class name yKHz means vaguely kilo heap slice and refers to a hashmap bucket for a linked list of yKn kilo nodes, where one might search for a target or push a new member. (Yes, that's right: yKn doesn't appear in a demo yet. It's for yK objects of 1K size and 1K alignment.)

     The z_n member is included since a general class based on a pointer ought to always have an associated length, in case its useful. (Not too expensive as yagni goes.)

/// yKHz is a slice (not subclass of yz: maps are unordered) struct yKHz { // slice of yPBH is ONE yPBH bucket: elem list « typedef size_t sz_t; // size type « typedef yKn e_t; // elem type «

yKn** z_p; // buckets vector (point in buckets vector) « sz_t z_n; // vector length (meaning depends on context) « // 1) on new bucket vector alloc, z_n is vector length. // 2) other uses of yKHz use z_n as convenient

yKHz() : z_p(0), z_n(0) { } « explicit yKHz(yKn** p) : z_p(p), z_n(0) { } yKHz(yKn** p, sz_t n) : z_p(p), z_n(n) { } }; // struct yKHz

     Note yPBH just uses yKn** z_p, and only internally for kilo map operations.

pile class «

     Class yPBH is a subclass of yH from the heap demo whose main purpose is allocating yB 64K books from the book demo and yP 4K pages from the page demo. So the name means Page Book Heap. The short nickname of yPBH is pile — a synonym for heap — to suggest an image: a pile of pages and books.

     In addition to pages and books, the pile api also allocates instances of memory in formats with different sizes, but all aligned to a multiple of the format size. The yK 1K kilo format was recently added as an example, but its code has not yet been featured in a demo. A yC 16K chapter format is planned whose code looks like yK with a different size constant.

     Production code is often more useful when many different similar variations are present so a best fit is in hand when you reach for it. But filling out a full suite of useful possibilities is tedious — this api is still on the light and sketchy side. However, to avoid memory fragmentation, you should plan to avoid creating too high a watermark in one memory format that you'll never reach again, since this can reserve pools of memory in that size never used again. (This would ironically counter some of the intended benefit of this api.)

class yPBH «

     Public methods appear toward the end of this class declaration. (Method implementations shown in the right column are defined as close as possible to code use sites in other methods, in the hope some cache line locality will occur.)

     Wil's ordering in a C++ class api goes slightly against convention: it's no longer necessary to declare things before they are used in a C++ class, but Wil still likes to define private state and methods before other methods using them.

     If you prefer to read public before private, you can read the class api backwards: here's a link to the end.

class yPBH : public yH { // page/book heap for yK, yP, yC, yB «

public: // 64 books: 4M, 128 books: 8MB etc when kBsz is 64K // alignment to kBsz is done by *dicing one*, make this BIG: static const size_t khunkBooks = 128; // 8MB yB's per hunk « typedef size_t sz_t; // size type «

     Size type sz_t is probably unsigned, but it's defined here as standard size_t to match typical interfaces used to alloc memory. Some folks call this silly abbreviation.

     The number of books allocated together at one time in a single hunk of memory is khunkBooks: 128 in a row at a possibly non-aligned memory address, out of of which 127 aligned books can always be carved by splitting off ends. The comment says 8MB is allocated at one time — this assumes a yB book is 64K in size, but that constant can change and make this comment wrong (until someone fixes it).

static const sz_t khunkBytes = khunkBooks * yB::kBsz; // 8MB« static const sz_t kprime1k = e1i_1021_1k; // prime < 1K « static const sz_t kprime8k = e1i_8191_8k; // prime < 8K « static const sz_t kprime16k = e1i_16381_16k; // prime < 16K « //static const sz_t kprime64k = e1i_65521_64k; // prime<64K

     The actual number of bytes in a hunk is khunkBytes, which doesn't assume a 64K book size; it uses the proper formal constant defining book size.

     The various prime hashmap sizes listed above are taken from the primes demo, used to fit the most 32-bit pointers into a given format as possible in a prime bucket count. (When pointers are a different sizes — say 64 bits — then different prime constants must be used. This code has not been ifdef'd to automatically make a switch.)

private: mutable yx h_mutex; // mutable during const methods «

     Mutex h_mutex is a thin wrapper around pthread_mutex_t from the Posix pthreads library, shown in the mutex demo. This mutex is locked in most yPBH public entry points, but not every single public method because some methods are already thread-safe as long as you don't call the destructor.

     For example, inherited methods hmalloc(), hcalloc(), and hfree() are implemented now by delegating to the h_H source heap passed the constructor. Since these methods don't use or examine state in yPBH, none of them currently need to lock the mutex.

yH* h_H; // actual allocator used by this heap subclass « yl32 h_lqve; // list of yqve: 1 per hunk allocated from h_H «

     Source heap h_H allocates all the memory — yPBH delegates all raw memory allocation to h_H, so you can choose whatever allocation scheme you like by subclassing yH.

     Every large hunk of khunkBytes bytes of memory allocated from heap h_H goes into list h_lqve, which captures each block of memory allocated in yv run format using a yqve element shown in the list demo (cf »). This h_lqve list remembers all the allocations so they can be freed in the destructor (and so they can be examined when debugging).

     The memory for each yqve element appended to list h_lqve comes from inside each 8MB hunk allocated — so each hunk is chained using space inside the hunk itself. This space isn't necessarily from the head of each hunk, but it could be. (One book's worth of bytes in each hunk, 1/128 of each, is always subdivided into pages and raw space for list elements and nodes. This is the source of bytes allocated.) As an obvious consequence, freeing each hunk also frees the yqve list elem inside for that hunk.

yl32 h_lrP; // list of RAW yPns never constructed; space « yl32 h_lfP; // list of FREE yPns in queue; yPn::e_nfree « yl32 h_luP; // list of USED yPns in queue; « yl32 h_lhP; // list of internal HEAP yPns; yPn::e_nheap « n32 h_nPfuh; // sum of yPn list lengths for f, u, and h «

     The four lists above contain yPn page nodes from the page demo (cf «) subclassing yqe from the list demo (cf ») specifically so yPBH can hold page nodes in exactly one of the lists above.

     The first list, h_lrP, contains raw nodes that have never been constructed, pushed into that list wholesale from a whole or partial page. Nodes in h_lrP reference no yP pages, and are invalid until constructed. This is where yPBH allocates nodes for the other three lists which do reference yP pages.

     Letters f, u, and h stand for free, used, and heap, indicating how pages in each list are used. When constructed, each yPn page node is coupled one-to-one with a 4K yP page, so each page node represents the status of the page it owns. Allocating a yPn node is the same as allocating its yP page.

     Any f=free page is unused with a zero refcount and is ready for allocation by hPn(). List h_lfP is a free pool of pages.

     Each u=used page has already been allocated by hPn() and has nonzero refs. List h_luP is a pool of allocated pages.

     Each h=heap page was allocated by the yPBH pile itself for internal use. List h_lhP is a pool of private heap pages which are never later returned to a free or used list. Typically heap pages contain nodes or arrays of hashmap buckets, or anything else yPBH needs for bookkeeping.

     Member h_nPfuh is a count of all pages in the free, used, and heap lists. It's the sum of lengths of lists h_lfP, h_luP, and h_lhP, and is equal to the total number of yP pages known by yPBH. The name means number of Pages: free, used, or heap — explaining the slightly awkward name.

     Some pages are allocated when each new 8MB hunk memory block is allcocated from source h_H heap. But as long as the list of free books is not empty, new pages are allocated by dividing a book into pages when free list of pages h_lfP is empty.

yl32 h_lrK; // list of RAW yKns never constructed; space « yl32 h_lfK; // list of FREE yKns in queue; yKn::e_nfree « yl32 h_luK; // list of USED yKns in queue « yl32 h_lhK; // list of internal HEAP yKns; yKn::e_nfree « n32 h_nKfuh; // sum of yKn list lengths for f, u, and h «

     These members for lists of yKn kilo nodes are identical to purpose and structure to lists shown above for yPn page nodes, so we won't repeat all that material literally with K substituted for P, even though it's roughly correct. Let's compress a little.

     However, unlike yP pages which are sometimes allocated directly from the 8MB hunk instead of a book, each yK kilo is always allocated from a yB book by dividing one book into as many kilos as fit in a book. This is done partly to foster more locality in kilos, since they all share a common subset of a process address space. But it's also for simpler, smaller code.

     List h_lrK holds raw yKn kilo nodes never constructed, having been pushed in that list from a page allocated to hold nothing but yKn kilo nodes. Nodes in h_lrK reference no yK kilos, and are invalid until constructed. This is where yPBH allocates nodes for the other three lists which do reference yK kilos.

     Any f=free kilo is unused with a zero refcount and is ready for allocation by hKn(). List h_lfK is a free pool of kilos.

     Each u=used kilo has already been allocated by hKn() and has nonzero refs. List h_luK is a pool of allocated kilos.

     Each h=heap kilo was allocated by the yPBH pile itself for internal use. List h_lhK is a pool of private heap kilos never later returned to a free or used list.

     Member h_nKfuh is a count of all kilos in the free, used, and heap lists. It's the sum of lengths of lists h_lfK, h_luK, and h_lhK, and is equal to the total number of yK kilos known by yPBH.

     Note yK kilos aren't used much yet. They were recently added as an example extension to show a yPBH pile can allocate objects in other formats. An early use Wil has in mind is a new version of yPq lists of pages (cf «) to avoid allocating nodes separately without incurring overhead cost of one entire page when the first list member is added.

//yl32 h_lrB; // list of RAW yBns never constructed; space « yl32 h_lfB; // list of FREE yBns in queue; yBn::e_nfree « yl32 h_luB; // list of USED yBns in queue « yl32 h_lhB; // list of internal HEAP yBns; yBn::e_nheap « n32 h_nBfuh; // sum of yBn list lengths for f, u, and h «

     These members for lists of yBn book nodes are identical to purpose and structure to lists shown above for yPn page nodes, so we won't repeat all that material literally with B substituted for P, even though it's roughly correct. Let's compress a little.

     Unlike pages which are allocated from books or the leftovers from allocating books from an 8MB hunk, all books are allocated solely from the 8MB hunks added to the h_lqve list. No h_lrB list of raw, unconstructed yBn book nodes is needed because yPBH simply allocates them all at once when an array of yB books is created, adding them all directly to the h_lfB free list of books.

     When constructed, each yBn book node is coupled one-to-one with a 64K yB book, so each book node represents the status of the book it owns. Allocating a yBn node using hBn() is the same as allocating its associated yB book.

     Any f=free book is unused with a zero refcount and is ready for allocation by hBn(). List h_lfB is a free pool of books.

     Each u=used book has already been allocated by hBn() and has nonzero refs. List h_luB is a pool of allocated books.

     Each h=heap book was allocated by the yPBH pile itself for internal use. List h_lhB is a pool of private heap books never later returned to a free or used list.

     Member h_nBfuh is a count of all books in the free, used, and heap lists. It's the sum of lengths of lists h_lfB, h_luB, and h_lhB, and is equal to the total number of yB kilos known by yPBH.

u32 h_htag; // e_hlive='yPBH' while still constructed « enum { e_hlive = 0x79504248, e_hdead = 0xbabedead };

     Member h_htag is a magic signature that must equal e_hlive while this pile is alive. Checking this value often helps detect memory corruption, early unexpected destruction, and unplanned early use before construction (which isn't as rare as you might think). Having a magic signature like this helps rule out some bizarre options when triaging odd system behavior. Checking it doesn't cost much when branch prediction expects the value is correct, as it always is when things go well.

yKn** h_Kpv; // vector of buckets of length h_Kslots slots « static const sz_t h_Kslots = kprime16k; // full book of yKn* «

     Array h_Kpv is vector of pointers to yKn used as a hashmap of list buckets to which yK kilo istances hash by address (with fewer collisions than you'd think — see the end of hash demo).

     The array has a prime number of buckets, and each bucket is a linked list of yKn chained through its n_n member which is only used by yPBH for this purpose. Every constructed yKn kilo node is added to this hashmap to support future lookups using mfindK().

     Today an entire 64K yB book is used for this hashmap when the first yKn kilo node allocation occurs using hKn(). This might be either overkill or underkill in size, depending on how many yK kilos are used. Under a theory many tens of thousands of kilos might be allocated, Wil errs on the large side to accomodate the worst by default. However, a larger map might be needed for enormous kilo populations. The right thing to do might be adding an api for construction that hints how many kilos might be needed. But for now, overhead of 64K for all kilos is not a terrible price to pay as a percentage of pile footprint. When Wil has time, he plans to grow this map over time as population size increases, starting with a page, then switching to a book and larger as needed.

yPn** h_Ppv; // vector of buckets of length h_Pslots slots « static const sz_t h_Pslots = kprime16k; // full book of yPn* «

     Comments about kilo map h_Kpv above also apply to page map h_Ppv. Except one book is always allocated for h_Ppv from the start when the first hunk is allocated.

     Array h_Ppv is vector of pointers to yPn used as a hashmap of list buckets to which yP page instances hash by address.

     The array has a prime number of buckets, and each bucket is a linked list of yPn chained through its n_n member which is only used by yPBH for this purpose. Every constructed yPn page node is added to this hashmap to support future lookups using mfindP().

     In the future Wil plans to add code growing the map above one book in size when the map becomes overloaded with members. A new hunk can easily carve out four or sixteen contiguous books to use as a page map if necessary.

yBn** h_Bpv; // vector of buckets of length h_Bslots « static const sz_t h_Bslots = kprime8k; // HALF book of yBn* «

     Comments about kilo map h_Kpv and page map h_Ppv above also apply to book map h_Bpv. Except today only one half a book is allocated for h_Bpv from the first hunk allocated.

     Array h_Bpv is vector of pointers to yBn used as a hashmap of list buckets to which yB book instances hash by address.

     The array has a prime number of buckets, and each bucket is a linked list of yBn chained through its n_n member which is only used by yPBH for this purpose. Every constructed yBn book node is added to this hashmap to support future lookups using mfindB().

     The map is currently only half a book because 8K pointers worth of map is enough to cover a very large amount of 32-bit address space with books at 64K apiece. As with the other maps, Wil expects to later grow the map as necessary depending on current occupancy. (The initial 32K block used for the map now is half the divided end-piece book when the first hunk is split to force alignment, since at least one piece of a book cut in two must be at least half a book in size.)

kilo push and pop «

     The next two private inline methods push and pop elements on the the free list of yKn kilo nodes.

void _hpushK(yKn* e) { // push in K free list; annotate « h_lfK.lpush_front(e); e->n_16 = 0; // no id e->n_fuh = yKn::e_nfree; // available }

yKn* _hpopK() { // pop page node from free list of Kn « yKn* n = (yKn*) h_lfK.lpop_front(); if (yunlikely(n && !n->Kngood())) n->Knbad("_hpopK"); // should never happen return n; }

page push and pop «

     The next two private inline methods push and pop elements on the the free list of yPn page nodes.

void _hpushP(yPn* e) { // push in P free list; annotate « h_lfP.lpush_front(e); e->n_16 = 0; // no id e->n_fuh = yPn::e_nfree; // available }

yPn* _hpopP() { // pop page node from free list of Pn « yPn* n = (yPn*) h_lfP.lpop_front(); if (yunlikely(n && !n->Pngood())) n->Pnbad("_hpopP"); // should never happen return n; }

book push and pop «

     The next two private inline methods push and pop elements on the the free list of yBn book nodes.

void _hpushB(yBn* e) { // push in B free list; annotate « h_lfB.lpush_front(e); e->n_16 = 0; // no id e->n_fuh = yBn::e_nfree; // available }

yBn* _hpopB() { // pop book node from free list of Bn « yBn* n = (yBn*) h_lfB.lpop_front(); if (yunlikely(n && !n->Bngood())) n->Bnbad("_hpopB"); // should never happen return n; } // _hshelve is code common to _hinit() and _bnewB() void _hshelve(u8* bv, n32 nbooks, yv* hi, yv* lo); void _hinit(); // lazy construction on first use

     Late lazy construction is done in _hinit() while _hshelve() puts away books, pages, and nodes of raw space still available in the last hunk block allocated by the caller.

new and find «

yBn* _hnewB(); // alloc books; add to free list; pop one yPn* _hnewP(); // alloc pages; add to free list; pop one yKn* _hnewK(); // alloc kilos; add to free list; pop one yBn* _hfindB(yB* key, yBHz& bucket) const; yBn** _hrefB(yB* key, yBHz& bucket) const; yPn* _hfindP(yP* key, yPHz& bucket) const; yPn** _hrefP(yP* key, yPHz& bucket) const; yKn* _hfindK(yK* key, yKHz& bucket) const; yKn** _hrefK(yK* key, yKHz& bucket) const;

     Private internal new methods _hnewB(), _hnewP(), and _hnewK() handle the problems of really allocating new books, pages, and kilos with an associated node. The caller can be a public allocation method like hBn(), hPn(), or hKn(); or the caller can be an internal client getting space for some purpose. These new methods do no refcounting on nodes used only internally.

     The private, internal find methods search the appropriate map for the desired node associated with the given address. Finding nodes in the map is part of both finding old objects and adding new ones, because adding a new node pushes a new member on the top of the bucket returned.

internal printing «

void _hcite(yo& o, bool closed=true) const; yo& _hdumpK(yo& o) const; yo& _hdumpP(yo& o) const; yo& _hdumpB(yo& o) const;

internal add «

void _haddB(yBn& n); // after n is constructed void _haddP(yPn& n); // after n is constructed void _haddK(yKn& n); // after n is constructed

error messages «

void _hnilheap() const { yH::hnil("yPBH::h_H"); } «

void _hnilbuckets() const { yH::hnil("yPBH::h_buckets"); } «

memory alignment «

// cuts a hunk of books into aligned & nonaligned parts // \a hunk: start of N=nbooks non-aligned books, kBsz each // \a before returns leading chunk cut to starting alignment // \a after returns trailing hunk of kBsz-before.v_n bytes // \return nbooks-1 aligned to yB, waste in before & after yB* _halignB(void* hunk, n32 n, yv& before, yv& after);

     Method _halignB() finds aligned books are in an allocated 8MB hunk, returning a book's worth of leftover space in before and after, one of which must be at least a half book size.

///// ----- pages ----- // cuts hunk of pages into aligned & nonaligned parts // \a hunk: start of N=npages non-aligned pages, kPsz each // \a before returns leading chunk cut to starting alignment // \a after returns trailing hunk of kPsz-before.v_n bytes // \return npages-1 aligned to yP, waste in before & after yP* _halignP(void* hunk, n32 n, yv& before, yv& after); void _hrawPages(yv const& raw); // raw to P's & yPn nodes void _hpushrawPn(yv const& raw); // list unconstructed nodes void* _hpoprawPn(); // one unconstructed Pn (alloc at need) // this only gets called from where one page MUST be free: yqe* _hnewrawPn(); // alloc 1 page as raw nodes; pop one

     Method _halignP() finds aligned pages in n pages worth of memory at any alignment used by source h_H heap, returning excess non-pages in before and after. Other methods above manage the h_lrP list of raw yPn page nodes.

///// ----- kilos ----- void _hpushrawKn(yv const& raw); // list unconstructed nodes void* _hpoprawKn(); // one unconstructed Kn (alloc at need) // this only gets called from where one page MUST be free: yqe* _hnewrawKn(); // alloc 1 page as raw nodes; pop one

     Kilo methods above manage the h_lrK list of raw yKn kilo nodes. No method is needed to align yP kilos in memory because allocation occurs only in existing yB aligned books, which already have 1K alignment from superior book alignment.

public: bool hgood() const { return e_hlive == h_htag; } « void hbad() const; // call when hgood() is false static yPBH g_1PBH; // singleton instance using yH::g_1H yPBH(); // uses yH::g_1H yPBH(yH* heap); // uses heap for allocations virtual ~yPBH();

     Goodness depends on a valid magic signature. The constructor needs a source heap subclassing the yH api for all pile memory allocation. The destructor frees all 8MB hunks allocated by the pile. But blocks allocated by hmalloc() and hcalloc() below are not freed automatically when the pile is destroyed — at least in this particular version.

inherited api «

public: // override public yH api to delegate to h_H intance virtual void* hmalloc(n32 size); // h_H->hmalloc(size); virtual void* hcalloc(n32 n, n32 size); // h_H->hcalloc() virtual void hfree(void* p); // h_H->hfree(p);

     Here yPBH overrides the base yH heap api, implementing these simply by delegating to the source heap passed to the constructor. The problem of freeing all blocks allocated by these methods before destructor ~yPBH() is left to the client application. However, since this api specifies these methods just use the source heap, you can free outstanding blocks via the source heap after the pile is destroyed. (What do you expect? Magic?)

free node pools «

protected: // you must allocate books as refcounted nodes friend class yBn; // able to _hxBn() from yBn::nzeroRefs() void _hxBn(yBn* n); // only called from yBn::nzeroRefs() friend class yPn; // able to _hxPn() from yPn::nzeroRefs() void _hxPn(yPn* n); // only called from yPn::nzeroRefs() friend class yKn; // able to _hxKn() from yKn::nzeroRefs() void _hxKn(yKn* n); // only called from yKn::nzeroRefs()

     The on-zero-refs handlers of each node subclass use methods above to inverse hBn(), hPn(), and hKn() operations, returning the nodes to free lists h_lfB, h_lfP, and h_lfK inside the pile. These private free methods balance the public alloc methods.

     But nodes aren't literally freed — they go into free pools for re-alloc, and the generation of each node changes both on free and re-alloc, so stale node refs due to bugs will be exposed by handles checking when refs go up and down.

node allocation «

     Books, pages, and kilos are allocated by getting a first ref to their associated nodes; each ref is returned in a handle passed as an argument. The object returned will remain allocated as long as you maintain at least one ref in a handle somewhere. When the last handle departs, your book, page, or kilo will be freed and made available to the next caller who allocates a new node.

public: /// \brief alloc book in new node: firstRef gets first ref yBn* hBn(yht<yBn>& firstRef, u16 id); yBn* hcBn(yht<yBn>& firstRef, u16 id); // book bytes zeroed yPn* hPn(yht<yPn>& firstRef, u16 id); yPn* hcPn(yht<yPn>& firstRef, u16 id); // page bytes zeroed yKn* hKn(yht<yKn>& firstRef, u16 id); yKn* hcKn(yht<yKn>& firstRef, u16 id); // kilo bytes zeroed

     Each of the methods above allocate an object of the desired type from a free list of books, pages, or kilos, with a brand new generation number for the node placed in the handle passed to receive the first node ref. If no space remains in the pile, a new hunk of 8MB will be allocated from the source heap passed to the constructor, and this hunk is split into books, pages, kilos, and nodes as necessary to satisfy subsequent requests.

     If you avoid calling inherited hmalloc(), hcalloc(), and hfree() methods yourself, the source heap will see only calls to malloc and free 8MB hunks later sub-divided to satisfy these methods.

     If the source heap ever returns nil when failing to alloc memory, the methods above can return a nil pointer and will otherwise fail gracefully without corrupting the pile.

     This pile will try to alloc as much memory as you request, so if you want a max footprint imposed, you need to impose this limit yourself. Limits aren't part of this api.

     The 16-bit id argument is used to annotate the n_16 field of any yBn, yPn, or yKn node allocated. This is intended mainly to color information shown when a yPBH heap is dumped, so you can see what objects were used for what sort of purpose. Otherwise this 16-bit "type" field doesn't have any enforced semantics. You can see objects that yPBH allocates for internal use classified by arbitrary 16-bit IDs it uses to color them.

node query «

/// \brief search for K, P, C, or B allocated by yPBH: /// \param p is any address used as a key /// \param nh gets a ref for returned node if one is found /// \return node for containing p if one is present in map yBn* hfindB(yht<yBn>& nh, void* p); // nil if unknown B yPn* hfindP(yht<yPn>& nh, void* p); // nil if unknown P yKn* hfindK(yht<yKn>& nh, void* p); // nil if unknown K

     If a book, page, or kilo is currently used and still allocated, you can look it up with a find method below, which gives you a new ref in the handle you pass. You can safely pass any pointer address you like, because if the address does not fall inside a used object, you'll simply get a nil pointer returned to say "not found."

     Only currently used objects can be found. You can't find free or internal-use heap objects this way.

pile printing «

struct Hq { « yPBH const& q_h; Hq(yPBH const& h): q_h(h) { } }; Hq quote() const { return Hq(*this); } // to request dump « void hprint() const; // dump to stdout via yout void hdump(yo& o) const; void hcite(yo& o) const; }; // class yPBH

     The printing api closely resembles that used in most demos. Please see the out and quote demos for further explanation of printing api and object quoting.

inline yo& operator<<(yo& o, yPBH const& x) { « x.hdump(o); return o; } inline yo& operator<<(yo& o, yPBH::Hq const& x) { « x.q_h.hdump(o); return o; } inline yo& operator<<(yo& o, yct<yPBH> const& x) { « x.c_t.hcite(o); return o; }

     A sample of pile printing appears in the book demo (cf «).

kilos and chapters «

     "Where's the code for yK kilos?" Stu wondered. "Does it look like code in the page and book demos?"

     "Yes," Wil confirmed. "The code is so similar it seems a waste to add a new demo just for 1K kilos like pages and books."

     "So I can clone yK and yKn from yP and yPn ... just by cut-and-paste?" Stu asked doubtfully. "Sounds too easy."

     "That's what I did," Wil shrugged. "I only needed to change yK::kKsz to say 1024, then change every P to K."

     "But then you also had to add new kilo methods to the pile api shown here, right?" Stu prompted.

     "Yes, and that required a little thinking," Wil agreed. "But now it should be easy to add the next format by cloning the yK kilo code, just making tiny changes to account for new sizes."

     "But you'd still need to be very careful," Stu objected. "Newly hacked code is sharp as a scapel: you can get cut."

     "Whatever," Wil shrugged. "Folks who can't do it shouldn't be in the business. It's not rocket science."

     Stu drummed his fingers. "You say that like you mean it's 'only memory management' like that's easy," he observed.

     "It is easy," Wil nodded. "Just incredibly tedious, and that's why folks make mistakes: no damned patience!"

     "Don't get worked up, dude," Stu smiled.

     "Why are you still here?" Wil asked. "What do you want?"

     "You mentioned chapters earlier," Stu recalled.

     Wil sighed. "Yeah, I plan to add a yC 16K chapter format next," Wil confirmed. "But I won't do that until I need a chapter instance. Until then it'd be writing code I don't need today."

     "So it's just part of design space now?" Stu restated.

     "Yeah, and if you ask my schedule for coding it, I'll have to hurt you," Wil joked. "Never ask when? for free code."

     Stu rolled his eyes up sharply to suggest prevarication: "I'd never dream of it. I wasn't going to ask that."

     "Good," Wil smiled. "It's funny how looking up is universally considered a sign of lying, even when that conclusion is wrong. It also means you're thinking, or remembering, or trying to find a way to explain a complex idea."

     "You look up when you're remembering?" Stu asked with a sideways look. "Cyborg data access mode?"

     "Very funny," Wil said to Stu's emerging smirk. "My memory is almost entirely visual. And it's somewhat impaired when I'm talking because verbal process interferes with my visual thinking. At least, it's hard for me to get visual mode completely turned on without partially disengaging from full low latency chat."

     Stu looked askance. "You're strange," he teased. "Does this mean you do have Asperger's after all?"

     "No," Wil glared. "It's just full memory overrides my vision a bit, and I can't really see your eyes when I'm off ransacking mental files. I look away to say, 'I cannot follow what your eyes say for a moment.' Seems like simple polite behavior."

     "Other folks can't access their memory the way you do," Stu explained. "There's no other space except invention from whole cloth when lying — at least in stupid people, and there's a whole lot more of them around than smart folks."

     "So you're telling me ... ," Wil prompted.

     "That Bayes' Theorem teaches most folks from observation that looking up is probably lying," Stu explained. "Because that's what it usually is. No one allows for folks like you, Wil. And it doesn't help at all that you're naturally inventive."

     "Wonderful," Wil muttered glumly.

A submenu for demos appears below, letting you go to the page on a topic written as a demo (as the demos page defines it).

menu

     thorn: todo, names, fd, iovec, assert, log, run, hex, crc, buf, in, out, quote, escape, compare, file, deck, cow, arc, blob, tree, slice, rand, time, stat, hash, heap, node, primes, page, book, pile « Þ, stack, atomic, lock, mutex, thread, map, meter, list, iter, ctype

     (mu: toy, peg, imm, tag, box, symbol, token, number, bigint, class, method, reader, writer, eval, env, vm, gc, world, pcode, compiler, asm, lathe, lisp, smalltalk, design, weight, jar, card, harp, debug, profile)

     Some demos are stubs: todo is a demo guide. See toy for mu updates on language pages; names introduces naming schemes.

pile methods «

     It's tempting to put explanations inside methods shown below, instead of afterwards like normal. But this would make some methods long enough not to fit on a screen, and generally Wil aims to see as much code as possible for maximum context. (This has quite a bit to do with general absence of vertical whitespace in Wil's code: space just spreads the code out more without making it more clear.)

yPBH.cpp «

     Non-inline pile methods appear below. Let's start with the single global object initialized at static init time.

yPBH yPBH::g_1PBH; // handy singleton global yPBH instance «

     This yPBH instance named yPBH::g_1PBH is a single well-known global instance constructed at static init time, which uses another global shown in the heap demo: yH::g_1H, which uses ::malloc() and ::free() for allocation. Wil defines g_1PBH partly to show this works, and partly to avoid the issue of when to destroy an instance of yPBH. This one is destroyed at static destruction time — which might not be a good idea because relative order of destruction with yH::g_1H isn't well-defined. Hmm. Maybe yPBH::g_1PBH isn't such a good idea.

     Before Wil added a mutex to make yPBH thread safe, a global instance was tempting fate since some folks assume global scope implies safe for use in multiple threads, when this was not yet true. However, now yPBH is thread safe and g_1PBH can be used by multiple threads.

yPBH::yPBH() : h_H(&yH::g_1H), h_lqve() « , h_lrP(), h_lfP(), h_luP(), h_lhP(), h_nPfuh(0) , h_lrK(), h_lfK(), h_luK(), h_lhK(), h_nKfuh(0) , /*h_lrB(),*/ h_lfB(), h_luB(), h_lhB(), h_nBfuh(0) , h_htag(e_hlive), h_Kpv(0), h_Ppv(0), h_Bpv(0) { // init doesn't happen until first used to get page or book // (avoid big static init time constructor allocations) }

     Pile constructors notably avoid allocating any memory during construction, just in case the source heap isn't ready to be used. For example, global base source heap yH::g_1H might not be constructed yet when the constructor above executes for yPBH::g_1PBH.

yPBH::yPBH(yH* heap) : h_H(heap), h_lqve() « , h_lrP(), h_lfP(), h_luP(), h_lhP(), h_nPfuh(0) , h_lrK(), h_lfK(), h_luK(), h_lhK(), h_nKfuh(0) , /*h_lrB(),*/ h_lfB(), h_luB(), h_lhB(), h_nBfuh(0) , h_htag(e_hlive), h_Kpv(0), h_Ppv(0), h_Bpv(0) { // init doesn't happen until first used to get page or book // (avoid big static init time constructor allocations) }

     No space is allocated until first use.

yP* yPBH::_halignP(void* hunk, n32 n, yv& before, yv& after) { « // cuts hunk of pages into aligned & nonaligned parts // \a hunk: start of N=npages non-aligned pages, kPsz each // \a before returns leading chunk cut to starting alignment // \a after returns trailing hunk of kPsz-before.v_n bytes // \return npages-1 aligned to yP, waste in before & after // figure: // kPz==10 & hunk==13: lowBits==3, before.m_n==7, Pv==20 // |<------------ yP::kPsz ---------->| // |<-- yP::kPsz - low -->|<-- low -->| after.m_n==low u8* Pv = (u8*) hunk; // page vector: aligned before return yP::p_t addr = (yP::p_t) hunk; // ptrdiff_t for arithmethic yP::p_t lowBits = addr & yP::klow; // low bits not aligned if (lowBits) { // not aligned? add complement of lowBits? u8* p = Pv; Pv = (u8*) (yP::kmask & (yP::p_t)(p + yP::klow)); // next before.vinit(p, Pv - p); // before 1st aligned page at Pv u8* end = p + (n * yP::kPsz); // 1 u8 past end of hunk after.vinit(end - lowBits, lowBits); // before complement if (yunlikely(lowBits + before.v_n != yP::kPsz)) { // ?? yellf(__LINE__,__FILE__, "_halignP() low=%#x before.v_n=%#x kPsz=%#x", (int) lowBits, (int) before.v_n, (int) yP::kPsz); } } else if (h_lrP.lsize() < 128) { // need raw space? // all of leading aligned P for before; empty after after.vclear(); before.vinit(Pv, yP::kPsz); // all of 1st page Pv += yP::kPsz; // skip first page } else { // nothing returned in before and after before.vclear(); after.vclear(); } if (yunlikely(yP::klow & (yP::p_t) Pv)) yellf(__LINE__,__FILE__, "_halignP() failed"); // when non-nil, after.v_p also has page alignment, versus // before.m_p with min original alignment from hmalloc() return (yP*) Pv; // lowBits or complement MUST be min 2K }

     Method _halignP() finds aligned yP pages inside the array of n possibly non-aligned pages starting at address hunk. Generally speaking, the first (valid) aligned yP is returned as method value; out parameters before and after return contiguous memory fragments that had to be skipped due to lack of page alignment.

     However, if address hunk is already page aligned and the h_lrP list of raw yPn page nodes is low, then the first aligned page is returned in before and the second page address is returned as method value. This is done because space for raw nodes is always needed to create yPn nodes used to manage yP pages. (It might not be necessary to do this, but reasoning about it gets complex. By only doing this when length of h_lrP is small, we avoid making more raw nodes than we'll ever need, just from allocating new hunks divided into books.)

yB* yPBH::_halignB(void* hunk, n32 n, yv& before, yv& after) { « // cuts a hunk of books into aligned & nonaligned parts // \a hunk: start of N=nbooks non-aligned books, kBsz each // \a before returns leading chunk cut to starting alignment // \a after returns trailing hunk of kBsz-before.v_n bytes // \return nbooks-1 aligned to yB, waste in before & after // figure: // kBz==10 & hunk==13: lowBits==3, before.m_n==7, Bv==20 // |<------------ yB::kBsz ---------->| // |<-- yB::kBsz - low -->|<-- low -->| after.m_n==low u8* Bv = (u8*) hunk; // book vector aligned before return yB::p_t addr = (yB::p_t) hunk; // ptrdiff_t for arithmethic yB::p_t lowBits = addr & yB::klow; // low bits not aligned if (lowBits) { // not aligned? add complement of lowBits? u8* p = Bv; Bv = (u8*) (yB::kmask & (yB::p_t)(p + yB::klow)); // next before.vinit(p, Bv - p); // before 1st aligned book at Bv u8* end = p + (n * yB::kBsz); // 1 u8 past end of hunk after.vinit(end - lowBits, lowBits); // complement before if (yunlikely(lowBits + before.v_n != yB::kBsz)) { // ?? ylog(0, "_halignB() low=%#x before.v_n=%#x kBsz=%#x", (int) lowBits, (int) before.v_n, (int) yB::kBsz); } } else { // all of leading aligned B in before; empty after after.vclear(); before.vinit(Bv, yB::kBsz); // all of leading aligned B Bv += yB::kBsz; // skip first book } if (yunlikely(yB::klow & (yB::p_t) Bv)) yellf(__LINE__,__FILE__, "_halignB() failed"); // when non-nil, after.v_p also has book alignment, versus // before.m_p with min original alignment from hmalloc() return (yB*) Bv; // lowBits or complement MUST be min 32K }

     Method _halignB() finds aligned yB books inside the array of n possibly non-aligned books starting at address hunk. Generally speaking, the first aligned yB is returned as method value. Out parameters before and after return contiguous memory fragments that had to be skipped due to lack of book alignment.

     However, if address hunk is already book aligned, then the first aligned book is returned in before and the second book address is returned as method value. This is done because space for pages and yBn book nodes to manage yB books themselves are needed, and this ensures such ancilliary state has been set aside for this purpose, starting from the very first 8MB hunk allocation.

     In current code, the first 8MB hunk also provides a half book array of pointers for the h_Bpv book map, which needs 32K of space. No matter how bytes are skipped for alignment when one book is cut into before and after runs of bytes, one of the two — either before or after — must be at least 32K in size since a half and half division is the smallest the larger can get; this guarantees contiguous 32K for h_Bpv.

yPn* yPBH::_hnewP() { // alloc Ps, add to free list; pop one « yBn* b = this->_hpopB(); // get a book to dice into pages // careful: if _hpopB() worked, other lists can be empty if (!b) { b = this->_hnewB(); } // alloc more and pop again if (yunlikely(!b)) { ylog(0,"yPBH::_hnewP() _newB failed"); return 0; } if (yunlikely(!b->Bnfree())) { // book not free?? b->Bnbadfree("yPBH::_hnewP"); // how can this happen? } b->n_fuh = yBn::e_nheap; // show internal heap usage h_lhB.lpush_back(b); // add to internal heap book list b->n_16 = 0x7067; // 'pg'=7067 shows book contains pages yB* B = b->n_B; u8* pv = (u8*) B; // page vector using u8 address arithmetic if (yunlikely(!B->Bgood())) { // not aligned?? recover B->Bbad("yPBH::_hnewP"); // can't occur; make pages anyway yv vbook = b->asv(); this->_hrawPages(vbook); return this->_hpopP(); // just follow raw path } n32 npages = yB::kBsz / yP::kPsz; // aligned pages in book yqe* rpe = h_lrP.lpop(); // check old raw yPn node list if (!rpe) { // raw list empty? turn 1 page into raw nodes? yv rawnodes(pv, yP::kPsz); // entire page for raw nodes pv += yP::kPsz; // first page is raw node space this->_hpushrawPn(rawnodes); // add 1 page worth to h_lrP --npages; // 1 page lost to nodes: 1 less for free list // now we can't exhaust h_lrP without a free page for more } else { // can safely put page in free list for _hpoprawPn() yP* P1st = (yP*) pv; if (yunlikely(!P1st->Pgood())) P1st->Pbad("yPBH::_hnewP()"); yPn* nd = new(rpe) yPn(this, P1st, /*id*/0); // construct h_lfP.lpush_back(nd); // make free list of pages nonempty this->_haddP(*nd); // push each new yPn in hashmap bucket pv += yP::kPsz; // start of next page aligned to yP addr --npages; // one page already added manually to free list } for (sz_t i = 0; i < npages; ++i) { // h_lfP +=node per page yP* P = (yP*) pv; if (yunlikely(!P->Pgood())) P->Pbad("yPBH::_hnewP()"); void* rawPn = this->_hpoprawPn(); // sizeof(yPn) for new() if (yunlikely(!rawPn)) { ylog(0, "_hpoprawPn() fail??"); return 0; } yPn* nd = new(rawPn) yPn(this, P, /*id*/0); // construct h_lfP.lpush_back(nd); // append node for page P at pv this->_haddP(*nd); // push each new yPn in hashmap bucket pv += yP::kPsz; // start of next page aligned to yP addr } h_nPfuh += npages; // sum list lengths h_lfP, h_luP, h_lhP return this->_hpopP(); // pop lost one added to free list }

     Method _hnewP() is the most complex pile method: it allocates one yP page with associated yPn page node. It's only called in hPn() or other methods when _hpopP() fails to find a free page node in the h_lfP list of free pages. So by definition it's called when we need to allocate new pages from a yB book or a new 8MB hunk if no free books remain.

     If the h_lfB free book list is empty and a new 8MB hunk is allocated and chopped into books, pages, and nodes, this result is actually the simple case because suddenly both page and space nodes are in hand. The hard case occurs when popping the free book list succeeds, because in this case it's possible all other lists are empty.

     Allocating a new yB book can fail only when the source heap h_H refuses to allocate a new 8MB hunk. In this case _hnewP() returns nil. But when we do get a yB book to use for new pages, we dice it up into page aligned yP pages.

     As long as the h_lrP list of raw yPn page nodes contains at least one node, we can add at least the first page in the book to the h_lfP list of free pages. Because when we add the second page, _hpoprawPn() cannot fail to allocate a raw yPn page node because it can always dice the first page put in the h_lfP free page list into raw nodes. So the only complex case occurs when the h_lrP list of raw page nodes is empty at the very start. In this case we manually take the first page in the book and turn it into raw page nodes using _hpushrawPn(). After this, none of the rest of _hnewP() can fail because enough raw nodes are guaranteed to push all the remaining pages in the free list.

     Every new free yP page is pushed by its yPn page node into the h_lfP list of free pages. And each page pushed is also added to the h_Ppv page map by calling _haddP(), which pushes each new page on top of the bucket to which the page address hashes.

yBn* yPBH::_hnewB() { « // alloc books & add them to free list; pop & return one // parts identical to yPBH::_hinit() as marked: SECTION A { n32 nbooks = khunkBooks; // count of full books in hunk sz_t bytes = nbooks * yB::kBsz; // check expected value: if (yunlikely(bytes != (8 * 1024 * 1024))) // not 8MB? fprintf(stderr,"bytes=%#x\n", bytes); void* hunk = h_H->hmalloc(bytes); // 8MB for 128 * 64K each if (yunlikely(!hunk)) { // can't get a big piece of memory? ylog(0, "yPBH::_hnewB() cannot alloc %#lx bytes\n", (long) khunkBytes); return 0; // EXIT without allocating new books } yv before; yv after; // sum: yB::kBsz nonaligned bytes yB* Bvec = this->_halignB(hunk, nbooks, before, after); yv* hi = &before; yv* lo = &after; // assume before is larger than after if (after.v_n > before.v_n) { hi = &after; lo = &before; // fix if wrong } yqve* e = (yqve*) lo->vtake(sizeof(yqve)); // try alloc if (!e) // must alloc from big end (with almost 64K?) e = (yqve*) hi->vtake(sizeof(yqve)); // must succeed e->e_v.vinit(hunk, khunkBytes); // describe hunk for h_lqve h_lqve.lpush_back(e); // hunk freed later in yPBH::~yPBH() u8* bv = (u8*) Bvec; // simplify our u8 address arithmetic // } SECTION A (_hinit() and _hnewB() common sections) --nbooks; // one lost to before and after when aligning this->_hshelve(bv, nbooks, hi, lo); // add nbooks as nodes return this->_hpopB(); // just added book from B free list }

     Method _hnewB() allocates one yB book with associated yBn book node. It's only called in hBn() or other methods when _hpopB() fails to find a free book node in the h_lfB list of free books. So by definition it's called when we need to allocate new books from a new 8MB hunk block allocated from the source heap. Allocating a new book can only fail if the source heap h_H refuses to allocate another 8MB block of memory, causing _hnewB() to return nil.

     Once a new 8MB hunk of memory is allocated, the hunk is divided into books and non-book-aligned leading and trailing fragments using _halignB(), which always divides at least one book's worth into before and after fragments — alignment or no — to ensure space is set aside for pages, nodes, and other list elements.

     Then one yqve element is allocated from the before or after fragment, which gets pushed into the h_lqve list of all hunks allocated to represent this new hunk instance. We try to get the yqve instance from the smaller fragment first, only falling back on the larger when the smaller is too tiny. (This guarantees the larger remains at least 32K.)

     Finally, _hshelve() puts away books in the h_lfB list of free books, and then divvies up the remainder of before and after fragments as pages and nodes. The last line calls _hpopB() to pop a free book node, now guaranteed to be present.

void yPBH::_hinit() { « // init on demand after construction SECTION A { if (yunlikely(!h_H || h_Ppv)) return; // noop if heap missing, or if init done already n32 nbooks = khunkBooks; // count of full books in hunk sz_t bytes = nbooks * yB::kBsz; if (yunlikely(bytes != (8 * 1024 * 1024))) // not 8MB? fprintf(stderr,"bytes=%#x\n", bytes); void* hunk = h_H->hmalloc(bytes); // 8MB for 128 * 64K each if (yunlikely(!hunk)) { // can't get a big piece of memory? ylog(0, "yPBH::_hinit() cannot alloc %#lx bytes\n", (long) khunkBytes); return; // EXIT without allocating new books } yv before; yv after; // sum: yB::kBsz nonaligned bytes yB* Bvec = this->_halignB(hunk, nbooks, before, after); yv* hi = &before; yv* lo = &after; // assume before is larger than after if (after.v_n > before.v_n) { hi = &after; lo = &before; // fix if wrong } yqve* e = (yqve*) lo->vtake(sizeof(yqve)); // try alloc if (!e) // must alloc from big end (with almost 64K?) e = (yqve*) hi->vtake(sizeof(yqve)); // must succeed e->e_v.vinit(hunk, khunkBytes); // describe hunk for h_lqve h_lqve.lpush_back(e); // hunk freed later in yPBH::~yPBH() u8* bv = (u8*) Bvec; // simplify our u8 address arithmetic // } SECTION A (_hinit() and _hnewB() common sections) u8* pageBuckets = bv; bv += yB::kBsz; // alloc 1st book ::memset(pageBuckets, 0, yB::kBsz); // all nil pointers h_Ppv = (yPn**) pageBuckets; // 1st book is page hashmap u8* kiloBuckets = bv; bv += yB::kBsz; // alloc 2nd book ::memset(kiloBuckets, 0, yB::kBsz); // all nil pointers h_Kpv = (yKn**) kiloBuckets; // 1st book is page hashmap sz_t bookMapSize = h_Bslots * sizeof(yBn*); // < kBsz/2 u8* bookBuckets = hi->vtake(bookMapSize); // lion's share ::memset(bookBuckets, 0, bookMapSize); // all nil pointers h_Bpv = (yBn**) bookBuckets; nbooks -= 3; // one for each hashmap (half for book hashmap) this->_hshelve(bv, nbooks, hi, lo); // + nbooks in yBn nodes }

     Method _hinit() lazily initializes the pile when the first allocation occurs. Page map h_Ppv indicates whether lazy init by _hinit() is still needed. Methods check h_Ppv for zero to trigger lazy init.

     A large leading block of code is enclosed in comments marking this code as SECTION A to emphasize this part it the same as that in _hnewB() above. (This could be put in a separate method, awkwardly, at further cost of making _hinit() and _hnewB() even harder to understand.) First you should read comments about _hnewB() up to where it diverges with _hinit() after the section in common.

     Instead of shelving all the books and and pages in fragments before and after like _hnewB() does, _hinit() first removes enough space for all three hashmaps: h_Ppv to map page addresses, h_Kpv to map kilo kilo address, and h_Bpv to map book addresses.

     The page and kilo hashmaps get one whole book apiece as an array of map buckets. In contrast, the book map currently gets only half a book, allocated from the larger of fragments before and after, one of which must be at least 32K in size.

     Finally _hinit() calls _hshelve() to put away remaining books, pages, and nodes in free lists, the same way _hnewB() does, but with fewer books and pages when hashmap space is removed.

void yPBH::_hshelve(u8* bv, n32 nbooks, yv* hi, yv* lo) { « // put books away from _hinit() or _hnewB(); alloc B nodes // for N books starting at bv, using space in spare hi vec if (hi->v_n < lo->v_n) { yv* t = hi; hi = lo; lo = t; // swap hi if small } if (yunlikely(hi->v_n < yB::kBsz/4)) yellf(__LINE__,__FILE__,"_hshelve() hi < B/4 ??"); sz_t bnSize = (nbooks * sizeof(yBn)); // sz for free nodes yBn* rawNodes = (yBn*) hi->