|
problem
symbol — a symbol is a kind of gc box strongly related a strings, typically used to identify other objects more efficiently than strings by using box address instead of content bytes inside as the key in some map. This page covers box formats and maps supporting symbols in toy programming language lathe: a lisp dialect plus smalltalk class system (using þ library under a mu-babel license — see mu for context). For purposes of use in Lisp and Smalltalk, it's only necessary that two symbols with the same name are identically the same object (with special exceptions allowed for uninterned symbols with different identity from any other symbol, whether or not names are the same). In Lisp dialects like Scheme, an unquoted identifier like lambda is understood to be a symbol, whereas "lambda" is a string. In Smalltalk, a literal written #lambda is a symbol, and 'lambda' is a string. (And awkward symbol names containing non-identifier letters can be understood as symbols anyway by quoting. So for example, to embed a space in a symbols one can write |foo bar| in Lisp and #"foo bar" in Smalltalk, and Lathe will support all these variants.) However, most of this page doesn't concern symbols as they appear in either Lisp or Smalltalk; instead a C++ representation for symbols in memory is described. Also described is an efficient hashmap for representing canonical instances of symbols with a given name: an intern table in traditional Lisp jargon.
summary
«
The format of symbols is deliberately strongly tied to that of strings, so code can use either as text. Lathe's primitive string box, ysb, uses pointer plus length format with string address pointing to the first octet in a string, and with length held in the tag preceding the string. A totally gratuitous redundant null byte follows content of a string, to be friendly with C runtimes, and to act as a sentinel detecting overwrites. In effect, a primitive Lathe string looks like a C string, but with a dynamic type tag in 16 bits of the immediately preceding four bytes. Any null byte found inside is not null termination; if you use null as a length marker, you'll have bugs: it's wrong. Strings contain octets, not characters — neither charset nor codepoint size are specified, and both are imposed from without at a higher, non-standard level. Lathe's primitive symbol box, yyb, has format identical to ysb strings with a few minor exceptions, so both symbols and strings can be used as text objects — sequences of octets — without awkward games to force similarity. Exceptions making symbols differ from strings include the following:
The purpose of a hash box before each symbol is twofold: 1) to cache a collision resistant hash better then a symbol's address, and 2) to provide hashmap bucket list linkage without breaking format compatibility with ysb strings. In some revs of Wil's Lathe runtime, immutable symbols might all be packed together in shared pages in book instances devoted to immutable objecdts, which are mapped readonly (each time a page fills) to enforce the cooperative yt::e_c const bit in yt tags with greater vigor.
hash and symbol boxes
«
The remainder of this column originally appeared on box with other gc box formats. yvh32 « "Class yvh32 is just a utility — it's not a box," Wil said. "It belongs in the crc demo, and uses the yh32 api there by way of yv::vhash() to use crc32." struct yvh32 { // yv + hash (not yv subclass on purpose) «
// has-a yv, instead of is-a yv: make them distinct types
yv h_v; // copy of original yv «
h32 h_32; // crc32-based hash of h_v «
"Why does it appear here?" Stu asked. "To keep it close to where it's used? Do hash boxes use it?" "Yes to both," Wil said. "The yvh32 constructor uses crc32 to hash the byte run; the value lands in member h_32. It's a way to use types to ensure hashing gets done." "Your hash demo shows why you like using crc32," Stu observed. "Tell folks they should read that." yhashb « "So this yhashb box is where the crc32 hash ends up?" Zé prompted. "Why the extra b_next pointer member?" "The b_next link is for open/chaining hashmap buckets," Wil explained. "It might be applicable to other objects needing a hash, but I really want this for yyb symbol boxes." // minimal support for hashmaps of symbols kept in ysb format
// yhashb holds a hash for yyb, and a link for making lists
struct yhashb { «
static const int kind = yt::k_hash; «
yhashb* b_next; // linked list of hashes (this is a gc peg) «
u32 b_h32; // aims to be zlib crc32() hash of a symbol «
"Isn't it awkward to bundle a next link with a hash like that?" Stu asked. "Doesn't it couple state too much?" "Too much for what?" Wil asked. "It's only barely a hack: it doesn't say what sort of object uses a hash and link." yhashb() : b_next(0), b_h32(0) { } «
yhashb(yhashb* p, u32 h) : b_next(p), b_h32(h) { }
yhashb(yhashb* p, yvh32 const& h)
: b_next(p), b_h32(h.h_32) { }
yhashb(yhashb* p, yv const& v)
: b_next(p), b_h32(v.vhash()) { }
yhashb(u32 h, yhashb* p) : b_next(p), b_h32(h) { }
yhashb(yvh32 const& h, yhashb* p)
: b_next(p), b_h32(h.h_32) { }
yhashb(yv const& v, yhashb* p)
: b_next(p), b_h32(v.vhash()) { }
"Wow, that's a lot of constructors," Stu murmurred. "Why so many? Does it make other code simpler?" "I like having one I want in hand already," Wil said. "It does reduce hoops to jump through when using it. As inlines, cost is negligible. And the api spec is better." const yt& btag() const { return ((const yt*) this)[-1]; } «
enum { e_bk = yt::k_hash,
e_bt = ((sizeof(yp)*2)<<16)|yt::e_c|e_bk
};
bool bkind() const { «
return ((const yt*) this)[-1].iskind(e_bk);
}
"Hey, look," Zé pointed. "The e_bt box tag for yhasbb includes the yt::e_c const bit. Does it limit client choice?" "Yes, so far I intend to require a hash box be immuable," Wil said. "That fits use with const symbols better. I won't change it unless I see reason later." bool bt() const { return e_bt == ((const u32*) this)[-1]; }
void badt(const char* where) const; // on false bt() is
}; // yhashb must precede yyb, but yyb need not follow yhashb
"It's disturbing the value to be hashed isn't also in the box," Stu worried. "Does it come after?" "It does in the case of yyb symbol boxes shown below," Wil qualified. "I have no other use for yhasbb yet; I might not bother using it again. For now it's a symbol helper." "So every hash box must be followed by a symbol?" Stu asked. "That seems awkward." "That's backwards," Wil corrected. "Every symbol must be preceded by a hash box, but not the reverse." "I guess a gc engine can preserve that easily enough," Zé considered. "Any time a collectable symbol is moved, gc can also ensure the preceding hash goes with it." "Aren't all symbols collectable?" Stu asked. "Or are all symbols not collected? I'm confused." "Symbols can be interned in a symbol table or not," Wil explained. "And they can be in gc space or not. Normally I'd prefer symbols not live in gc space." "But I guess you don't have a choice in apps that dynamically create more symbols," Zé suggested. "A sandboxed app must not create arbitrary symbols not subject to gc." "That's right," Wil confirmed. "Otherwise denial of service would be possible in untrusted apps. A sandbox would be able to consume memory that's not reclaimed." "You're getting ahead again," Stu sighed. "Are you doing it on purpose? Why not save this for the yyb box below?" Wil half-smiled. "An implicit message is this: things aren't tidy. It's not safe to read about symbols in one place only. You have to read the entire page. And other pages, too." "If you want to work on the C++ code, that is," Zé qualified. "A Lisp and Smalltalk view won't expose complexity." Wil squinted in thought. "I might need to make it possible to ask more about symbols in Lisp. But not required." yhashb.cpp « "Non-inline methods for a yhashb only involve logging errors," Wil dimissed. "This reports invalid tags." void yhashb::badt(const char* where) const { «
// call when bt() is false
ylog(1, "yhashb::badt(at=%s) tag=%#x not e_bt=%#x\n",
where, (int) ((const u32*) this)[-1], (int) e_bt);
}
yhashb_t « "The following yhashb_t struct is a kludge for seeing distance between a hash box and the following symbol," Wil explained. "Every box is separated by a yt tag." "Oh, I see," Zé chimed. "Since each symbol is preceded by a hash box, you use sizeof(yhashb_t) as pointer offset." // sizeof(yhashb_t) is distance from yyb to preceding yhashb
struct yhashb_t : yhashb { // support for yyb::hash() «
yt b_t; // tag between yhashb and the subsequent yyb
};
"Member b_t represents the tag after the hash box and before the symbol box?" Stu asked. "Oops, I see that's exactly what the comment says." "Dex would complain a _t suffix on the yhashb_t means 'type' and should be a synonym for yhashb," Zé observed. "Well here he'd be wrong," Wil grinned. "The underscore separates a tag+box pair, and the t means tag." "I'm just sayin'," Zé mollified. "Don't get mad." "I'm not mad," Wil said calmly. "Names I use are consistent. Mostly. I just cut a few corners." yyb « "You mentioned yyb is a symbol box so many times already," Zé rubbed his chin, "it no longer looks as cryptically short. Was that a deliberate tactic? You sly dog." "Ah, yes," Wil said. "Not sneaky — just practical. The first y means thorn and the second y means symbol. The final b means box. I wanted a short name for symbols." "I get it," Stu nodded. "Most of your global code names begin with y to say it's a global symbol. It's a bit of a pun." "Let's go on," Wil suggested. "Yes, it's a good mnemonic. But at heart I think names are arbitrary." // yyb looks like ysb with mandatory hash and is always const
// yyb is identical in format to ysb, but preceded by yhashb
struct yyb { «
static const int e_bk = yt::e_c|yt::v_sym; // yt::e_c=CONST «
u8 b_u8[4]; // first four of this->len() u8 slots «
"Member b_u8 is the array of octets in a symbol," Wil introduced. "The number of octets in the symbol is in the high 16 bits of the tag. The first four are reserved here." "That you say octets here instead of characters is telling," Zé said. "Can symbols be anything? Even Unicode?" "Physically any byte sequence, yes," Wil nodded. "But the toy parser is still going to think keywords, for example, are octet sequences of ascii names." "No problem, just checking," Zé assured. "I wanted to confirm infrastructure doesn't require ascii." "Okay, well add two more details to your symbol model's physical representation," Wil said. "First, physical length of a box must be a multiple of four bytes to maintain alignment, so allocated space is rounded up. Four for b_u8 is a minimum." "Okay, and the second detail?" Zé prompted. "By convention," Wil continued, "each symbol is followed by a null byte, which is not part of the octet sequence. This is meant to be friendly to C runtimes, especially when debugging." "You can use it to help detect corruption too," Zé suggested. "For length N, you can assert the byte at index N is zero." "Yes, but it's not null termination," Wil reminded. "Symbols can have null bytes inside." u32 vblen() const { return ((const yt*) this)[-1].len(); } «
u32 blen() const { return ((const yt*) this)[-1].len(); } «
bool bkind() const { «
return ((const yt*) this)[-1].cvkind() == (u32) e_bk;
}
"So blen() is symbol length in octets," Stu restated. "Yes, logical length of yyb is blen()," Wil elaborated. "And the physical length is at least one more, then rounded up to a multiple of four. And gc must do this to skip each symbol." "How do you figure total bytes needed?" Stu asked. // bmore() says extra size needed not already in b_u8[4];
// extra allocation bytes needed after first sizeof(yyb)
static n32 bmore(n32 n) { return n & ~((n32)3); } «
static n32 bmore(yv const& s) { return s.v_n & ~((n32)3); }
"Right now I use bmore() to reckon number of extra bytes needed after sizeof(yyb)," Wil indicated. "It drops the low two bits of the size, rounding down to a multiple of four." "Hmm," Zé considered. "And since b_u8[4] covers three content bytes and a trailing null byte, that never fails to be enough." "But doesn't it add an extra four bytes when length is exactly a multiple of four?" Stu asked. "Wasting four bytes?" "Well, no," Wil said gently. "After N content bytes, you need one trailing null byte, which must be rounded up to four." "Ah," Stu slapped his head. "Trickier than I thought." const yt& btag() const { return ((const yt*) this)[-1]; } «
bool bescaped() const { «
return ((const yt*) this)[-1].isescaped();
}
"Okay, what's an escaped symbol?" Stu asked. "Does it affect printing again like it does escaped pairs?" "Yes," Wil confirmed. "It means a symbol name contains a value needing an explicit delimitor for a parser to read correctly." "You can capture that right when a symbol is created, can't you?" Zé noticed. "By scanning a symbol at instantiation?" "That's right," Wil nodded. "And yesc() above returns either zero or escape bit yt::e_e for the symbol's tag. I guess it should be named besc(). And it works only on ascii." "Huh?" Zé paused. "Oh, you're saying the escape bit only means a symbol must be escaped when it's being written for an ascii reader. I guess it depends on round trip method." "Yeah," Wil agreed. "And if you're using Unicode symbols, I figure you don't care how slow the system runs and I don't worry about optimizing print speed." "So for example," Zé wrote on a piece of paper, "if I wanted quotes in my symbol name the parser needs to use syntax |"foo"| to make a symbol whose name is "foo"?" "That's right," Wil nodded. "Then yesc() returns the escape bit when it sees a " quote inside the symbol name. So later it prints once again as |"foo"| with vertical bar quotes." "Cool," Zé smiled. "Sorry, let's go on." yyb() { *b_u8 = 0; } // just final null for empty vector «
yyb(yv const& s) {
// caller ensures at least s.v_n+1 u8s allocated for this
b_u8[s.v_n] = 0; if(s) ::memcpy(b_u8, s.v_p, s.v_n);
}
"Constructors set symbol name and trailing null," Wil commented. "But it's up to a caller to make sure a symbol is big enough with sizeof(yyb)+bmore(s.v_n) bytes." "Method h2y() is a non-inline dynamic cast from hash box to symbol box," Wil said. "It returns nil when anything looks wrong, such as when the following box is not a symbol." static u32 n2t(n32 n) { return (n<<16)|e_bk; } «
// yyb differs from ysb in:
// 1. always const, 2. yhashb always before
"Methods n2t() and v2t() return the symbol's tag needed for a given length or run of bytes," Wil said. static yyb* notb(yp const& p); // log err, ALWAYS return zero
static yyb* mustb(yp const& p) { yyb* b = (yyb*) p.u.p_box; «
return ((3 & (yp::p_t) b) == 0 && b->bkind())? b : notb(p);
}
static yyb* mayb(yp const& p) { yyb* b = (yyb*) p.u.p_box; «
return ((3 & (yp::p_t) b) == 0 && b->bkind())? b : 0;
}
"These notb(), mustb(), and mayb() methods look just like the dynamic casts for ypairb pair boxes," Zé noticed. "Yes, both mayb() and mustb() cast a peg pointer to the symbol box," Wil said. "Both return zero if the peg is not a symbol, and the latter logs an error for non-symbol pegs." const yhashb& bhash() const { // immediately preceding hash «
return ((const yhashb_t*) this)[-1];
}
u32 yhash() const { // hash value in preceding hash box «
return ((const yhashb_t*) this)[-1].b_h32;
}
"Inlines above very cheaply get the symbol's hash value," Wil explained. "It ought to be as cheap as normal member access. Type yhashb_t is only used to get distance right." yv asv() const { «
yv v(this, ((const yt*) this)[-1].len()); return v;
}
operator yv() const { «
yv v(this, ((const yt*) this)[-1].len()); return v;
}
"And inlines above convert a symbol to pointer and length representation using a yv run," Wil gestured. "It's actually kinda cool symbols are also C strings," Stu said. "I guess so," Wil agreed. "It should improve interoperation with C as long as bytes are kept immutable." "What's the maximum length of a symbol?" Zé asked. "Is it smaller than any other string?" "Darn it, I avoided discussing size limits so far," Wil mused. "Um, I think the parser imposes a limit of about 1024 bytes in one symbol name before it's split by a read buffer." "But outside the reader, there's no limit except what will fit inside one book instance?" Zé guessed. "Basically yes, and the tag's length field is only 16 bits," Wil noted. "If a book is bigger than 64K, then a tag's length field is the limit. You'll ask about string limits now, right?" "Hey, yeah," Stu chimed in. "How big can a string get? Is it limited to one book in size? 64K currently?" "As a single contiguous vector of octets, yes," Wil said. "But strings will also use discontiguous iovec formats." "So strings as discontiguous arrays composed of fragments can be as big as I want?" Zé asked. "Yes, but that's a slightly involved feature," Wil warned. "It won't be ready until I code it. Not complex though." bool operator==(yvh32 const& key) const { «
if (key.h_32 == ((const yhashb_t*) this)[-1].b_h32) {
yv myself(this, ((const yt*) this)[-1].len());
return myself == key.h_v; // yv::operator==()
}
return false; // hash or yv bytes themselves don't match
}
"That operator looks interesting," Zé said. "It says whether a symbol equals a yvh32 argument? The key supplied has both hash and byte run, just like the symbol." "That's right," Wil confirmed. "This is used when searching a symbol hashmap to see if a symbol is already present." "So you don't have to stage a dummy yyb for comparison," Zé noted. "You can compare directly to a run where it sits." "Hey, you only compare content bytes when hash is equal," Stu appreciated. "So most of the time you'll only do a byte by byte comparison when a match is likely." "Nice analysis; and length has to match first too if you look at code for yv::operator==()," Wil expanded. struct Yq { yyb const& q_y; Yq(yyb const& y): q_y(y) { } }; «
Yq quote() const { return Yq(*this); } «
void bprint() const; // dump to stdout via yout
void bdump(yo& o) const; void bcite(yo& o) const;
//void bshow(yo& o) const; n32 bcage(n32 max) const;
}; // yyb (symbol box)
"Above and below is the usual C++ debug printing boilerplate," Wil waved his hands. "Can I just point you at the out and quote demos? Lisp printing is separate." "Ah, I was going to ask about the Lisp printer," Stu explained. "Never mind, I'll wait to you reach that part." inline yo& operator<<(yo& o, yyb const& x) { «
o << x.asv(); return o; }
inline yo& operator<<(yo& o, yyb::Yq const& x) { «
x.q_y.bdump(o); return o; }
inline yo& operator<<(yo& o, yct<yyb> const& x) { «
x.c_t.bcite(o); return o; }
yyb.cpp « "Below are yyb symbol box non-inline methods," Wil said. "Let's start with a dynamic cast from leading hash box to following symbol box. Note a symbol need not follow a hash box." yyb* yyb::h2y(yhashb* h) { // hash box to symbol box: «
// (get symbol box after hash box in hashmap)
static const char* here = "yyb::h2y";
if (!ybox::goodb(h)) { ybox::badb(h, here); return 0; }
if (!h->bt()) // bad hash box tag?
h->badt(here); // hash looks invalid?
else {
// skip yhashb and following yt:
yyb* y = (yyb*) (( (yhashb_t*) h ) + 1);
int cvk = (int) ((const yt*) y)[-1].cvkind();
if (e_bk == cvk) // is const, & vector & has symbol kind?
return y; // looks good
else { // looks like corruption (this is pretty bad)
ylog(1, "yyb::h2y() y=%#lx cvk=%#x not e_bk=%#x",
(long) y, (int) cvk, (int) e_bk);
}
}
return 0;
}
"Basically what h2y() does is offset by sizeof(yhashb_t) and check the following box's tag to see if it's a simple," Zé read. "Good summary," Wil agreed. "The following globals are used by yesc() to see if symbol octets need escape when printed." "Is there an initialization order dependency with yum and yutm maps from the ctype demo?" Zé wondered. "Yes, I think there's a static init time order dependency, so it's not safe in separate compilation units," Wil confirmed. "I had most things in one file at one time." "You're kidding," Stu was appalled at the idea of one file. "No, I'm serious," Wil repeated. "I might have to put globals back in one file to ensure good init order." "Okay, yyb_1utm is a predicate equal to isalnum()," Wil said. "It's true for digits, lowercase, and uppercase." "And yyb_1v enumerates more bytes I want to use without escape in symbols" Wil said. "Maybe I need more." "Finally," Wil pointed, "yyb_1noescmap is a predicate containing the union of the two above. It might not init correctly unless the yutm module is ready." "You can make modules lazy init on first use to remove dependency issues," Zé suggested. "You've done that before." "Yes, I have," Wil agreed. "But I hate explaining why that works. I know, I should do it anyway." u32 yyb::yesc(yv const& s) { «
// yt::e_e iff symbol needs escaped printing
return (s.vall(yyb_1noescmap))? 0 : yt::e_e;
}
"Whoa, that's short," Stu marveled. "I guess yv::vall() returns true if every octet is true in the predicate?" "Yeah, not much else to say," Wil agreed. "And notb() below logs failure of dynamic casts mustb() shown earlier." yyb* yyb::notb(yp const& p) { // log err, ALWAYS return zero «
yyb* b = (yyb*) p.u.p_box;
if ((3 & (yp::p_t) b) == 0) // box?
yellf(__LINE__,__FILE__,"yyb::notb(p=0x%lx) is %s",
(long) b, b->btag().tname());
else // immed
yellf(__LINE__,__FILE__,"yyb::notb(p=0x%lx) is immed",
(long) b);
return 0; // nil pointer
}
"Debug printing code below just prints symbols in debug format when dumping symbol hashmaps," Wil explained. "Dude," Zé pointed. "What's that value stored in a symbol tag's alt field — that's seven bits in size, right?" "Uh," Wil tried to remember. "I think symbols that are special form keywords have a nonzero enum value, and it's stored there. An interpreter would use that value in a switch." "You're saying you can justify some class named ykMgbv?" Stu balked. "What does that mean?" "I think it means const machine global box vector," Wil replied. "That would be an array of well known immutable objects of global scope, including special form symbols." "You mean," Stu struggled, "if lambda is a Scheme special form symbol, an enum value meaning lambda is stored in the alt field of the yyb box for lambda in the symbol table?" "That's exactly what I mean," Wil confirmed. "An interpreter can tell a symbol is a special form when a nonzero value appears in the alt field, then use a switch to execute suitable code." "Do I shout 'computed gotos rule' now?" Stu wondered. "Or is it too early to celebrate?" Zé held one hand level and teetered it back and forth like the pans of a balance scale wobbling in near balance. void yyb::bdump(yo& o) const { «
static char yt_tname_tmp[96]; yt_tname_tmp[96= 0;
yb tmpBuf(yt_tname_tmp, 0, 96);
yt t = this->btag();
u32 hash = this->yhash();
yv vsym = this->asv(); // same as operator yv()
if (t.alt()) {
ykMgbv::eboxid e = (ykMgbv::eboxid) t.alt();
o.of("<yyb alt='%s:%d' tag=%s hash=%lx name=\"",
asName(e), (int) e, t.tname(tmpBuf), (long) hash);
}
else
o.of("<yyb tag=%s hash=%lx name=\"",
t.tname(tmpBuf), (long) hash);
o << vsym.escape(); // cf yvw::wescape()
o.o3c('"', '/', '>');
}
void yyb::bcite(yo& o) const { «
yv vsym = this->asv(); // same as operator yv()
if (this->bescaped()) {
o.oc('|'); o << vsym; o.oc('|');
}
else
o << vsym;
}
tyhashb_tyyb « "You were completely serious about naming a struct tyhashb_tyyb to combine both hash and symbol boxes together with their tags?" Zé marveled. "Symbol table use this?" "Yes," Wil sighed. "For a new symbol, yyb::bmore(n) plus size of tyhashb_tyyb is exactly what I need to allocate, including both hash and symbol boxes and their tags." "Didn't I just say that?" Zé pretended to be cross. "You keep using C++ to avoid drudgery. Isn't that poor style?" "You're asking me about a future C rewrite?" Wil asked. "Maybe I'll use toy lathe to generate a new infrastructure." "You're funny," Zé tried to elbow Wil. "You'd have to write a high level assembler in Lisp. Are you serious?" struct tyhashb_tyyb { // format to alloc yyb plus metainfo «
yt t_hash; // yhashb::e_bt «
yhashb b_hash; // tag and box for hash «
yt t_yy; // yyb::v2t(sym) «
yyb b_yy; // actual bytes in the symbol's cord «
tyhashb_tyyb(u32 h, yhashb* next, yv const& sym);
tyhashb_tyyb(yvh32 const& sym);
tyhashb_tyyb() { }
};
tyhashb_tyyb.cpp « "You know what I hate about classes like tyhashb_tyyb?" Stu asked. "They declare stuff without showing what happens, even if it's a little interesting." "To see what happens," Wil suggested, "you should read the yyb symbol box constructor shown above — click on link yyb. Code below just says 'do it' and can't be faulted for brevity." "The don't-repeat-yourself principle implies you can't see things done in each place," Zé reminded. "You're really just saying you don't like C++ constructors because they say little." tyhashb_tyyb::tyhashb_tyyb(yvh32 const& sym) «
: t_hash(yhashb::e_bt), b_hash((yhashb*) 0, sym.h_32)
, t_yy(yyb::v2t(sym.h_v)), b_yy(sym.h_v)
{
}
tyhashb_tyyb::tyhashb_tyyb(u32 h, yhashb* next, yv const& sym)
: t_hash(yhashb::e_bt), b_hash(next, h)
, t_yy(yyb::v2t(sym)), b_yy(sym)
{
}
"Instead of tyhashb_tyyb.cpp, the constructors above actually appear right inside symbol hashmap code, next to where called," Wil corrected, "for better cache line locality."
license
«
All this code is available only under the BriarPig mu-babel license described fully on the rights page. You do not have permission to reprint this page in any way. Neither feeds nor repackaging is allowed. You can link this page if you want folks to read it. |
menu
mu, toy, peg, imm, tag, box, symbol « Þ, token, number, bigint, class, method, reader, writer, eval, env, vm, gc, world, pcode, compiler, asm, lathe, lisp, smalltalk, design, weight, jar, card, harp, debug, profile (thorn, todo, names, fd, iovec, assert, log, run, hex, crc, buf, in, out, quote, escape, compare, file, deck, cow, arc, blob, tree, slice, rand, time, stat, hash, heap, node, primes, page, book, pile, stack, atomic, lock, mutex, thread, map, meter, list, iter, ctype)
errata
«
"Hey, Wil," interrupted Stu. "The comment next to member b_next in yhashb says it's really a peg." "That's what I was thinking," nodded Wil. "But it can only point to another yhashb hash box." "But the bucket chain in hashmap yym is zero terminated," Stu continued. "And that would put a zero pointer in the b_next member of yhashb." "Oops," Wil gulped. "You're right. I can either use yp::i_nil for list termination, or stop saying b_next is a peg." "I bet five dollars you stop saying it's a peg," Stu said. "Yeah," Wil agreed. "But I need to make sure it sticks: the gc engine had better know about it too."
symbol hashmap
«
"Is class yym below used as an intern table for canonical symbols instances in a virtual machine?" Zé asked. "I understand from names this name must mean thorn symbol map." "Yes," Wil confirmed. "But I won't apologize for not naming it SymbolMap. You've got my name conventions down." class yym « "How many instances of yym symbol map do you expect in a system," Stu asked. "Which is better: many or few?" "Few is better than many," Wil replied. "Each one can use a lot of resources, so many would be expensive." "And if symbols are immutable," Zé added, "Sharing of some kind ought to be easy — that's what you planned, right? Is yym part of future immutable world state?" "Yes, an instance of yym can be shared by each virtual macine," Wil confirmed. "It's a good place to put a canonical instance of each symbol appearing in modules for example." "What about symbols created dynamically at runtime?" Stu asked. "Say, unique symbols in requests to server?" "That's the problem case that must be solved in the long run," Wil said. "I can't let a symbol intern table permanently grow in size proportional to unique symbols in server requests." "So if you want tenured, global, shared symbols one place," Zé hypothesized, "and transient, local, isolated symbols elsewhere, does that imply a two tier scheme?" "Yes, I think it does," Wil agreed. "So one or few yym instances should be good for global sharing, and some other symbol table might be better for temp request-local symbol maps." "Can you guys hold off more big picture context until after we start the yym api, please?" Stu begged. class yym { // symbol hashmap (interns symbols in box format) «
private: // no by-value copy supported
yym(const yym&); yym& operator=(const yym&); // no copy
"Private copy methods means yym can't be copied," Wil indicated. "This prevents accidental copying." private:
yPBH* m_H; // the PB heap allocator used by this map «
yBqm m_Bqm; // book map: 1 per book of USED space in map «
yv m_v0; // empty dummy for vfree to refer when invalid «
yB* m_B; // the current yB book used for box allocation «
yv* m_vfree; // yv describing free space left in m_B «
n32 m_nlost; // unused bytes left in books preceding m_B «
"Pile instance m_H is the heap allocating all memory," Wil introduced, "including each book allocated by m_Bqm which keeps every book in both a list and a hashmap." "Looks like m_B is the current book in which new symbols are allocated," Stu chimed in. "And m_nlost counts fragmentation waste in earlier books? What's m_vfree?" "Aha!" shouted Zé. "I bet m_vfree points at the b_v member of the ytocb box at the start of the current book." "Bingo," Wil congratulated. "And m_v0 is where m_vfree points when it has no book to reference." static const size_t kprime16k = e1i_16381_16k; // prime < 16K
static const size_t kprime32k = e1i_32749_32k; // prime < 32K «
static const size_t kprime64k = e1i_65521_64k; // prime < 64K «
"Those look like the same primes you use for hash maps like those in book and pile demos," Stu noticed. "Yes, one of those values goes into m_slots below," Wil said, "which is the length of the m_ypv vector of pointers to yhashb hash boxes, each of which is followed by a symbol." "You mean hash boxes added to a map bucket are actually b_hash members of struct tyhashb_tyyb," Zé assumed. "Just so," Wil agreed. "Each elem added is actually a tyhashb_tyyb instance allocated from a current book." // m_ypv can be a book or much larger contiguous block (256K)
yhashb** m_ypv; // vector of buckets of length m_ypv
u32 m_slots; «
u32 m_n; // the number of symbols in the hashmap «
u32 m_cue; // count changes that would invalidate an iter «
"I see m_n is the number N of map symbols," Stu read. "But what is m_cue — a count of changes? What changes?" "Member m_cue helps me implement iterators," Wil explained. "It increments every time a pointer in the map changes that might foil an interator in the map." "Why do you use iterators?" Stu asked. "That's neither stylish in either Lisp nor Smalltalk, right?" "It's mainly for any helper code I write in C++," Wil shrugged. "I don't actually need iterators now. But it seems wrong not to permit the map to be examined." "Here you'll do your usual magic number thing: e_mlive stored in m_mlive while a maps is still alive," Zé noted. void* _mnewB(n32 boxSize); // alloc new book and boxSize box
tyhashb_tyyb* _mnew(yvh32 const& s); // alloc/init new symbol
"Non-public method _mnew() is an internal mean of allocating new map elements," Wil explained. "The second taking yvh32 intializes boxes, too. The first only allocates." public:
yPBH* mheap() const { return m_H; } «
yB* mfindB(const void* p) const { return m_Bqm.mfindB(p); } «
"I'm sure mheap() just returns the pile passed to the map constructor," Stu said. "But how is mfindB() useful?" "If it returns non-null," Wil replied, "mfindB() tells you a symbol address passed in as an argument was actually allocated by this symbol map, and not elsewhere." "That's a little subtle," Stu worried. "But I guess you just explained it clearly enough. Add more comments!" bool mgood() const { return e_mlive == m_mlive && m_H; } «
void mbad() const; // call mbad() when !mgood()
"Good and bad methods just check basic map aliveness," Wil dismissed. "Method mbad() logs an error." "Size and empty just refer to number of unique symbol members in this map," Wil glibly summarized. "You allocate all space used by yym symbol hashmaps using the yPBH passed to the constructor?" Zé asked. Wil nodded. "And the destructor frees everything allocated," Wil added. "Books and one large bucket vector." void mclear(); // remove all symbols; return books to heap
yyb* mfind(yv const& key) const; // zero if not present
yyb* madd(yv const& key); // interns key if not now present
yyb* ms2y(const char* s, unsigned alt); // interns if new
"Let me guess," Stu leaned in close, "those methods above are the main api when using yym, right?" "Sure looks like it," Zé agreed. "Add, find, and remove map elements. Is method ms2y() how you manage to associate a unique value denoting a special form for some symbols?" "Yes, it's only called at boot time," Wil confirmed. "Setting the alt field in symbols won't be exposed in high level languages — at least not usually." yymp mbegin() const; // iter starting at "first" symbol
const yhashb** mend() const { return (const yhashb**) 0; } «
"Begin and end methods act a bit like an STL api, right?" Stu guessed. "I assume code resembles the iter demo." "Good call," Wil said. "The iter comes after map code. As usual, the p suffix means pointer — in cases like this, a synonym for iterator when it acts like a pointer to an elem." struct Mq { yym const& q_m; Mq(yym const& m): q_m(m) { } }; «
Mq quote() const { return Mq(*this); } «
void mprint() const; // dump to stdout via yout
void mdump(yo& o) const; void mcite(yo& o) const;
void mout(yo& o) const; // 1 symbol per line, no extra markup
}; // class yym
"Once again, a lot of this is my standard api for debug printing," said Wil. "And out and quote demos provide adpi detail." inline yo& operator<<(yo& o, yym const& x) { «
x.mout(o); return o; }
inline yo& operator<<(yo& o, yym::Mq const& x) { «
x.q_m.mdump(o); return o; }
inline yo& operator<<(yo& o, yct<yym> const& x) { «
x.c_t.mcite(o); return o; }
"This last operator allows you to append a new octet run as a symbol," Wil said. "It adds a symbol if not already present." yym.cpp « "Non-inline methods for yym symbol maps appear below," Wil said. "The constructor gets space from the pile arg." yym::yym(yPBH* H) : m_H(H), m_Bqm(H), m_v0(0,0), m_B(0) «
, m_vfree(&m_v0), m_nlost(0), m_ypv(0), m_slots(0), m_n(0)
, m_cue(0), m_mlive(e_mlive)
{
// we can alloc 2 or N contiguous books, but this works fine:
m_slots = e1i_65521_64k; // e1i_32749_32k or e1i_65521_64k
m_ypv = (yhashb**) H->hcalloc(m_slots, sizeof(yhashb*));
if (!m_ypv) { // try smaller alloc?
m_slots = e1i_32749_32k; // e1i_32749_32k or e1i_65521_64k
m_ypv = (yhashb**) H->hcalloc(m_slots, sizeof(yhashb*));
}
if (!m_ypv) { // could not alloc hashmap buckets?
ylog(1, "yym::yym() cant alloc slots=%d\n", (int) m_slots);
m_H = 0; // not usable anyway
}
}
"As you can see," Wil continued. "The vector of bucket pointers is allocated as one contiguous block cleared to all zero." "Why not use book instances for that?" Stu asked. "You have no api yet to alloc several contiguous books at once?" "I wanted many more than 16K buckets than would fit in one book," Wil said. "Several adjacent books might be better." yym::~yym() { // return all allocated space to the heap «
if (yunlikely(!this->mgood())) { this->mbad(); return; }
++m_cue; // invalidate iters
this->mclear(); // free all symbols
yPBH* h = m_H; m_H = 0; // no more heap
if (h && m_ypv) // first time this yPBH was destroyed?
h->hfree(m_ypv); // calloc'ed space instead of using books
m_ypv = 0; // freed just now
m_mlive = e_mdead; // no longer alive
}
"The destructor frees all space," Wil explained. "The mclear() method below frees all books containing symbols, and then the bucket vector is freed." void yym::mclear() { «
// remove all symbols; return allocated books to heap
if (yunlikely(!this->mgood())) { this->mbad(); return; }
++m_cue; // invalidate iters
m_vfree = &m_v0;
m_v0.vinit(0,0); // no more free space in current book
m_nlost = 0;
m_B = 0; // most space is stored in books in map m_Bqm:
m_Bqm.mclear();
if (m_ypv) // have buckets? clear all buckets to nil ptrs?
::memset(m_ypv, 0, m_slots * sizeof(yhashb*)); // zero all
else
yH::hnil("yym::mclear() m_ypv");
}
"Maybe you should do some of mclear() directly in the destructor," Stu suggested, "to avoid the unnecessary memset() to zero. Does it happen often enough to matter?" "Not so far," Wil said. void* yym::_mnewB(n32 boxSize) { «
// alloc new book and then alloc boxSize box
static const char* here = "yym::_mnewB";
yB* B = m_Bqm.mnewB(/*'yy'*/0x7979);
if (!B->Bgood()) { B->Bbad(here); return 0; }
B->Bzero(); // clear all bytes (to help with bug detection)
m_nlost += m_vfree->v_n; // remains of last exhausted book
m_B = B; yv vbook(B, yB::kBsz); // all bytes in this book
void* toc = vbook.vtake(sizeof(tytocb)); // leading toc box
tytocb* tb = new(toc) tytocb(vbook); // remainder of book
m_vfree = tb->bv(); // free space is book's remaining space
return m_vfree->vtake(boxSize); // try to alloc box again
}
"Method _mnewB() above is only called when the current book in m_B has insufficient space," Wil said. "This is detected when zero is returned from vtake() on m_vfree." "It looks like you simply allocate a new book," Zé read, "then allocate a ytocb box at the start, whose b_v member is initialized to the book remainder. Then m_vfree points at b_v." "That's right," Wil agreed, "Then the desired box is allocated from m_vfree using vtake(), which is operation that failed on the last book. So mainly _mnewB() refills free space." tyhashb_tyyb* yym::_mnew(yvh32 const& s) { «
static const char* here = "yym::_mnew"; // for err messages
n32 need = sizeof(tyhashb_tyyb) + yyb::bmore(s.h_v.v_n);
if (!ybox::goodn(need)) // not aligned??
ybox::badn(need, here); // check 4-byte alignment
void* p = m_vfree->vtake(need); // from current free space
if (!p) p = this->_mnewB(need); // alloc book & re-alloc box
return (p)? new(p) tyhashb_tyyb(s) : 0; // placement new
}
"I see _mnew() uses _mnewB() if necessary to alloc one box," Zé analyzed. "First it figures total bytes needed, then afterward it constructs tyhashb_tyyb in space acquired." "Yes, and below I repeat the same tyhashb_tyyb constructors from the left column," Wil pointed. tyhashb_tyyb::tyhashb_tyyb(yvh32 const& sym)
: t_hash(yhashb::e_bt), b_hash((yhashb*) 0, sym.h_32)
, t_yy(yyb::v2t(sym.h_v)), b_yy(sym.h_v)
{
}
tyhashb_tyyb::tyhashb_tyyb(u32 h, yhashb* next, yv const& sym)
: t_hash(yhashb::e_bt), b_hash(next, h)
, t_yy(yyb::v2t(sym)), b_yy(sym)
{
}
"Hey, Wil," Stu hestiated. "I've been comparing mfind() and madd() below, and madd() begins with the same code as mfind(). Do you need to refactor the common code?" "Maybe," Wil nodded. "But I'm not sure mfind() actually gets used much compared to madd() since the intent is usually to get an interned symbol in hand." "Adding another function might slow it down," Stu considered. "I guess we should profile things first?" yyb* yym::madd(yv const& v) { // interns if not already present «
if (yunlikely(!this->mgood())) { this->mbad(); return 0; }
yvh32 key(v); // hash key and cache the hash in hkey.h_32
yhashb** pp = m_ypv + (key.h_32 % m_slots); // bucket
for (yhashb* p = *pp; p; p = p->b_next) { // not yet nil?
yyb* y = yyb::h2y(p); // find symbol box after hash box
if (y && *y == key) // symbol after hash? equals input key?
return y; // found: success (no new symbol is needed)
}
tyhashb_tyyb* pair = this->_mnew(key); // new symbol key copy
if (pair) { // created tagged pair (yt+hash, yt+symbol)?
++m_n; // hashmap is bigger
++m_cue; // invalidate iters before bucket change:
pair->b_hash.b_next = *pp;
*pp = &pair->b_hash; // push on bucket top
return &pair->b_yy; // the address of the yyb box inside
}
return 0;
}
"Above, madd() searches the bucket where a symbol should go," Wil described. "If missing, a new symbol is allocated, then pushed on top of that bucket." "And I see ms2y() uses madd() to find or add a symbol," Zé said. "But first it converts C string to yv. Then afterward it sets the 7-bit alt field of the tag. That's only for special forms?" "Yeah, only Lathe's boot code should call ms2y()," Wil expanded. "It knows which symbols are special forms, and using which alt values. Everyone else should leave it alone." yyb* yym::ms2y(const char* s, unsigned alt) { // interns if new «
yv v(s); yyb* y = this->madd(v);
if (y && alt && alt < 128) { // no more than 7-bits in alt?
const yt* kt = &y->btag();
yt* t = (yt*) kt;
t->talt(alt); // install enum code denoting special form
}
return y;
}
yyb* yym::mfind(yv const& v) const { // nil if not present «
if (yunlikely(!this->mgood())) { this->mbad(); return 0; }
yvh32 key(v); // hash key and cache the hash in hkey.h_32
yhashb** pp = m_ypv + (key.h_32 % m_slots); // bucket
for (yhashb* p = *pp; p; p = p->b_next) { // not yet nil?
yyb* y = yyb::h2y(p); // find symbol box after hash box
if (y && *y == key) // symbol after hash? equals input key?
return y; // found the target: success
}
return 0; // not found
}
"And mfind() looks just like the start of madd()," Stu repeated, "searching the expected bucket for a symbol." yymp yym::mbegin() const { «
if (yunlikely(!this->mgood())) { this->mbad(); return yymp(); }
return yymp(this);
}
"Method mbegin() constructs an iterator instance," Wil said, "whose api resembles STL iterators in usage. Code involved appears a ways below (cf »)." void yym::mdump(yo& o) const { «
if (yunlikely(!this->mgood())) {
this->mbad(); o << "<yym dump=bad/>"; return;
}
o.oftn(
"<yym m=%lx H=%lx #B=%d B=%lx free=%d lost=%d cap=%d n=%d>",
(long) this, (long) m_H, (int) m_Bqm.msize(), (long) m_B,
(int) m_vfree->v_n, (int) m_nlost,(int) m_slots,(int) m_n);
o << m_Bqm.quote(); // show map of allocated books
o.ounend("yym");
}
void yym::mcite(yo& o) const { «
if (yunlikely(!this->mgood())) {
this->mbad(); o << "<yym cite=bad/>"; return;
}
o.of(
"<yym m=%lx H=%lx #B=%d B=%lx free=%d lost=%d cap=%d n=%d/>",
(long) this, (long) m_H, (int) m_Bqm.msize(), (long) m_B,
(int) m_vfree->v_n, (int) m_nlost,(int) m_slots,(int) m_n);
}
"Debug printing code above does not show the contents of the hashmap," Wil warned. "But mout() below prints each symbols one per line, without extra markup." void yym::mout(yo& o) const { «
// one symbol per line with no extra markup
if (yunlikely(!this->mgood())) {
this->mbad(); o << "<yym out=bad/>"; return;
}
yrsum sum; u32 hi = 0; u32 all = 0;
yhashb** pp = m_ypv; // first bucket
yhashb** end = pp + m_slots; // one beyond last bucket
for (/*prep preincre*/ --pp; ++pp < end; ) {
u32 some = 0; // bucket length
for (yhashb* p = *pp; p; p = p->b_next) { // not yet nil?
++some; ++all;
yyb* y = yyb::h2y(p); // find symbol box after hash box
if (y) // symbol follows hash?
o << yendl << *y;
}
if (some) sum += some;
if (hi < some) hi = some;
}
o << yendl; // final newline
if (yunlikely(all != m_n)) { // not the number expected??
ylog(1, "yym::mout() all=%lu does not match m_n=%lu\n",
(long) all, (long) m_n);
}
#ifdef SPILL_INTERNAL_VIEW
ylog(1, "ymm buckets N=%d avg=%3.1f sdv=%3.1f hi=%d\n",
(int) sum.size(), (double) sum.avg(), (double) sum.sdv(),
(int) hi);
#endif /*SPILL_INTERNAL_VIEW*/
}
"I guess you could test mout() by loading yym with a dictionary, then dumping the words again using mout()," Zé considered. "Then sort and diff with the original dict." "That's exactly what I did," Wil beamed. "It's like you read my mind. Is that an easy skill to learn?" "You're easy to read," Zé dismissed. void yym::mbad() const { // call when hgood() is false «
if (e_mdead == m_mlive)
ylog(1, "mbad() m_mlive=e_mdead=%#lx\n", (long) e_mdead);
else if(e_mlive != m_mlive)
ylog(1, "yym::mbad() m_mlive==%#lx NOT e_mlive=%#lx\n",
(long) m_mlive, (long) e_mlive);
else if (!m_H)
ylog(1, "yym::mbad() m_H is nil\n");
else
ylog(1, "yym::mbad() but why?\n");
}
"Hmm," Stu glanced quickly. "Looks like mbad() is one of those boring error logging methods."
hashmap iterator
«
The following iterator class yymp visits symbols in a specific yym symbol hashmap instance. An instance is returned by method yym::begin() (cf «). class yymp « Almost no comments are offered for yymp because Wil thinks the code should be obvious after you read the iter demo, and because yymp is not currently used anywhere in other existing Lathe code. This implementation was written specifically for this page, mainly to clarify yym above further. public:
typedef yym m_t; typedef m_t map_t; // map types «
typedef yhashb e_t; typedef e_t element_t; // element types «
typedef size_t sz_t; typedef sz_t size_type; // size types «
protected: friend class yym;
const m_t* p_m; // (map's m_cue == p_c when iter is valid) «
u32 p_c; // copy of p_m->m_cue validating this iter's sync «
// h, e, and n structured to support remove() but still TBD
mutable e_t** p_h; // handle: pointer to slot holding p_e «
e_t* p_e; // curr elem if not nil or erased (nil when erased) «
e_t* p_n; // next elem (p_e->e_e just after p_e gets value) «
public:
void pnilmap(const char* at) const { «
yellf(__LINE__,__FILE__, "pnilmap(at=%s) p_m==nil", at);
}
void pbadcue(const char* at, u32 mapcue) const { // no sync «
yellf(__LINE__,__FILE__, "pbadcue(at=%s) p_c=%d m_cue=%d",
at, (int) p_c, (int) mapcue);
}
bool peq(yymp const& t) const { // same map, same element «
return (this->p_m == t.p_m) || (this->p_h == t.p_h);
}
bool operator==(yymp const& t) const { return peq(t); } «
bool operator==(const e_t** h) const { return (h == p_h); }
bool operator!=(yymp const& t) const { return !peq(t); } «
bool operator!=(const e_t** h) const { return (h != p_h); }
const e_t* pke() const { return p_e; } // current iter elem «
const yyb& operator*() const; // current symbol in iter
yymp& operator++() { this->pnext(); return *this; } // prefix «
yymp& operator++(int) { // same as prefix: err - don't use
yellf(__LINE__,__FILE__,
"yymp::operator++() POSTFIX deprecrated");
this->pnext(); return *this;
}
}; // yymp
yymp.cpp « Note iterator code might contain bugs since little testing was done besides sample code shown further below. tyhashb_tyyb g_1nobody_symbol;
const yyb& yymp::operator*() const { // current symbol in iter «
const yym* m = p_m; // the map
yhashb* e = p_e; // current elem
if (p_h && e && m) { // iter not done and elem not erased?
yyb* y = yyb::h2y(e); // find symbol box after hash box
if (y && p_c == m->m_cue) { // is symbol? map unchanged?
return *y;
}
else { // should never happen; map or iter is corrupt?
if (p_c != m->m_cue) { // map changed
ylog(0, "operator*() p_c=%#lx != cue=%#lx bad sync",
(long) p_c , (long) m->m_cue);
}
if (!y) {
ylog(0, "operator*() hash not followed by symbol");
}
p_h = 0; // end iteration when anything goes wrong
}
}
return g_1nobody_symbol.b_yy; // dummy (all zeroes?)
}
bool yymp::pnext() { «
const yym* m = p_m; // the map
yhashb** h = p_h; // handle to current map bucket
if (m && h && p_c == m->m_cue) {
yhashb* n = p_n; // next elem in current bucket
if (n) { // no need to change current bucket?
p_e = n;
p_n = n->b_next;
return true;
}
yhashb** x = m->m_ypv + m->m_slots; // 1 past last bucket
for (++h ; h < x && p_c == m->m_cue; ++h) {
yhashb* e = *h; // first hash box elem in bucket
if (e) { // found first map elem?
p_e = e;
p_h = h; // track current bucket in map
p_n = e->b_next; // next elem in iter
return true; // end map bucket scan
}
}
}
if (yunlikely(m && p_c != m->m_cue)) {
ylog(0, "pnext() p_c=%#lx != m_cue=%#lx out of sync",
(long) p_c , (long) m->m_cue);
}
p_h = 0; // iteration is over
return false;
}
yymp::yymp(const yym* m) «
: p_m(m), p_c(m->m_cue), p_h(0), p_e(0), p_n(0) {
if (yunlikely(!m->mgood())) { m->mbad(); return; }
yhashb** h = m->m_ypv; // first hashmap slot in map
if (h) { // map looks valid? (maybe assert this?)
yhashb** x = h + m->m_slots; // one past last map bucket
for ( ; h < x; ++h) {
yhashb* e = *h; // first hash box elem in bucket
if (e) { // found first map elem?
p_e = e;
p_h = h; // track current bucket in map
p_n = e->b_next; // next elem in iter
return; // end map bucket scan
}
}
// p_h = 0; // end of iter if return did not occur above
}
}
sample « int main(int argc, char * const argv[]) { «
yH& vanillaHeap = yH::g_1H; // malloc() and free()
yPBH pile(&vanillaHeap); // allocs pages and books
yym symbolMap(&pile);
yv foo("foo");
yv bar("bar");
yv space("white\t\nspace");
symbolMap << foo << bar << space;
yymp iter = symbolMap.mbegin();
yout << "# symbolMap post foo and bar added:";
for ( ; iter != symbolMap.mend(); ++iter) {
const yyb& sym = *iter;
yout << yendl << sym.quote();
}
yout << yendl;
yout << "# symbolMap end dump:" << yendl;
yout << symbolMap.quote() << yendl << ynow;
return 0;
}
Output of main() above appears on stdout below; light editing breaks a few lines with a ¬ glyph, and elides a few object addresses and hashes with a … glyph. # symbolMap post foo and bar added: «
<yyb tag=const-vec-symbol hash=76ff8caa name="bar"/>
<yyb tag=const-vec-symbol hash=8c736521 name="foo"/>
<yyb tag=const-esc-vec-symbol hash=… name="white\t\nspace"/>
# symbolMap end dump:
<yym … H=bffff924 #B=1 B=2030000 free=65452 lost=0 cap=65521 n=3>
<yBqm … H=bffff924 epp=2802000 Blen=1 Plen=2 vn=4072>
<yPq me=bffffa14 H=bffff924 len=2>
<yh0 a=1 k=S n=28013a8 g=6f><yPn me=28013a8 ru=1:1 g=6f¬
rwi=w+ tyl=Pu id=6862 P=2802000 crc=b19d03ea/></yh0>
<yh0 a=2 k=S n=28013d0 g=db><yPn me=28013d0 ru=1:1 g=db¬
rwi=w+ tyl=Pu id=4271 P=2803000 crc=fbb96ddd/></yh0>
</yPq>
<map vec=2802000 slots=1021 sz=1> <!-- sizeof(yBqme)=24 -->
<yBn … ru=1:1 g=42 rwi=w+ tyl=Bu id=7979 B=2030000/>¬
<!-- 1 i=864 -->
</map>
</yBqm>
</yym>
|