Þ   briarpig  » thorn  » demos  » hash


demos are explained here; a menu at top column right indexes actual topic demos. Here we demo hash.

problem

     Wil uses hashmaps for many purposes; each map needs a fast and collision-resistant hash function. The perennial questions are these: What hash function is best? Which is fast enough? Which has fewer collisions in expected population of map keys?

     By default Wil often chooses crc32() as a good hash function because crc's are collision-resistant, and because performance has been good, if not ideal. (Because crc32() is usually implemented with lookup tables, it touches many cache lines, so it gets more expensive as an app incurs memory pressure.)

     The remainder of this page presents further analysis in the form of dialog among characters discussing merits of crc32() versus other hash methods. For more information on þ's crc api, see the crc demo.

criticism «

     "I can't believe you use crc32() for hashing," Dex complained. "People say it sucks. Why not use Chongo's FNV hash?"

     "Do you have evidence fnv is a good hash?" Wil asked. "Or is this just hearsay again? Why do you hate crc32()?"

     "Crc is so old school," Dex dismissed. "And I see from web pages many people use fnv, so it has to be good."

     "Let's find out," Wil suggested. "I'll write a test and tell you the answer empirically. If fnv wins, I'll use it."

     Stu was almost out the door, but paused — this sounded interesting. Something in Wil's tone of voice got Stu's attention: maybe Dex would fall for a trap Wil is setting.

     "What are you looking at?" Dex snarled. "Wil's gonna find out his precious crc32() is a dog. Finally."

     "Should we make a bet on it?"" Wil wondered. "Just to make it interesting? Loser wears a t-shirt for a week that reads: 'I'm an idiot.' What do you say?"

     Dex's eyes got shifty, like he smelled something. "Did you already do the test? Is this a setup?"

     "Chicken," Stu razzed Dex. Then to Wil: "How can you test a hash? What's your criteria? Will the test be fair?"

     "When I use a hash function in a map," Wil explained, "what I want is low probability of collisions when a hash distributes keys as uniformly as possible across all hash buckets."

     "A closed/probing hashmap?" Stu asked. "Or an open/chaining map? Which is better for analysis?"

     "I'd rather test a closed/probing hashmap," Wil said. "They suffer badly if hash collisions create too many used buckets in a row. This will be easier to quantify."

     "How do we measure that?" Dex asked. "Do we look for distribution of empty buckets?"

     "Yes, exactly," Wil nodded. "All I want in a hash is empty buckets distributed as uniformly as possible — ideally empty buckets would be evenly spaced everywhere."

     "For an upper bound on how long it takes to find an empty slot?" Stu asked. "I guess the best hash would minimize distance between empty buckets, right?"

     "Yes, that's my criteria," Wil agreed. "Is there something else you think is important, Dex? Didn't I hear you mumbling about something called a hash's 'avalanche' behavior?"

     "Yeah," Dex confirmed. "Good avalanche is where a hash changes many bits from change in few input bits. Can we measure that?"

     "We're not going to measure it," Wil explained, "because I don't care. Good avalanche is irrelevant to me — all I want is uniform distance between empty buckets. I can measure patterns of holes between used buckets directly."

     "Okay, I bet you dinner fnv works better," Dex offered.

     "I really wanted to see you wear that shirt," Wil gnawed his lip. "Okay, you're on. Let's hurry, I'm hungry."

hash test

     "I did a test like this before, almost ten years ago," Wil noted. "That's when I noticed crc32() did a good job."

     Dex did a double take. "You just said you didn't do a test already. The bet's off if you already know the answer."

     "I never tested fnv before," Wil assured. "For all I know, maybe it's a lot better than a crc. I meant to say I have an idea how the test should go using a closed/probing map."

     "Can you run the test on your Mac laptop here?" Stu suggested. "Will the platform matter at all?"

     "Sure I can," Wil settled at the keyboard. "Let's fire up Xcode. We're not really interested in speed. Let's assume speed isn't an issue. All we care about is even empty bucket placement."

     "But if fnv is faster, then it wins if everything else is equal," Dex insisted. "I betcha it's faster."

     "Unless it's a lot faster, better hole spacing wins," Wil emphasized. "I'm not going to waste time optimizing this test. I'm going to code each algorithm the simplest possible way."

struct HashTest { // study hashmap distributions « typedef u32 (*Thash)(const u8* p, n32 n); « yv& t_v; // source of hashmap array memory « Thash t_hash; // the hash function to use « u32* t_map; // closed/probing hashmap array « n32 t_buckets; // number of hashmap buckets «

     Wil started with the HashTest class. "Let's write the test so the hash function plugs in — Thash is a pointer to a function taking a pointer and length in bytes, returning a 32-bit hash."

     "I hate your naming conventions," Dex whined. "And why is HashTest a struct? Why not a class?"

     "Shut up," Wil snapped. "You write tests your way, I'll write 'em mine. You're lucky I'm not charging you. Okay, the yv run instance named t_v is a block of memory provided in which we put the hashmap. We cast it's pointer to u32* for t_map and divide length in bytes by sizeof(u32) for t_buckets."

n32 t_zeroes; // count of zero hashes « n32 t_dups; // count of duplicate hashes « n32 t_capacity; // max number of buckets to use « n32 t_size; // actual number of map elems added « u64 t_ticks; // clock ticks spent hashing «

     "I'll make each map entry just an integer containing a hash, using zero to mean empty, ie empty," Wil explained. "So if any hash is zero, I'm just going to count it in the t_zeroes and ignore it. We'll decide whether that's significant if it happens."

     "Max t_capacity will be two thirds of all t_buckets, so a third of the buckets will be empty when the map hits max," Wil continued. The actual number of hashes added is t_size, and t_dups is a count of duplicate hashes if we see any."

     "Is t_ticks the number of elapsed clock ticks?" Stu guessed. "Will that appear in the time demo later?"

     "Yeah," Wil confirmed. "We'll read the hardware timestamp counter for high granularity time measurements."

     "I thought you said speed did not matter," Dex sniped. "Why bother measuring time at all?"

     "If I don't, you'll invent numbers out of thin air," Wil accused. "This is to ground you in reality. You can optimize the code later yourself if you want to get the times better."

n32 t_high; // max value added to histogram « enum { e_hist = 72 }; // base size of histogram « n32 t_histogram[ e_hist + 1 ]; // +1 for overflow «

     "Ooh, a histogram," Dex pretended to be awed.

     "Whatever works," Wil shrugged. "This histogram will actually get used more than one way. Initially it tracks number of buckets seen — after the first — to add a new map entry."

     "And later on?" Stu prompted.

     "Later we can use the same histogram to track count of used slots between empty slots," Wil explained.

     "Those aren't the same thing?" Dex asked. "Close?"

     "Do you ever think before you speak?" Wil marveled. "No they're not the same. Be patient, it'll make sense."

void tdistance(unsigned x) { // add x to histogram « if (t_high < x) // new highwater mark? t_high = x; if (x < e_hist) ++t_histogram[x]; // small x: separate counts else ++t_histogram[e_hist]; // overflow on x >= max }

     "This method populates the histogram," Wil said. "The highest distance seen goes in t_high in case it exceeds the histogram size. Note how t_histogram[e_hist] counts all events beyond the histogram size."

void tclear() { // back to after-construction state « ::memset(t_v.v_p, 0, t_v.v_n); // zero all buckets t_high = t_size = t_zeroes = t_dups = 0; t_ticks = 0; for (unsigned i = 0; i <= e_hist; ++i) t_histogram[i] = 0; }

HashTest(yv& v, Thash hash) // on hash=0, use crc32() « : t_v(v), t_hash(hash), t_map((u32*) v.v_p) , t_buckets(v.v_n / sizeof(n32)), t_zeroes(0), t_dups(0) , t_capacity((t_buckets * 2)/3) // max size allowed , t_size(0), t_high(0) { this->tclear(); }

     "Why call tclear() in the constructor?" Dex asked. "Make the constructor do it. Do you just like clear methods?"

     "We're going to call tclear() over and over in the tests instead of re-instantiating HashTest. And before you ask: no, I'm not going to write a destructor."

     "What's left?" Stu hurried. "I'm getting hungry."

     "A little more api," Wil said. "Then we'll define the non-inline methods. I'll model it on some tests I wrote for a symbol table last summer. I can populate the map with dictionary words."

n32 tadd(yv const& v); // hash and add one word n32 tadd(yi& i); // hash and add words from input n32 taddpages(const void* p, n32 count); void tlog(yo& o, const char* arc); // histogram etc void tcaliper_holes(); // histogram by hole distance };

     "Whoa," Dex objected. "Why is a dictionary a good place to find map keys? Won't a population favor one hash over another?"

     "I want to test keys that have many similarities," Wil explained. "Lots of dictionary words are leading prefixes of other words. Some are only barely different from each other. If you have another population to run through, be my guest."

     "You probably want to try page addresses," Stu suggested. "The maps in the page demo might be able to use a better hash function. Let's look at four byte integer keys, too."

     "Wil," Dex pointed at the api. "What the heck is tcaliper_holes()? Isn't there a better name?"

     "I dunno," Wil shrugged. "Let's look at it first. See, it zeroes all the histogram buckets and then re-populates the histogram with distances between empty buckets. It's like using calipers to gauge distance between holes, thus the name."

void HashTest::tcaliper_holes() { « int i = 0; for ( ; i <= e_hist; ++i) t_histogram[i] = 0; int last = -1; /*previous empty slot*/ u32* m = t_map; for (i = 0 ; i < t_buckets; ++i) { if (0 == m[i]) { // empty slot double betweenHoles = (i - last - 1); this->tdistance(betweenHoles); last = i; } } }

     "Ick," Dex made a face. "Doesn't the map wrap around from end to beginning? Why don't you track the distance correctly when it wraps around?"

     "Yes," Wil admitted. "This code pretends a break happens in the middle of the sequence of used buckets where end wraps around to the start. But if this is only one misstep in tens of thousands, it won't change numbers. Relax."

n32 HashTest::taddpages(const void* p, n32 count) { « n32 adds = 0; ptrdiff_t page = (ptrdiff_t) p; yv vpage(&page, sizeof(ptrdiff_t)); for (n32 i = 0; i < count && page < 0xfffff000; ++i) { if (this->tadd(vpage)) { ++adds; page += 4096; } else break; } return adds; }

     "Okay, Stu," Wil continued. "This taddpages() method adds count page addresses starting with input address p which is assumed page aligned. We can use that when we want to see how hash distribution affects a page address population."

     "Looks good to me," Stu shrugged. "But this next method adding all the words in an input stream looks kinda weird. Was the weirdness really necessary? Or did you just take a shortcut?"

n32 HashTest::tadd(yi& i) { // hash & add words from input « n32 adds = 0; char body[ 256 ]; yb buf(body, /*fill*/ 0, /*size*/ 256); ybo wordo(&buf); while (i.iword(wordo, 256)) { wordo.oflush(); if (this->tadd(buf)) { ++adds; wordo << yreset; } else break; } return adds; }

     "It's from a symbol table population test," Wil explained. "You'll see more of it below where I open the file as an in-stream. Basically, all this does is parse space delimited 'words' from the in-stream using yi::iword() (cf «) and then add each one to the hashmap using tadd() below."

     "This test is never going to be over," Dex whined. "Do you over-engineer everything, including unit tests?"

     "I'm just winging this without thinking," Wil bristled. "You have a hell of a lot of nerve to bitch about presence of structure where none is to be expected."

     "You've way too much kiss-my-assitude," Dex charged.

     "Thanks, I forgot to say," Wil nodded. "Kiss my ass."

     "Before you start on the next method," Stu interrupted, "can you show me the time api used below?"

     "Sure," Wil pulled up a header. "Let's just pull out the inline assembler here for reading the timestamp counter."

struct ytsc { // time stamp counter (based on intel's rdtsc) « static volatile u64 rd() { // rdtsc: read time stamp counter « union { u64 c_64; unsigned long c_32[2]; } u; asm volatile("rdtsc" : "=a" (u.c_32[0]), "=d" (u.c_32[1])); return u.c_64; // 64-bits of sub microsecond clock time } u64 c_ts; // 64 bit timestamp from hardware register ytsc() : c_ts(ytsc::rd()) { } // hardware register ticks operator u64() const { return c_ts; } }; // ytsc

     "I'm gonna hurl," Dex mimed barfing.

     "Hush," Wil snapped. "Just remember ytsc::rd() returns clock ticks in a 64-bit value. It costs almost nothing to do in inline assembler. Of course, it assumes you have an Intel processor. If you don't, that's not my problem."

n32 HashTest::tadd(yv const& v) { // hash and add one word « // return zero if and only if we reached max capacity if (yunlikely(t_size >= t_capacity)) // no room left? return 0; // sorry: no space Thash fn = t_hash; u32 hash = 0; u64 before = 0; u64 after = 0; if (fn) { // explicit hash function? before = ytsc::rd(); hash = (*fn)(v.v_p, v.v_n); after = ytsc::rd(); } else { // default to crc before = ytsc::rd(); hash = ::crc32(0L, Z_NULL, 0); hash = ::crc32(hash, (Bytef*) v.v_p, v.v_n); after = ytsc::rd(); } if (after > before) // no two processor skew? t_ticks += after - before;

     "Obviously the hash function is called above," Wil noted. "But as a special case, a zero hash function means use crc32() — my goal here was to avoid adding an extra function call compared to the other hash functions."

if (yunlikely(0 == hash)) { // hash is "empty" slot val? ++t_zeroes; // count instead of adding return 1; // lie and say one was added } u32* end = t_map + t_buckets; // one past last in map int i = (hash % t_buckets); // bucket index u32* b = t_map + i; // bucket (start) u32* start = b; // save so we can tell if we later wrap

     "The fragment above," Wil observed, "just preps the loop below by finding the starting map bucket. Wrapping back to start must be impossible since t_capacity is less than t_buckets."

     "Oh yeah, the impossible never happens," Stu dead-panned.

u32 count = 0; // slots AFTER FIRST in map we examined do { u32 here = *b; if (0 == here) { // found an empty map slot? *b = hash; ++t_size; break; // fall through to assert and tdistance() } else if (here == hash) { // dup? ++t_dups; return 1; } if (++b >= end) // hit end of map? b = t_map; // wrap to map start ++count; } while (b != start); // have not hit starting point? yassert(0 == count || b != start); this->tdistance(count); // populate histogram return 1; }

     "The final loop installs in the first empty bucket," Wil summarized. "If the same hash occurs twice — not necessarily for the same key — t_dups counts each repeat."

     "Do we have to look at the log printing?" Dex pleaded. "I'm sick of printing code. And I hate your api — why don't you use ostream? I know, I know. Some places won't let you use the C++ standard library."

     "Our bet rides on what the print method below reports," Wil reminded. "Maybe you should study what it outputs. The most interesting part is right in the beginning, where I calculate average and standard deviation of distances between holes that occur at empty buckets."

void HashTest::tlog(yo& o, const char* arc) { « yrsum holes; int last = -1; /*previous empty slot*/ u32* m = t_map; int i = 0; for ( ; i < t_buckets; ++i) { if (0 == m[i]) { // empty slot holes += (double) (i - last - 1); last = i; } } last = i - last; if (last > 1) holes += (double) last;

     "Where does the standard deviation appear," Dex searched. "How important is the standard deviation?"

     "The yrsum variable named holes is where the sum and sum of squares is tracked," Wil explained. "See the stat demo for api and usage. The standard deviation is very important. Other than the histogram, the standard deviation is the only thing that will show differences in hash effectiveness."

     "You're kidding," Dex could not believe his ears.

     "Oh no," Wil assured. "The average distance between holes will be exactly the same if each map has the same total number of holes. The only measure of uniform hole distribution is a smaller standard deviation."

     "So it's going to be math again," Dex groaned.

     "Yeesss," Wil said grimly, using the same deep tone of voice Morpheus uses in the Matrix when Neo sees agents approaching in his place of work, and curses.

u64 ticksPer = t_ticks; n32 events = t_size + t_dups + t_zeroes; if (events) ticksPer /= events; o.oftn("<HashTest-%s fn=%lx zero=%d slots=%d x=%d ", arc, (long) t_hash, (int) t_zeroes, (int) t_buckets, (int) t_capacity); o.ofn("n=%d ticks=%llu ticks-per=%llu high=%d dup=%d>", (int) t_size, (long long) t_ticks, (long long) ticksPer, (int) t_high, (int) t_dups);

o.ofn("<holes n=%llu avg=%5.3f sdev=%5.3f/>", (long long) holes.size(), (double) holes.avg(), (double) holes.sdv()); « o.oftn("<histogram max=%d over-%d=%d>", (int) e_hist, (int) e_hist, (int) t_histogram[e_hist]); unsigned count = 0; for (i = 0; i < e_hist; ++i) { o.of("%2u=%5d ", i, (int) t_histogram[i]); if (++count >= 6 && i != e_hist-1) { o << yendl; count = 0; } } o.ounend("histogram"); o.ounend("HashTest"); }

     "Okay, show me the money," Dex chivvied. "Do it. Show me some output. How about crc first?"

     "Coming right up," Wil agreed.

test harness «

     "Okay, let's go through a simple example," Wil introduced. "Let's say I have a dictionary of words in a file named dict.txt. Unfortunately I haven't done the file demo yet, so in-stream yfi for files is not ready to show. Here's a file opened:"

int twoFd = ::open("dict.txt", O_RDONLY ); « if (twoFd < 0) { int e = errno; perror("open('dict.txt')"); fprintf(stderr, "open('dict.txt')=%d\n", e); return 1; } yfdg twoFac(twoFd); // tell source to auto-close fd later yfi fi(twoFd, &yH::g_1H, (64*1024));

     "Oh my dear god," Dex emoted.

     "Don't be so histrionic," Wil rolled his eyes. "Just think of variable fi as a file in-stream we can parse for input words, then seek back to zero later."

     "Looks okay to me," Stu volunteered. "What's next?"

n32 bytes = 181000 * sizeof(u32); « u32* table = (u32*) valloc(bytes); yassert(0 != table); yv mapVec(table, bytes); HashTest ht(mapVec, /*crc32*/ 0);

     "My dict text file has just over 120,000 words in it," Wil explained. "So here I allocate space for about 50% more in the hashmap since capacity aims to limit size to two thirds all buckets. Then I construct HashTest with that space. The map starts empty. The first tadd() call below fills it:"

n32 adds = ht.tadd(fi); « ht.tlog(yout, "crc32-original");

     "Above we add all words in the dictionary," Wil noted. "Then we log the hash test, printing a histogram with stats on standard deviation — on slots seen before adding a new entry."

     "Can you print the histogram for distance between empty bucket holes next?" Stu suggested.

     "Yeah," Wil was already doing it.

yout << yendl << "# re-histogram" << yendl; ht.tcaliper_holes(); ht.tlog(yout, "crc32-caliper"); yout << ynow;

     "I'm done waiting," Dex warned. "Run that now. Show me some output. Change the color a little like usual."

<HashTest-crc32-original fn=0 zero=0 slots=181000 x=120666 « n=119888 ticks=32635498 ticks-per=270 high=63 dup=908> <holes n=61113 avg=1.962 sdev=4.127/> <histogram max=72 over-72=0> 0=80055 1=18556 2= 7771 3= 4296 4= 2464 5= 1669 6= 1176 7= 854 8= 638 9= 494 10= 372 11= 272 12= 245 13= 176 14= 131 15= 110 16= 86 17= 74 18= 62 19= 57 20= 40 21= 46 22= 44 23= 24 24= 17 25= 16 26= 21 27= 17 28= 15 29= 9 30= 9 31= 6 32= 3 33= 6 34= 8 35= 7 36= 5 37= 5 38= 4 39= 2 40= 2 41= 2 42= 2 43= 0 44= 5 45= 1 46= 3 47= 2 48= 1 49= 1 50= 1 51= 0 52= 1 53= 2 54= 0 55= 1 56= 0 57= 1 58= 0 59= 0 60= 0 61= 0 62= 0 63= 1 64= 0 65= 0 66= 0 67= 0 68= 0 69= 0 70= 0 71= 0 </histogram> </HashTest>

# re-histogram <HashTest-crc32-caliper fn=0 zero=0 slots=181000 x=120666 « n=119888 ticks=32635498 ticks-per=270 high=87 dup=908> <holes n=61113 avg=1.962 sdev=4.127/> <histogram max=72 over-72=2> 0=31459 1=10771 2= 5581 3= 3362 4= 2202 5= 1620 6= 1126 7= 895 8= 695 9= 531 10= 424 11= 370 12= 293 13= 227 14= 220 15= 180 16= 167 17= 107 18= 100 19= 87 20= 94 21= 73 22= 60 23= 60 24= 55 25= 45 26= 37 27= 32 28= 22 29= 29 30= 20 31= 18 32= 18 33= 13 34= 14 35= 8 36= 8 37= 8 38= 6 39= 11 40= 9 41= 6 42= 8 43= 3 44= 2 45= 2 46= 7 47= 3 48= 2 49= 2 50= 1 51= 2 52= 1 53= 1 54= 0 55= 3 56= 1 57= 1 58= 2 59= 3 60= 0 61= 0 62= 0 63= 0 64= 1 65= 1 66= 0 67= 0 68= 0 69= 1 70= 0 71= 0 </histogram> </HashTest>

     "Danger Wil Robinson," Dex roboticized. "Information overload. Can you give me a play by play on these numbers?"

     "Don't overuse that Robinson joke," Wil warned.

     "How about 'I'm not like you, Wil Munny,'" Stu asked.

     Wil smiled. "In moderation," he granted. "First note the two histograms above are not the same, even though both of them describe the aftermath of adding n=119888 unique hash values in a map of slots=181000 buckets. So both of them have exactly the same avg=1.962 mean distance between empty buckets with sdev=4.127 standard deviation — because those are calculated from the same sequence of empty buckets each time."

     "When the standard deviation is sdev=4.127," Dex pointed, "what does that mean exactly?"

     "Roughly speaking," Wil began with a proviso, "that means most data points are covered by three or four standard deviations, except for unusual outliers. The highest value ever seen was high=63 used buckets when adding, and high=87 measuring every single run of used buckets."

     "Except the bucket run that wraps around," Stu noted. "That one gets broken in two. But since you have n=61113 empty bucket holes, I don't think one matters much."

     "I see the average ticks per hash was ticks-per=270," Dex observed. "What's the scale on that? How many ticks per microsecond? Is this a multi-GHz Intel Mac?"

     "Yes, but I don't want to scale the ticks," Wil hesitated. "It's approximately 800 ticks or so per microsecond, give or take a couple hundred. As I'll explain in the time demo later, my laptop has two processors, and the hardware timestamp counters are not in sync. Apparently. So when I sleep a known period of time to gauge ticks per elapsed time, the process flip flops between one CPU and another."

     "But you're not sleeping in any of these tests," Stu remarked. "Does that mean one thread runs on one CPU only?"

     "Not necessarily," Wil warned. "When the process gets scheduled, either CPU can get the task. Our test doesn't have CPU affinity. So times are rough, okay?"

     "Okay, let's compare this side by side with a hash based on FNV," Dex prompted. "I'm in the mood for sushi. And it's the most expensive thing that comes to mind."

     "Let's start a new column," Wil pointed up toward the top of column right. "Just follow me. (cf »)"

A submenu for demos appears below, letting you go to the page on a topic written as a demo (as the demos page defines it).

menu

     thorn: todo, names, fd, iovec, assert, log, run, hex, crc, buf, in, out, quote, escape, compare, file, deck, cow, arc, blob, tree, slice, rand, time, stat, hash « Þ, heap, node, primes, page, book, pile, stack, atomic, lock, mutex, thread, map, meter, list, iter, ctype

     (mu: toy, peg, imm, tag, box, symbol, token, number, bigint, class, method, reader, writer, eval, env, vm, gc, world, pcode, compiler, asm, lathe, lisp, smalltalk, design, weight, jar, card, harp, debug, profile)

     Some demos are stubs: todo is a demo guide. See toy for mu updates on language pages; names introduces naming schemes.

64-bit hashes «

     (Wil re-ran these tests using 64-bit hashes to see whether FNV did better when more bits were involved. But the results were about the same. In interest of brevity, the 64-bit versions won't be shown. But you're welcome to take the same test harness and run it yourself on more data sets, you lazy slacker, you.)

fnv hash «

     "The code below clones the basic 32-bit fnv algorithm shown on Noll's FNV hash web page," Wil introduced. "I think I did this right. Let me know if you see something wrong."

u32 yfnv32_1a(const u8* p, n32 n) { « u32 hash = 2166136261UL; /*32 bit offset_basis*/ const u8* end = p + n; for (--p; ++p < end; ) { hash *= 16777619; /*32 bit FNV_prime*/ hash ^= *p; } return hash; }

     "Then we just plug the function pointer into field t_hash after clearing the map to a clean state," Wil proceeded.

yout << yendl << "# redo: t_hash=yfnv32_1a" << yendl; « ht.tclear(); ht.t_hash = yfnv32_1a; fi.iseek(0); ht.tadd(fi); ht.tcaliper_holes(); ht.tlog(yout, "fnv32-caliper"); yout << ynow;

     "The output on stdout appears below," Wil pointed. "Note the average ticks per hash is much higher, but remember we're reading one byte at a time. I know for a fact crc32() reads four bytes at once and this makes it run twice as fast as otherwise."

     "Okay, we get it: don't get hung up on times," Stu acknowledged. "Does look slow in this test, though. But like you said: under memory pressure crc32() should go slower and lose its advantage."

# redo: t_hash=yfnv32_1a <HashTest-fnv32-caliper fn=691b7c zero=0 slots=181000 x=120666 « n=119891 ticks=132784974 ticks-per=1099 high=98 dup=905> <holes n=61110 avg=1.962 sdev=4.183/> <histogram max=72 over-72=4> 0=31545 1=10770 2= 5517 3= 3300 4= 2247 5= 1609 6= 1122 7= 900 8= 681 9= 587 10= 446 11= 350 12= 290 13= 231 14= 200 15= 152 16= 167 17= 122 18= 103 19= 84 20= 95 21= 78 22= 53 23= 54 24= 37 25= 46 26= 35 27= 27 28= 30 29= 23 30= 25 31= 21 32= 25 33= 16 34= 8 35= 8 36= 8 37= 3 38= 7 39= 7 40= 5 41= 12 42= 7 43= 4 44= 3 45= 3 46= 2 47= 5 48= 8 49= 2 50= 1 51= 3 52= 4 53= 1 54= 3 55= 4 56= 2 57= 1 58= 1 59= 1 60= 0 61= 1 62= 0 63= 0 64= 1 65= 0 66= 0 67= 0 68= 0 69= 1 70= 0 71= 1 </histogram> </HashTest>

     "Then let's put the output from crc32() right below for comparison," Wil suggested.

# re-histogram <HashTest-crc32-caliper fn=0 zero=0 slots=181000 x=120666 « n=119888 ticks=32635498 ticks-per=270 high=87 dup=908> <holes n=61113 avg=1.962 sdev=4.127/> <histogram max=72 over-72=2> 0=31459 1=10771 2= 5581 3= 3362 4= 2202 5= 1620 6= 1126 7= 895 8= 695 9= 531 10= 424 11= 370 12= 293 13= 227 14= 220 15= 180 16= 167 17= 107 18= 100 19= 87 20= 94 21= 73 22= 60 23= 60 24= 55 25= 45 26= 37 27= 32 28= 22 29= 29 30= 20 31= 18 32= 18 33= 13 34= 14 35= 8 36= 8 37= 8 38= 6 39= 11 40= 9 41= 6 42= 8 43= 3 44= 2 45= 2 46= 7 47= 3 48= 2 49= 2 50= 1 51= 2 52= 1 53= 1 54= 0 55= 3 56= 1 57= 1 58= 2 59= 3 60= 0 61= 0 62= 0 63= 0 64= 1 65= 1 66= 0 67= 0 68= 0 69= 1 70= 0 71= 0 </histogram> </HashTest>

     "No, you must have made a mistake," Dex read the numbers and started moaning. "Standard deviation sdev=4.183 is higher and the longest run of used buckets high=98 is bigger. But this is close, real close. I'm not sure this is significant. Maybe if we hashed another key population. Dict words might not be representative."

     "You're right, that's close," Wil agreed. "But you were pretty sure fnv would win outright. Think you were a little hasty?"

     "The FNV guys seem to prefer a power of two number of buckets," Dex noted. "Why don't we try that instead?"

primes «

     "No. You can do it if you want," Wil begged off. "But we could try a prime number of buckets. Okay?"

     "Where do you get your primes?" Stu asked. "Do you write your own sieve of Eratosthenes?"

     "I usually go to http://www.prime-numbers.org/ and punch in a value under the range I want to study," Wil replied. "Okay, I see 180959 is a prime just under the 181000 buckets we used before. Let's just change t_buckets to that and re-run the test.

ht.t_buckets = 180959; «

yout << yendl << "# redo: t_hash=yfnv32_1a - prime" << yendl; « ht.tclear(); ht.t_hash = yfnv32_1a; fi.iseek(0); ht.tadd(fi); ht.tcaliper_holes(); ht.tlog(yout, "fnv32-prime"); yout << ynow;

yout << yendl << "# redo: t_hash=crc32 - prime" << yendl; « ht.tclear(); ht.t_hash = 0; fi.iseek(0); ht.tadd(fi); ht.tcaliper_holes(); ht.tlog(yout, "crc32-prime"); yout << ynow;

     "And here's the output on stdout," Wil announced. "It looks like they both got a little better. The standard deviations improved by about 1%, which might not be worth the bother."

# redo: t_hash=yfnv32_1a - prime « <HashTest-fnv32-prime fn=691b7c zero=0 slots=180959 x=120666 n=119891 ticks=125196876 ticks-per=1036 high=72 dup=905> <holes n=61068 avg=1.963 sdev=4.162/> <histogram max=72 over-72=1> 0=31507 1=10830 2= 5503 3= 3287 4= 2180 5= 1591 6= 1194 7= 865 8= 644 9= 569 10= 444 11= 375 12= 274 13= 268 14= 214 15= 191 16= 147 17= 110 18= 109 19= 105 20= 70 21= 72 22= 62 23= 54 24= 45 25= 37 26= 31 27= 34 28= 27 29= 28 30= 23 31= 20 32= 18 33= 14 34= 8 35= 10 36= 11 37= 11 38= 11 39= 8 40= 7 41= 9 42= 6 43= 3 44= 3 45= 1 46= 3 47= 3 48= 2 49= 2 50= 3 51= 1 52= 1 53= 0 54= 2 55= 3 56= 2 57= 0 58= 3 59= 3 60= 0 61= 1 62= 0 63= 1 64= 2 65= 1 66= 0 67= 2 68= 0 69= 0 70= 2 71= 0 </histogram> </HashTest>

# redo: t_hash=crc32 - prime « <HashTest-crc32-prime fn=0 zero=0 slots=180959 x=120666 n=119888 ticks=30961126 ticks-per=256 high=78 dup=908> <holes n=61072 avg=1.963 sdev=4.082/> <histogram max=72 over-72=1> 0=31303 1=10819 2= 5493 3= 3465 4= 2262 5= 1637 6= 1143 7= 863 8= 656 9= 541 10= 444 11= 356 12= 330 13= 245 14= 200 15= 150 16= 150 17= 134 18= 94 19= 102 20= 89 21= 78 22= 72 23= 50 24= 49 25= 46 26= 36 27= 35 28= 31 29= 21 30= 15 31= 17 32= 18 33= 16 34= 13 35= 11 36= 10 37= 13 38= 4 39= 3 40= 4 41= 6 42= 8 43= 1 44= 8 45= 2 46= 4 47= 1 48= 4 49= 3 50= 4 51= 1 52= 0 53= 2 54= 1 55= 0 56= 1 57= 1 58= 0 59= 1 60= 0 61= 0 62= 1 63= 0 64= 0 65= 0 66= 0 67= 2 68= 1 69= 0 70= 0 71= 0 </histogram> </HashTest>

     "Looks like crc still wins on standard deviation," Stu observed. "It improved more than fnv did, and fnv only managed to get a slightly lower highwater mark."

pages «

     "We need to try another population," Dex insisted. "Let's try a set of page addresses instead of dict words. Maybe fnv is better with really short keys, all similar to one another."

     "If fnv loses again," Stu warned. "I think you lose the bet."

     "I'll restore the original non-prime number of buckets," Wil narrated. "Then add a pages to hit max map capacity."

ht.t_buckets = 181000; // original buckets value « yout << yendl << "# pages: t_hash=yfnv32_1a" << yendl; ht.tclear(); ht.t_hash = yfnv32_1a; ht.taddpages((void*) 0x1234000, ht.t_capacity); ht.tcaliper_holes(); ht.tlog(yout, "fnv32-pages"); yout << ynow;

yout << yendl << "# pages: t_hash=crc32" << yendl; « ht.tclear(); ht.t_hash = 0; ht.taddpages((void*) 0x1234000, ht.t_capacity); ht.tcaliper_holes(); ht.tlog(yout, "crc32-pages"); yout << ynow;

     "Come on baby," Dex crossed his fingers, willing fnv to succeed. "Show me the magic. Resist those collisions."

# pages: t_hash=yfnv32_1a « <HashTest-fnv32-pages fn=691b7c zero=0 slots=181000 x=120666 n=120666 ticks=21299628 ticks-per=176 high=526 dup=0> <holes n=60335 avg=2.000 sdev=17.165/> <histogram max=72 over-72=294> 0=55654 1= 209 2= 243 3= 122 4= 316 5= 177 6= 278 7= 128 8= 339 9= 201 10= 281 11= 121 12= 346 13= 187 14= 275 15= 126 16= 78 17= 39 18= 33 19= 0 20= 62 21= 36 22= 25 23= 0 24= 66 25= 36 26= 24 27= 0 28= 86 29= 44 30= 40 31= 0 32= 0 33= 0 34= 0 35= 5 36= 0 37= 8 38= 32 39= 31 40= 34 41= 29 42= 32 43= 19 44= 34 45= 13 46= 2 47= 0 48= 0 49= 0 50= 0 51= 3 52= 19 53= 32 54= 24 55= 36 56= 16 57= 29 58= 16 59= 27 60= 9 61= 0 62= 0 63= 0 64= 0 65= 0 66= 0 67= 4 68= 0 69= 4 70= 2 71= 8 </histogram> </HashTest>

# pages: t_hash=crc32 « <HashTest-crc32-pages fn=0 zero=0 slots=181000 x=120666 n=120666 ticks=15609580 ticks-per=129 high=76 dup=0> <holes n=60334 avg=2.000 sdev=4.196/> <histogram max=72 over-72=1> 0=30922 1=10639 2= 5425 3= 3322 4= 2200 5= 1558 6= 1185 7= 912 8= 734 9= 537 10= 437 11= 356 12= 308 13= 235 14= 190 15= 189 16= 163 17= 127 18= 108 19= 93 20= 73 21= 67 22= 63 23= 64 24= 47 25= 41 26= 33 27= 39 28= 25 29= 24 30= 28 31= 22 32= 16 33= 21 34= 20 35= 10 36= 11 37= 12 38= 9 39= 4 40= 8 41= 3 42= 8 43= 4 44= 6 45= 1 46= 3 47= 5 48= 2 49= 3 50= 1 51= 2 52= 3 53= 3 54= 0 55= 2 56= 1 57= 4 58= 0 59= 0 60= 1 61= 0 62= 1 63= 0 64= 1 65= 1 66= 0 67= 0 68= 1 69= 0 70= 0 71= 0 </histogram> </HashTest>

     "Ahhh!" Dex let out a little scream, then stared aghast at the results. "No, no. This can't be."

     "That looks bad for fnv," Stu drawled. "The standard deviation shot up four times as big as before."

     "That's interesting," Wil agreed. "And crc32() is about the same: only slightly worse. Well, there's actually more members now, with several hundred fewer holes. I guess crc didn't get worse."

     "But look at the 0=55654 entry for fnv," Dex begged. "That's awesome! It means fnv had 25,000 more zero distance between holes compared to crc!"

     "I hate to tell you this," Wil broke gently. "But zero distance between holes means back-to-back empty buckets. That's not good. It means the holes are less dispersed from one another, compared to crc32(). Call it, Stu."

     Stu turned to Dex with a serious expression. "You lose," he told Dex. "You better have your wallet."

one-at-a-time «

     "Let's try a few other hashes first," Dex suggested. "You look hungry. Let's wait a while so you work up an appetite."

     "Jerk," Stu pronounced.

     "What do you want to try next?" Wil asked.

     "How about Jenkin's one-at-a-time hash?" Dex requested.

     "You want a side of fries with that?" Wil started setting it up.

u32 oat_hash(const u8* p, n32 len) { « unsigned h = 0; for (n32 i = 0; i < len; i++) { h += p[i]; h += ( h << 10 ); h ^= ( h >> 6 ); } h += ( h << 3 ); h ^= ( h >> 11 ); h += ( h << 15 ); return h; }

     "Alright, there's the hash," Wil said. "Let's run both dict words and page addresses through the test."

yout << yendl << "# redo: t_hash=oat_hash" << yendl; « ht.tclear(); ht.t_hash = oat_hash; fi.iseek(0); ht.tadd(fi); ht.tcaliper_holes(); ht.tlog(yout, "oat32-dict"); yout << ynow;

yout << yendl << "# pages: t_hash=oat32" << yendl; « ht.tclear(); ht.t_hash = oat_hash; ht.taddpages((void*) 0x1234000, ht.t_capacity); ht.tcaliper_holes(); ht.tlog(yout, "oat32-pages"); yout << ynow;

     "Maybe this one is better than crc," Dex suggested. "There must be a reason people say bad things about crc."

# redo: t_hash=oat_hash « <HashTest-oat32-dict fn=6919ba zero=0 slots=181000 x=120666 n=119889 ticks=134647086 ticks-per=1114 high=77 dup=907> <holes n=61112 avg=1.962 sdev=4.163/> <histogram max=72 over-72=2> 0=31436 1=10992 2= 5470 3= 3332 4= 2166 5= 1531 6= 1200 7= 899 8= 697 9= 536 10= 429 11= 340 12= 289 13= 243 14= 209 15= 172 16= 156 17= 121 18= 83 19= 125 20= 86 21= 76 22= 53 23= 58 24= 47 25= 32 26= 36 27= 36 28= 29 29= 20 30= 27 31= 28 32= 18 33= 13 34= 13 35= 10 36= 9 37= 5 38= 9 39= 7 40= 10 41= 5 42= 5 43= 8 44= 3 45= 5 46= 4 47= 4 48= 2 49= 5 50= 1 51= 4 52= 2 53= 1 54= 0 55= 2 56= 2 57= 1 58= 1 59= 0 60= 0 61= 0 62= 0 63= 1 64= 1 65= 3 66= 0 67= 0 68= 0 69= 1 70= 0 71= 0 </histogram> </HashTest>

# pages: t_hash=oat32 « <HashTest-oat32-pages fn=6919ba zero=0 slots=181000 x=120666 n=120666 ticks=24041178 ticks-per=199 high=157 dup=0> <holes n=60334 avg=2.000 sdev=4.316/> <histogram max=72 over-72=9> 0=30751 1=10725 2= 5573 3= 3316 4= 2209 5= 1603 6= 1138 7= 921 8= 703 9= 546 10= 422 11= 337 12= 287 13= 243 14= 204 15= 179 16= 152 17= 134 18= 103 19= 97 20= 88 21= 89 22= 53 23= 56 24= 44 25= 47 26= 35 27= 30 28= 24 29= 23 30= 28 31= 17 32= 14 33= 16 34= 7 35= 10 36= 12 37= 7 38= 7 39= 9 40= 3 41= 8 42= 2 43= 4 44= 6 45= 2 46= 5 47= 3 48= 1 49= 6 50= 7 51= 0 52= 2 53= 4 54= 1 55= 1 56= 0 57= 3 58= 1 59= 0 60= 1 61= 1 62= 1 63= 0 64= 1 65= 0 66= 0 67= 1 68= 0 69= 1 70= 0 71= 1 </histogram> </HashTest>

     "It still loses to crc both times," Stu summarized. "But it doesn't suck as badly as fnv on page addresses. Actually it's pretty good. And elapsed time is good — it might beat crc under cache line memory pressure. Any more hashes you want to try?"

rand32 and crc64 «

     "I want to try a 64-bit crc and someting based on a pseudo random number generator. Here are two new hash functions. The 4294967291 constant is the largest prime in 32-bits, equal to 0xfffffffb. I'm using 64-bit multiplication, which might be slow. Oh yeah: 123456791 is the first nine digit prime I saw exceeding 123456780."

u32 yrand32_vs_fnv(const u8* p, n32 n) { « u64 x = 123456791ULL; /*32 bit offset_basis*/ const u8* end = p + n; for (--p; ++p < end; ) { x ^= (*p) << 13; x = (x * 48271) % 4294967291ULL; } return x; };

u32 ycrc64_vs_fnv(const u8* p, n32 n) { « yh64 x64(p, n); u64 hash = x64.hcrc(); u32 high32 = ((hash >> 32) & 0xffffffff); return high32 ^ hash; // xor-fold };

     "I guess I'd expect crc64() to be better," Dex guess. "But I'm learning to be cautious."

     "No you're not," Stu countered. "People don't change."

     "Hey wait," Dex realized. "Add the 64-bit fnv hash too to be fair. Maybe I can still win the bet."

     Wil sighed. "Okay, fine. Here ya go."

u32 yfnv64_1a(const u8* p, n32 n) { « u64 hash = 14695981039346656037ULL; /*offset_basis*/ const u8* end = p + n; for (--p; ++p < end; ) { hash *= 1099511628211ULL; /*64 bit FNV_prime*/ hash ^= *p; } u32 high32 = ((hash >> 32) & 0xffffffff); return high32 ^ hash; // xor-fold }

     "Perfect," Dex approved. "Let 'er rip."

yout << yendl << "# redo: t_hash=yfnv64_1a" << yendl; « ht.tclear(); ht.t_hash = yfnv64_1a; fi.iseek(0); ht.tadd(fi); ht.tcaliper_holes(); ht.tlog(yout, "fnv64-dict"); yout << ynow;

yout << yendl << "# redo: t_hash=rand " << yendl; « ht.tclear(); ht.t_hash = yrand32_vs_fnv; fi.iseek(0); ht.tadd(fi); ht.tcaliper_holes(); ht.tlog(yout, "rand32-dict"); yout << ynow;

yout << yendl << "# redo: t_hash=ycrc64_vs_fnv" << yendl; « ht.tclear(); ht.t_hash = ycrc64_vs_fnv; fi.iseek(0); ht.tadd(fi); ht.tcaliper_holes(); ht.tlog(yout, "crc64-dict"); yout << ynow;

     "Wait for it," Wil teased. "Okay, here's output on stdout."

# redo: t_hash=yfnv64_1a « <HashTest-fnv64-dict fn=691c54 zero=0 slots=181000 x=120666 n=119891 ticks=123634420 ticks-per=1023 high=90 dup=905> <holes n=61110 avg=1.962 sdev=4.146/> <histogram max=72 over-72=4> 0=31486 1=10662 2= 5565 3= 3356 4= 2242 5= 1645 6= 1183 7= 857 8= 714 9= 562 10= 430 11= 389 12= 319 13= 231 14= 191 15= 157 16= 140 17= 128 18= 106 19= 96 20= 77 21= 66 22= 58 23= 48 24= 60 25= 39 26= 32 27= 31 28= 30 29= 19 30= 15 31= 23 32= 16 33= 8 34= 11 35= 10 36= 13 37= 4 38= 8 39= 5 40= 5 41= 10 42= 12 43= 2 44= 7 45= 1 46= 3 47= 5 48= 2 49= 1 50= 3 51= 5 52= 1 53= 1 54= 1 55= 1 56= 4 57= 2 58= 3 59= 0 60= 0 61= 0 62= 0 63= 0 64= 0 65= 0 66= 0 67= 0 68= 1 69= 0 70= 1 71= 2 </histogram> </HashTest>

     "Ha, ha," Stu jabbed Dex. "Even the 64-bit fnv is about the same. Sorry, Dex, looks like dinner is on you."

# redo: t_hash=rand « <HashTest-rand32-dict fn=691a6e zero=0 slots=181000 x=120666 n=119889 ticks=412633046 ticks-per=3415 high=73 dup=907> <holes n=61112 avg=1.962 sdev=4.107/> <histogram max=72 over-72=1> 0=31264 1=10817 2= 5601 3= 3429 4= 2254 5= 1585 6= 1140 7= 924 8= 706 9= 591 10= 426 11= 368 12= 261 13= 274 14= 211 15= 156 16= 135 17= 134 18= 101 19= 103 20= 77 21= 54 22= 67 23= 49 24= 36 25= 35 26= 31 27= 31 28= 22 29= 20 30= 21 31= 29 32= 14 33= 20 34= 11 35= 10 36= 15 37= 7 38= 9 39= 8 40= 5 41= 7 42= 4 43= 5 44= 6 45= 2 46= 2 47= 3 48= 2 49= 2 50= 4 51= 2 52= 5 53= 1 54= 2 55= 1 56= 0 57= 0 58= 1 59= 3 60= 0 61= 0 62= 1 63= 2 64= 0 65= 3 66= 0 67= 0 68= 0 69= 1 70= 0 71= 0 </histogram> </HashTest>

     "Hmm," Wil considered. "This is slow but looks slightly better than anything else so far. Maybe it's worth optimizing the 64-bit code instead of taking whatever gcc emits."

     "Did you make that one up just now?" Stu asked. "How did you do that? Maybe I should give it a try. With a prime number of buckets it would be even better. Think you might promote this?"

     "No, hashing isn't my shtick," Wil replied. "The only part of this test I like is settling stupid arguments once and for all."

# redo: t_hash=ycrc64_vs_fnv « <HashTest-crc64-dict fn=691b30 zero=0 slots=181000 x=120666 n=119890 ticks=121262960 ticks-per=1003 high=87 dup=906> <holes n=61111 avg=1.962 sdev=4.161/> <histogram max=72 over-72=4> 0=31489 1=10793 2= 5524 3= 3299 4= 2208 5= 1627 6= 1234 7= 869 8= 668 9= 549 10= 430 11= 367 12= 302 13= 234 14= 205 15= 169 16= 131 17= 132 18= 127 19= 88 20= 74 21= 86 22= 51 23= 55 24= 45 25= 38 26= 42 27= 27 28= 27 29= 29 30= 21 31= 14 32= 15 33= 9 34= 20 35= 11 36= 7 37= 5 38= 7 39= 13 40= 9 41= 3 42= 7 43= 4 44= 4 45= 7 46= 3 47= 4 48= 2 49= 1 50= 1 51= 1 52= 2 53= 2 54= 1 55= 2 56= 1 57= 0 58= 1 59= 0 60= 1 61= 2 62= 0 63= 0 64= 2 65= 2 66= 0 67= 1 68= 0 69= 0 70= 2 71= 0 </histogram> </HashTest>

     "Oops," Wil blurted. "That actually looks inferior to crc32(). That's interesting. I could ... nah, I have other demos to write."

identity hash «

     "Are we done?" Stu begged.

     "Just one more," Wil urged. "I want to see what happens to page addresses when I use the identity function as a hash. I've been worried about collisions. Here's my chance to see any problem."

u32 self_vs_fnv(const u8* p, n32 n) { « return *(u32*) p; };

     "Is that legal?" Dex squirmed. "You're assuming the length is at least 32-bits and that it's okay to cast any address passed to type u32*. There's just so many things wrong with that."

     "I only expect this to work for the test case I run below," Wil assured. "Don't get bent out of shape. First I'll use a prime for bucket count again, following my recommendation in the primes demo."

ht.t_buckets = 180959; «

yout << yendl << "# pages: PRIME=self_vs_fnv" << yendl; « ht.tclear(); ht.t_hash = self_vs_fnv; ht.taddpages((void*) 0x1234000, ht.t_capacity); ht.tcaliper_holes(); ht.tlog(yout, "self-pages"); yout << ynow;

     "Hey, check the output below," Wil enthused. "That's even more interesting than I'd hoped. The highwater mark is just four."

# pages: PRIME=self_vs_fnv <HashTest-self-pages fn=691a5a zero=0 slots=180959 x=120666 « n=120666 ticks=10745476 ticks-per=89 high=4 dup=0> <holes n=60294 avg=2.001 sdev=1.319/> <histogram max=72 over-72=0> 0= 1 1=34934 2= 7852 3= 0 4=17506 5= 0 6= 0 7= 0 8= 0 9= 0 10= 0 11= 0 12= 0 13= 0 14= 0 15= 0 16= 0 17= 0 18= 0 19= 0 20= 0 21= 0 22= 0 23= 0 24= 0 25= 0 26= 0 27= 0 28= 0 29= 0 30= 0 31= 0 32= 0 33= 0 34= 0 35= 0 36= 0 37= 0 38= 0 39= 0 40= 0 41= 0 42= 0 43= 0 44= 0 45= 0 46= 0 47= 0 48= 0 49= 0 50= 0 51= 0 52= 0 53= 0 54= 0 55= 0 56= 0 57= 0 58= 0 59= 0 60= 0 61= 0 62= 0 63= 0 64= 0 65= 0 66= 0 67= 0 68= 0 69= 0 70= 0 71= 0 </histogram> </HashTest>

     "That's the lowest standard deviation so far by a big margin," Stu exclaimed. "What happened?"

     "Maybe the page addresses all hashed perfectly," Wil guessed. "I should have logged the histogram before calling tcaliper_holes(), to see the original histogram."

     "What does the original histogram show again?" Stu asked. "Buckets after the first seen while trying to add a new map entry?"

     "Yeah, exactly," Wil confirmed. "Let's run it again, but this time we'll show the histogram without regenerating it to track distance between empty buckets."

yout << yendl << "# pages: PRIME=self_vs_fnv rewind" << yendl; « ht.tclear(); ht.t_hash = self_vs_fnv; ht.taddpages((void*) 0x4234000, ht.t_capacity); ht.tlog(yout, "self-rewind");

     "Check it out," Wil waved to Dex. "What do you think?"

# pages: PRIME=self_vs_fnv rewind « <HashTest-self-rewind fn=691a5a zero=0 slots=180959 x=120666 n=120666 ticks=10745476 ticks-per=89 high=0 dup=0> <holes n=60294 avg=2.001 sdev=1.319/> <histogram max=72 over-72=0> 0=120666 1= 0 2= 0 3= 0 4= 0 5= 0 6= 0 7= 0 8= 0 9= 0 10= 0 11= 0 12= 0 13= 0 14= 0 15= 0 16= 0 17= 0 18= 0 19= 0 20= 0 21= 0 22= 0 23= 0 24= 0 25= 0 26= 0 27= 0 28= 0 29= 0 30= 0 31= 0 32= 0 33= 0 34= 0 35= 0 36= 0 37= 0 38= 0 39= 0 40= 0 41= 0 42= 0 43= 0 44= 0 45= 0 46= 0 47= 0 48= 0 49= 0 50= 0 51= 0 52= 0 53= 0 54= 0 55= 0 56= 0 57= 0 58= 0 59= 0 60= 0 61= 0 62= 0 63= 0 64= 0 65= 0 66= 0 67= 0 68= 0 69= 0 70= 0 71= 0 </histogram> </HashTest>

     "Holy cow," Dex grinned appreciatively. "Perfect score: no collisions at all when adding each page address."

     "Now I see why you weren't very concerned about hashing addresses of pages or books," Stu gestured.

     "I was kinda hoping that would happen," Wil smiled gently.

     "Who's up for sushi!" Stu clapped his hands loudly. "I can eat a record number of dishes. How 'bout you Wil?"

     "Wearing an idiot shirt would've been cheaper," Dex mulled.

     "You're not really an idiot — I just want you to stop making so many assumptions," Wil reminded. "Especially those suggested by random web pages you read. It's like you never learned a lesson about not believing everything you read. Try to verify something once in a while."

     "I call dibs on winning next time," Dex chirped.

     "You can't call dibs on that," Stu grimaced.

license

     All this code is available only under the BriarPig mu-babel license described fully on the rights page. You do not have permission to reprint this page in any way. Neither feeds nor repackaging is allowed. You can link this page if you want folks to read it.