|
demos are
explained here;
a menu at top column right indexes actual topic demos.
Here we demo primes.
problem
While prime numbers are seldom important in many tasks Wil undertakes, primes are useful in sizing hashmaps using keys that might otherwise collide in buckets too often because map keys have patterns. Wil takes particular care to use prime numbers to size hashmaps whose keys are pointer addresses with strong alignments like pages. It helps prevent collisions. Sample code below trivially shows the value of relatively prime hashmap sizes and hashes for table keys. You might infer the value of prime hashmap sizes from the result shown — seeing can help you believe. A hashmap size of 127 is chosen because 127 is prime. Then Wil writes code to add 127 integers with increasing values separated by the same step increment. This is meant to simulate an effect of adding keys all sharing a numerical relationship, like all being multiples of a common factor. When hashmap size is relatively prime to step size, an interesting pattern emerges: there are no collisions. The result is independent of step size; each hashmap bucket has a single hit. int vec[ 127 ];
int hit[ 127 ];
unsigned i = 0;
for ( ; i < 127; ++i) { // zero all
vec[i] = hit[i] = 0;
}
The loop above zeroes both arrays. Array hit[i] counts the number of times a value is written into vec[i] below. unsigned max = 0;
unsigned j = 4;
unsigned step = 4;
for (i = 0; i < 127; j += step, ++i) { // 127 values of j
unsigned n = j % 127;
if (max < ++hit[n]) // new max hit in one slot?
max = hit[n];
if (!vec[n]) // first added to this slot?
vec[n] = j;
}
This new loop adds 127 integers — increasing by step each time — to array vec[n] where index n is just j mod the prime hashmap size. We count the number of times a value is stored at the same index n by bumping hit[n], and the highest hit count seen is stored in max. The following loop prints all: fprintf(stderr, "\nmax=%u\n", max);
for (j = 0, i = 0; i < 127; ++i) {
if (hit[i] > 1) // special format for hits over one?
fprintf(stderr, "{{%u#%u:%u}} ", i, hit[i], vec[i]);
else
fprintf(stderr, "%u:%u ", i, vec[i]); // one hit only
if (++j >= 8) { // eight appear on this line now?
fputc('\n', stderr);
j = 0;
}
}
Output from this code appears below. The value of max is one: no collisions were observed. The loop prints any slot with more than one hit in a different format, but that case doesn't appear in the output. Each key lands in a unique slot. max=1
0:508 1:128 2:256 3:384 4:4 5:132 6:260 7:388
8:8 9:136 10:264 11:392 12:12 13:140 14:268 15:396
16:16 17:144 18:272 19:400 20:20 21:148 22:276 23:404
24:24 25:152 26:280 27:408 28:28 29:156 30:284 31:412
32:32 33:160 34:288 35:416 36:36 37:164 38:292 39:420
40:40 41:168 42:296 43:424 44:44 45:172 46:300 47:428
48:48 49:176 50:304 51:432 52:52 53:180 54:308 55:436
56:56 57:184 58:312 59:440 60:60 61:188 62:316 63:444
64:64 65:192 66:320 67:448 68:68 69:196 70:324 71:452
72:72 73:200 74:328 75:456 76:76 77:204 78:332 79:460
80:80 81:208 82:336 83:464 84:84 85:212 86:340 87:468
88:88 89:216 90:344 91:472 92:92 93:220 94:348 95:476
96:96 97:224 98:352 99:480 100:100 101:228 102:356 103:484
104:104 105:232 106:360 107:488 108:108 109:236 110:364 111:492
112:112 113:240 114:368 115:496 116:116 117:244 118:372 119:500
120:120 121:248 122:376 123:504 124:124 125:252 126:380
Of course, a random increase between keys added would show collisions. This demonstration only intends to show value in keeping hashmap size relatively prime to some common factor in keys added. A prime hashmap size will always be relatively prime when the common factor is not the hashmap size itself. |
A submenu for demos appears below, letting you
go to the page on a topic written as a demo (as the
demos page defines it).
menu
thorn: todo, names, fd, iovec, assert, log, run, hex, crc, buf, in, out, quote, escape, compare, file, deck, cow, arc, blob, tree, slice, rand, time, stat, hash, heap, node, primes « Þ, page, book, pile, stack, atomic, lock, mutex, thread, map, meter, list, iter, ctype (mu: toy, peg, imm, tag, box, symbol, token, number, bigint, class, method, reader, writer, eval, env, vm, gc, world, pcode, compiler, asm, lathe, lisp, smalltalk, design, weight, jar, card, harp, debug, profile) Some demos are stubs: todo is a demo guide. See toy for mu updates on language pages; names introduces naming schemes.
powers of two
Wil usually collects lists of primes in sizes convenient for progressively larger space allocation schemes. In particular, Wil almost always wants a list of the largest primes less than powers of two, within ranges of plausible hashmap sizes. enum yEPrimes { // sample of possibly useful prime numbers «
e1i_127 = 127, // largest prime < 128 «
e1i_251 = 251, // largest prime < 256 «
e1i_509 = 509, // largest prime < 512 «
e1i_1021_1k = 1021, // largest prime < 1024 (1K) «
e1i_2039_2k = 2039, // largest prime < 2048 (2K) «
e1i_4093_4k = 4093, // largest prime < 4096 (4K) «
e1i_8191_8k = 8191, // largest prime < 8192 (8K) «
e1i_16381_16k = 16381, // largest prime < 16384 (16K) «
e1i_32749_32k = 32749, // largest prime < 32768 (32K) «
e1i_65521_64k = 65521, // largest prime < 65536 (64K) «
e1i_131071_128k = 131071, // largest prime < 131072 (128K) «
e1i_262139_256k = 262139, // largest prime < 262144 (256K) «
e1i_524287_512k = 524287, // largest prime < 524288 (512K) «
e1i_1048573_1024k = 1048573, // prime < 1048576 (1024K) «
};
"Do you always use a prime for hashmap size?" Stu asked. "No," Wil replied. "Actually, I don't use primes often at all. Usually I have something more important to worry about, like choosing a hash function with good collision resistance." "Well, when would you use one of the primes above?" Stu asked. "Is there any pattern? A minimal hash function?" "Sometimes I carefully manage heap space to avoid fragmentation," Wil explained. "And I use a variety of block sizes aligned to powers of two. For example, the future page and book demos are about page and super page allocation." "Do you put hashmaps inside those?" Stu guessed. "Yes, exactly," Wil confirmed. "When I put a hashmap in a block containing a power of two bytes, I'm afraid of address alignment related patterns in bucket collisions." "For example, a 4K page contains 1024 32-bit pointers," Stu explored. "If each is a hashmap bucket, you prefer to use only 1021 of them — 1021 is the largest prime under 1024?" "Bingo. I couldn't say it better myself," Wil nodded. "It sounds weirdly static," Stu complained. "Like you don't plan to dynamically resize hashmaps." "In some cases that's right," Wil agreed. "In some first drafts of programming language elements, some hashmaps don't resize. But often they're already much bigger than they need to be." "But if each block is uniform in size like pages and books," Stu considered, "then you'd have no fragmentation?" "Yes, that's basically the reason," Wil confirmed. "Especially when I allocate temporary space for operations, there's no reason to allocate less than a 4K page. It will just go back in the free pool after, with zero fragmentation. Same for books on a bigger scale." Stu squinted. "It sounds like you're simplifying somewhere. Do you really get away with just two sizes: large and small?" Wil was nodding. "Yes, having sizes in only two granularities is a simplfication. In real life things get more complex and involve more sizes. But such tuning can always wait." "Do you use hashmaps for the page and book allocator, too?" Stu guessed. "Ooh, I just had an idea. Do you use pages and books for space managing allocation of pages and books?" "Yeeees," Wil drawled. "You can see how it poses a self referential problem unless care is taken. It's one of the reasons for statically sized hashmaps using primes from the list above. I don't want feedback in map resizing when objects represent their own free lists." "Thinking about it gives me a headache already," Stu winced. "The code might give you a headache too," Wil warned. |
demos « Þ
+ todo + names + fd + iovec + assert + log + run + hex + crc + buf + in + out + quote + escape + compare + file + deck + cow + arc + blob + tree + slice + rand + time + stat + hash + heap + node + primes « Þ + page + book + pile + stack + atomic + lock + mutex + thread + map + meter + list + iter + ctype |