|
Early drafts of
þ docs appear first on this page before moving to
better final resting places.
scalars
Wil uses the term scalar to loosely mean native primitive integer and floating point types in C and C++, as well as any other native type resembling integers, like pointers of all stripes including function pointers. Anything where bitwidth is a concern. At the top of mu.h (or another header included from mu.h), the following scalar types are defined in the mu namespace: namespace mu {
typedef uint8_t c8; // 8 bit unsigned codepoint «
typedef uint16_t c16; // 16 bit unsigned codepoint «
typedef uint32_t c32; // 32 bit unsigned codepoint «
typedef uint32_t x32; // 32 bit unsigned maximum «
typedef uint32_t h32; // 32 bit hash (eg crc32) «
typedef uint64_t h64; // 64 bit hash (eg crc64) «
typedef int8_t i8; // 8 bit signed int «
typedef int16_t i16; // 16 bit signed int «
typedef int32_t i32; // 32 bit signed int «
typedef int64_t i64; // 64 bit signed int «
typedef uint32_t n32; // 32 bit unsigned length «
typedef uint32_t p32; // 32 bit unsigned position «
typedef float r32; // 32 bit real (floating point) «
typedef double r64; // 64 bit real (floating point) «
typedef long double r128; // 96 or 128 bit floating point «
typedef uint8_t u8; // 8 bit unsigned int «
typedef uint16_t u16; // 16 bit unsigned int «
typedef uint32_t u32; // 32 bit unsigned int «
typedef uint64_t u64; // 64 bit unsigned int «
typedef int32_t zn32; // 32 bit signed (slice) length «
typedef int32_t zp32; // 32 bit signed (slice) position «
typedef uint32_t ys32; // 32 bit: 1st 4 bytes of a string «
#define YS32(cstr) (*(ys32*) (#cstr)) «
// fn32 assumes 32-bit function pointers (review later)
typedef void (*yfn32_m)(); // pointer to function -> void «
}; // namespace mu
The most remarkable thing about these names — in the light of other þ sources and the license — is the lack of leading y in most of these names to show membership in the þ library. (One exception is the YS32(cstr) macro, which you can see mentioned on the tricks page under string-alignment) There are several reasons for this, all with casual motivation. The most casual reason is these types are not very interesting or important, and need little special care. Wil is used to almost always having to change the names of these types on all projects, and the issue just isn't interesting anymore. Wil thinks a y prefix is overkill here; it's useful to see a difference between native primitive types and other global symbols. And really primitive types should have really short names. The presence of a trailing integer in names denoting width in bits is sufficient to prevent name collisions with other þ types. The most disturbing type here is yfn32_m, which assumes a code address is 32-bits in size. Obviously this is not a good assumption. However, it makes it easy to review all places in code where an assumption was made that function pointers are only 32-bits, wherever yfn32_m is used to store the value.
reals
There's a difference between "real" and "floating point" and some folks are bound to point this out. But Wil doesn't need an actual real type, so he feels comfortable using real as a short synonym meaning floating point, especially since Wil wants to often use the letter r in some names to abbreviate real. (Note r also means other things like reverse discussed elsewhere.) However, Wil expects a few conversations with Dex to go as follows, making him thankful Ira shoos him away: "But," objected Dex, working up a lather, "you can't do that! The term real has a specific mathematical meaning and code systems must use APIs that respect that. Programmers have been getting away with lazy approximation too long. And I for one..." "Shut the hell up," interrupted Ira. A struggle ensued, with Ira getting a headlock on Dex, who threatened to tell Mom.
C strings «
Type const char* in þ interfaces almost always means "address of null-terminated C string." If you pass a value to such methods, you'd better make sure a terminating null byte is there. It's the contract and þ is sticking to it. You can always pass a literal C string constant to such a method, assuming you avoid wide characters; þ doesn't understand (read: doesn't give a damn about) wide characters, so if you play with them in þ, expect to get burned. Don't do that. Here's the yv constructor taking a null-terminated C string: struct yv { // vector of bytes (like iovec)
u8* v_p; // byte pointer
n32 v_n; // length in bytes
// ...
explicit yv(const char* cstr)
: v_p((u8*) cstr), v_n((cstr)? ::strlen(cstr) : 0) { }
}; // yv
This yv constructor shows how const char* gets treated in most interfaces: it's implicitly understood to be a C string, and one with indefinitely long lifespan at that, because the passed pointer might be kept as long as the receiver (yv in this case) remains alive. (Allocation interfaces to copy strings appear elsewhere.) |
menu
thorn: todo, names, fd, iovec, assert, log, run, hex, crc, buf, in, out, quote, escape, compare, file, deck, cow, arc, blob, tree, slice, rand, time, stat, hash, heap, node, primes, page, book, pile, stack, atomic, lock, mutex, thread, map, meter, list, iter, ctype (mu: toy, peg, imm, tag, box, symbol, token, number, bigint, class, method, reader, writer, eval, env, vm, gc, world, pcode, compiler, asm, lathe, lisp, smalltalk, design, weight, jar, card, harp, debug, profile) Some demos are stubs: todo is a demo guide. See toy for mu updates on language pages; names introduces naming schemes. General type theory will not be discussed on this page — only basic scalar types and a few compound types, plus rules for naming classes and structs related to one another.
types
Some libraries root their type hierarchy in some header file named types.h, but everything in þ is just lumped together in the notional mu.h header for all thorn APIs. Typedefs for integers (see column left) provide a lot of synonyms for the same types — with the same sign and bit width — because sometimes a clue to purpose is helpful when reading code. Excepting hashes which don't count anything, all those unsigned integers count something; the leading letter helps suggest the semantic meaning of what's counted. Types starting with u for unsigned are non-specific.
sign
Almost all the integer types are unsigned — it's the default assumption. The exceptions are types starting with i and z (for int and slice, where z means "slice" only because s means string). Note yip32 is in the i camp, because ptrdiff_t is signed to represent differences in pointers (thus the name). Signed zn32 and zp32 are direct analogs to unsigned n32 and p32 used to denote length and position (offset). These signed versions of length and position are used in APIs expressing sequence subsets using offset and length notation, where negative values mean relative to the entire sequence length (about the same way this is done in Python). þ even has a slice type named yz whose state members look like this (showing a primary use of zn32 and zp32): struct yz { // slice (starting pos and len in a sequence)
zp32 z_p; // signed pos (neg: relative to eof)
zn32 z_n; // signed len (neg: relative to eof)
// ...
}; // yz
This usage is really what those two signed 32-bit integer types are for: to accomodate end-of-sequence relative expressions. Subclasses of yz end in z and generally add a reference to the sequence sliced (to delay evaluation until use in lhs or rhs expressions occur).
pointer size «
Type yip32 is defined to be the same size as ptrdiff_t, which is wrong on platforms where ptrdiff_t is actually larger. (The name means "integer big enough to hold a pointer, which I assume is 32 bits.") This is similar to the way yfn32_m (see column left) is also wrong when function pointers are bigger than 32-bits. In both cases, when function pointers are 64-bit, and ptrdiff_t is also 64-bit, you need to find all appearances of yip32 and yfn32_m in the code and replace them with yip64 and yfn64_m. The idea here is to review consequences of doing this when sizes change, so you see effects of structures getting larger. (It might matter.) Of course, in your clone of þ where you get to do whatever you want, feel free to use some other way of handling this. Or in your local copy of þ, you know where to find the assumption and change it: that's what a typedef in mu.h is for. (Just don't publish your changed version outside your clone.) |