Þ   briarpig  » thorn  » demos  » ctype


demos are explained here; a menu at top column right indexes actual topic demos. Here we demo ctype.

problem

     When categorizing octets using character predicates resembling ctype.h in C's standard library, Wil likes to generalize apis with maps, including maps replacing ctype.h with dynamic usage (but still based on ctype.h) using yutm shown first below.

     Several þ classes use yutm and yum in their apis to include or exclude filtered octets by rule. Both map an octet key to an associated slot in an array of 256 values containing information about the octet. But the yutm map array is a singleton shared by all yutm instances; the yum map array is per instance and 256 bytes in size. A yum map is easily made from a yutm predicate.

     The value associated with an octet key in yum is just another octet. When used as a predicate like yutm, that value in yum is tested for zero or nonzero as a boolean. But yum can also be used to map octets to octets; for example, a map of opening delimitors in a language might contain closing delimitors as values.

     In contrast, the value associated with an octet key in yutm is a set of bits categorizing an octet key with ctype.h predicates. The first section immediately below defines predicate bitflags; later we show how a shared singleton map for yutm is constructed at static init time to map predicates in ctype.h by bitflag.

ctype

     Class name yutm means thorn unsigned (octet) type map where the u really means u8 because 8-bit is default when unspecified in þ. This class is just ctype.h repackaged in a new api letting this reified version of ctype.h be used like a value. (For example, you can make a yum map from a yutm instance describing isalpha which can then be edited.) The yutm class api depends on the following enum which names ctype.h predicates with bit values.

enum yE_ctype { // bitflags for ctype.h predicates « ye_isupper = (1<<0), /* UPPERCASE. */ ye_islower = (1<<1), /* lowercase. */ ye_isalpha = (1<<2), /* Alphabetic. */ ye_isdigit = (1<<3), /* Numeric. */ ye_isxdigit = (1<<4), /* Hexadecimal numeric. */ ye_isspace = (1<<5), /* Whitespace. */ ye_isprint = (1<<6), /* Printing. */ ye_isgraph = (1<<7), /* Graphical. */ ye_isblank = (1<<8), /* Blank (usually SPC and TAB). */ ye_iscntrl = (1<<9), /* Control character. */ ye_ispunct = (1<<10), /* Punctuation. */ ye_isalnum = (1<<11), /* Alphanumeric. */ };

     This enum uses 12 of 16 bits in each u16 of the map shown below. When Wil has more predicates to test, he sometimes adds them to this map. But often using yum instead is just as easy, so trying to use all available bits in yutm has low priority.

     "Why did you define a bit for isalpha?" asked Stu. "Isn't that just the union of isupper and islower?"

     "Yes," replied Wil. "And representing isalpha in terms of other bits is traditional in ctype.h implementations."

     "But what?" prompted Stu. "You're not implementing ctype.h yourself? Spell it out for me."

     "Right," confirmed Wil. "Class yutm is just a map of whatever ctype.h says — anything it says — captured once only at static init time. So whatever C library's ctype.h returns for isalpha(), that's what yutm returns for ye_isalpha."

     "A subtle distinction," noted Stu. "Why should I care?"

     "This way yutm is a replacement for ctype.h," Wil explained, "because it actually is ctype.h in a different form factor. If you substitute yutm in your code, nothing should change."

     "Ah," Stu looked satisfied. "A no-change argument."

     "Yes," nodded Wil. "It comes from a refactoring mindset, where the game is avoiding change despite reorganization."

     "So if ctype.h has a bug," Stu considered, "you preserve it perfectly? Is that good?"

     "Yes, because unintentional change is worse," Wil explained. "The problem is doubt. Any time you make a change that might alter behavior, other folks wonder if you interfered with their code and they want you to settle their questions."

     "Time out the window?" Stu asked.

     "Yep," Wil chirped. "Okay, let's look at the class api."

yutm

     State members in yutm consist of a singleton map shared by every instance, and one u32 of predicate bitflags:

class yutm { // u8 type map predicates keyed by bitmask code « private: static u16* c_map; // [256]; // map of yE_ctype for every u8 // inits yutm::c_map above: bitmap image of ctype predicates class Minit { public: u16* s_map; Minit(); }; // used once static Minit s_init; // constructed at static init time only public: u32 m_bits; yutm(u32 bits) : m_bits(bits) { } « // continued...

     Each yutm instance is only sizeof(u32) bytes in size because m_bits is just a logical OR of one or more yE_ctype bitflags (see last section). Private nested class Minit is used one time only to initialize singleton s_init, whose s_map member inside becomes the shared c_map containing yE_ctype bitflags for each possible octet value. The logical AND of per instance m_bits and shared c_map narrows the predicate to include only m_bits when any input byte is used to lookup a map entry:

bool operator[](yE_ctype e) { return (m_bits & e) != 0; } « bool operator[](u8 c) const { return (c_map[c]&m_bits)!=0; } bool mtype(u8 c) const { return (c_map[c]&m_bits)!=0; } «

     To exercise operator[] more easily in sample code below, Wil wrote the following new yv::vpick() resembling yv::vspn() in the run demo (cf «). A variant for yum instead of yutm looks exactly the same except for input param type.

u32 yv::vpick(yo& o, yutm const& accept) const { // matches « u8* p = v_p; // first byte u8* end = p+v_n; // one beyond last byte n32 n = 0; for (/*prep preincr*/ --p; ++p < end; ) { // another octet? if (accept[*p]) { // another match? o.oc(*p); // copy this matching byte to out stream ++n; // count } } return n; }

     Then Wil used this new vpick() method to select octets matching sample yutm predicates from a C string converted into a yv run, containing a selection of ascii characters:

const char* s = " aByZ,:69cd12$!"; « yv sample(s); // yv::yv(const char* s) yutm mlower(ye_islower); // selects islower() yutm mupper(ye_isupper); // selects isupper() yutm mdigit(ye_isdigit); // selects isdigit() yutm mpunct(ye_ispunct); // selects ispunct() yutm mdigitpunct(ye_isdigit|ye_ispunct); // both yutm mblank(ye_isblank); // selects isblank() yutm malnum(ye_isalnum); // selects isalnum() yutm mxdigit(ye_isxdigit); // selects isxdigit() yout << "# sample lower" << yendl << "|"; sample.vpick(yout, mlower); yout << "|" << yendl; yout << "# sample upper" << yendl << "|"; sample.vpick(yout, mupper); yout << "|" << yendl; yout << "# sample digit" << yendl << "|"; sample.vpick(yout, mdigit); yout << "|" << yendl; yout << "# sample punct" << yendl << "|"; sample.vpick(yout, mpunct); yout << "|" << yendl; yout << "# sample digit and punct" << yendl << "|"; sample.vpick(yout, mdigitpunct); yout << "|" << yendl; yout << "# sample blank" << yendl << "|"; sample.vpick(yout, mblank); yout << "|" << yendl; yout << "# sample alnum" << yendl << "|"; sample.vpick(yout, malnum); yout << "|" << yendl; yout << "# sample xdigit" << yendl << "|"; sample.vpick(yout, mxdigit); yout << "|" << yendl; yout << ynow; // flush to stdout

     And the output of this appears on stdout as follows:

# sample lower |aycd| # sample upper |BZ| # sample digit |6912| # sample punct |,:$!| # sample digit and punct |,:6912$!| # sample blank | | # sample alnum |aByZ69cd12| # sample xdigit |aB69cd12|

     Note how ye_ispunct and ye_isdigit together create a predicate true for either one or the other, illustrating how you can efficiently combine multiple predicates at need.

     The next part of the class api shows how to alter a predicate after construction, by adding or subtracting a single bitflag, or all bitflags in another instance of yutm:

void operator+=(yE_ctype e) { m_bits |= e; } « void operator+=(yutm const& x) { m_bits |= x.m_bits; } void operator-=(yE_ctype e) { m_bits &= ~((u32)e); } « void operator-=(yutm const& cb) { m_bits &= ~cb.m_bits; }

     Since those operators should be obvious, no usage sample is given. Instead let's wrap up the rest of the class api, so we can go on to initialization of the shared singleton map.

struct Mq { yutm const& q_m; Mq(yutm const& m): q_m(m) { } }; Mq quote() const { return Mq(*this); } // to request dump « void mprint() const; // mdump() to stdout for use under gdb void mdump(yo& o) const; void mcite(yo& o) const; void mbits(yo& o) const; // up:lo:al:di:xd:sp:pr:gr:bl:cn:pu:an: }; // class yutm inline yo& operator<<(yo& o, yutm::Mq const& x) { x.q_m.mdump(o); return o; } inline yo& operator<<(yo& o, yct<yutm> const& x) { x.c_t.mcite(o); return o; }

     This boilerplate for debug printing looks like that for any other þ class. Try the iovec demo's section on debug printing (cf «) or the quote demo for a guide to use of the quote() inline method convention. Let's wrap up this column with initialization.

singleton

     At static init time (before main() is called) the following constructor is called to initialize the singleton predicate map used by all yutm instances:

static u16 yutm_map[ 256 ]; // actual map singleton « u16* yutm::c_map = yutm_map; // value shared everywhere « /*static*/ yutm::Minit yutm::s_init; // static init time yutm::Minit::Minit() : s_map(yutm_map) { // called once « for (int c = 0; c < 256; c ++) { // for every u8 octet u16 flags = 0; if (::isupper(c)) flags |= ye_isupper; if (::islower(c)) flags |= ye_islower; if (::isalpha(c)) flags |= ye_isalpha; if (::isdigit(c)) flags |= ye_isdigit; if (::isxdigit(c)) flags |= ye_isxdigit; if (::isspace(c)) flags |= ye_isspace; if (::isprint(c)) flags |= ye_isprint; if (::isgraph(c)) flags |= ye_isgraph; if (::isblank(c)) flags |= ye_isblank; if (::iscntrl(c)) flags |= ye_iscntrl; if (::ispunct(c)) flags |= ye_ispunct; if (::isalnum(c)) flags |= ye_isalnum; s_map[c] = flags; } }

     Essentially all this does is sample the value of each ctype.h predicate with a bitflag assigned, for each of 256 possible octet values, so yutm_map contains all the same information but in a form easier to use efficiently with yutm's bitflags.

     The only thing left is debug print code showing the state of per instance m_bits bitflags and the resulting filtered view of the shared singleton predicate map. After that comes the class api for yum starting top of column right, which explicitly maps octets instead of mapping predicate bitflags.

yutm print

     Below is actual code to debug print state of yutm, whose class api appears above. However, before the print code let's show an example of output since format of results is the purpose of the code. The code aims to list octet members clearly:

yutm hilo(ye_islower | ye_isupper | ye_iscntrl | ye_isblank); yout << hilo.quote() << yendl; // show map with four bits « yout << ynow; // flush to stdout

     Executing that prints the following on stdout:

<yutm me=0xbffffad0 n=86 u16=0303 bits=up:lo:bl:cn:> 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z 7f </yutm>

     This illustrates four different aims of the print code:

  • attribute bits=up:lo:bl:cn: reveals m_bits bitflags
  • every non isgraph() octet is shown in two-glyph hex
  • every isgraph() octet is shown one-glyph as itself
  • line length is constrained

     If you choose octets to print as themselves with isprint() instead of isgraph(), blank (0x20) only appears as a gap. Here's the source code:

void yutm::mprint() const { // dump to stdout « yout << yendl; this->mdump(yout); yout << yendl << ynow; } void yutm::mcite(yo& o) const { // one line only « int n = 0; for (int i = 0; i < 256; i++ ) { if ((*this)[i]) ++n; } o.of("<yutm me=%#lx n=%d u16=%04x bits=", (long) this, (int) n, (int) m_bits); this->mbits(o); o.o2c('/', '>'); } void yutm::mdump(yo& o) const { « int line = 0; int n = 0; for (int i = 0; i < 256; i++ ) { if ((*this)[i]) // operator[] counts octet members ++n; } o.oft("<yutm me=%#lx n=%d u16=%04x bits=", (long) this, (int) n, (int) m_bits); this->mbits(o); // show bitflags as abbreviations o.ocn('>'); for (int c = 0; c < 256; c++ ) { if ((*this)[c]) { // operator[] if (line > 44) { // line too long? o.on(); // newline and tab to indent line = 0; } if (isgraph(c)) { // show as self instead of hex? o.o2c(c, ' '); line += 2; } else { o.of("%02x ", (int) c); line += 3; } } } o.ouend("yutm"); }

     And the mbits() method abbreviates bitflag names, in a format Wil thinks is "good enough" despite its brevity:

void yutm::mbits(yo& o) const { « u32 bits = m_bits; // up:lo:al:di:xd:sp:pr:gr:bl:cn:pu:an: if (bits & ye_isupper) o << "up:"; if (bits & ye_islower) o << "lo:"; if (bits & ye_isalpha) o << "al:"; if (bits & ye_isdigit) o << "di:"; if (bits & ye_isxdigit) o << "xd:"; if (bits & ye_isspace) o << "sp:"; if (bits & ye_isprint) o << "pr:"; if (bits & ye_isgraph) o << "gr:"; if (bits & ye_isblank) o << "bl:"; if (bits & ye_iscntrl) o << "cn:"; if (bits & ye_ispunct) o << "pu:"; if (bits & ye_isalnum) o << "an:"; }

     The rest of this page describes similar class yum, so further yutm material only appears in interaction with yum.

A submenu for demos appears below, letting you go to the page on a topic written as a demo (as the demos page defines it).

menu

     thorn: todo, names, fd, iovec, assert, log, run, hex, crc, buf, in, out, quote, escape, compare, file, deck, cow, arc, blob, tree, slice, rand, time, stat, hash, heap, node, primes, page, book, pile, stack, atomic, lock, mutex, thread, map, meter, list, iter, ctype « Þ

     (mu: toy, peg, imm, tag, box, symbol, token, number, bigint, class, method, reader, writer, eval, env, vm, gc, world, pcode, compiler, asm, lathe, lisp, smalltalk, design, weight, jar, card, harp, debug, profile)

     Some demos are stubs: todo is a demo guide. See toy for mu updates on language pages; names introduces naming schemes.

yum

     Class name yum means thorn unsigned (octet) map where (just as with yutm) the u really means u8 because 8-bit is default when unspecified in þ. Each yum instance is mainly a 256 byte map of all possible u8 octet values. But yum also counts the nonzero values in the map (to track the number of "yes" members when such a map is used as a predicate resembling yutm).

struct yum { // u8 map for predicate of nonzero entries « u8 m_u8[ 256 ]; // nonzero for c if predicate is true for c u32 m_len; // map entries in array that are nonzero (yes) vals u8 m_yes; // default value used to set m_u8[c] to nonzero "yes"

u8 operator[](u8 val) const { return m_u8[c]; } « // continued ...

     Code comments and material below both usually assume the only purpose of yum is to represent a predicate, where only the difference between zero and nonzero in the map is meaningful.

     But another use of yum is representing octet associations where a key octet maps to an associated value octet. (And since all state is public as a struct, you're explicitly free to use yum for any other useful purpose too.) One of the constructors takes two input sequences so a set of keys can map to a set of values.

     This example maps open delimitors to close delimitors:

yum open2close("'([{\"<", "')]}\">"); // delimitors « yout << open2close.quote() << yendl; yout << ynow; // flush to stdout

     This sample code writes the following on stdout:

<yum me=0xbffff9c8 len=6 yes=2b:+> "=" '=' (=) <=> [=] {=} </yum>

     Each key and value is separated by equal sign (=) and otherwise each octet prints the same way yutm does: using two-glyph hex for non-isgraph() octets, and as themselves otherwise.

     Note the yum example above can be used as a predicate meaning "is an open delimitor" for the keys involved.

     The following methods and operators edit yum map content by adding or subtracting entries one at a time or en masse. When unspecified, the value used on add is m_yes which defaults to + (plus).

/// \brief for non-nil key and val, add m_u8[key[i]] = val[i] /// \param key and val are both nul terminated c strings void madd(const char* key, const char* val); // key[i] = val[i] « void madd(yv const& v); // for i in 0..v.v_n-1, m[v.v_p[i]] = yes void madd(yutm const& m); // ctype predicate converted to yum void madd(u8 c); // add this specific octet yum& operator+=(yv const& v) { madd(v); return *this; } « yum& operator+=(const char* s) { yv v(s); madd(v); return *this; } yum& operator+=(yutm const& m) { madd(m); return *this; } yum& operator+=(u8 c) { madd(c); return *this; } void msub(yv const& v); // subtract (zero) every byte in src v « void msub(yutm const& m); // subtract bytes true for predicate m void msub(u8 c); // remove this specific octet yum& operator-=(yv const& v) { msub(v); return *this; } « yum& operator-=(const char* s) { yv v(s); msub(v); return *this; } yum& operator-=(yutm const& m) { msub(m); return *this; } yum& operator-=(u8 c) { msub(c); return *this; }

     Obviously operators simply call madd() or msub() as appropriate, so no demo for operators is given below. The constructors — shown next — generally just call madd() after initializing with minit(). Let's add print api boilerplate with constructors to finish the class api:

void minit(yv const& v, u8 yes='+'); // clear all, then madd(v) yum(const char* s, u8 y='+') { yv v(s); this->minit(v, y); } « yum(yv const& v, u8 y='+') { this->minit(v, y); } yum(yutm const& m, u8 y='+'); yum(yutm const& m, yv const& v, u8 y='+'); yum(u8 y='+') { yv v(""); this->minit(v, y); } yum(const char* key, const char* val, u8 y='+') { yv v(""); this->minit(v, y); this->madd(key, val); } struct Mq { yum const& q_m; Mq(yum const& m): q_m(m) { } }; Mq quote() const { return Mq(*this); } // to request dump « void mprint() const; // mdump() to stdout for use under gdb void mdump(yo& o) const; void mcite(yo& o) const; }; // struct yum inline yo& operator<<(yo& o, yum::Mq const& x) { x.q_m.mdump(o); return o; } inline yo& operator<<(yo& o, yct<yum> const& x) { x.c_t.mcite(o); return o; }

     If you study all, you'll see nearly everything is done by minit() and madd(), plus debug printing by mdump(). A mixture of source code and examples is shown below.

     Code for minit() is tiny and also calls madd():

void yum::minit(yv const& src, u8 yes) { // zero then madd(src) « ::memset(this, 0, sizeof(yum)); m_yes = yes; this->madd(src); }

     So an empty yv run instance is needed to get a perfectly empty map to start. (But why start with an empty map?)

     Two non-inline constructors basically do an inline minit():

yum::yum(yutm const& m, yv const& src, u8 yes) { « ::memset(this, 0, sizeof(yum)); m_yes = yes; this->madd(m); this->madd(src); } yum::yum(yutm const& m, u8 yes) { ::memset(this, 0, sizeof(yum)); m_yes = yes; this->madd(m); }

     Of several madd() methods, the interesting one is shown first, taking two arguments to specify explicit values for the keys passed:

void yum::madd(const char* key, const char* val) { « if ( key && val ) { // neither nil? (key[i] maps to val[i]) u8* map = m_u8; for (int i = 0; key[i] && val[i]; i++ ) { int k = key[i]; if (!map[k]) ++m_len; // one more only if not already yes map[k] = val[i]; } } }

     The other two madd() methods merely use m_yes as the value for each source octet. Each time a value changes from zero to nonzero, member m_len increments to show total nonzero values.

void yum::madd(const yv& src) { // add yes for bytes in src « u8* map = m_u8; if (!m_yes) m_yes = '+'; u8* p = src.v_p; u8* end = p+src.v_n; // one beyond last byte --p; // prepare for preincrement while (++p < end) { if (!map[*p]) { // not already nonzero? need to count? map[*p] = m_yes; // pick some nonzero value ++m_len; } } } void yum::madd(const yutm& src) { // add yes for bytes true for src u8* map = m_u8; if (!m_yes) m_yes = '+'; for (unsigned i = 0; i < 256; i++) { if (src[i]) { // i is true in source predicate? if (!map[i]) ++m_len; // going from zero to nonzero? one mre map[i] = m_yes; } } }

     The last madd() variant adds octets true in yutm, which creates a copy of the yutm predicate in yum when the map starts empty — an example below shows yum copying an input yutm:

yutm tmisc(ye_isspace|ye_isupper|ye_iscntrl|ye_isxdigit); « yum misc(tmisc); yout << "# original yutm bitflag predicates:" << yendl; yout << tmisc.quote() << yendl; yout << "# yum copy of yutm with + as value:" << yendl; yout << misc.quote() << yendl; yout << ynow; // flush to stdout

     This code writes the following on stdout:

# original yutm bitflag predicates: <yutm me=0xbffffacc n=76 u16=0231 bits=up:xd:sp:cn:> 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f 7f </yutm> # yum copy of yutm with + as value: <yum me=0xbffff8bc len=76 yes=2b:+> 00=+ 01=+ 02=+ 03=+ 04=+ 05=+ 06=+ 07=+ 08=+ 09=+ 0a=+ 0b=+ 0c=+ 0d=+ 0e=+ 0f=+ 10=+ 11=+ 12=+ 13=+ 14=+ 15=+ 16=+ 17=+ 18=+ 19=+ 1a=+ 1b=+ 1c=+ 1d=+ 1e=+ 1f=+ 20=+ 0=+ 1=+ 2=+ 3=+ 4=+ 5=+ 6=+ 7=+ 8=+ 9=+ A=+ B=+ C=+ D=+ E=+ F=+ G=+ H=+ I=+ J=+ K=+ L=+ M=+ N=+ O=+ P=+ Q=+ R=+ S=+ T=+ U=+ V=+ W=+ X=+ Y=+ Z=+ a=+ b=+ c=+ d=+ e=+ f=+ 7f=+ </yum>

     Unlike yutm, this map can be edited octet by octet; for example, subtracting lower and uppercase b's and p's like this:

misc -= "pPbB"; // operator-=(const char* s) « yout << misc.quote() << yendl;

     Showing three fewer nonzero members afterward:

<yum me=0xbffff8b8 len=73 yes=2b:+> 00=+ 01=+ 02=+ 03=+ 04=+ 05=+ 06=+ 07=+ 08=+ 09=+ 0a=+ 0b=+ 0c=+ 0d=+ 0e=+ 0f=+ 10=+ 11=+ 12=+ 13=+ 14=+ 15=+ 16=+ 17=+ 18=+ 19=+ 1a=+ 1b=+ 1c=+ 1d=+ 1e=+ 1f=+ 20=+ 0=+ 1=+ 2=+ 3=+ 4=+ 5=+ 6=+ 7=+ 8=+ 9=+ A=+ C=+ D=+ E=+ F=+ G=+ H=+ I=+ J=+ K=+ L=+ M=+ N=+ O=+ Q=+ R=+ S=+ T=+ U=+ V=+ W=+ X=+ Y=+ Z=+ a=+ c=+ d=+ e=+ f=+ 7f=+ </yum>

     Both msub() methods resemble madd() in reverse, subtracting instead of adding map members:

void yum::msub(const yv& src) { // zero every byte in source « u8* map = m_u8; if (!m_yes) m_yes = '+'; u8* p = src.v_p; u8* end = p+src.v_n; // one beyond last byte --p; // prepare for preincrement while (++p < end) { if (map[*p]) { // not already zero? need to remove? --m_len; map[*p] = 0; // only zero means "not a member" } } } void yum::msub(const yutm& src) { // zero every byte true for src u8* map = m_u8; if (!m_yes) m_yes = '+'; for (unsigned i = 0; i < 256; i++) { if (src[i]) { // i is true in source predicate? if (map[i]) --m_len; // from nonzero to zero? one fewer map[i] = 0; // only zero means "not a member" } } }

     The yum debug print code below closely resembles the way yutm methods print. Formats are only slightly different, mainly since here values must also be printed, and not just keys:

void yum::mprint() const { // dump to stdout « yout << yendl; this->mdump(yout); yout << yendl << ynow; } void yum::mcite(yo& o) const { // single line only « char cyes = (isgraph(m_yes))? (char) m_yes : '.'; o.of("<yum me=%#lx len=%d yes=%02x:%c/>", (long) this, (int) m_len, (int) m_yes, (char) cyes); } void yum::mdump(yo& o) const { // multi line « int line = 0; char cyes = (isgraph(m_yes))? (char) m_yes : '.'; o.oftn("<yum me=%#lx len=%d yes=%02x:%c>", (long) this, (int) m_len, (int) m_yes, (char) cyes); for (int i = 0; i < 256; i++ ) { int c = m_u8[i]; if (c) { if (line > 44) { o.on(); // newline and tab to indent line = 0; } if (isgraph(i) && isgraph(c)) { o.of("%c=%c ", (char) i, (char) c); line +=4; } else if (isgraph(c)) { o.of("%02x=%c ", (int) i, (char) c); line +=5; } else { o.of("%02x=%02x ", (int) i, (int) c); line +=6; } } } o.ouend("yum"); }

     Before a final dialog, let's look at another sample which gratuitously changes the value of m_yes between adding one set of members and another set of members:

yutm tlower(ye_islower); // islower() predicate « yutm tupper(ye_isupper); // isupper() predicate yum lohi(tlower, 'L'); // 'L' for all lower values lohi.m_yes = 'U'; // now 'U' before upper values lohi += tupper; // add all of isupper predicate yout << lohi.quote() << yendl;

     Output shows upper and lowercase use different values:

<yum me=0xbffff7ac len=52 yes=55:U> A=U B=U C=U D=U E=U F=U G=U H=U I=U J=U K=U L=U M=U N=U O=U P=U Q=U R=U S=U T=U U=U V=U W=U X=U Y=U Z=U a=L b=L c=L d=L e=L f=L g=L h=L i=L j=L k=L l=L m=L n=L o=L p=L q=L r=L s=L t=L u=L v=L w=L x=L y=L z=L </yum>

     The dialog below comments lightly on charsets.

ascii

     You might have little use for ctype style predicates when using charsets not compatible with Latin1 character predicates. In this case, don't use this api. Use something else instead geared for charsets and streams of codepoints in sizes your app needs.

     "Ahem," Dex cleared his throat self-importantly. "You shouldn't support ascii or Latin1 charsets at all because only Unicode matters, you western chauvinist pig, you."

     Before Wil could respond, Ira tapped on Dex's shoulder, making him flinch. "Got a wild hair problem?" Ira asked Dex.

     "No, no," Dex slowly got the squeak in his voice under control. "I just wondered why Wil didn't focus on Unicode exclusively."

     Wil said pleasantly, "I do still write apps needing no Unicode. For example, sometimes I write command line tools."

     "And that programming language of yours," Dex worried. "Are you going to have it parse Latin1 characters?"

     "Is he bothering you?" Ira asked Wil.

     "It's okay," Wil replied. Then Wil addressed Dex. "Yes, only ascii at first since I've no reason to bother with Unicode."

     "But what if I want to use Unicode?" Dex whined. "How can I use a language unless Unicode support is baked-in from the start? I just won't use your language then."

     "First," Wil ticked off on his fingers, "I couldn't care less if you use any programming language I publish. You keep acting like I give a crap about what you want, despite my insistence I don't."

     "I mean it," Dex warned. "I won't use your language."

     "Second," Wil continued without pause, "If I should happen to process some Unicode with my toy language — even when it has Latin1 based syntax — I'll have no problem since all my strings are in pointer plus length representation. Null bytes: no problem."

     "But," wailed Dex, "You won't display native code strings well in Unicode. Your user interface will suck."

     "Who cares," Wil dismissed. "I should be so lucky that I need a better UI for a language to get adoption. Right now all I care about is scratching one itch at a time."

     "But," puzzled Dex, "can you write servers using Unicode?"

     "You're not that stupid, are you?" wondered Wil. "Unicode content in data and whatever charsets I use in code development have nothing to do with each other. You knew that, right?"

     "Ha, ha," Dex dithered, "of course I knew that. But what if code really needs to grok Unicode. Won't you need library support?"

     "Yes," agreed Wil. "I just don't believe the silly idea everything can be done transparently without needing to deal with details."