Þ   briarpig(cf  » thorn  » demos  » in


demos are explained here; a menu at top column right indexes actual topic demos. Here we demo in. See the license.

problem

     To unify treatment of input streams parsed for various purposes, Wil wants one api to read buffered i/o streams, and yi is the abstract api he uses for this in-stream purpose. Wil shows a few subclasses and types related to yi.

     Class yi0 shows a basic virtual api with all the virtual methods in yi. Wil's plan for yi0 is vague, subtle, and possibly useless, but can't do any harm. (To wit: objects managing to expose an yi0 api can support a yi subclass — an idea not currently pursued in any way.)

class yi0 { // abstract yi origin (completely non-concrete) « protected: // only subclasses see; public api uses ic() only virtual int _ic(); // neg on eof or err (yiu::ic() support) public: // required virtual methods for yi subclasses virtual ~yi0(); virtual p32 ipos() const = 0; // current byte position virtual p32 ilen() const = 0; // size in bytes virtual p32 ileft(); // bytes here & after: ilen() - ipos() virtual int iread(void* p, size_t n) = 0; // neg on err virtual int ipread(void* p, size_t n, p32 pos) const; int iin(void* p, size_t n) { return this->iread(p, n); } virtual int iseek(p32 p); // subclass might not support // iget() is a trivial wrapper around virtual ipread() int iget(yb& dest, p32 pos) const; // requires iseek() // i2b() is a trivial wrapper around virtual iread() int i2b(yb& b); // 0 on eof and b.v_n==0, neg on err };

     Most yi0 methods are described below when code is shown. Instead let's focus on a problem caused by making seek a primitive base class operation: not all streams support seek, so what happens when you seek a stream which cannot? It fails. And in that case you are urged not to do that. Streams that cannot seek are unable to perform iseek(), iget(), or ipread(); the api doesn't have a way to ask a stream if seek is supported.

     Why did Wil do this? In earlier framework designs Wil tried to factor separable api features in a sensible way; it always led to a mess and Wil hated the result. It smelled of over-engineering, even when Wil added a virtual method to query capabilities. So the current design aims to avoid over-planning; it works with informal dynamic typing — you can avoid seeking streams with no seek support. If you get stuck, change the code (it's re-editable, not re-usable). Wil finds early fancy design often leads to evil; his new ethic: sketch in broad strokes first, handle fine detail later.

yi0::~yi0() { } « int yi0::_ic() { ylog(1, "_ic() NO OVERRIDE\n"); return -1; } « int yi0::iseek(p32 p) { // might not be supported in subclass « ylog(0, "yi0::iseek(pos=%#lx) NO OVERRIDE\n", p); // backtrace? return 0; } int yi0::ipread(void* p, size_t n, p32 pos) const { // need seek « ylog(0, "yi0::ipread(pos=%#lx) NO OVERRIDE\n", pos); // btrace? return 0; }

     You needn't override iseek() or ipread() unless used; by default they just complain. You also needn't override ileft() below unless you can do it faster than calling ilen() and ipos() separately to query remaining stream bytes.

p32 yi0::ileft() { // bytes remaining at & after here: « p32 end = this->ilen(); p32 n = this->ipos(); return (end > n)? (end - n): 0; } int yi0::i2b(yb& dest) { // eof: 0 & dest.v_n=0, neg on err « int actual = this->iread(dest.v_p, dest.b_x); dest.v_n = (actual > 0)? (n32) actual : 0; return actual; } int yi0::iget(yb& dest, p32 pos) const { // requires iseek() « int actual = this->ipread(dest.v_p, dest.b_x, pos); dest.v_n = (actual > 0)? (n32) actual : 0; return actual; }

     The last i2b() and iget() methods are just buffer friendly (see the buf demo) wrappers around virtual iread() and ipread(); the latter resemble read() and pread() system calls. But with a buffer in hand, i2b() and iget() might look easy to use.

     Note i2b() overwrites existing buffer content, versus iout(yb&) below (cf ») appending at current buffer length.

readv

     Why no ireadv() method simulating a readv() system call? Because Wil hasn't needed it yet in this code rev, and didn't feel like adding it just for this demo. Consider it an obvious extension. If a subclass has an especially interesting ireadv() (say in the deck demo) Wil can show it later anyway.

     The method Wil actually wants to use most often in his work is preadv(), so that's more likely to appear first in the api. However, the current api uses slice subclass yiz (cf ») for some of the behavior of preadv().

streaming

     A main purpose of yi is to have the first triple (i_0, i_p, i_x) of pointers shown, describing a current buffer being read by an in-stream, so inlines like ic() can directly read octets without a virtual function call. As long as pointer i_p has not yet reached max i_x (one byte past the last in the buffer) then another byte can be read without need for refill via virtual method. This amortizes cost of virtual methods across many octets; cheap cost to read octets is a main design motivation. As long as an in-stream is buffered, the space buffer can be used in this way.

     An unbuffered stream can simply make the triple of pointers all equal to zero, indicating an empty buffer since i_p is then never less than i_x; a buffer is exhausted when i_p equals i_x.

class yi : public yi0 { // buffered in/source stream « protected: u8* i_0; // in origin: first byte in buffer u8* i_p; // in cursor: must be such that i_0 <= i_p <= i_x u8* i_x; // one beyond end of buf mutable int i_e; // zero or some error status

     When yvi subclasses yi further below (cf ») to read from a yv run of octets (see the run demo) the following occurs:

|<------------ v_n ------------>| v_p ->| +---+---+---+---+---+---+---+---+ | a b c d e f g h | run of eight bytes +---+---+---+---+---+---+---+---+ | | | ^i_0 ^i_p ^i_x

     Which resembles the same organization of out triples. The base class inits all the triple pointers to zero, and subclasses set them to something else if a buffer can be used.

public: yi() : i_0(0), i_p(0), i_x(0), i_e(0) { } // init « virtual ~yi(); bool igood() const { return i_0 <= i_p && i_p <= i_x; } « void ibad() const; // call when igood() is false static void inil(const char* slot); // call on nil in slot static void isay(const char* what); // call when what occurs static void isayf(const char* fmt, ...); // formatted isay() int ierr() const { return i_e; } // get err status « void ifail(int e) { i_e = e; } // save err status « void isayfail(const char* what, int e) const { // « this->isay(what); i_e = e; } // say and then fail

     Most methods above involve error handling or logging, and aim to minimize code for that in in-stream implementations. Any time igood() is false, it means the main invariant of yi is false: triple pointers are not ordered as required. So ibad() logs this and might assert as well since it's considered corruption equal to using nil as a valid memory address.

public: // utils // (cf yv::vspn()) i2o(): count leading bytes in acceptMap[] u32 i2o(yum const& acceptMap, yo& o, n32 max); // strspn() u32 i2o(yutm const& acceptMap, yo& o, n32 max); // strspn() u32 iword(yo& o, n32 max); // skip space; i2o(!isspace, o, max) // i2oc() complement: count leading bytes NOT in rejectMap[] u32 i2oc(yum const& rejectMap, yo& o, n32 max); // strcspn() u32 i2oc(yutm const& rejectMap, yo& o, n32 max); // strcspn()

     Methods i2o() and i2oc() (c here means complement) resemble yv methods vspn() and vcspn() in the run demo (cf «) using octet predicate maps shown later in the ctype demo to write to an out stream only leading octets selected by a given map. Special case iword() skips leading whitespace then writes a whitespace terminated word on the out stream (more for illustration and unit testing than practical use).

public: // byte i/o: ic() returns negative on either eof or error int ic() { return ( i_p < i_x )? *i_p++ : _ic(); } // « void iunc() { if (i_p > i_0) --i_p; } // unget last read byte « void ipush(int c) { // push non-eof, non-err back on stream « if ( i_p > i_0 && c >= 0 ) *--i_p = (u8) c; } int iline(yb& dest); // read 1 line into buf; neg on eof/err

     Method ic() is the reason yi exists: to read octets as cheaply as possible until a buffer is exhausted, after which protected _ic() refills a buffer (if possible) and tries again, returning negative on either eof or error. Negative one (-1) is the conventional value for eof, but another might be used as well, so yi clients must always interpret values less than zero as eof or error.

     Method iunc() is the moral equivalent of ungetc(), the same way ic() is the equivalent of getc(). It unreads an octet previously read, but only one octet capacity of pushback is guaranteed. (A few more bytes of pushback are available in ycg elsewhere on this page: cf ».) Method ipush() writes back an octet which might replace the original — not a good idea if the buffer was intended immutable space; it won't push back eof or an error.

yiz iz(zp32 p, zn32 n) const { return yiz(p, n, *this); } « yiz iz(yz const& z) const { return yiz(z, *this); } yiz operator()(zp32 p, zn32 n) const { // slice « return yiz(p, n, *this); } yiz operator[](yz const& z) const { return yiz(z, *this); }

     All these methods make yiz (cf ») in-stream slice instances selecting a byte span in this yi, typically to be read by ipread(), or to create a new yi for a subset. (Hmm, Wil uses yii for an in-stream based on another in-stream, but wasn't planning to show that in this demo; let's see how big this page gets first.)

struct Iq { yi const& q_i; Iq(yi const& i): q_i(i) { } }; Iq quote() const { return Iq(*this); } // to request dump « void iprint() const; // idump() to stdout for use under gdb void idump(yo& o) const; void icite(yo& o) const; void iout(yo& o); void iout(yb& b); void iout(yh32& crc); }; // class yi inline yo& operator<<(yo& o, yct<yi> const& x) { x.c_t.icite(o); return o; } inline yo& operator<<(yo& o, yi::Iq const& x) { x.q_i.idump(o); return o; } inline yo& operator<<(yo& o, yi& x) { x.iout(o); return o; } « inline yb& operator<<(yb& b, yi& x) { x.iout(b); return b; } inline yh32& operator<<(yh32& h, yi& x) { x.iout(h); return h; }

     The api to debug print yi resembles that of other þ types (see the quote demo) to print, debug, or cite. Variants of iout() handle writing to destination streams. Here only yo and yb are main targets; Wil overloads iout() for other types at need. For an example, Wil added iout(yh32&) for crc support for this demo.

yi.cpp

     Below Wil annotates base yi in-stream source code.

yi::~yi() { i_0 = i_p = i_x = 0; }

     For good form, buffer is empty after destruction.

void yi::iprint() const { // to stdout « yout << yendl; this->idump(yout); yout << yendl << ynow; } void yi::idump(yo& o) const { // multi-line « o.oft("<yi me=%lx i0=%lx p-0=%d ip=%lx x-p=%d ix=%lx ie=%d>", (long) this, (long) i_0, (int) (i_p-i_0), (long) i_p, (int) (i_x-i_p), (long) i_x, (long) i_e); if (this->igood()) { if (i_p > i_0) { // before i_p? yv before(i_0, i_p - i_0); before.vshow(o, "0:p", (16*1024)); // vshow(), vhexmax() } if (i_x > i_p) { // after i_p? yv after(i_p, i_x - i_p); after.vshow(o, "p:x", (16*1024)); // vshow(), vhexmax() } } else this->ibad(); o.ounend("yi"); } void yi::icite(yo& o) const { // one line only « o.of("<yi me=%lx i0=%lx p-0=%d ip=%lx x-p=%d ix=%lx ie=%d/>", (long) this, (long) i_0, (int) (i_p-i_0), (long) i_p, (int) (i_x-i_p), (long) i_x, (long) i_e); }

     Better explanations and examples of debug print methods appear in the run, hex, and out demos. These methods print in-stream fields and hex dump spans already read (before i_p) and to be read later (after i_p). Just use icite() to avoid hex dumps.

void yi::ibad() const { // call when igood() is false « if (i_0 > i_p) ylog(1, "yi::ibad() i_0>i_p by=0x%lx\n", (long) (i_0 - i_p)); if (i_p > i_x) ylog(1, "yi::ibad() i_p>i_x by=0x%lx\n", (long) (i_p - i_x)); } /*static*/ void yi::inil(const char* slot) { // on nil seen « if (!slot) slot = "?"; // backtrace too? ylog(1, "yi::inil(slot=%s) NIL MEMBER VAR\n", slot); } /*static*/ void yi::isay(const char* what) { // if what happens « if (!what) what = "?"; // backtrace too? ylog(1, "yi::isay(what=%s) UNEXPECTED\n", what); } /*static*/ void yi::isayf(const char* fmt, ...) { // formatted « if (!fmt) { yi::inil("isayf(fmt=nil)"); return; } char temp[ 1024 + 2 ]; va_list args; va_start(args,fmt); vsnprintf(temp, 1024, fmt, args); // max-1 to save end u8 nul va_end(args); temp[1024] = 0; // whether or not vsnprintf() also wrote nul ylog(1, "%s", temp); }

     Methods logging messages are for convenience only.

u32 yi::iword(yo& o, n32 max) { « u32 n = 0; register int c; while (n < max && (c = this->ic()) >= 0) { if (!isspace(c)) { ++n; o.oc(c); } else if (n) break; // stop on first whitespace after adding to o } return n; // count of bytes added to o }

     Method iword() is an informal word tokenizer for space delimited "words", using isspace() as the whitespace predicate. While seldom really useful, Wil likes iword() for a few unit tests, and to illustrate a general idea handled more flexibly by i2o() et al.

     Below is a bit of sample code showing what iword() returns from a static C string converted into a yv run (cf «) once represented as a yvi in-stream (shown below on this page). The destination in this sample is a ybo buf out stream (cf «) writing to the local temp buf in this example. (Here you can see how earlier demos begin to stack in terms of available low-effort unit test experiments.)

yv words("\tjolly boots\n\tof doom\n"); // (invader zim) « yvi in(words); // in-stream reading from words char temp[128]; yb buf(temp, 0, 128); // local buf ybo out(&buf); // out-stream writing to buf n32 len = 0; // length of word in bytes int count = 0; // words we've seen so far do { buf.v_n = 0; out.boclear(); // empty buf and out len = in.iword(out, 512); // read next word if (len) { yout.of("# len=%d word=", (int) len); // word length out << ynow; // flush makes buf length up-to-date yout << "'" << buf << "'" << yendl; // quoted word if (++count == 2) // debug print after 2nd word? yout << in.quote() << yendl; } } while (len); // until words is exhausted yout << ynow; // flush to stdout

     Naturally this sample is a bit contrived, but it gets the job done. The following appears on stdout:

# len=5 word='jolly' # len=5 word='boots' <yi me=bffffae8 i0=ed9f p-0=13 ip=edac x-p=9 ix=edb5 ie=0> <0:p p=0xed9f n=13 crc='0xea225ab:13'> 00000: 09 6a 6f 6c 6c 79 20 62 6f 6f 74 73 ; .jolly boots 0000c: 0a ; . </0:p> <p:x p=0xedac n=9 crc='0xd0512a3a:9'> 0000d: 09 6f 66 20 64 6f 6f 6d 0a ; .of doom. </p:x> </yi> # len=2 word='of' # len=4 word='doom'

     The following methods filter leading in-stream bytes using octet map predicates explained in ctype demo, writing accepted octets to the yo out-stream destination. The effect resembles std C lib strspn() and strcspn(), also resembling comparable yv methods in the run demo (cf «). As with earlier iword(), these are still meant to illustrate a general idea you ought to tune to fit your app needs more specifically, rather than always using this api.

u32 yi::i2o(yum const& accept, yo& o, n32 max) { // strspn() « // (cf yv::vspn()) i2o(): count of leading bytes in map[] u32 n = 0; register int c; while (n < max && (c = this->ic()) >= 0 && accept[(u8) c]) { ++n; o.oc(c); } return n; // count of bytes added to o } u32 yi::i2o(yutm const& accept, yo& o, n32 max) { // cf strspn() u32 n = 0; register int c; while (n < max && (c = this->ic()) >= 0 && accept[(u8) c]) { ++n; o.oc(c); } return n; // count of bytes added to o } u32 yi::i2oc(yum const& deny, yo& o, n32 max) { // « /// i2oc() complement: count of leading bytes NOT in map[] u32 n = 0; register int c; while (n < max && (c = this->ic()) >= 0 && !deny[(u8) c]) { ++n; o.oc(c); } return n; // count of bytes added to o } u32 yi::i2oc(yutm const& deny, yo& o, n32 max) { // strcspn() u32 n = 0; register int c; /// cf ::strcspn() while (n < max && (c = this->ic()) >= 0 && !deny[(u8) c]) { ++n; o.oc(c); } return n; // count of bytes added to o }

     Note how equivalent yum and yutm methods are identical but for input arg type; the yum and yutm usage api is the same.

void yi::iout(yo& dest) { // copy in to out « int actual = 0; char buf[4096+1]; // 4096 is arbitrary do { actual = this->iread(buf, 4096); if (actual > 0) dest.owrite(buf, (u32) actual); } while (actual > 0); // until eof or err } void yi::iout(yh32& crc) { // take crc32 of in stream int actual = 0; char buf[4096+1]; // 4096 is arbitrary do { actual = this->iread(buf, 4096); if (actual > 0) crc.hadd(buf, (u32) actual); } while (actual > 0); // until eof or err }

     (See a discussion of the bug caused by not checking the return of owrite() above for error, cf ».)

     Sample code below calls iout() using operator<<() (cf «) but doesn't show an issue about the "best" way to write iout() efficiently. So let's discuss that first before sample code.

     Both iout() methods read all remaining input and append it to the destination — an out-stream or crc checksum. In both cases we use a local 4K stack buffer to hold content read. Why? Isn't 4K arbitrary? Wouldn't some other approach work better?

     Presumably a bigger stack buffer is more efficient for big writes. Maybe 8K or 16K would be better. However, this code officially doesn't care what size would be best. That's something to be tweaked after profiling. Wil tends to choose space with size "about one page" when picking arbitrary preliminary numbers.

     When writing to out-stream yo in particular, this local stack copy should be slower than using yo methods otake() and ogive() for direct writes into out-stream buffers. But this version of iout() has an advantage: simplicity. So it's better to debug in an app's first working version. A faster version passing the same unit tests can come later. (Wil tries several plausible options for i/o when tuning app performance for an empirical measure of "best.")

     The following example calls both versions of iout():

yv words("\tjolly boots\n\tof doom\n"); // (invader zim) « yvi in(words); // in-stream reading from words char temp[128]; yb buf(temp, 0, 128); // local buf ybo out(&buf); // out-stream writing to buf out << in << ynow; // yi::iout(yo&) then out.oflush() yout << "# buf:" << yendl << buf.quote()<< yendl; in.iseek(0); // reset to start of stream yh32 crc; crc << in; // yi::out(yh32&) yout << "# crc:" << yendl << crc.quote() << yendl; yout << "# words:" << yendl << words.quote() << yendl; yout << ynow;

     And that code writes the following on stdout:

# buf: <yb p=0xbffffa1c n=22 x=128 crc='#ed08f90e:22'> 00000: 09 6a 6f 6c 6c 79 20 62 6f 6f 74 73 ; .jolly boots 0000c: 0a 09 6f 66 20 64 6f 6f 6d 0a ; ..of doom. </yb> # crc: <yh32 crc=0xed08f90e len=22/> # words: <yv p=0xec5d n=22 crc='0xed08f90e:22'> 00000: 09 6a 6f 6c 6c 79 20 62 6f 6f 74 73 ; .jolly boots 0000c: 0a 09 6f 66 20 64 6f 6f 6d 0a ; ..of doom. </yv>

     This shows iout(yo&) wrote a perfert copy of original words inside buf, and shows iout(yh32&) generated the correct crc32 since it matches both words and buf.

yvi

     Class yvi is a yi subclass reading from yv with implementation as trivial as it gets. (Examples of use: above and below.)

class yvi : public yi { // source from contiguous octet vector « public: // virtual methods virtual p32 ipos() const; virtual p32 ilen() const; virtual int iread(void* p, size_t n); // neg on err virtual int iseek(p32 p); // might not be supported in subclass virtual int ipread(void* p, size_t n, p32 p) const; // err: neg protected: virtual int _ic(); // called only from public yi::ic() public: // unmake virtual ~yvi(); // clear all slots to zero public: // make yvi(const yv& v); // read from bytes in this contiguous yv p32 vipos() const { return (p32) (i_p - i_0); } // ipos() « void viseek(const void* vp) { u8* p = (u8*) vp; // ptr seek « i_p = (i_0 <= p && p <= i_x)? p : i_x; } }; // class yvi

     The final two vipos() and viseek() inlines are just fast, non-virtual versions of ipos() and iseek(), returning the same answers more quickly when you statically know the type is yvi.

/*virtual*/ p32 yvi::ipos() const { « return i_p - i_0; } /*virtual*/ int yvi::iseek(p32 p) { « n32 n = i_x - i_0; // eof position i_p = (n >= p)? (i_0 + p) : i_x; return (i_p - i_0); // return position after seek }

     Method iseek() cannot seek after eof.

yvi::yvi(const yv& v) : yi() { // read from contig vector « u8* p = v.v_p; i_0 = i_p = p; i_x = p + v.v_n; } /*virtual*/ yvi::~yvi() { // clear all slots to zero « i_0 = i_p = i_x = 0; // empty buffer (same as yi::~yi()) } /*virtual*/ p32 yvi::ilen() const { « return i_x - i_0; // distance from origin to max }

     The constructor and ilen() are trivial; the destructor is redundant with yi::~yi(). And _ic() next is also trivial in the sense it always returns negative for eof — it can only be called from public yi::ic() on empty buffer, so testing i_p < i_x is just good form.

/*virtual*/ int yvi::_ic() { // same as ic(), so -1 expected « return (i_p < i_x)? *i_p++ : -1; // always -1 in practice }

     So the only methods that do anything of note are iread() and ipread() below. They need merely check for nil pointers and avoid reading more bytes than remain.

/*virtual*/ int yvi::iread(void* dest, size_t n) { « u8* p = i_p; n32 more = i_x - p; // bytes remaining in in-stream if (dest && n && p && more) { // both valid & nonempty? if (n > more) // requested more than remaining bytes? n = more; ::memcpy(dest, p, n); // copy current ptr pos to dest i_p = p + n; // advance beyond bytes just read return (int) n; // number of bytes copied } return 0; } int yvi::ipread(void* dest, size_t n, p32 pos) const { « n32 eof = i_x - i_0; // eof position if (pos < eof) { u8* p = i_0 + pos; // analog to i_p n32 more = i_x - p; // bytes left in this in-stream if (dest && n && more) { // valid & nonempty src & dest? if (n > more) // requested more than remaining bytes? n = more; ::memcpy(dest, p, n); // copy current ptr pos to dest return (int) n; // number of bytes copied } } return 0; }

     Method ipread() does nearly the same thing as iread() but uses an explicit position in the stream, avoiding current read pointer i_p updates. Here's an example showing the difference:

const char* s = "abcdefghijklmnopqrst"; « yv v(s); // yv::yv(const char*) yvi i(v); // reads from v char tmp[64]; // more than enough buf space yout << "# vlo (run 20 lowercase bytes)" << yendl; yout << v.quote() << yendl; // debug print v i.iseek(3); // make i_p point at 'd' yout.ofn("# i pos=%d len=%d left=%d:", (int) i.ipos(), (int) i.ilen(), (int) i.ileft()); yout << i.quote() << yendl; // debug print yvi int n = i.iread(tmp, 6); // read six bytes "defghi" yout.ofn("# i pos=%d len=%d left=%d tmp='%.*s':", (int) i.ipos(), (int) i.ilen(), (int) i.ileft(), /*tmp='%.*s'*/ (int) n, (const char*) tmp); yout << i.quote() << yendl; // debug print yvi again n = i.ipread(tmp, /*size*/ 5, /*pos*/ 1); // "bcdef" yout.ofn("# same: pos=%d len=%d left=%d v(1,5)='%.*s':", (int) i.ipos(), (int) i.ilen(), (int) i.ileft(), /*tmp='%.*s'*/ (int) n, (const char*) tmp); yout << i.quote() << yendl; // debug print yvi again n = i.iread(tmp, 6); // read six bytes "jklmno" yout.ofn("# i pos=%d len=%d left=%d tmp='%.*s':", (int) i.ipos(), (int) i.ilen(), (int) i.ileft(), /*tmp='%.*s'*/ (int) n, (const char*) tmp); yout << i.quote() << yendl; // debug print yvi again i.viseek(s + 7); // seek using pointer address yout.ofn("# i pos=%d len=%d left=%d:", (int) i.ipos(), (int) i.ilen(), (int) i.ileft()); yout << ycite(i) << yendl; // cite yvi only yout << ynow; // flush to stdout

     The output appears on stdout as below. In summary, the sample code does the following. A C string of 20 bytes s is converted to yv octet run v which is read by yvi in-stream i. After seeking offset 3 (skipping abc) six bytes are read, expecting defghi. Then five bytes (bcdef) are read from offset 1 using ipread() which doesn't move the read position. So the next read of six bytes (jklmno) with iread() continues where the previous ipread() left off.

# vlo (run 20 lowercase bytes) <yv p=0xfca4 n=20 crc='0x1a596ae5:20'> 00000: 61 62 63 64 65 66 67 68 69 6a 6b 6c ; abcdefghijkl 0000c: 6d 6e 6f 70 71 72 73 74 ; mnopqrst </yv> # i pos=3 len=20 left=17: <yi me=bffffad4 i0=fca4 p-0=3 ip=fca7 x-p=17 ix=fcb8 ie=0> <0:p p=0xfca4 n=3 crc='0x352441c2:3'> 00000: 61 62 63 ; abc </0:p> <p:x p=0xfca7 n=17 crc='0x84754271:17'> 00003: 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f ; defghijklmno 0000f: 70 71 72 73 74 ; pqrst </p:x> </yi> # i pos=9 len=20 left=11 tmp='defghi': <yi me=bffffad4 i0=fca4 p-0=9 ip=fcad x-p=11 ix=fcb8 ie=0> <0:p p=0xfca4 n=9 crc='0x8da988af:9'> 00000: 61 62 63 64 65 66 67 68 69 ; abcdefghi </0:p> <p:x p=0xfcad n=11 crc='0xd37c922:11'> 00009: 6a 6b 6c 6d 6e 6f 70 71 72 73 74 ; jklmnopqrst </p:x> </yi> # same: pos=9 len=20 left=11 v(1,5)='bcdef': <yi me=bffffad4 i0=fca4 p-0=9 ip=fcad x-p=11 ix=fcb8 ie=0> <0:p p=0xfca4 n=9 crc='0x8da988af:9'> 00000: 61 62 63 64 65 66 67 68 69 ; abcdefghi </0:p> <p:x p=0xfcad n=11 crc='0xd37c922:11'> 00009: 6a 6b 6c 6d 6e 6f 70 71 72 73 74 ; jklmnopqrst </p:x> </yi> # i pos=15 len=20 left=5 tmp='jklmno': <yi me=bffffad4 i0=fca4 p-0=15 ip=fcb3 x-p=5 ix=fcb8 ie=0> <0:p p=0xfca4 n=15 crc='0x519167df:15'> 00000: 61 62 63 64 65 66 67 68 69 6a 6b 6c ; abcdefghijkl 0000c: 6d 6e 6f ; mno </0:p> <p:x p=0xfcb3 n=5 crc='0xe87cf305:5'> 0000f: 70 71 72 73 74 ; pqrst </p:x> </yi> # i pos=7 len=20 left=13: <yi me=bffffad4 i0=fca4 p-0=7 ip=fcab x-p=13 ix=fcb8 ie=0/>

     The final viseek() is done with a pointer address inside the buffer by adding an offset to the address of the string start. So this is different than passing an integer offset to iseek() in the yi api; the final citation shows the expected offset for i_p.

A submenu for demos appears below, letting you go to the page on a topic written as a demo (as the demos page defines it).

menu

     thorn: todo, names, fd, iovec, assert, log, run, hex, crc, buf, in « Þ, out, quote, escape, compare, file, deck, cow, arc, blob, tree, slice, rand, time, stat, hash, heap, node, primes, page, book, pile, stack, atomic, lock, mutex, thread, map, meter, list, iter, ctype

     (mu: toy, peg, imm, tag, box, symbol, token, number, bigint, class, method, reader, writer, eval, env, vm, gc, world, pcode, compiler, asm, lathe, lisp, smalltalk, design, weight, jar, card, harp, debug, profile)

     Some demos are stubs: todo is a demo guide. See toy for mu updates on language pages; names introduces naming schemes.

table of contents

     Because this page shows many classes, here's an index:

  • yi0 (cf «) abstract base class with virtual methods
  • yi (cf «) base class for all in-streams
  • yvi (cf «) in-stream subclass reading from yv
  • yiz (cf ») in-stream slice: subset of an in-stream
  • yugi (cf ») in-stream reading pseudo-random generator
  • ycg (cf ») char generator: position and line meter

files

     Obviously files have a yi subclass for reading, but yfi appears in the file demo because this page is filled with other types.

slices

     A subset of a yi in-stream is slice yiz subclassing the yz slice base class. (See the slice demo for an intro to slices and the yz api.) Base class yz is just an offset and length inside a sequence, to name a subset.

     Subclass yiz adds a reference to yi, so you can think of yiz as triple (pos, len, ref) whose fields are named (z_p, z_n, z_i).

     A most important caveat when using yiz is that it requires a yi in-stream suporting random access with yi::ipread() implemented. A yiz slice reads content from yi without changing the stream's read position, so every read is positioned explicitly.

struct yiz : public yz { // const slice of yi « yi& z_i; yiz(zp32 p, zn32 n, yi const& i) : yz(p, n), z_i(*(yi*)&i) { } yiz(yz const& z, yi const& i) : yz(z), z_i(*(yi*)&i) { } int zread(void* dest, size_t len); struct Zq { yiz const& q_z; Zq(yiz const& z): q_z(z) { } }; Zq quote() const { return Zq(*this); } // to request dump void zprint() const; // zdump() to stdout for use under gdb void zdump(yo& o) const; void zcite(yo& o) const; void zout(yb& b) const; // transfer content to b, unchanged void zout(yo& o) const; void zout(yh32& crc) const; }; // yiz inline yo& operator<<(yo& o, yct<yiz> const& x) { x.c_t.zcite(o); return o; } inline yo& operator<<(yo& o, yiz::Zq const& x) { x.q_z.zdump(o); return o; } inline yb& operator<<(yb& b, yiz const& x) { x.zout(b); return b; } inline yo& operator<<(yo& o, yiz const& x) { x.zout(o); return o; } inline yh32& operator<<(yh32& h, yiz const& x) { x.zout(h); return h; }

     For a change, let's start with the most interesting method, zread(), since it needs little explanation and because it's used inside other methods like zout() to actually get things done.

int yiz::zread(void* dest, size_t len) { « if (dest && len) { yz z(*this); // copy my z_p and z_n if (z_p < 0 || z_n < 0) { // either is negative? yz nz(*this, z_i.ilen()); // normalize relative to length yassert(nz.z_p >= 0 && nz.z_n >= 0); // no more negatives this->zassign(nz); } if (len > z_n) // request is more than slice length? len = z_n; if (len) { int actual = z_i.ipread(dest, len, (p32) z_p); if (actual > 0) // cut actual bytes from head of slice? this->zskip((unsigned) actual); return actual; } return actual; } return 0; }

     Let's try an example to help show what this does: zread() does streamed reads over a described yi subset, using yz::zskip() to self-adjust after a read, making the slice itself smaller each time. When zread() must be used on const yiz instances (see zout() below) we need only make temp mutable copies that avoid altering an original slice.

     This next example creates a slice of a yvi in-stream shown column left (cf «). Successive reads consume more of the slice until eof is indicated when the slice is exhausted.

const char* s = "abcdefghijklmnopqrst"; « yv v(s); // yv::yv(const char*) yout << "# v (20 bytes):" << yendl << v.quote() << yendl; yvi i(v); // reads from v char tmp[64]; // more than enough buf space yv vz = v(5, 10); // slice of original yv yout << "# v(5, 10):" << yendl << vz.quote() << yendl; yiz z = i(5, 10); // slize of yvi i, cf yi::iz() yout << "# z = i(5, 10)" << yendl << z.quote() << yendl; int n = z.zread(tmp, 6); int t = (n < 0)? 0 : n; yout.ofn("# (FIRST) z_p=%d z_n=%d n=%d tmp='%.*s':", (int) z.z_p, (int) z.z_n, n, t, tmp); yout << z.quote() << yendl; n = z.zread(tmp, 15); // too many: rest of slice t = (n < 0)? 0 : n; yout.ofn("# (REMAINDER) z_p=%d z_n=%d n=%d tmp='%.*s':", (int) z.z_p, (int) z.z_n, n, t, tmp); yout << z.quote() << yendl; n = z.zread(tmp, 15); // empty read t = (n < 0)? 0 : n; yout.ofn("# (EMPTY) z_p=%d z_n=%d n=%d tmp='%.*s':", (int) z.z_p, (int) z.z_n, n, t, tmp); yout << z.quote() << yendl; yout << ynow; // flush to stdout

     This code names a ten byte slice of yvi in-stream i starting five bytes inside. Then it reads: six bytes once and fifteen bytes twice. The first two reads consume the slice in two parts; the third returns eof because the slice is empty. The output on stdout shows:

# v (20 bytes): <yv p=0xfc80 n=20 crc='0x1a596ae5:20'> 00000: 61 62 63 64 65 66 67 68 69 6a 6b 6c ; abcdefghijkl 0000c: 6d 6e 6f 70 71 72 73 74 ; mnopqrst </yv> # v(5, 10): <yv p=0xfc85 n=10 crc='0x4a50025c:10'> 00000: 66 67 68 69 6a 6b 6c 6d 6e 6f ; fghijklmno </yv> # z = i(5, 10) <yiz me=bffffabc zp=5 zn=10 i=bffffad0 N=20 _p=5 _n=10/> # (FIRST) z_p=11 z_n=4 n=6 tmp='fghijk': <yiz me=bffffabc zp=11 zn=4 i=bffffad0 N=20 _p=11 _n=4/> # (REMAINDER) z_p=15 z_n=0 n=4 tmp='lmno': <yiz me=bffffabc zp=15 zn=0 i=bffffad0 N=20 _p=15 _n=0/> # (EMPTY) z_p=15 z_n=0 n=0 tmp='': <yiz me=bffffabc zp=15 zn=0 i=bffffad0 N=20 _p=15 _n=0/>

     The debug printing output show above was written like this:

void yiz::zprint() const { // dump to stdout « yout << yendl; this->zcite(yout); yout << yendl << ynow; } void yiz::zdump(yo& o) const { // can be multi-line « this->zcite(yout); } void yiz::zcite(yo& o) const { // one line only « n32 len = z_i.ilen(); yz nz(*this, len); // normalize relative to length o.of("<yiz me=%lx zp=%ld zn=%ld i=%lx N=%ld _p=%ld _n=%ld/>", (long) this, (long) z_p, (long) z_n, (long) &z_i, (long) len, (long) nz.z_p, (long) nz.z_n); }

     A normal approach to coding zdump() would also debug print the z_i in-stream inside as well, but making output more complex. So zdump() is just zcite() here for sake of brevity.

     The following two zout() methods look nearly identical to code in yi::iout() (cf «) and the same comments about iout() efficiency apply here (ie using give/take methods or size of reads).

void yiz::zout(yo& dest) const { « char buf[4096+1]; // arbitrary small buf size int n = 0; // actual count of bytes read yiz iz(*this); // mutable copy for zread() to alter do { if ((n = iz.zread(buf, 4096)) > 0) dest.owrite(buf, (u32) n); } while (n > 0); // no err or eof? } void yiz::zout(yh32& crc) const { char buf[4096+1]; // arbitrary small buf size int n = 0; // actual count of bytes read yiz iz(*this); // mutable copy for zread() to alter do { if ((n = iz.zread(buf, 4096)) > 0) crc.hadd(buf, (u32) n); } while (n > 0); // no err or eof? }

     The version of yiz shown here is somewhat experimental. It's intended to rough out basics to be later adjusted for specific circumstances. Wil has written libraries in servers needing to read streams from subsets of a larger stream, and yiz is a gesture in this direction. However, a more useful representation of a yiz is a new yi subclass reading only from a subset slice. This demo doesn't show that, so you're being deprived of the most fun part.

     Wil finds it harder to explain compromises in such things; use your imagination, and note them as obvious when you see them.

pseudo random

     This section describes a new yi in-stream subclass reading from a pseudo random number generator. The name yugi means thorn unspecified bytes generator in-stream, which uses u in a recklessly casual way; but Wil tends to use u in class names with a lot of hand waving. The 32-bit pseudo random generator h32rand() used by yugi will appear later in the rand demo (cf »).

     Wil uses yugi in unit tests needing pseudo random content, when the normal way to acquire content is from streams.

class yugi : public yi { // random number source stream « protected: enum { e_u32_len = 8, e_body_size = sizeof(u32)*e_u32_len }; p32 g_pos; // pos of m_body (bytes before current buffer) u32* g_seed; // generator seed shared with other folks u32 g_body[ e_u32_len ]; // random bytes public: // virtual methods virtual p32 ipos() const; virtual p32 ilen() const; virtual int iread(void* p, size_t n); // neg on err protected: virtual int _ic(); n32 _g_fill_rand(); // put random bytes in buf public: // unmake virtual ~yugi(); public: // make yugi(u32* seed); // read sequence of random bytes from seed void setSeed(u32* seed) { g_seed = seed; } « }; // class yugi

     Note yugi does not override iseek() or ipread() so random access i/o is not supported. Wil sees no need for it.

     The g_body array of eight u32's is 32 bytes in size: the value of e_body_size. This is the buffer that triple (i_0, i_p, i_x) describes. Each time the buffer is filled, eight pseudo random 32-bit values are generated and buffered in g_body for subsequent demand. Perhaps a larger buffer would be faster, but Wil doesn't expect to use yugi in performance critical situations, and "about one cache line" is fine as buf size in this case. Filling the buffer looks like this:

n32 yugi::_g_fill_rand() { // put random bytes in buf « u32 seed = *g_seed; u32* p = g_body; u32* end = p + e_u32_len; do { *p = seed = h32rand(seed); } while (++p < end); *g_seed = seed; return (sizeof(u32) * e_u32_len); // e_body_size }

     The constructor takes a pointer to an external random number seed, so the same seed can be shared by many unit tests all sharing the same stream of "random" values so behavior is deterministic (as long as threads are not involved with non-deterministic scheduling).

yugi::~yugi() { i_0 = i_p = i_x = 0; g_seed = 0; } « u32 g_yugi_seed = 1; // only used if absolutely necessary yugi::yugi(u32* seed) // read seq of rand bytes from seed « : yi(), g_pos(0), g_seed(seed? seed : &g_yugi_seed) { i_0 = i_p = (u8*) g_body; i_x = i_0 + _g_fill_rand(); }

     The first call to _g_fill_rand() occurs in the constructor. Note how a nil pointer for seed address causes a global seed to be used.

p32 yugi::ipos() const { return g_pos + (i_p - i_0); } « p32 yugi::ilen() const { return g_pos + (i_p - i_0); } «

     Neither length nor read position are very meaningful in ygui, so Wil defines both as the number of bytes returned so far from ygui, so ygui always appears positioned at eof. This works fine. Wil uses member variable g_pos to record the position of the first byte in the buffer, which is consistent with further g_pos bumps below when _g_fill_rand() fills the buffer each time.

int yugi::_ic() { « if (i_p >= i_x) { // rand buffer is empty? g_pos += _g_fill_rand(); i_p = i_0; } return *i_p++; }

     As always, protected _ic() is called from public yi::ic() only when the buffer is exhausted, so the test condition here is always true, refilling the buffer and resetting i_p to the origin.

int yugi::iread(void* pdest, size_t sz) { « u8* dest = (u8*) pdest; // nil invalid only when sz is nonzero if ( !this->igood() ) { this->ibad(); errno = EINVAL; return -1; } if ( !sz ) { return 0; } // zero size okay (just a noop) if ( !dest ) { errno = EINVAL; return -1; } // nil ptr invalid n32 outSize = 0; n32 more = (i_x - i_p); // space can't be neg after igood() n32 part = sz; // quantum of contiguous transfer if (part > more) part = more; // min(part, more) if (part) { ::memcpy(dest, i_p, part); outSize += part; dest += part; i_p += part; sz -= part; } while (sz) { // have not yet satisfied all of request? n32 more = _g_fill_rand(); g_pos += more; i_p = i_0; part = (sz > more)? more: sz; if (part) { ::memcpy(dest, i_p, part); outSize += part; dest += part; i_p += part; sz -= part; } } return outSize; }

     Method iread() is the source of pseudo random bytes read in bulk, when ic() is not used to read single bytes. Basically iread() counts down the sz number of requested bytes until this many have been written to destination dest. As the buffer becomes exhausted, new content is generated on demand using _g_fill_rand().

     As Wil thinks about writing a demo for yugi, it becomes apparent the absence of max stream size makes it possible to write an infinite loop by calling iout() (cf «) with yugi as a source. But this is realistic in situations where an in-stream might be unbounded as long as a system remains operational. So the problem might be in the original version of iout() — yes, that's it. The iout() api ought to allow a max transfer size. And with or without a max transfer size, the first draft of iout() has a bug when it doesn't check to see if writing makes progress. (The first time iout() hits a zero return from writing, it ought to stop transfer, instead of waiting for eof on read.) «

     Okay, here's a short demo of yugi emphasizing use of both ic() and iread() with debug printing to reveal change in buffer content when the buffer is exhausted.

char temp[ 64 ]; // buf space for reading « u32 seed = 0x1234; yugi i(&seed); int a = i.ic(); int b = i.ic(); yout.ofn("# after reading a=0x%02x b=0x%02x of 32:", a, b); yout << i.quote() << yendl; int n = i.iread(temp, 30); // rest of rand buf yv v(temp, n); yout << "# next 30 bytes in one yv:" << yendl; yout << v.quote() << yendl; yout << "# yugi with buffer exhausted:" << yendl; yout << i.quote() << yendl; int c = i.ic(); yout.ofn("# after reading c=0x%02x of next 32:", c); yout << i.quote() << yendl; yout << ynow; // flush to stdout

     This code sample writes the following on stdout:

# after reading a=0x0c b=0x5b of 32: <yi i0=bffffac4 p-0=2 ip=bffffac6 x-p=30 ix=bffffae4 ie=0> <0:p p=0xbffffac4 n=2 crc='0x11d5d58f:2'> 00000: 0c 5b ; .[ </0:p> <p:x p=0xbffffac6 n=30 crc='0x2057e160:30'> 00002: 68 0d 74 bf 27 20 2a 51 cd 26 9e 67 ; h.t.' *Q.&.g 0000e: 4b 74 92 94 37 56 8f b4 27 7a d3 93 ; Kt..7V..'z.. 0001a: c7 64 52 1b 0b 5c ; .dR..\ </p:x> </yi> # next 30 bytes in one yv: <yv p=0xbffffa68 n=30 crc='0x2057e160:30'> 00000: 68 0d 74 bf 27 20 2a 51 cd 26 9e 67 ; h.t.' *Q.&.g 0000c: 4b 74 92 94 37 56 8f b4 27 7a d3 93 ; Kt..7V..'z.. 00018: c7 64 52 1b 0b 5c ; .dR..\ </yv> # yugi with buffer exhausted: <yi i0=bffffac4 p-0=32 ip=bffffae4 x-p=0 ix=bffffae4 ie=0> <0:p p=0xbffffac4 n=32 crc='0x2eb66e2f:32'> 00000: 0c 5b 68 0d 74 bf 27 20 2a 51 cd 26 ; .[h.t.' *Q.& 0000c: 9e 67 4b 74 92 94 37 56 8f b4 27 7a ; .gKt..7V..'z 00018: d3 93 c7 64 52 1b 0b 5c ; ...dR..\ </0:p> </yi> # after reading c=0x65 of next 32: <yi i0=bffffac4 p-0=1 ip=bffffac5 x-p=31 ix=bffffae4 ie=0> <0:p p=0xbffffac4 n=1 crc='0xefda7a5a:1'> 00000: 65 ; e </0:p> <p:x p=0xbffffac5 n=31 crc='0xf0d6ec21:31'> 00001: 02 45 12 54 9d 4e 62 bd 22 74 25 bf ; .E.T.Nb."t%. 0000d: 6a 62 35 55 33 5e 18 60 3c 51 4a 1a ; jb5U3^.`<QJ. 00019: a7 b7 2b ec af 4d 4a ; ..+..MJ </p:x> </yi>

     On a final note, clearly yugi is badly named if you think the word "random" really ought to appear in the class name somewhere. However, Wil knows the content is pseudo random — and there's only one in-stream like this, so he doesn't get confused. So the class is only badly named if Wil prioritizes your need to be informed.

     But Wil doesn't; your needs don't rank. If Wil forgets the class name, he opens yi.cpp, searches for "rand", and the problem is solved. This point would be a good place for a dialog in which Dex gives Wil grief for not naming the class PseudoRandomInStream. But you could write that one easily. So the following short circuit exchange is a joke.

     "You know," Dex patronized, "normal people give meaningful names to classes. Would that trouble you too much?"

     "I don't mind if you burn," Wil replied.

     "That's not the question I asked," objected Dex.

     "Bite me," snapped Wil.

     "I'm lodging a complaint with the management," huffed Dex.

     "Good luck with that," wished Wil.

spigot

     The next class ycg uses an instance of yi and repackages its output, but is not itself a yi in-stream. You could make a yi based on ycg, but Wil doesn't need this (and Wil isn't driven to normative consistency without a reason). Class ycg is almost — but not quite — as badly named as yugi in the last section. It means thorn char generator where an eight bit codepoint is understood since unspecified.

     In the 90's Wil often used the name spigot for this class, and that's not any better, is it? It was where Wil metered out his input when parsing content that had to be annotated with line numbers.

     The code below mainly pursues a couple requirements Wil needs in language parsers using only a few bytes of token lookahead:

  • several bytes of pushback are needed, and
  • offset and line stay correct despite pushback.

     Before we start api and code for ycg, this is a good spot to mention the main problem Wil has with þ's runtime as it pertains to dynamic language implementations: i/o is sync with no continuations, so reading and parsing input is awkwardly monolithic and batch-oriented in a way that's strictly retarded in languages with continuations.

     In other words, you won't see much of Wil's approach to async i/o in this rev of þ, and that's why you might develop an itchy feeling when perusing code that reads and parses in-streams. (And the toy language's first version will be the same — one of the reasons it will just be a toy.) The first version of i/o for a toy language for processing source code must be used to bootstrap another async i/o system — so parsing and compiling must be written twice. The C++ version is a throw-away bootstrap layer. Cool, huh? As a side effect, a later more elaborate parsing engine built in a dynamic runtime can choose to support more upscale charsets and encodings. A first version can be dumber to bootstrap without limiting a later version.

class ycg { // char generator (special parsing reader for yi) « protected: // mithril called this a 'spigot' yi* cg_i; // byte source p32 cg_p0; // original source position cg_i->ipos() p32 cg_p; // bytes returned less cg_saved n32 cg_line; // current one-based line num in source from cg_p0 enum { e_back_len = 16 }; // max octet pushback supported u8 cg_back[ e_back_len ]; mutable n32 cg_saved; // bytes in cg_back (mutable to correct) public: yi* gi() const { return cg_i; } // source in-stream « p32 gp0() const { return cg_p0; } // original source position « p32 gp() const { return cg_p; } // current position « n32 gline() const { return cg_line; } // line count at gp() « yiz gz(zp32 p, zn32 n) const { return yiz(p, n, *cg_i); } « yiz gz(yz const& z) const { return yiz(z, *cg_i); } yiz operator()(zp32 p, zn32 n) const { return yiz(p, n, *cg_i); } yiz operator[](yz const& z) const { return yiz(z, *cg_i); } int gc(); // next char (at position gp()-1), or else -1 upon eof void gpush(int c); // limit: max pushback (negative c is IGNORED) ~ycg() { cg_i = 0; } ycg(yi* i) : cg_i(i), cg_p0((i)? i->ipos() : 0) « , cg_p( cg_p0 ) , cg_line( 1 ) , cg_saved( 0 ) { } /// \brief default copy is just fine to remember earlier state: // ycg(const ycg& other); ycg& operator=(const ycg& other); struct Gq { ycg const& q_a; Gq(ycg const& a): q_a(a) { } }; Gq quote() const { return Gq(*this); } // to request dump « void gprint() const; // gdump() to stdout for use under gdb void gdump(yo& o) const; void gcite(yo& o) const; }; // class ycg inline yo& operator<<(yo& o, ycg const& x) { x.gdump(o); return o; } inline yo& operator<<(yo& o, ycg::Gq const& x) { x.q_a.gdump(o); return o; } inline yo& operator<<(yo& o, yct<ycg> const& x) { x.c_t.gcite(o); return o; }

     Note Wil's interesting decision to return a slice of the source yi in-stream when slicing ycg — this works because content appears at the same position in ycg that it does in the underlying yi.

     Almost all the api is inline, so all that remains is gc() to read a byte, gpush() to push back a byte, and debug printing. Let's start with printing boilerplate and conclude with good stuff.

void ycg::gprint() const { // dump to stdout « yout << yendl; this->gdump(yout); yout << yendl << ynow; } void ycg::gdump(yo& o) const { // multi-line « o.oft("<ycg me=%lx i=%lx p0=%d p=%ld line=%d saved=%d>", (long) this, (long) cg_i, (int) cg_p0, (long) cg_p, (int) cg_line, (int) cg_saved); if (cg_i) { o << yendl << ycite(*cg_i) << yendl; } if (cg_saved) { if (cg_saved > e_back_len) cg_saved = e_back_len; yv back(cg_back, cg_saved); back.vshow(o, "cg_back", 1024); } o.ouend("ycg"); } void ycg::gcite(yo& o) const { // one line only « o.of("<ycg me=%lx i=%lx p0=%d p=%ld line=%d saved=%d/>", (long) this, (long) cg_i, (int) cg_p0, (long) cg_p, (int) cg_line, (int) cg_saved); }

     The gdump() output here only cites the cg_i in-stream, occupying only one line — which is nice for brevity, but lacks detail as you'll see in sample code below.

     Code for gpush() next reveals it only considers 0xA when counting line-endings, so pushing back 0xA has the effect of reducing the line count. Note you can push back a different byte than one that was originally read. This is considered a feature when useful for content rewriting — but dangerous when it might alter token positions so they no longer match source.

void ycg::gpush(int c) { // negative c is IGNORED « if ( c >= 0 && cg_saved < e_back_len ) { if ( c == 0xA ) { // need to decrement line number? if ( cg_line ) // not already down to zero? --cg_line; } if ( cg_p ) // able to decrement byte count? --cg_p; cg_back[ cg_saved++ ] = (u8) c; // save } }

     Reading the next octet with gc() increments position when eof is not encountered, and increments line count when 0xA is returned. (Wil's portable version of this in the 90's was complex when working with Mac, Linux, and Windows line endings. This one is trivial.)

int ycg::gc() { « register int c = -1; // default to eof on errors yi* source = cg_i; if ( source ) { // has not been halted? if ( cg_saved == 0 ) { // no cached bytes in pushback buf? c = source->ic(); // typically an inline read from a buffer if ( c >= 0 ) { // not eof? count byte & see line endings? ++cg_p; // another content byte returned bumps position if ( c == 0xA ) // line ending? ++cg_line; // another line ending bumps line number } } else { // need to pop a byte from the pushback buffer if (cg_saved > e_back_len) { yellf(__LINE__,__FILE__,"ycg::gc() cg_saved=%d > len=%d", (int) cg_saved, (int) e_back_len); cg_saved = e_back_len; } c = cg_back[ --cg_saved ]; // pop ++cg_p; // always advance pos since it can't be eof if ( c == 0xA ) // it was a line ending pushed back? ++cg_line; // count another line ending } } return c; }

     The following code sample shows inner state of ycg before reading the first newline, after reading it, and after pushing it back again. The start and end positions of a token parsed before first newline is captured in a buffer, and then read again directly from ycg using slice notation. The slice is printed for comparison with what was parsed, invoking code in an earlier demo on this page for yiz::zout() (cf «) which is actually an expected practical application of this api: fetching the original source from offsets when presenting code commentary.

yv santa("\tsanta:\n\tjolly boots\n\tof doom\n"); « yvi i(santa); // in-stream reading from santa i.iseek(1); // skip 1st byte so start pos != 0 yout << "# yvi reading from santa:" << yendl; yout << i.quote() << yendl; ycg spigot(&i); // reads from i yout << "# spigot start pos not zero:" << yendl; yout << spigot.quote() << yendl; u8 line[ 128 ]; u8* p = line; // cursor u8* end = p + 128; // one past last usable u8 p32 startPos = spigot.gp(); // current position int c = 0; while (p < end && (c = spigot.gc()) != '\n') { *p++ = (u8) c; } yout << "# spigot after first newline:" << yendl; yout << spigot.quote() << yendl; yassert(c == '\n'); spigot.gpush(c); // push back last byte yout << "# spigot after newline pushback:" << yendl; yout << spigot.quote() << yendl; p32 lfPos = spigot.gp(); // position of '\n' n32 tokenLen = lfPos - startPos; yv first(line, p - line); yout << "# first line='" << first << "'" << yendl; yout << first.quote() << yendl; yout << "# reading slice of spigot:" << yendl; yout << "|" << spigot(startPos, tokenLen) << "|" << yendl; yout << ynow; // flush to stdout

     The result of code above appears on stdout below. Since the in-stream inside is not dumped each time, you need to find integer values in more spots to verify expected behavior.

# yvi reading from santa: <yi me=bffffabc i0=fb18 p-0=1 ip=fb19 x-p=29 ix=fb36 ie=0> <0:p p=0xfb18 n=1 crc='0xabde5729:1'> 00000: 09 ; . </0:p> <p:x p=0xfb19 n=29 crc='0xf5e59814:29'> 00001: 73 61 6e 74 61 3a 0a 09 6a 6f 6c 6c ; santa:..joll 0000d: 79 20 62 6f 6f 74 73 0a 09 6f 66 20 ; y boots..of 00019: 64 6f 6f 6d 0a ; doom. </p:x> </yi> # spigot start pos not zero: <ycg me=bffffa90 i=bffffabc p0=1 p=1 line=1 saved=0> <yi me=bffffabc i0=fb18 p-0=1 ip=fb19 x-p=29 ix=fb36 ie=0/> </ycg> # spigot after first newline: <ycg me=bffffa90 i=bffffabc p0=1 p=8 line=2 saved=0> <yi me=bffffabc i0=fb18 p-0=8 ip=fb20 x-p=22 ix=fb36 ie=0/> </ycg> # spigot after newline pushback: <ycg me=bffffa90 i=bffffabc p0=1 p=7 line=1 saved=1> <yi me=bffffabc i0=fb18 p-0=8 ip=fb20 x-p=22 ix=fb36 ie=0/> <cg_back p=0xbffffaa0 n=1 crc='0x32d70693:1'> 00000: 0a ; . </cg_back></ycg> # first line='santa:' <yv p=0xbffffa10 n=6 crc='0xaacd43d9:6'> 00000: 73 61 6e 74 61 3a ; santa: </yv> # reading slice of spigot: |santa:|

     Instead of a funny dialog here at the end, let's just wrap things up. Way past midnight on Sunday night is no time to exercise one's sense of humor — not with a full day ahead.

license

     All this code is available only under the BriarPig mu-babel license described fully on the rights page. You do not have permission to reprint this page in any way. No feeds or repackaging is allowed. You can link this page if you want folks to read it.