|
demos are
explained here;
a menu at top column right indexes actual topic demos.
Here we demo slice.
problem
Many tasks in content editing can be expressed as picking and choosing which subset of existing sequences to add or remove from prior sequences. This sort of activity can be simplified a bit with a uniform style of access expressing subsets of sequences using some slice notation, specifying an offset z_p and a length z_n within a sequence of interest. Several years ago Wil started thinking about how best to use slices in C++ in a way resembling slices in the Python programming language, where slices express subsequence range selections very clearly. But a big constraint was a desire to make slices in C++ as primitive as possible, with the smallest amount of object orientation Wil could manage. (Wil controls complexity by keeping as many mechanisms primitive at the bottom of his software stack as possible. The idea is that some things at the bottom can be completely specified seven ways from Sunday, so they have no degrees of freedom at all. With invariable, clear, primitive mechanisms in hand, more complex things can be reasoned about without introducing ambiguity within minor tools themselves. For this reason, Wil wants state in each slice to be achingly concrete, and thus absent any subtlety.)
slice name
By convention, classes use z in their names to mean slice because s is reserved to mean string. A class name ending with z is usually a subclass of the base slice type yz.
yv parallel
The members in slice class yz and the two names of it's members intentionally mimic yv and it's two members v_p and v_n. (See the run demo for more yv detail.) Here's the state in yz: struct yz { // slice (starting position and length in sequence) «
zp32 z_p; // signed position (neg is relative to eof)
zn32 z_n; // signed length (neg is relative to eof)
// ... continued
Field z_p is a pointer to a member in an ordered collection, expressed as a zero-relative offset from the start. Zero is the first collection member and, in a collection of length N, value N-1 points at the last member. Field z_n is the length of the slice: the number of members in the slice starting at position z_p in a collection. Note a collection might not have z_n members at position z_p (and might not have a meaningful position z_p at all when z_p exceeds collection length N). A slice is a description of a hypothetical sequence subset that might or might not exist. By convention, a slice description is understood to mean the intersection of an actual physical collection and the hypothetical subsequence described by a slice. But this interpretation is up to whatever code applies a slice to a collection in some operation. As explained on the types page (column right), the 32-bit integer types zp32 and zn32 are signed to express total-length relative meanings with negative values. If a collection has length N, then negative z_p is understood to mean N+z_p, and negative z_n is understood to mean N+z_n. To refer to a position starting p before the end, use -p; and to refer to a length equal to all-but-n, use -n. Try some examples. If absolute values of negative z_p and/or z_n exceed length N, the result might be considered the empty set (the default for yz) or might be handled another way: see both implementations of yv::vz() in the run demo (cf «). By convention slices are written as ordered pair (z_p, z_n):
// ... struct yz continued
yz(zp32 p, zn32 n) : z_p(p), z_n(n) { } «
void zinit(zp32 p, zn32 n) { z_p = p; z_n = n; } // set «
void zassign(yz const& z) { z_p = z.z_p; z_n = z.z_n; } «
yz() : z_p(0), z_n(0) { } «
Methods above set the (z_p, z_n) pair. The default copy constructor and assignment operator can be used as well. Wil only uses zassign() for assignment when he wants to emphasize the yz api more clearly. (It's easier to search for zassign() than a particular use of operator=().)
yz addition
Operators support adding two yz instances (union: zjoin()) or adding a position pointer (zmax()) which is also a sort of union operation, since adding a position after a slice's end extends the slice to reach that position. /// \brief union of this & z (assert: neither z_n can be negative)
/// \usage expected useful for union of token slices in in-streams
void zjoin(yz const& z); // union of both and everything between
yz& operator+=(yz const& z) { this->zjoin(z); return *this; } «
yz operator+(yz const& z) { yz x(*this); x.zjoin(z); return x; } «
Wil decided to define addition of two slices as the union of each span and everything between the two spans. In practice, Wil sometimes wants to do this in parsers when yz is used to span a token. In this context, joining two slices refers to the slice of an input stream spanning two tokens. /// \brief increase z_n so z_p+z_n == after if after follows z_p+z_n
void zmax(zp32 after); // extend slice so after is 1st after slice
yz& operator+=(zp32 after) { this->zmax(after); return *this; } «
yz operator+(zp32 after) { yz x(*this); x.zmax(after); return x; }«
Method zmax() is actually related to zjoin(): zmax() means the same thing as zjoin() on a zero length slice (after, 0) provided after follows the slice. (If after precedes z_p, then nothing happens.) void yz::zjoin(yz const& z) { // union of both and all between «
yassert(z_n >= 0 && z.z_n >= 0); // neither z_n negative
zp32 tx = z_p + z_n; // one byte past last in this slice
zp32 zx = z.z_p + z.z_n; // one byte past last byte in z
zp32 p = (z_p < z.z_p)? z_p : z.z_p; // leftmost of both
zp32 x = (tx > zx)? tx : zx; // rightmost of both
z_n = (x > p)? (x - p) : 0; // distance from p to x
z_p = p;
}
void yz::zmax(zp32 after) { «
// extend slice so after is first byte after slice
zp32 p = z_p;
zp32 x = p + z_n;
if (after > x) // to right of previous end?
z_n = after - p; // extend length so new end reaches after
}
The next two sections show a little more of the yz class api. Wil wanted to show zjoin() and zmax() above before finishing yz's declaration.
inline skip
This zskip() strongly resembles yv::vskip() in the run demo (cf «) by removing the leading n members of the slice. // \brief zskip() uses yz as a fast sub-slice space consumer
// \limit only yz normalized to non-neg z_p and z_n will work here
void zskip(unsigned n) { // like yv::vskip() but on slices «
if (z_n > (zn32) n) { // enough members remain to cut n of them?
z_p += n; z_n -= n; // advance beyond n and cut n from length
}
else { // skip all remaining members of this slice
z_p += z_n; z_n = 0; // advance beyond remaining and zero length
}
}
Wil might later add more methods resembling the yv run api.
length relative slices
This small part of the yz api handles length-relative slice calculations. If either z_p or z_n is negative, then zrelative() returns true and this is understood (by convention) to mean a slice relative to a collection's total length of N, once length N is known. // \brief if either z_p or z_n is negative, become N+z_p or N+z_n
// \limit if result of N+z_p or N+z_p is negative, make zero instead
// \bounds if sequence length=N, don't let slice ref elems outside
yz(yz const& z, n32 N); // normalize relative to N if z_p<0 or z_n<0
void zabsolute(n32 N); // normalize relative to N if z_p<0 or z_n<0
bool zrelative() const { return 0 > z_p || 0 > z_n; } // either neg? «
}; // struct yz
You can provide the missing collection length explicitly with a length argument to either zabsolute() or a copy constructor taking the absolute length in addition to the source slice. You can see an example use of zabsolute() in the iovec demo inside ydvp::pcut() (cf «). yz::yz(yz const& z, n32 N) «
// normalize relative to N if z_p or z_n is negative
: z_p(z.z_p), z_n(z.z_n)
{
if (z_p < 0) { // negative? need N+z_p with lower bound of zero?
if ((z_p += (long) N) < 0) // relative to N becomes negative?
z_p = 0; // first elem rather than elem at negative offset
}
if (z_n < 0) { // negative? need N+z_n with lower bound of zero?
if ((z_n += (long) N) < 0) // relative to N becomes negative?
z_n = 0; // zero elems rather than negative elems
}
if (z_p > N) // points at elem beyond last in collection?
z_p = N; // one beyond last in collection
zn32 remaining = N - z_p; // elems at and after z_p before pos=N
if (z_n > remaining) // slice describes more than remaining elems?
z_n = remaining; // no more than remaining elems
}
void yz::zabsolute(n32 N) { «
// normalize relative to N if z_p or z_n is neg
if (z_p < 0) { // negative? need N+z_p with lower bound of zero?
if ((z_p += (long) N) < 0) // relative to N becomes negative?
z_p = 0; // first elem rather than elem at negative offset
}
if (z_n < 0) { // negative? need N+z_n with lower bound of zero?
if ((z_n += (long) N) < 0) // relative to N becomes negative?
z_n = 0; // zero elems rather than negative elems
}
if (z_p > N) // points at elem beyond last in collection?
z_p = N; // one beyond last in collection
zn32 remaining = N - z_p; // elems at and after z_p before pos=N
if (z_n > remaining) // slice describes more than remaining elems?
z_n = remaining; // no more than remaining elems
}
Obviously there's code duplication here, but at least the duplication appears in two methods immediately adjacent to one another. Because the effect might already look opaque to some folks, Wil prefers to avoid making the code even harder to understand by using zabsolute() inside the constructor. But the entire body of the constructor shown above could be replaced with a call to zabsolute() to get the same effect.
yz application
Throughout all demos, Wil's convention to apply yz slices to each collection type follows a common recipe verging on boilerplate: an ordered sequence s sliced by pair (z_p, z_n) creates triple (z_p, z_n, s) subclassing yz, whose interpretation is delayed until some operation uses the slice to constrain what is read or written in the collection. For example, you can see this done in the earlier iovec demo where yz subclass ydvpz slices iovec iterator collection ydvp (cf «). The following is a template for the slice recipe, assuming you have a sequence type named ys. First you define a slice subclass that adds a reference to the ys sequence named z_s: class ys; // forward declare some sequence type
struct ysz : public yz { // slice of ys «
ys& z_s;
ysz(zp32 p, zn32 n, ys const& x) : yz(p, n), z_s(*(ys*)&x) { } «
ysz(yz const& z, ys const& x) : yz(z), z_s(*(ys*)&x) { } «
struct Zq { ysz const& q_z; Zq(ysz const& z): q_z(z) { } }; «
Zq quote() const { return Zq(*this); } // to request dump
void zprint() const; // dump to stdout via yout for gdb use
void zdump(yo& o) const; void zcite(yo& o) const;
}; // struct ysz
inline yo& operator<<(yo& o, ysz::Zq const& x) {
x.q_z.zdump(o); return o; }
inline yo& operator<<(yo& o, yct<ysz> const& x) {
x.c_t.zcite(o); return o; }
Then in the ys class api itself — defined afterwards — you add sz() and operator()() as follows, both of which cast away const in a way you might find disconcerting if you prefer more exact const use. class ys { // hypothetical sequence class «
// ...
public: // slicing
ysz sz(zp32 p, zn32 n) const { return ysz(p, n, *this); } «
ysz operator()(zp32 p, zn32 n) const { return ysz(p, n, *this); } «
// ...
}; // class ys
Then given an instance of ys named s1, you can slice it with pair (p, n) using either of these two notations: s1(p, n) or s1.sz(p, n). Both return an instance of ysz by value you can pass anywhere you like. Why did Wil decide to always cast away const? In a first draft of slices, it makes things less confusing since defining yksz to mean const slice would create more classes to be processed in various places. Nothing stops Wil from defining yksz later if that clarifies a problem. (Wil would then define const and non-const versions of sz() and operator()() to return yksz or ysz as appropriate.) In practice Wil tends to see far too many different slice classes already when every single one is handled manually instead of using some generic technique. All these demos strive to show the simplest version of things that might get more complex if need arises in real problems, and only if need actually arises. One must resist looking too far ahead. Throughout all demos, Wil's convention to apply yz slices to each collection type follows a common recipe verging on boilerplate: an ordered sequence s sliced by pair (z_p, z_n) creates triple (z_p, z_n, s) subclassing yz, whose interpretation is delayed until some operation uses the slice to constrain what is read or written in the collection.
parso ergo finito «
"Maybe I'm missing something," Stu interposed. "I haven't seen slice classes used much in interfaces I see so far. Does that mean you have more slice usage coming later?" "Yes," confirmed Wil. "When I added support for slices, this made it possible to get very fine-grained control over reading and writing subsequences of collections in very type-specific ways." "But you didn't code it yet?" Stu asked. "No, not yet," agreed Wil. "It looked like it would take forever to code every possible combination of effects I thought would be useful." "Well, forever is a long time," Stu noted. "I exaggerated," Wil granted. "When you're coding, forever is an amount of time stalling you from making progress long enough to derail your plans so you don't finish them." "Ah," nodded Stu. "Now I see the relation to forever. Dead projects are often dead forever. You meant to say: you avoided stalling." "I can write lots of slices," Wil agreed, "but I don't need to write them all right now. Just-in-time coding is often hard to plan." "I'm waiting for string slices, of course," warned Stu. "I bet ysz is actually the string slice class, right?" "Yes," Wil acknowledged. "Or rather, it's the octet string slice. The Unicode string slice is named ySz." "Dex will be thrilled," joked Stu. "Why don't you just punch him in the nose the next time he nags you about Unicode?" "Yeah, sure," Wil laughed. "I'm going to turn into Indiana Jones in my old age and punch people when they piss me off." "Are you going to use utf8 or utf16 for your Unicode?" Stu asked gently. "Don't punch me, I'm just curious." "All of them," Wil gestured expansively. "A Unicode string representation can easily indicate codepoint size." "If there's an out-of-band place to put it," qualified Stu. "I guess that means Unicode strings won't just be a naked yv run of octets?" "You're pumping me for information I haven't written about yet?" whined Wil, glowering a little for theatric effect. "Heh," Stu started and then considered temporizing. "Yeah, actually. Why don't you write about things you know folks care more about? Is it to tweak their noses because they're idiots?" Wil gnawed his lip. "That's not true ... as far as you know." "What?" objected Stu. "Are you Chevy Chase in Caddyshack now? That's the weakest denial I ever heard." "I thought it was funny," Wil murmurred. "I don't write about topics other folks demand stupidly without thinking about consequences. The idea of catering to a bread-and-circuses crowd makes me sick. No one deserves to get what they want just because they demand it. I've heard folks in each workplace say stupid things about Unicode for fifteen years, and I'm tired of it. I associate Unicode with a bankrupt one-true-context worldview supposing software naturally heads toward the best of all possible worlds, without any problems or gotchas." "Instead of a speech," Stu groaned in a slumped heap, "why don't you just punch me next time instead?" "Sit up straight," Wil ordered, "and I'll answer your question. I gather you want to know more about Unicode strings?" "Yeah," Stu nodded and watched Wil uneasily. Wil drummed his fingers. "Okay, I was thinking this time I'd have both ys octet sequences and yS character sequences both work something like C++ std::string instances, in the sense of being by-value wrappers around references to actual string storage." "With reference counting?" guessed Stu. "Yes," nodded Wil. "And the physical representation would vary depending on how much content was involved, and how it was created in a sequence of editing operations." "Why?" prompted Stu. "So I can use copy-on-write as described in the cow demo," explained Wil. "One format will resemble one shown in the deck demo." "But you haven't written those demos yet," objected Stu. "See?" Wil raised his eyebrows. "I wasn't ready to talk about this yet. I'd rather just wait until it falls out of a demo some day." "Can't you go faster?" teased Stu. "How many demos have you written?" needled Wil. "I know, I know," Stu held up his hands. "I'm sorry." "Okay," continued Wil, preparing to count on his fingers. "Here's the run down on strings. One: multiple formats. Two: ref counting. Three: scatter gather. Four: copy on write. Five: some manual control, some automatic. Six: some damned Unicode library or another — I really don't care which one, and it shouldn't matter." "What about ICU?" suggested Stu. "Whatever," Wil brushed it away in a gesture. "I'm not interested in how text is interpreted as characters. I write parts of libraries managing space in binary without caring what content means." "Agnostic to the end?" charged Stu. "Yep," Wil nodded. "I support parsing strings as char subsequences, but other than that, I'm not in the loop. To me it's all just binary, and at the bottom layer it's just octet sequences I handle with yv runs." Stu snickered at an idea. "What's Latin for that? Parso ergo finito?" "I parse, therefore I'm done?" Wil laughed. "I don't know. Sounds good to me. That's probably funnier than correct Latin."
read yizi «
Method yizi::ipread() below appears here to juxtapose it with yizi::ipread() near the bottom of column right. int yizi::iread(void* pdest, size_t n) { «
if (!this->igood()) { this->ibad(); return 0; }
u8* dest = (u8*) pdest; // for u8 arithmetic
if (!n) { return 0; } // nothing? then no-op
if (!dest) { yi::inil("iread dest");
errno = EINVAL; return -1;
}
n32 outSize = 0; // total actual bytes written to pdest
n32 more = i_x - i_p; // cannot be negative after igood()
p32 eof = zi_z.z_n; // usable offsets: 0 to eof-1
yassert(zi_pos <= eof); // assert invariant is maintained
if (more) { // any bytes buffered locally in m_buf?
n32 part = (more < n)? more : n; // min(more, n)
::memcpy(dest, i_p, part); outSize += part; // copy
dest += part; i_p += part; n -= part; // net changes
}
n32 rest = eof - zi_pos; // content left before eof
if (n > rest) // more than remainder of this slice?
n = rest; // read only remaining bytes before eof
if (n) { // anything left to be copied?
// Note we make no attempt here to populate the buf if n
// is small and zi_v.v_n is big; we can optimize it later.
int actual = zi_i->ipread(dest, n, zi_z.z_p + zi_pos);
if (actual >= 0) { // not an error?
zi_sum += actual; // sum of all bytes read from zi_i
outSize += (n32) actual; // this much more put into dest
if ((zi_pos += actual) > eof) { // went past eof??
zi_log_actual("iread", n, actual); // actual > n ??
zi_pos = eof; // never let zi_pos be more than eof
}
}
else
return actual; // return negative to show error
}
return (int) outSize;
}
|
A submenu for demos appears below, letting you
go to the page on a topic written as a demo (as the
demos page defines it).
menu
thorn: todo, names, fd, iovec, assert, log, run, hex, crc, buf, in, out, quote, escape, compare, file, deck, cow, arc, blob, tree, slice « Þ, rand, time, stat, hash, heap, node, primes, page, book, pile, stack, atomic, lock, mutex, thread, map, meter, list, iter, ctype (mu: toy, peg, imm, tag, box, symbol, token, number, bigint, class, method, reader, writer, eval, env, vm, gc, world, pcode, compiler, asm, lathe, lisp, smalltalk, design, weight, jar, card, harp, debug, profile) Some demos are stubs: todo is a demo guide. See toy for mu updates on language pages; names introduces naming schemes.
license
All this code is available only under the BriarPig mu-babel license described fully on the rights page. You do not have permission to reprint this page in any way. No feeds or repackaging is allowed. You can link this page if you want folks to read it.
related classes
Before starting the main demo content (yz subclassing), let's look at a couple classes related to yz. The first is a pair of yz instances inside y2z: struct y2z { // pair of slices «
yz z_1; // first
yz z_2; // second
y2z() : z_1(0, 0), z_2(0, 0) { }
y2z(zp32 p, zn32 n, zp32 p2, zn32 n2) : z_1(p, n), z_2(p2, n2) { }
y2z(yz const& z1, yz const& z2) : z_1(z1), z_2(z2) { }
void zclear() { ::memset(this, 0, sizeof(y2z)); }
};
Of course y2z is trivial, but that's the point. This makes it simple to declare an ordered pair of slices when needed. And the existence of y2z as a type makes it possible to override methods in other classes taking this type. The next related class is yzz below, and it's basically an alternative way to formulate yz using two positions instead of position and length. Of course the name yzz is slightly odd because it seems to say slice slice, which doesn't make any sense. Wil prefers to think of it as slice prime, where doubling a letter suggests an extension via emphasis. Basically, Wil couldn't think of a short name but wanted yzz as a basic type on par with yz. Wil's intended use for yzz is to describe spans of bytes in a shared memory segment, which cannot contain pointers in data structures when more than one process might use a segment mapped at different addresses. Thus members zz_b and zz_a aim to be pointers, and pointer-sized integer yip32 is used to say these members are like pointers; but the values of zz_b and zz_a are understood (by convention) to mean offsets from the start of a memory segment. (See the types page for a short discussion of yip32 and the size of pointers; it explains how yip64 is used in 64-bit systems.) // cannot change this format: yew depends on it
class yew; // event world (async event shared memory description)
struct yzz { // span (starting position and ending position) «
yip32 zz_b; // before: offset of 1ST byte INSIDE span
yip32 zz_a; // after: offset of 1ST byte AFTER span (not inside)
yzz() : zz_b(0), zz_a(0) { }
yzz(yip32 b, yip32 a) : zz_b(b), zz_a(a) { }
yzz(yzz const& one, yzz const& two) // union of one & two
: zz_b((one.zz_b < two.zz_b)? one.zz_b : two.zz_b) //min
, zz_a((one.zz_a > two.zz_a)? one.zz_a : two.zz_a) //max
{
}
void zunion(yzz const& two) { // union of this and two
zz_b = ((zz_b < two.zz_b)? zz_b : two.zz_b); //min
zz_a = ((zz_a > two.zz_a)? zz_a : two.zz_a); //max
}
void operator+=(yzz const& two) { this->zunion(two); }
yzz(yz const& z) : zz_b(z.z_p), zz_a(z.z_p + z.z_n) { }
yip32 zsize() const { return zz_b - zz_a; }
bool zempty() const { return zz_b >= zz_a; }
bool in(const void* base, const void* p) const {
return p >= (((u8*) base)+zz_b) && p < (((u8*) base)+zz_a);
}
bool in(yew const& w, const void* p) const { // in event world
return p >= (((u8*) &w)+zz_b) && p < (((u8*) &w)+zz_a);
}
bool in(yip32 d) const { return zz_b <= d && d <= zz_a; }
bool operator[](yip32 d) const { return zz_b <= d && d <= zz_a; }
}; // struct yzz
Wil's format for self-describing shared memory segments is named yew and isn't otherwise described anywhere yet. However, using a pointer to one of these via forward reference works perfectly well when the only thing we care about is position in memory. The several flavors of overloaded in() all aim to say whether a pointer or offset is "inside" the described memory span.
yizi example
Before we show the yizi subclass of yi, let's look at a simple example of use. We want to define a yi subclass that knows how to read from a yiz in-stream slice (cf «). The example below starts with a yv run of bytes (cf «) and then creates an instance of yvi, a subclass of yi that reads from yv (cf «). The expression ilo(5,10) shown below is a slice of that yvi in-stream, and the yizi instance constructed from that is a new in-stream which reads only the slice from the underlying in-stream named ilo. yv lower("abcdefghijklmnopqrstuvwxyz"); «
yout << "# lower:" << yendl;
yout << lower.quote() << yendl; // hexdump lower
yvi ilo(lower); // in-stream reading from lower
yizi lozi(ilo(5,10)); // yi reading from ilo(5,10)
char buf[64]; // to hold bytes read
int actual = lozi.iread(buf, 64); // read slice
yassert(actual > 0); // expect success this time
yv vbuf(buf, actual); // yv run for all bytes read
yout << "# vbuf(buf, actual):" << yendl;
yout << vbuf.quote() << yendl; // hexdump buf
yout << ynow; // flush to stdout
The output appearing on stdout for this code fragment includes hexdump of lower and a hex dump of the ten bytes read into the buffer: # lower:
<yv p=0xed48 n=26 crc='0x4c2750bd:26'>
00000: 61 62 63 64 65 66 67 68 69 6a 6b 6c ; abcdefghijkl
0000c: 6d 6e 6f 70 71 72 73 74 75 76 77 78 ; mnopqrstuvwx
00018: 79 7a ; yz
</yv>
# vbuf(buf, actual):
<yv p=0xbffffa30 n=10 crc='0x4a50025c:10'>
00000: 66 67 68 69 6a 6b 6c 6d 6e 6f ; fghijklmno
</yv>
This particular example is a little too lightweight; it's barely informative. And an in-stream reading from a yv memory fragment is too easy. However, the general idea is much the same no matter what in-stream is involved.
yizi in stream «
The yizi class shown in this section strongly resembles the yozo out-stream shown in the out demo (cf «) which was written first. The source code for yizi derives from the source code for yozo. However, Wil has used classes like yizi more often in the past, given more call for reading fixed-sized subsets of in-streams. Note this code for yizi still hasn't been tested yet, other than the small example just above. Both yozo and yizi were written for demos just a week ago. The purpose of yizi is to read from a slice of another yi in-stream. The source of yizi bytes is a yiz slice describing a contiguous subset of another stream. Superficially, yizi resembles yfi shown in the file demo rather than the in demo (which was already too long to hold yfi too). class yizi : public yi { // in-stream reading from yiz «
//u8* i_0; // in origin
//u8* i_p; // cursor: must be such that i_0 <= i_p <= i_x
//u8* i_x; // one beyond end of buf
//mutable int i_e; // zero or some error status
protected: // yizi strongly resembles yozo
yz zi_z; // the span of zi_i that can be read
yi* zi_i; // the yi actually read for bytes
yv zi_v; // copy of original yv buffer provided
// valid values for zi_pos are only in range [0, zi_z.z_n]
p32 zi_pos; // pos of byte AFTER buf: where i_x points
mutable n32 zi_sum; // total bytes actually read from zi_i
enum { e_body = 8 }; // size of tiny default buffer:
u8 zi_body[ e_body ]; // only used if necessary
The original source is zi_i, which can be any yi subclass, like yfi shown in the file demo (cf «) or yvi in the in demo (cf «). The part of zi_i that can be read is described by slice zi_z: the N=zi_z.z_n bytes starting at offset zi_z.z_p inside zi_i; so obviously yizi always has length zi_z.z_n denoting eof in the slice. Triple (i_0, i_p, i_x) points to space in iovec zi_v (see yv in the run demo): a buffer constructor arg, or an 8 byte zi_body[] array in yizi. Valid offsets for yizi::iseek() vary from zero to zi_z.z_n, and zi_pos describes an offset in this range saying where first byte i_0 in the buffer will be read next time when the buffer is re-filled — so zi_pos describes the position of i_x, which is the byte after the last in the buffer. Since zi_z.z_p is the offset in underlying zi_i, the offset in zi_i corresponding to i_x is actually zi_z.z_p + zi_pos. Let's compare this briefly with zo_pos in yozo: Both zi_pos in yizi and zo_pos in yozo (cf «) describe where the next i/o will occur on the underlying stream if normal forward progress is made. So zi_pos is where the next read occurs in yizi to fill the buffer, and zo_pos is where the next write occurs in yozo to flush the buffer. It affects arithmetic calculating current stream position because zi_pos is typically a point i_x after i_p, while zo_pos is typically a point o_0 before o_p. Reads falling outside the target slice are ignored. Total bytes read from zi_i are counted in zi_sum; this can be more than zi_z.z_n since nothing stops you from re-reading the same bytes again and again. protected: // internal utils
void zi_init(yv const& v); // common to constructors
public: // non-virtual getters
yz ziz() const { return zi_z; } // slice bounding reads «
yi* zii() const { return zi_i; } // source yi «
// the length of yizi is constant: the length of the slice:
p32 zilen() const { return zi_z.z_n; } // same as ilen() «
n32 zisum() const { return zi_sum; } // sum of all reads «
// zipos() is a faster way to get same answer as ipos():
// zi_pos corresponds to i_x, but current pos is at i_p:
p32 zipos() const; // returns: zi_pos - (i_x - i_p);
protected:
void zi_log_actual(const char* where, n32 n, int actual) const;
void zi_log_pos(const char* where) const; // zi_pos problem
bool zi_fill(); // called when oc() runs out of bytes
public: // virtual methods
virtual p32 ipos() const;
virtual p32 ilen() const;
virtual int iread(void* p, size_t n); // neg on err
virtual int iseek(p32 p); // p is valid whenless than zilen()
virtual int ipread(void* d, size_t n, p32 p) const; // err: neg
protected:
virtual int _ic();
public: // unmake
virtual ~yizi(); // clear slots to zero
public: // make
yizi(yiz iz); // (pass by value) uses zi_body[] as buf
yizi(yiz iz, yv const& v); // (pass by value) v as buf
yizi(yi& i, yz const& z, yv const& v); // v as buf
}; // class yizi
Source code for yizi below is described only briefly, since you can get the general idea for each method from descriptions of the same methods in yfi (once the file demo appears). Material here dwells on how limiting reads to a target slice affects otherwise generic code. Let's start with creation: yizi::yizi(yiz iz) : yi(), zi_z(iz), zi_i(&iz.z_i) «
, zi_v(zi_body, e_body), zi_pos(0), zi_sum(0) {
this->zi_init(zi_v);
}
yizi::yizi(yiz iz, yv const& v) : yi(), zi_z(iz) «
, zi_i(&iz.z_i), zi_v(v), zi_pos(0), zi_sum(0) {
this->zi_init(v);
}
yizi::yizi(yi& i, yz const& z, yv const& v) : yi(), zi_z(z) «
, zi_i(&i), zi_v(v), zi_pos(0), zi_sum(0) {
this->zi_init(v);
}
void yizi::zi_init(yv const& v) { // common constructor «
if (v.v_p && v.v_n) {
i_0 = i_p = i_x = v.v_p; // empty
}
else { // unbuffered
i_0 = i_p = i_x = 0;
}
n32 eof = zi_i->ilen(); // offset of u8 beyond end
if (zi_z.zrelative()) { // z_p or z_n is negative?
yz zabs(zi_z); // new copy we can make absolute
zabs.zabsolute(eof); // relative to length (eof)
zi_z = zabs; // adopt absolute form as canonical
}
else { // just need to error check slice values?
if (zi_z.z_p >= eof) { // position at/after eof?
zi_z.z_p = eof; // maintain zo_pos invariant
zi_z.z_n = 0; // zero sized slice at stream eof
}
else { // see whether slice end goes past eof?
n32 rest = eof - zi_z.z_p; // space before eof
if (zi_z.z_n > rest) // more than what remains?
zi_z.z_n = rest; // up to eof but no further
}
}
// zi_pos is always where next read should occur:
zi_pos = 0; // starting offset of slice (NOT zi_z.z_p)
}
As you can see, a call to ilen() on zi_i gives the entire length of zi_i — this is used to constrain the zi_z slice so it corresponds to an actual part of the underlying in-stream. Calls to yz::zrelative() and yz::zabsolute() convert negative offset and length to absolute values relative to total length (cf «). Internal logging methods below just centralize reporting of unexpected conditions. For example, since zi_pos should describe the position of i_x, the distance between i_0 and i_x should not be able to exceed zi_pos because that would imply a negative position for i_0. void yizi::zi_log_pos(const char* where) const { «
// log zi_pos vs i_x - i_0 problem
n32 more = i_x - i_0; // total number of buffered bytes
if (more > zi_pos) { // more in buf than offset of buf end??
// this should be impossible; let's log it and go on:
ylog(0, "yizi::%s() i_x-i_0=%d zi_pos=%d", where,
(int) more, (int) zi_pos);
}
}
void yizi::zi_log_actual(const char* at, n32 n, int act) const { «
// presumably only called when act > n unexpectedly
ylog(0, "yizi::%s() request=%d but actual=%d", at,
(int) n, (int) act);
}
The destructor below and calculations of current read position are fairly simple. Because zi_pos is defined as the position of buffer pointer i_x, the current read pos is zi_pos less the distance between i_x and i_p. yizi::~yizi() { // halt «
i_0 = i_p = i_x = 0;
zi_i = (yi*) (void*) (yip32) 0xdeadbeef;
}
p32 yizi::ipos() const { // current byte position «
return this->zipos();
}
p32 yizi::zipos() const { «
// zi_pos corresponds to i_x, but current pos is at i_p:
if (!this->igood()) { ibad(); return 0; }
n32 more = i_x - i_p; // distance i_p to i_x (zi_pos is at i_x)
if (zi_pos >= more) { // expected ordering of pos and more?
return zi_pos - more; // back up from pos of i_x to i_p
}
// more in buffer than the position of buffer's end? huh?
this->zi_log_pos("zipos"); // zi_pos should correspond to i_x
return 0; // do not return negative even though it's implied
}
p32 yizi::ilen() const { // size in bytes (zolen() is faster) «
return (p32) zi_z.z_n; // constant
}
The iseek() method is only slightly complex because it tries to preserve any bytes in the current buffer if possible. In general, seeks of short distances (say around the size of one token in parsers) might often still be inside the current buffer span. So as an optimization, iseek() tries to see if the destination position corresponds to a point between origin i_0 and end i_x. int yizi::iseek(p32 p) { // change read pos to p «
p32 eof = zi_z.z_n; // usable offsets: 0 to eof-1
if (!this->igood()) { this->ibad(); return 0; }
if (p > eof) // beyond end of the slice?
p = eof; // cannot seek a position after eof
// note zi_pos points at offset corresponding to position i_x:
n32 local = i_x - i_0; // total bytes in current buffer
if (local > zi_pos) { // more buffered than pos of buf end? huh?
this->zi_log_pos("iseek"); // zi_pos should correspond to i_x
local = zi_pos; // prevent neg val from (zi_pos - local) below
}
p32 first = zi_pos - local; // offset of 1st u8 in local buf
if (first <= p && p < zi_pos) { // p in current buf?
// we can keep whatever is now in the i_x - i_0 buffered bytes
i_p = i_0 + (p - first); // make i_p point at next u8 to read
}
else { // cannot continue using any currently buffered bytes
i_0 = i_p = i_x = zi_v.v_p; // empty (pointers might be nil)
}
zi_pos = p; // where next read will occur to refill the buffer
return p;
}
Protected _ic() is only called when public inline yi::ic() exhausts the current buffer (cf «) so the first thing it does is call zi_fill() to re-fill the buffer by reading more bytes if any remain in the slice following the current buffer span. int yizi::_ic() { // called when ic() runs dry; fill and pop 1st «
return (this->zi_fill() && i_p < i_x)? *i_p++ : -1;
}
bool yizi::zi_fill() { // called when () runs out of bytes «
u8* p = zi_v.v_p;
n32 n = zi_v.v_n;
p32 eof = zi_z.z_n; // usable offsets: 0 to eof-1
yassert(zi_pos <= eof); // assert invariant is maintained
if (p && n && zi_pos < eof) { // have buffer? more content?
n32 rest = eof - zi_pos; // content left before eof
if (n > rest) // buffer holds more than slice remainder?
n = rest; // read only what remains before slice eof
int actual = zi_i->ipread(p, n, zi_z.z_p + zi_pos);
if (actual > 0) { // something read successfully?
if ((zi_pos += actual) > eof) { // went past eof??
zi_log_actual("zi_fill", n, actual); // actual > n ??
zi_pos = eof; // never let zi_pos be more than eof
}
i_0 = i_p = p; // put cursor back at origin of buffer
i_x = p + actual; // i_x from size read, not buf size
return true;
}
}
return false;
}
When yizi::ipread() finally calls yi::ipread() on underlying in-stream zi_i below, the local yizi coordinates of zero to zi_z.z_n are mapped to global yi coordinates by adding zi_z.z_p, which is the offset inside zi_i. All other business in yizi::ipread() involves avoiding reads outside the target slice, and counting actual bytes read in zi_sum. int yizi::ipread(void* pdest, size_t n, p32 pos) const { «
if (!this->igood()) { this->ibad(); return 0; }
p32 eof = zi_z.z_n; // usable offsets: 0 to eof-1
yassert(zi_pos <= eof); // assert invariant is maintained
// Note our basic plan here is to ignore any buffered bytes
// and just go directly to the source yi zi_i; but as a tiny
// optimization, it's tempting to see if currently buffered
// bytes completely cover all the desired request byte range.
if (pos < eof) { // request pos starts before slice eof?
n32 rest = eof - pos; // content left at pos before eof
if (n > rest) // more than remainder of this slice?
n = rest; // read only remaining bytes before eof
p32 px = pos + n; // one byte past end of requested range
n32 more = i_x - i_0; // total number of buffered bytes
if (more > zi_pos) { // more in buf than offset of buf end??
this->zi_log_pos("ipread"); // should be impossible
more = zi_pos; // avoid negative result in math below:
}
p32 p0 = zi_pos - more; // position of byte at origin i_0
if (p0 <= pos && px <= zi_pos) { // request is inside buf?
// positions of buffer cover requested position ends:
const u8* src = i_0 + (pos - p0); // buf byte for pos
::memcpy(pdest, src, n); // copy direct from local buf
return (int) n; // all of request satisfied from buf
}
int actual = zi_i->ipread(pdest, n, zi_z.z_p + pos);
if (actual > 0)
zi_sum += actual; // sum of all bytes read from zi_i
return actual;
}
return 0; // pos >= eof means a zero-sized read beyond eof
}
As with the similar buffer optimization in yizi::iread() (bottom column left «), nothing bad happens if the desired range to read only partially intersects the buffer instead of being totally inside or outside. The yizi api assumes the in-stream does not change, so no cache coherency issue can be involved. (To make an in-stream that copes with an underlying medium that changes, you'll need to do something more complex anyway.) Once we decide not all the bytes are in the buffer, we simply go direct to the underlying zi_i in-stream for all the bytes. Since current code makes little effort to fill the buffer, it might never contain any bytes unless ic() is called, which indirectly fills the buffer using zi_fill(). Further optimization and tuning in this code should decide when and whether to fill the buffer — this would be especially appropriate for small reads near one another. |
demos « Þ
+ todo + names + fd + iovec + assert + log + run + hex + crc + buf + in + out + quote + escape + compare + file + deck + cow + arc + blob + tree + slice « Þ + rand + time + stat + hash + heap + node + primes + page + book + pile + stack + atomic + lock + mutex + thread + map + meter + list + iter + ctype |