|
demos
Lately I write demos under thorn with a todo list using C++ under a BriarPig mu-babel license. The run and hex demos were done on 13apr2008; crc on 14apr2008; buf on 20apr2008; in on 27apr2008; ctype on 04may2008; out on 18may2008; slice on 25may2008; quote on 31may2008; escape on 31may2008; mutex on 14jun2008; rand on 16jun2008; stat on 17jun2008; primes on 19jun2008; list on 23jun2008; heap on 29jun2008; iter on 02jul2008 and 04jul2008; atomic on 06jul2008; node on 13jul2008; page on 23jul2008; hash on 27jul2008; book on 27jul2008; pile on 03aug2008; stack on 07aug2008; mu on 12aug2008; toy on 16aug2008; weight on 23aug2008; symbol on 02sep2008; imm on 04sep2008; gc on 06sep2008; map on 14sep2008; meter on 16sep2008; thread on 22oct2008; this menu links all demos: mu: toy, peg, imm, tag, box, symbol, token, number, bigint, class, method, reader, writer, eval, env, vm, gc, world, pcode, compiler, asm, lathe, lisp, smalltalk, design, weight, jar, card, harp, debug, profile thorn: todo, names, iovec, assert, log, run, hex, crc, buf, in, out, quote, escape, compare, file, deck, cow, arc, blob, tree, slice, rand, time, stat, heap, node, primes, page, book, pile, stack, atomic, lock, mutex, thread, map, list, iter, ctype 29oct08
¶
ids vs superegos
busy december ¶ I expect to be rather busy this December and January, so I likely won't do as much on this site during that period except during holidays and weekends when I turn to hobby play. Expecting that slowdown, I might finish a little material before December in new arc, blob, and tree demos I just added as stubs. The latter two will reprise some IronDoc designs in capsule form; I'll try to make the code as small as I can, leaving out details needed for database use, which you might re-add by thinking about other ways to represent an arc besides in-memory pointers favored in demos. opening paragraphs ¶ There's not much to read on those pages yet beyond opening remarks. And here I quote all the opening paragraphs, so there's no need to see the new demos until I post more updates. arc opening ¶ As described in the graphs subsection of the node demo, Wil sees most computing system representions as graphs in both code and data. Nodes are typically somewhat over-represented; structs and objects for state are typically favored in code description to the exclusion of arcs: the way you refer to something else. But careful arc design is a good way to solve problems too. Tuning graphs can target either arcs or nodes — sometimes both. This page uses the term arc to mean the same thing as arrow or dart or directed edge in an abstract pointing role in a graph. You should think "arrow" but focus on graph edges. Concrete arcs can be pointers, absolute offsets, relative offsets, map keys, and/or combinations of these in series. A chain of arrows acts like one arrow with extra internal detail exposed. Wil rarely bothers discussing this perspective, because arc design folds seamlessly into effective coding. However, this also means an arc perspective is usually buried, and Wil rarely tells you the other options that would have worked too. So when Wil shows a very specific way to code blobs or btrees, you might have trouble seeing the other ways to code them using different arc formats to point between nodes. To help you with this, Zé offers material below on arc ideas, and later nags Wil to present arc agnostic designs in future blob and tree demos, along with Wil's concrete code. blob opening ¶ In the middle 90's Wil started using a specialized btree format to represent blobs in storage systems — where a blob is basically hard to tell apart from a file embedded in a database — in imitation of tree formats described by Exodus at the University of Wisconsin (cf Storage Management for Objects in Exodus). Given any block-oriented media for allocating space for use as nodes in tree structures (for example: pages in files using pseudo virtual memory) you can build a "file system" representing blob "files" as btrees in a manner called Exodus-style btrees in IronDoc literature. This page calls them blob btrees. Designs and code shown on this page present a way to encode scaling byte sequences as blobs, much like files, using a tree to index leaf nodes of content. Blob btrees index using a simple scheme: inner tree nodes just count bytes of leaf content. So seeking any offset can simply skip bytes to reach a desired point. Seeks, inserts, and deletes are all logarithmic (O(logN) means "log time") at all offsets in a blob file; blob edits have the same order of performance as appending to a Unix file. So for example, deleting the first kilobyte of a 100MB blob is just as fast as truncating a kilobyte at the end — just another efficient btree edit. It's unnecessary to touch bytes downstream from a blob btree edit. tree opening ¶ For use in persistent maps (or dictionaries) which scale much larger than main memory, Wil prefers to use a format called a btree (wikipedia) whose name means balanced tree. Traditionally the name is written b-tree, or sometimes b*-tree (denoting a variation with non-root nodes aiming for least two thirds full, not just half). But Wil adopts a common practice: dropping hyphens in frequently used terms. So Wil writes simply: btree. 26oct08
¶
insensitive clods
mental drafts ¶ During early phases of design or exploring ideas, it often goes faster to imagine everything instead of drafting a code api, because actually writing is slow and distracts with details not pertinent to issues of interest. I've been imagining a mixture of data structures, diagrams, and approaches to plain english descriptions. I'm half certain I'll precede a btree blob demo with another about arcs, if you use the term arc as a synonym for pointer, or reference, or some way you refer to something you want as part of a node in a graph. (If you can't stop yourself from thinking of Graham's Arc programming language, just imagine I'm saying "arrow" or "dart" instead, or your favorite synonym for arrow.) The arc demo would be "written" by Zé instead of Wil, since it's mostly about ideas and options, and doesn't involve code per se, which is Wil's territory by convention. Zé wants to briefly touch all the many possible options in representing an arc, in order to introduce the idea any particular way you pick to write code for graphs — btrees in particular — will tend to prevent you from representing other sorts of graphs using different arcs. The idea would be to introduce a little terminology, and maybe some conventions for representing schematic graphs in s-expression syntax, to enable description of btree psuedocode without committing to a specific way it would appear in C and/or C++, even though Wil's code will embody algorithms in concrete choices. I think I'll make one pretty Photoshop diagram of a schematic btree graph with some shared structure. Then I'll present a text representation of the same thing (using Lisp notation for labeling shared structure) to define linear text conventions intended to mean the same thing as pretty diagrams I'm not going to draw. Then all the pseudocode descriptions of btree operartions will be illustrated with diagrams of the linear text sort. blind chess ¶ When I was in my 30's I had to actually write code to work out all the details. Staring at an api helped prop up my working memory so I could imagine uses of it and possible conflicts. Sometime in the last few years, I noticed I didn't need to do that anymore. By the time I worked at Netscape I could design code right in the middle of conversations where folks stated requirements. But I still got a better picture of the design later when looking at the code. A few years later, it stopped making a difference: looking at code didn't make me work out answers faster or even better. And I can actually visualize what C++ code will look like in my mind's eye when I want, so writing it is just transcribing from memory. But usually I don't work out every detail, when it's obvious, unless really interesting. The net effect of this is what you'd expect: I don't need to sit at a keyboard to get work done. I can code when I'm driving, and so I do, of course, as much as it might chagrin you to learn. But usually there's not much reason. Sometimes in meetings at work, when my attention isn't really needed, I'm actually writing code or planning how it will go, working out the details since the limit in speed is mainly knowing exactly what the plan really is. Maybe 18 months ago, the project director would occasionally glance at me, and I'd say yeah, I'm working out the code plan — what I'm going to write next when I get back to my keyboard. He'd say, yeah, the plan takes all the time. However, actually writing code is when you can try to make it shorter and clearer than it might have been. Size and complexity cost nothing in your mind's eye, but you can see both take up space in black and white text. I imagine another developer reading my code as I write, and guess whether parts will be too unclear. When you're typing is an opportunity to sharpen clarity. Or, when you haven't worked out the whole plan yet, you spend half your mental cycles working out the rest of the plan while filling in the place you see right now. 25oct08
¶
big fish eat little fish
btree sideshow ¶ I keep thinking I'll go right back to toy language material. But it doesn't seem most appealing at the moment. I find myself thinking about btrees again instead. For some reason that seems fun right now. And I try not to question my urges in pursuit of fun hobby stuff, since I don't want to scare off the muse by being a grind when it comes to scratching itches. Being a grind about what you think you should be doing is a good way to leech all the joy out of the process. And I've been looking for the joy. There must be a phrase for what I've been doing. (Or maybe I'm recalling part of a science fiction novel.) If there isn't a phrase, we should coin one. Feel free to tell me which one is used most often by folks in creative pursuits. Anyway, I've been courting inspiration by going through the motions in my demos and toy language material. Maybe you thought I was having fun, but I wasn't quite — not just yet anyway. But I was asking for it. I thought if I went about my business as if I was having fun, eventually it would start to happen, and I'd finally be back in the zone: the good place where work and joy meet in a mutual, nurturing, virtuous cycle. I thought this would happen first in toy language stuff, but it still has some uninteresting grind phases to go through first, recapturing known past stuff I know without learning anything in the process. I could tell something was wrong because it wasn't quite clear what I was going to do when it was working. Getting in the zone works a whole lot better when everything you do enables the next thing you want to do: when this tool makes it easier to investigate the next one. The trick is partly perceiving a dependency between one hobby pursuit and another, so they amplify one another. For example, if I had something in particular I wanted to drive from my toy language, I'd go about it more enthusiastically. At the moment I find I want to play with btrees again, which is great: the important thing is just to want something, because that can be carefully stoked so tech wishes motivate each other. In this particular case, I was planning to code scaling strings as btrees in garbage collected memory, so I can turn a btree junket now into a study for gc runtime btrees for arrays and strings in a language. And I can feed that back into an interaction between language and persistence systems. So I think I'm going to write about Exodus style btrees for blobs, with the usual sort of copy-on-write support, but in memory-only form so no one will mistake this for database work right now. It will fall awkwardly short of addressing file formats and gc box formats, by heading somewhere down in the middle, but in spitting distance of both. I'll probably crib from my old IronDoc sources in the process, but I'll write everything using thorn api and class support. Probably it will work as a useful bridge between disk-based and memory-based code in the future, and it will also give me something to eagerly port to toy language builtin data structures. This is also a good point in time to document some of my old copy-on-write work since it's completely irrelevant to my current day job, but it's one of those things companies tend to be obnoxiously possessive about when it does have any bearing on work problems (as if it wasn't all sort of basically obvious from first principles). Maybe thinking about another IronDoc style page cache recently when writing the thread demo got me in the mood. All I know is I've been thinking about it off and on all day, and I started looking forward to working out some new experimental code for btrees without having it revolve around file formats. I might actually learn something, which is maybe why it has my juices going. It will seem more like generic library work than anything else. I'll avoid language and database slants deliberately, which will also make it simpler when it's abstract and app agnostic. scaling and probability ¶ I often advertise my interest in scaling, but I suspect some folks think it means something different than what I intend. These days, most of the time folks talk about "scaling" they're talking about building out hardware infrastructure in support of scaling. But that's not what I mean. I'm not into hardware, and I don't design and spec hardware for server farms for scaling. I do local system scaling and not global hardware farm scaling. The only part focused on scaling in distributed systems is making each node cope with the fact all content involved participates in async conversation with many nodes, and coding this for correctness. In my day job, when I talk about scaling I often mean something in particular: making every operation constant time if possible. Folks get sick of hearing me say the phrase "order one time" over and over again. What didn't I like about your last idea? It can't be done in order one time. In systems with a big network performance goal, you can't afford to spend much time doing anything. For example, can't physically go visit millions of objects you want to invalidate all at one time, even as a non-blocking background task, if it runs CPU usage up to high levels — that alone can cause network traffic to start backing off. Near 100% CPU utilization for several seconds in a row is a bad idea. You can usually turn this sort of task into a lazy duty performed incrementally as you go about your other constant time tasks, adding a tiny bit of overhead to each one, spreading large costs evenly across a zillion small costs. I had really meant to make this blog piece be all about a specific view of probability, but I added scaling to the picture because that's the context in which analysis of probability is interesting: what happens to certain probabilities as you scale something up to use all available resources to do whatever it is? I wanted to write a little about my meta reasoning, applied to some topics in dedup, because the way I reason about the issue is generically useful and not specific to dedup in particular. However, I'm not going to tell you any of the results of my reasoning, when it comes to dedup anyway. The results have something of a trade secret flavor to them, because the results were found on the clock in pursuit of day job problems. They're not mine to share. But some of my results were found by the manner in which I went about thinking, then testing hypotheses empirically. How I go about reasoning is something I can share. It really amounts to little more than undergraduate discrete math coupled with basic scientific method: trying things and seeing how it fits models you form. There's hardly any trick to it. Let's say you write some app, or system, or service that scales really big. And part of this software runtime contains sets of different kinds of objects in relationships satisfying your needs so the app works. The basic question you have to ask yourself is a simple one: what can you conclude about the entire population of these objects, taken altogether? For example, how many of them are there? What are the relative ratios of different sets? What do actual and potential ratios tell you about probabilities of good or bad situations with respect to your app? If the actual probability — not the probability you were hoping would obtain — has a big potential impact on the value or quality of your product, then you should find out what it is by generating and analyzing actual populations at the scale of interest. Then you can think about making changes altering probabilities in directions to your advantage. And when you already have means to empirically measure factors of interest, you can directly explore the effect of changes you think might work better. Some of the data sets and populations are not easy to measure directly off your running system, because you don't see all the potential emergent relationships — just the ones during a current run. To measure all potential cases, you might need to write code explicitly constructing a full set of potential values, even if you'd never see all of them in practice. Otherwise, you won't know certain probabilities. For example, you only have a theory about a count of all potential cases until constructed and measured. Any time your code has reasoning that includes parts like "this data set contains a set of things, and I'm going to find some of them" — you should find out how many of those things are actually there to be found. Don't guess or assume. 22oct08
¶
bobbing for apples
finished thread demo ¶ The thread demo is now fully done. The tail end of the right column contains a small piece of random fiction, added in lieu of writing a much longer piece I'd rather if I had time. However, there's not much reason to add fiction to the demos. In this case, adding a tiny bit lends characters a little flavor, as if they actually have existence apart from foils in dialogs. 21oct08
¶
vacant parking lots
thread demo ¶ The thread demo is around two thirds to three quarters done now. I'll finish it later this week. It's missing explanations for thread subclasses and the output for sample execution runs. But all the code is marked up and the test runs will show a few interesting effects when I explain them. 18oct08
¶
wheels within wheels
pending demo ¶ I'm working on a thread demo, but so far I've been slicing and dicing sample code and examples in pursuit of meaningful but tiny pieces. It's hard to write a demo of threads small enough to mean anything while avoiding totally fake and arbitrary results. It's tempting to write another thread-safe page cache of the sort I used in IronDoc, since I end up coding one in day jobs every few years, and it doesn't change much while being useful, realistic, and informative. The kind I've been favoring for a while now amounts to a simulation of memory mapping, but with more control over when i/o occurs. (It turns out that real memory mapping tends to make terrible performance and latency choices when the OS lacks app specific usage details.) But in the short term I'm just going to show basic stuff, like how to manage a thread pool, how to make them stop cleanly using barriers, and how to write a pthread barrier clone when your system doesn't define barriers in pthread.h. (Barriers are present in Linux, but not on my Mac laptop, so I need to write a barrier class if I want it in the demo. I wrote it today but it's not tested.) non-determinism ¶ This week was a total grind at work because we reached that stage again where I need to convince folks the dedup codec works according to plan, despite it's inherently fuzzy, heuristic, stochastic, non-deterministic algorithm. Actually, encoding is deterministic if you start with exactly the same system start state and present exactly the same sequence of things to encode in exactly the same sizes of pieces. But if you vary anything, the end result is only approximately similar. It's defined not to be exact. I didn't come up with the original algorithm. I just tuned it and reworked all the pieces to leave out unnecessary details, leveraging the most useful parts. I cleaned up definitions, squeezing them for efficiency by exploiting improved definitions. But it's still just as heuristic and inexact in nature as it started. I just increased probabilities of good effects and decreased those of bad ones. But the result is still probabilistic. So it's really strange when folks expect exact, repeatable, deterministic behavior when even such small details as size of network frames will alter encoding. Especially when someone testing doesn't guess results are highly content specific. It turns out it matters exactly what was inside network traffic, because there's (basically) an infinite number of legal ways to encode it each time. One preferred approach is pursued, but it's heavily influenced by actual content and past history, both. Folks ask, why did it do X here? And I reply, it's defined to be allowed to do X there if appropriate. So they follow up with, but why did it happen this time? So I say, it depends on everything — for example, what was content this time? And how big were the pieces? So this week I wrote code every day to reveal what was happening in detail not previously tracked or reported. And it's still not yet sufficient detail to eyeball a report and be satisfied you know why this case acted thus. But the level of detail has already grown to a scale that loses the simplification of models inherent in the algorithm. You can't see the forest because now you see not only the trees, but all the leaves, imperfections in bark, and ants crawling on trunks. It causes detail shock. Part of the problem is in what folks expect to be visible in text displays. Even when I write a data structure to build an overall story explaining everything that happened (and where events affected which part of an encoding) when this is printed as a single linear display, it's inadequate. The actual picture of what happens is two dimensional and layered like transparencies on an overhead projector. And I can't show this in a text view whipped together in an hour. It would still be hard to show in a fancy GUI — you'd essentially have to write a codec debugger to really understand without getting lost. (Instead of source code, a codec debugger might display patterns of similarity, search, and references in content flows already too large for comfortable human digestion.) Anyway, some people would only be satisfied if encoding was trivial and exact, when in fact it's gnarly and inexact. They confuse the idea of correct behavior with simple and repeatable behavior, when neither of those apply to correct solutions in adaptive, opportunistic algorithms with strong internal feedback looks in system state. We can verify it always generates correct answers. But oddly folks wish only ideal things would happen, as if no heuristic fallback code path would ever be reached. It shows a mixture of ideal and non-ideal results, in a mix resulting from heuristic strategies. When that's how it's defined, non-ideal results occur here and there. Whining about it changes very little. You should pursue a more block-oriented approach if you want endless permutations to look simpler. 14oct08
¶
decisions
vacation ¶ Okay yeah, I've been taking days off my project here to relax. The past couple days I kept planning to start the posix thread demo (because that link still draws a lot of hits) but I let myself read escapist fiction instead: a Heinlein novel I last read over thirty years ago. There's no point in mentioning the title since I haven't anything to say about the content. But it's always interesting to note how my mind has changed since the last time I read something. So, I think I'm going to write a thread demo next, featuring a dialog developing a very simple C++ wrapper class to use pthreads using the posix api. But I'm not sure when I'll start. Then I suppose I'll go back to toy language stuff. opportunity ¶ I'm trying to make up my mind about an opportunity that's really tempting because it has prospects of really interesting work again, though it would involve working in Java a lot, which I've hardly used. They don't mind my lack of Java experience, but I'm wondering how much I'll mind lack of control when trying to do optimization work. I would almost certainly net less income if I make a change, and I haven't finished digging myself out of debt yet, playing a big part in how I think about risks of diving into something new. But worrying about my finances is kind of backward looking, when I should really think about where I want to go instead of escaping money craters. What I do now is really narrow, and not very challenging. And the last round of team load-balancing had the effect of isolating me from i/o parts I enjoy more. However, I have a lot of time to work on projects like a toy language on this site, because my current work is not particularly demanding. If I started a new job, I'd spent most of my time coming up to speed for quite some time, and output here might fall off to light, sporadic updates for a few months. I've been in my current position two years and I'd like to make it three. I don't want to change jobs often any more. It's just this current possibility is really high on my list of best places to work in terms of fit with my interests. I've had my eye on them several years. So it just has all the signs of the right thing to do, except for downside risks. If I take a new job, I'll have to throttle time I spend here even more, to focus on just basic toy language stuff alone, with an aim to get it done and wrap it up. The problem with making a decision so far is that I can construct positive and negative arguments for and against making a change that don't resemble each other. The gap is very wide: best and worst case outcomes are far apart, which suggests I'm really guessing blindly, motivated by hope. It's been a while since I hoped for better things. Of course, if I do change jobs, I won't tell you. I'll keep my policy of saying as little as possible about day jobs here, unless a really good reason to change arises. |
Entries appear in reverse chronological order.
Content here is permanent: Each entry has a permalink
(¶) to
the long-lived persistent copy here. Clearly, to link
anything, you'd best link the permanent copy.
11oct08
¶
places for everything
same bat channel ¶ I might return to programming material tomorrow. But then I must also put in a extra hours at work tomorrow afternoon, so maybe I won't write much this weekend. I could launch into a little code right now, but I decided instead to shortchange you with a little fluff best described as "blogging." It's topical in a sense of relating to financial accounting systems, which folks recently noticed (again) aren't always all they're cracked up to be. Someone is willing to game every system, so it's inevitable. accounting ¶ I know more about accounting than most geeks. At one point when I was around twenty years old, I double majored in computer science and accounting for a year or a bit more. I'd developed a bit of curiosity about business systems when I was working at McDonald's between school years at the end of the 70's. I could tell a few funny stories about McDonald's, but actually I don't want to remember. I did a good job of misplacing most of those memories and want to keep it that way. You'd think a guy with an IQ of 160 could find better work than McDonald's, but not where I lived in Iowa in the late 70's, which was — in a word — barren. You guys don't know how lucky you are to grow up in an age where you can find work at a distance using the internet. I grew up in a small urban oasis in the middle of a corn desert. In the Dark Ages. But they had books. For example, I read The Sun Never Sets on IBM while working fast food, which told the rise and fall of IBM in a style mocking the rise and fall of the British Empire. I started reading about accounting because businesses seemed a little bizarre. I had once read a science fiction short story where a teenager was caught dabbling in the dark arts of accounting, clearly cast in the role of black magic. When confronted by adults, he protected himself by conjuring a daemon with double zeroes in place of eyes. My family had predisposed me to interest in accounting. Visiting relatives, seeing me absorbed in detailed projects around the age of eight or nine, always observed, "One day he'll be a certified public accountant with attention to detail like that." They were all working class, so this was apparently the only use they could imagine for an attention span in a child longer than any adult's. I'd keep books for other people who did something useful, like make money. By that time my father had started his own business — in Tennessee, where my status was damn yankee to other kids — selling "business systems" to small business owners, consisting mainly of business records storage (on microfiche for example). If you like, think of his business as selling clay tablets with attached stylus to shop keepers using cuneiform for business records. It's about as comical, and foreshadows the quick collapse of his business during rapid innovation in storage systems. Instead of declaring bankruptcy, which once had quite a stigma attached, he instead spent years paying off his debts and finally hit a net worth of zero again by the end of my high school years. He seemed a bitter and miserable man, but I supposed I would do better since I had a gift for arithmetic; simple mathematics, like that tested in SAT's for example, was a breeze to me. In retrospect, it's a little disturbing to find myself sucked into storage systems in professional computing work now and again. I changed majors to computer science a couple years into college because programming was a lot more interesting than circuits in electrical engineering, even though I'd aced all the calculus and physics classes. It was clear EE would suck as a job because you could never do anything interesting by yourself, whereas in programming you could do meaningful work all alone. (You can probably see the irony here: over the years it has become very hard to do meaningful work all by yourself in programming.) But for a while I also double majored in business administration because I wanted accounting classes to explain how ordinary businesses worked. I walked into local businesses and asked rude questions about inventory practices. I pissed off business owners by asking how they acquired their positions. I toured facilities at my home town's newspaper because I wanted to know how it worked. (For some reason, I was treated like a visiting VIP at the newspaper for reasons completely beyond my comprehension.) I read books about the acquisition of wealth. Standard advice was simple: don't spend everything you earn — invest some of it. But a couple years of learning the nuts and bolts of accounting systems exhausted my supply of curiosity, so I went back to just computer science. But not until after I'd aced three quarters of accounting. In the middle 80's when I lived in Berkeley, while I taught myself compiler writing before I returned to school to finish my degree, I worked for a while as a clerk in a law office in Oakland. There they set me to work running their accounting systems when I wasn't filing away paperwork for legal cases. I keyed in all time and expenses billed to clients, cleared checks in the general register, and balanced their accounts before generating final balance statements. An actual honest-to-god bookkeeper came in once a month to double check my work. Why did they have me do this sort of work? Because I tended to not make mistakes: I was freakishly precise, and I noticed any data that didn't fit. By the 90's I was working full time as a software developer, and in 1994 I worked for a primary vendor of software in the derivatives market segment. They were experts in software for the sort of dodgy financial instruments you have come to loathe in Wall Street's practices. I read books about derivatives and about packaging and re-selling financial instruments like mortgages. For example, the explanation of how Fannie Mae worked — and rationales behind mortgage valuations — had a kind of rank smell to it compared to any more pure form of mathematical system. By the standards of accounting I had learned in college, mortgages had a fishy smell because they were divorced from clear auditing that might provide evidence grounding you in reality. Assuming you cared. Sometimes I call myself a plumber in computing systems, but it's actually more accurate to describe my role (some of the time) as that of an accountant. In some tasks, my job is to optimize memory and CPU budgets, since you can't spend more than you have in the real world, unless you have a scam going. At the same time, the idea is to pinch every bit of efficiency out of resources available, like an anal bean counter. When tweaking runtime systems in software, I'm like the accountant who asks, "What do you want the answer to be?" since with a little creative accounting, I can probably manage something like what you have in mind. That's what you get if I'm in left brain mode: balanced books with a place for everything, and everything in it's place. But I'm much more right brained than left brained, and if I'm in that mode, you get something else — and it's not something they can teach in school. Most folks aren't any good at it. To call it thinking "outside the box" is a dumb way to say it. It's better to think of it as a facility for imagining boxes of all kinds — sometimes very strange boxes — and simulating what happens given sets of different kinds of boxes. Among other things, it leads directly to systems without standard terminology, and it's very easy to imagine the basis of a coordinate system incomprehensible to others (who imagine the only way things could ever be is similar to what exists now). But in practice, with short term goals, only small changes are useful when most things must stay compatible. Right brained thinking is most useful when you're stuck. I can imagine being unstuck several different ways. It's easy, and you get your choice of side effects for each way to get unstuck. It's also very useful when you want to know what would break the current system — lots of things will break it. Everything has weaknesses. You can even do armchair debugging this way: when evidence says an odd thing occurs, just ask, what might cause that? Right here I'm going to drop a couple digressions and wrap this up by heading for thematic closure. Remember my poor dad? The poor shmuck who got himself in debt up to his eyeballs? I was pretty sure I'd never let that happen to me. Turns out, when you get married, you're in business with your spouse, in the eyes of the law. If you marry a complete idiot with numbers, who can't be responsible even in a pinch, you end up having the same result you would if you had selected a moron as a business partner. Which you would know not to do, right? And it would be even worse if your partner was dishonest, much craftier than you imagined, and diabolical as an adversary. The phrase "taken to the cleaners" is not just a bit of rhetoric folks started using in the context of divorce for purposes of color. It's a fairly accurate description. But it falls short of what can actually happen. You can lose more than everything you have. In the business world, about the worst thing that can happen is going bankrupt. Going negative is not enforced in business law, but it is enforced in divorce law and child support. You can go negative by an amount that becomes daunting very quickly, especially if you struggle. (Don't do that: just give up fast.) Okay, now imagine owing your ex an amount of money that's a sizeable fraction of your gross annual salary. What do you think will be necessary to pay that debt back? As a clue here, try to imagine the folks designing the rules as mathematical idiots, who can't do simple arithmetic. As far as they're concerned, you've got the money. First off, you're going to pay almost all the tax on your income whether or not a hefty chunk goes to your ex after taxes, because you pay tax on all the part labeled "child support," and it's the big piece. After taxes, you're going to split the remainder nearly in two, and one piece goes to you ex. Whether or not your ex works, her income now exceeds yours, by a lot. Out of your net income, you pay your rent and all other expenses. Probably you'll be asked to pay more expenses for the kids above and beyond what's already paid in child support. And of course, you're feeding, clothing, and supporting them yourself half the time as well, and it's not cheap. (They're on track to cost me a million dollars apiece by end of high school, once I include my ex.) You pay your ex back the money you owe from whatever's left. In round numbers, you might need four or five dollars of gross income for each one dollar of discretionary spending after taxes, support, and living expenses. For each dollar you repay against a debt, you also give your ex more than a dollar in support from the same five dollars of gross income, and pay taxes on both. The overhead on paying back the debt is spectacular. My net worth will be back up to zero shortly, as my older son hits the end of his high school years. So to him I look like the same kind of loser my old man looked to me. It's only taken about four years after the low point to get back up to almost dead broke. All I had to do to get here was trust someone far too much. Do I trust you? I never trust anyone, anymore. 08oct08
¶
battening down hatches
home stuff ¶ Sorry about low output here this week. I've been taking care of the real world at home. It seemed a good time to upgrade my sons' beds, which have been ad hoc and informal for a couple years now. One evening I bought new pillows, sheets, and associated folderol, then washed everything old and new. The next evening I dragged my sons to a mattress place to pick upgrades they like at reasonable cost. financial fairy tales ¶ I find the economic financial meltdown a bit distracting. News of this sort tends to engage the part of your mind that projects alternative future possible realities (let's say future sight or some other simulation name) to anticipate how details might affect you personally. Anyway, evaluating risk in the context of alternative permutations of situations is very nearly the same sort of task involved in designing some parts of code, so the code creation part of my mind has been busy in evenings of late. Either that, or I'm rationalizing. 05oct08
¶
splitting hairs
lexical syntax ¶ I wrote an informal description of lexical syntax on the lisp page for the toy language, after finishing my pass over all the Scheme SRFI's to be included in Aim. team sizes ¶ Sometimes I'm asked odd questions I fail at first to see the motivation behind. For example, I was once asked about sizes of teams I have worked on before. I answered the question as if it had been asked, "What team sizes are good? What's normal size in the industry?" But afterwards I think the question really meant to ask, "How many people worked on your part of the project?" Ah, now that's a completely different sort of question. Whenever I say in writing (like on a resumé for example) that "I did XYZ," this means the team size working on XYZ was exactly one — me. I use the word "we" when we did something as a group. For example, when I explain the objective we were aiming to achieve as an overall product, in order to provide context. I might say we were writing a server to do ABC. But in that server, if I say I did the XYZ part, then no one helped me, and in many cases I did everything from specify the problem, write all the code, test all the cases, and debug all the problems. In such cases, other team members only had the vaguest idea about details, and only when they needed to share my api — for example, when writing a draft VM to execute code emitted by code from my compiler targeting the VM I both invented and specified. Of course, I understand why such questions are asked, because when I interview candidates coming into my projects, I want to know the same thing: did you do anything on your resumé? Or did you just happen to be there when it happened? It's very common for people to put stuff on their vitae which summarizes what the entire team was doing, even if they had nothing to do with any of it, in a remarkable display of (either) intellectual dishonesty or failure to grasp the idea. Several years ago I had a job at a startup where I did things sounding like output of four or five engineers in the course of a couple years of development. But I did all of it myself. One of the reasons I was so productive was because I personally wrote more than 90% of the APIs I had to use, which meant I never spent any time trying to figure out what something might mean, because I already knew. Anyway, almost all productive engineering teams have a fractal structure, with typically three to five engineers doing something similar, with close interaction. Large and complex products can have fifteen or eighteen engineers altogether, and up to that many more folks doing QA, project management, marketing, docs writers etc. A startup might have five engineers that do basically everything, with three of the senior guys doing all heavy lifting. A bigger company has bigger teams, partly because they can afford benchwarmers (for lack of a better word) and excessive depth in QA teams who sometimes seem to do nothing. But there's still usually only a handful of engineers that do heavy lifting. I do the heavy lifting when it has to do with data structures, memory management, thread safety, using lots of resources, efficiency and safety of exact algorithms, and reducing exotically messy problems into relatively simple parts. It varies from job to job, but usually any task that requires someone really know what's happening at a low level in C++ is handed to me, because I deliver a superior result in several says: 1) it actually works, 2) it stops having bugs, 3) I can explain what it does, 4) management can understand my docs, 5) I made it simpler, 6) you'll want to patent some clever part, 7) it goes faster, 8) it scales larger, 9) it gets refactored, 10) I keep alternatives you can pick, 11) it has stats revealing inner workings. But I can generate results you don't like, especially if you pile on constraints in the hope of killing one of the options by making it too complex to solve. Uh no, I'll still solve it. And then you'll be stuck with this complex thing. If having it be really simple in the end is one of the constraints, you have to say that explicitly instead of pretending all the complex requirements are really critical, expecting that will derail things and make the result moot. I'll make it work anyway, but you'll hate it. Just tell me not to do it if you don't want it finished. Usually teams have a few folks you can bounce ideas off, instead of having to work out every detail in a vacuum. I like to bounce ideas off folks so they can tell me not to do something that sounds too complex. But most of the ideas I pursue are ones I had myself, as opposed to those suggested by others — I don't know why other folks suggest few ideas. I work on whatever problems need attention. But over time I tend to get stuck with families of related problems once I've gotten deeply involved in reworking how things address earlier problems. Other folks get assigned tasks that sound like "just get something working, even if it's wrong" because I'm more likely to do it right if that takes slightly longer. I'm not very good at randomly scaffolding temporary hacks without any well-defined explanation. But I'm really good at doing parts you want to believe are correct — such as parts that would otherwise lose user data if behavior was erratic. I always work closely with several engineers, and we cooperatively decide who is doing what. But parts assigned to me are usually done without help. I'm not sure why that happens. No one minds when it gets done correctly. Usually I work out the answer and then run a solution by whichever engineer has the best grasp of a problem to see if my solution has favor. 03oct08
¶
where magic goes after
lisp specs ¶ I started a new lisp page for the toy language, specifying early, cursory specs for a Scheme subset named Aim. I was distracted the last few days. This weekend I should be able to fill in that page somewhat more. I need to finish a review of all the SRFI's while picking candidates for inclusion in Aim. And I need to flesh out lexical syntax details. 01oct08
¶
hill views
sample code ¶ Here's a short intro to code appearing in demos from the perspective of sample code even though all the C++ code here is written in a fairly odd style (as described under thorn by names and style pages). Most of the code you find here is actually production quality C++, except for one very unusual practice: using comically terse class names. The extremely short names started as a private experiment to see how short it was possible to make C++ code names and still work. Almost none of the code published here uses an acceptable work style, because names are short and cryptic, and whitespace is cramped both horizontally and vertically. However, I have used clones of this code in production servers using more conventional naming and whitespace to match project style guidelines. (Correct code style to use in work environments is always what looks just like code written by everyone else.) Viewed as sample code, you can't tell a whole lot from code you see here, except that I know how to do pointer arithmetic and memory management. (And liberal use of character dialogs with occasional profane language shows I expect very little professional exposure to this material — this site really isn't for professional purposes. The main purpose is really publishing old and new hobby code related to programming languages, without drawing too much undue attention. For example, you can't find any code in a form that can be downloaded, built, and executed, because if I did that folks would write and ask questions. Currently there's no social purpose to this site.) Many of the linked pages are just stubs for future writing, so the todo page should be consulted before clicking on interesting looking links. So the thread page is still a stub, while the mutex page has interesting stuff to read — if the short names don't freak you out. More recent code, like the writer which pretty prints Lisp code trees, shows some high level application logic specific to getting some task done. But most of the code is a library of C++ utilities, starting with the extremely primitive pointer-plus-length iovec utility code in the run, which I tend to use heavily (with better names) in most real C++ projects. Most of the code is what is called "utilities" in most projects, and clones of many of these classes end up in some utils source code directory shared by all other project libraries because the utilities don't depend on anything else. If you're looking for code related for specific topics, you might examine the following demo pages:
page re-org ¶ As I mentioned I would several days ago, I started moving exploratory fiction to a separate page so just-for-fun content isn't mixed with language project material. |