Þ   briarpig  » mu  » toy


problem

     toy — following progress in toy programming language lathe: a lisp dialect plus smalltalk class system, using þ library using the mu-babel license, under mu which presents broad context including a main focus on usefulness to the author.

     A "toy" language has a cursory initial implementation of limited scope, typically with a pedagogical or exploratory purpose, seeking small size for many reasons including: lower cost, expendable investment, reduced cognitive footprint, swappable parts, simpler mechanisms, shorter specs, loose choices, informal decisions, less inertia, little commitment, open options, growth headroom, easy scaffolding, and stable testbeds.

     In short, Wil feels the point of a toy language is using a cheap starting point to make more languages, using tools with bootstrap potential to kite into higher territory. Given one simple toy language of stable behavior and dimensions, you can use it as a satellite tool indefinitely in service of making other tools and languages of greater complexity, if warts are not too big. A "real" Lathe might be easier to build with a toy lathe, using more than one generation to bootstrap bigger things.

     Among other things, no one version need follow a one true way until your magnum opus (if you must seek one). A first simple version should be easy to get your head around.

incremental «

     Wil aims to write mu pages incrementally without finishing any one for a while, dabbling here and as mood and inspiration occur. Order of pages progress from simpler, concrete, specific topics to more complex, abstract, general topics — adjusted to group similar things. Early content jumps around.

memory format «

     The first generation of lathe will focus on memory format more than anything else, besides seeking right answers to simple questions about doing each basic thing needed.

     Wil's philosopy usually aims to do something correctly first before worrying about complex optimization and ideal convenience in relationships. You can then always compare a more elaborate later generation of code with a simpler predecessor, to see if it preserves correctness. First get it done and get it right. If you're thinking too long a time up front, you're off track.

     The first version of lathe will seek easy formats to specify and use correctly with api contracts enforced by compiler as much as possible, targeting efficiency only when easy and free.

     (But using page and book objects for memory, allocated by a pile, amounts to some big design up-front Wil used to plan how lathe could be used in day jobs, even in toy form.)

maps and morphisms «

     Wil typically uses math metaphors when reasoning about software, favoring morphism and homomorphism (wikipedia) as useful notions great for explaining why solving a problem in one context also solves it in another when a mapping preserves all relevant relationships. You do this all the time too when debugging, for example, by assuming when you fix a debug version of software, you're also fixing the release version. Wil is just casual about extending the idea to scales folks feel means rewrite — which is correct: Wil finds a rewrite often keeps essential roles.

     Parts of toy lathe ought to be rewritten for some purposes, so the first simple version should aim to do something clearly so you can rewrite it as needed. Wil certainly will.

     At the same time, Wil isn't in the business of making totally tweakable language architectures, so little effort will go into making it easy for you to change things like, for example, syntax. Wil rolls parsers by hand; so if you have a Yacc fetish, you'll have to yacc-ify yourself. (Given the state of the art in Yacc parser error handling though, Wil suggests you leave Yacc alone.)

topics «

     This section introduces each page topic.

     peg: as described in the stack demo, a peg is a pointer sized value containing either a box pointer or a immediate value packed in the same space. A peg is a uniform gc value.

     imm: many different types of immediate value can be represented inside a peg, including integers, characters, enums, and even getter and setter methods needing only a slot index. This page centralizes immediate value spec details.

     tag: in garbage collected (gc) memory, each box of allocated space is preceded by a tag describing the box format, which in many cases also identifies the class of primitive types. Each tag is explicit box metainformation.

     box: in memory subject to gc, space is allocated in box granularity, where every box is an instance of some C struct format, directly preceded by a tag saying (among other things) which box struct it is. This page will enumerate every primitive box format understood by the garbage collector.

     symbol: interned strings in box format represent symbols used in both lisp and smalltalk as strings with unique addresses. A lathe symbol is identical in format to the string box format except for the leading tag, but is always preceded by another hash box. Symbols are central in naming and value binding.

     token: when parsing, a reader turns an input text stream into a sequence of tokens representing indivisible language elements in the first layer of representation.

     number: many different token formats correspond to one or another kind of number representation; under lathe quite a few alternative number representations will be supported — probably a union of C, Scheme, and smalltalk numbers.

     bigint: a token for a number can have many digits, and integers can be stored as a specific box format designed to hold big integers; lathe's reader and writer will handle big ints, but support for primitive arithmetic might lag behind a while. Physical memory format will be described here.

     class: every lathe value including immediates will be an object with a smalltalk style class (turtles all the way down) describing instances and their methods. This page will describe bootstrapping a smalltalk object system, and physical format of classes.

     method: in addition to lisp style procedures created by evaluating lambda expressions, lathe objects will also have smalltalk methods dispatched by selector symbols.

     reader: a reader in a read-eval-print loop has the job of turning input text into parse trees representing either data or code to be interpreted or compiled.

     writer: a writer performs the print task in a read-eval-print loop. The first lathe writer will pretty print lisp syntax with cycle and shared structure handling.

     eval: conventionally handles a process of turning expression trees from the reader into associated value depending on context and symbol scoping rules.

     env: the environment is the representation of symbol bindings in a context permitting values to be found for symbols. Under lathe this might union several models including a module scheme with namespaces.

     vm: a virtual machine is the runtime context for executing code and accessing data, both as visible in a particular runtime. It means several related things from instruction set interpreter to collection of all state and code used to run an app process. Initial detail focuses on state in memory formats.

     gc: garbage collection will use a Cheney style stop-and-copy technique to recover unused space and compact space containing every box reachable by a gc root.

     world: a world is a memory and code context containing every vm in a lathe instantiation, including state shared by every vm instance, especially when it's immutable, and possibly when copy-on-write shares more parts than strictly immutable.

     pcode: bytecode or pseudo-code (wikipedia) is code intended to be executed by software, unlike native code usually run by hardware (except when simulated). This page will explain formats and ways to use more than one pcode instruction set.

     compiler: a compiler translates an input body of mostly data into an output body of mostly code in some format — typically pcode or native code, or even just a new abstract syntax tree (ast) different from an old one. Under lathe this will be whatever code turns reader output into something that executes on a vm.

     asm: an assembler describes code at a lower level, typically symbolic pcode or symbolic native code, along with directives about handling any machine code to which it translates, including both alignment and binding details. Wil hopes to write assemblers using lathe almost exclusively.

     lathe: a toy language called lathe — named by joining lambda and thorn (λ and þ) meaning "lisp on thorn" — will look and act a lot like Scheme, except for behavior like smalltalk. This page will eventually enumerate differences from other languages.

     lisp: aspects of lisp dialect Scheme present in lathe will be lightly covered on this page, as well as introductory features common to typical Lisp languages. Standard Scheme extensions should also be treated here.

     smalltalk: aspects of smalltalk present in lathe will be lightly covered, as well as introductory Smalltalk features like class and method approaches to polymorphic object dispatch.

     design: Wil aims to make only old languages in lathe rather than a new language. But aspects in which lathe is not exactly the same as Scheme and Smalltalk should be motivated by design remarks putting ideas in context.

     weight: to clarify Wil's priorities, potential motivations will be listed as pros and cons with positive and negative point scores roughly showing what things are desired and avoided. This might explain some of Wil's taste in tech choices, and prevent casual readers from projecting their own biases.

     jar: tech elements related to persistent storage will be addressed only on this page, likely only much later in lathe's code development.

     card: sample applications of lathe related to HyperCard style features will later be tackled on this page.

     harp: Wil vaguely intends to support event scripting in lathe apps, and this page might host related material later.

     debug: design factors for debugging lathe apps should appear somewhere, touching points in compiler support as well as possible debugger app designs.

     profile: support for profiling code written in lathe should make it possible to measure gross performance factors and monitor general behavior, likely also useful in testing.

menu

     mu, toy « Þ, peg, imm, tag, box, symbol, token, number, bigint, class, method, reader, writer, eval, env, vm, gc, world, pcode, compiler, asm, lathe, lisp, smalltalk, design, weight, jar, card, harp, debug, profile

     (thorn, todo, names, fd, iovec, assert, log, run, hex, crc, buf, in, out, quote, escape, compare, file, deck, cow, arc, blob, tree, slice, rand, time, stat, hash, heap, node, primes, page, book, pile, stack, atomic, lock, mutex, thread, map, meter, list, iter, ctype)

updates «

     Reverse chronological list of recent changes:

05oct08   lisp   lexical syntax, tokens, chars
03oct08   lisp   aim, namespaces, syntax, srfi's
20sep08   writer   intro, class, source, issues
08sep08   design   affordances, arity, become, frank
07sep08   design   scope, syntax, semantics
06sep08   gc   cheney, weak, mark/sweep
04sep08   imm   bits, enums, turtles, cache lines
04sep08   box   vtable, map, iter, err, fiction
03sep08   box   tuples, vectors, strings, trees
02sep08   symbol   finish map, iter, & commentary
01sep08   symbol   problem, summary, map, iter
30aug08   symbol   list latency, hashes, symbols
29aug08   box   toc, p1, pairs, length/cycles
28aug08   box   first meat and box basics
27aug08   box   intro and dialog start
25aug08   design   character Zé alter ego
24aug08   box   marked-up gc box structs
23aug08   weight   numerics, risks, threads, etc
22aug08   weight   dialog with extra profanity
21aug08   peg   peg api without commentary
20aug08   tag   tag api without commentary
17aug08   lisp   credit Manfred Spiller's logo
17aug08   weight   intro; priority to finish
17aug08   peg   opening problem description
16aug08   toy   intros for each page topic
15aug08   toy   most of this page drafted
12aug08   debug   opening paragraph or two

     (Space reserved for growing the table above very long.)

graphs «

     Since Wil views computing systems as graphs of code and data, he sees every programming language as composed of such graphs, while providing features whose purpose is to create and edit more graphs of code and data. Obviously this is highly incestuous and self referential, which is clearly where a lot of power derives in languages good at graph manipulation. Graphs are Wil's focus in lathe, not specific qualities other folks desire in programming languages like favorite syntax or best (native) code for specific processors.

     Once control of graph creation and editing is suitably easy and convenient, Wil can worry more about syntax and code performance in a particular context. Wil's first and main concern is control over use of memory to make graphs in any structure he needs.

     When lathe creates and execute graphs with ease, Wil can use lathe to make new graphs of code and data preserving relationships by mapping, which execute with new desired qualities. Then it's just a matter of iterative development to get where he wants to be.

native code «

     None of Wil's early toy lathe material aims for target native code generation. Wil mentions native code only in passing when it clarifies another detail in context. Native code generation is a todo item far off on the distant horizon, when it's a last fun thing to do.

     Many folks making languages first prioritize speed over every other concern, as if the only measure of goodness is generated code of highest speed for every problem domain. But since Wil can simply use C and/or C++ to achieve speed, what he really wants is flexibility to get any result he wants quickly and cheaply, while still having access to C and C++ when speed matters, with all code involved based solely in graphs of code and data under Wil's control.

     Both lisp and smalltalk provide especially simple ways to manage graphs of memory with gc, closures, continuations, and other high level features painfully missing in C++, and both can be implemented in C++ using mechanisms Wil has in hand, using coding technique Wil has used before to implement both.

     As Scott Johnson said on LtU: "It's a fair bit easier to Greenspun up a reasonable higher-level environment on top of C++ (download a GC here, throw in a bunch of stuff from Boost there, etc.) then it is to do reasonable systems programming in Java." This is a good way to characterize what Wil is doing with lathe — it's a way to greenspun up whatever Wil needs.

     (This refers to Philip Greenspun's famous tenth rule of programming stating, "Any sufficiently complicated C or Fortran program contains an ad-hoc, informally-specified bug-ridden slow implementation of half of Common Lisp.")

other languages «

     Clearly parsers and compilers can be written in lisp and smalltalk as easily as written in C and C++ — in fact, one would expect it to be easier using as high level a language as possible. If Wil wants to code support for other languages, it'll be easier to do this in lathe than in C++ directly as Lathe will first be written.

     Doing Lisp and Smalltalk first is no slight against other languages. It's just giving priority to simplicity in limited time. (But Wil sees no time for another language on the horizon.)

plan «

     Wil first mentioned a plan to focus on a toy language in Feb2008 — this page aims to index content added to pages listed above. But despite an original idea to tell all in the form of a story, Wil no longer thinks he has time for detailed stories. Time it took to write demos under thorn told Wil brevity is needed. Finishing has high weight in priority. Anything increasing scope seems a bad idea, including attention to anyone else's needs.

     There's no deadline because there's no scheduled rendezvous with a consumer since Wil is the consumer. However, a minimum amount of momentum is necessary to hold a sense of useful progress when currently neither demos under thorn nor language work under mu involve learning when Wil already knows these things. It would be nice to reach novel experience by summer 2009. (Wil thinks in terms of years now instead of months given the last slow year of writing.)

story «

     One day in 2008, Wil decided to rev a new toy version of an old language project, instead of getting carried away with grand new plans of larger scope. This section once contained a long history of Wil's language work, but it was too Wil centric.

     Instead of telling a story about Wil's past, some of these pages tell a story about Wil's present when asking and answering questions about representation and modeling in toy language details. This would likely do a good job of motivating many parts of language systems, but there might not be time for a good creative approach like this. So Wil expects to use story format sparingly or even rarely.

active voice «

     Yes it gets a little tedious talking about this Wil character all the time. A few folks might assume it's really just an odd new form of "royal we" used by the author as a pretentious form of self reference. (Come on, admit it — you at least toyed with the idea.)

     Usually the author claims this device merely promotes more active voice, improving clarity, but that's only part of the story. Another more subtle purpose is involved: emphasis on valuation from one person's perspective — this Wil guy, for example. But your point of view is just as valid as Wil's, and in your own work, you should use your own goals and criteria to decide what's important.

     An underlying premise is that value is contextual, and the best context to use is often the viewpoint of one person, because it leads to simpler problem descriptions when asking, "What should be done?" Because if Kip is your character, you should be asking instead, "What does Kip want?" What effect does Kip require in this situation? Kip is a stand-in for you or your user. The author uses Wil — same thing, but the author gets to choose Wil's priorities, and you only get to choose for Kip; don't get them confused.

     When you don't agree with Wil, this is a clue you should think about changing that part of code in your clone. It's not Wil's job to worry about what you want — get this Kip character of yours on the job if you want something. Do it yourself.

     In short, an underlying message is: one-size-fits-all models are wrong. So every user gets to pick a good size suiting themselves. Wil is the author's stand-in for what the author wants. Please feel free to change the toy language to whatever suits your needs. Don't sell your preferences as universal rules Wil must obey.

books «

     Wil stopped reading tech books years ago ... actually, more like a decade ago. Like man, we're talking none whatsoever (the web is too effective as a library). As a result, there are no references Wil can point you at for background on this toy language, because Wil isn't using one. Wil does whatever he wants, directly suiting effects he wants to occur, because he knows how to define what he wants and what technique will satisfy.

     The last book Wil read that comes close to being relevant is Andrew Appel's Compiling With Continuations, which Wil read around 1994 or so, along with reading much of the C source code published for the New Jersey ML implementation. Actually, Wil got a lot more out of reading the ML source code than Appel's text.

     A lot of Wil's basic memory format designs for boxes, tags, immediates, and gc are directly or indirectly related to ideas found in ML source code from the early 90's. Except Wil feels free to change any detail to whatever seems more useful just now. And this is exactly the way you should feel about it too.

     Wil didn't make up many novel ideas, except maybe the way he plans to use books to process weak refs during gc. (Whether or not you preserve a weakly referenced object depends on whether it's reachable from a strong ref. So you have to delay copying a weakly referenced object until it's reached from a strong ref. But space used to keep track of weak refs can lead to an awkward design. Wil realized he could just stage weakly referenced objects in a different set of books for a weak ref todo list, processed after all other gc is done. Then this set of books can be freed independently of any other memory when gc is done, with no net increase in space cost after gc.)

     (Yes there's a pun here: two unrelated book meanings.)