|
problem
lisp — the toy programming language named lathe is a dialect of Lisp using a smalltalk class system, built atop the þ C++ library under the mu-babel license. As a basic floor plan, Wil wants to use Scheme as the Lisp variant targeted and not Common Lisp. But even then, Wil doesn't want to follow every single Scheme convention, so Wil makes no attempt to follow most recent Scheme revisions. Thus Wil must solve a problem: saying how much of Scheme is in a spec for Lathe. To acknowledge lack of Scheme compatibility because Lathe pursues a Scheme subset, Wil uses new codename Aim to describe features from Scheme planned for Lathe, because "aim" is a short synonym (thesaurus) for "scheme." Aim describes the Scheme subset on this page.
mess
«
This page will likely remain a mess for a while, until details start to converge on stable choices.
aim
«
Aim resembles Scheme: it's a subset of Scheme with a few arbitrary changes to suit Wil's taste, specifying Lisp features Wil aims to have in toy language lathe. Wil wants you to think of Aim as meaning the same thing as Scheme, but with a few features pruned and details changed, to make a spec for use in Lathe. Using a different name emphasizes Aim is not Scheme.
references
«
You are assumed already familiar with the Scheme programming language. This page defines very few Scheme features included in Aim; definitions appear elsewhere: other sites and other pages. Here's a growing list of Scheme references used here:
The point of this page is to state the relation of Aim to Scheme as described in standard specs.
namespaces
«
Original simple Scheme has a single flat top level namespace containing all top level definitions. Proposals for module systems permit more namespaces, but here we assume Scheme has only one for sake of discussion. Aim puts Scheme names in at least one namespace named sch, while Aim specific names go in a namespace named aim. More than one namespace can be visible at a time, and names can be explicitly qualified using some (as yet) unspecified namespace notation. On this page, assume % (percent) is used like a namespace operator not unlike :: in C++, so sch%cons refers to cons in the sch namespace. Note this is only a metalanguage convention used on this page and elsewhere until a more practical spec for namespace notation is finalized. (It might be necessary to write something cluttered, like #%%sch%cons, to use # as a general escape mechanism. Here #% starts a namespace path where % is the separator — #%% would mean define % as the value of the default % namespace separator, or something like that. So #%/sch/cons would mean the same thing.)
details
«
This section calls out specific Scheme features or revisisions to say how Aim treats them. r6rs « Wil has no interest in the most recent R6RS revision so Aim ignores it completely. This is unlikely to change. data types « Aim's primitive literal data types include Scheme's pairs, symbols, numbers, booleans, characters, strings, vectors, and lists; procedures, continuations, and ports can be created dynamically at runtime. At least one hashmap extension type will be native in Aim. A smalltalk object system (specified by Gab) adds class and compound objects based on vectors and maps. case sensitivity « Unlike Scheme, Aim is case sensitive: symbols differing only in case are not the same symbols. (Notation denoting numbers is never case sensitive; for example, hexadecimal is not case sensitive, and neither are letters following # in a number.) operator precedence « Like Scheme, all expressions use fully parenthesized notation, so Aim has no operator precedence. numerical tower « Because Wil has no interest in numerics, Scheme's numerical tower beyond simple arithmetic will appear in Aim only slowly, if at all, with lowest priority in providing support. Full api for some numeric types might only be added at some future time when Wil audits Lathe for compliance with an Aim spec. tail recursion « Like Scheme, proper TCO (tail call optimization) is required by the Aim subset, so tail calls do not deepen a stack. comments « Aim uses a semicolon (;) to introduce comments until end-of-line; comments are ignored act like whitespace. Nesting multiline comments are enclosed by #| and |#. brackets « Square brackets can be used anywhere parentheses are used. Note a closing ] must balance an opening [, so parens and brackets can only be mixed if a closing delimiter matches the opening delimiter. Aim allows either (…) or […], but never (…] or […). macros « Aim will have only simple non-hygienic macros, added with low priority. Complex macros might come later.
r3rs vs r4rs
«
This section enumerates Scheme symbols in namespace sch which should have the same meaning in Aim when namespace sch is used, or when sch% is used as a namespace prefix. (Remember this namespace prefix notation is just a temp standin for a later design.) low priority « Symbols in R3RS but not in R4RS have low implementation piority, as do several other procedures, especially those having anything to do with numerical tower support. Some symbols are shown in gray to indicate they have low priority, and might not be added to Aim until late. non intersection « Symbols below appearing only in R3RS or only in R4RS are written in bold with a superscript denoting revision using the symbol. For example, nil3 is only in R3RS. r4 and r3 « From practical-scheme's wiki: R3RS, R4RS: exprs: quote, lambda, if, set!, cond, case, and, or, let, let*, letrec, begin, do, delay, quasiquote program: define bools: not, boolean?, nil3, t3 equivalence: eqv?, eq?, equal? pairs: pair?, cons, car, cdr, set-car!, set-cdr!, null?, list, length, append, reverse, list-tail, list-ref, last-pair, memq, memv, member, assq, assv, assoc symbols: symbol?, string->symbol, symbol->string numbers: number?, complex?, real?, rational?, integer?, zero?, positive?, negative?, odd?, even?, exact?, inexact?, =, <, >, <=, >=, max, min, +, *, -, /, abs, quotient, remainder, modulo, numerator, denominator, gcd, lcm, floor, ceiling, truncate, round, rationalize, exp, log, sin, cos, tan, asin, acos, atan, sqrt, expt, make-rectangular, make-polar, real-part, imag-part, magnitude, angle, exact->inexact, inexact->exact, number->string, string->number (int3, rat3, fix3, flo3, sci3, rect3, polar3, heur3, exactness3, radix3) characters: char?, char=?, char<?, char>?, char<=?, char>=?, char-ci=?, char-ci<?, char-ci>?, char-ci<=?, char-ci>=?, char-alphabetic?, char-numeric?, char-whitespace?, char-upper-case?, char-lower-case?, char->integer, integer->char, char-upcase, char-downcase strings: string?, make-string, string-length, string-ref, string-set!, string=?, string-ci=?, string<?, string>?, string<=?, string>=?, string-ci<?, string-ci>?, string-ci<=?, string-ci>=?, substring, string-append, string->list, list->string, string-copy, string-fill! vectors: vector?, make-vector, vector, vector-length, vector-ref, vector-set!, vector->list, list->vector, vector-fill! control: procedure?, apply, map, for-each, force, call-with-current-continuation (call/cc) i/o: call-with-input-file, call-with-output-file, input-port?, output-port?, current-input-port, current-output-port, with-input-from-file, with-output-to-file, open-input-file, open-output-file, close-input-port, close-output-port, read, read-char, char-ready?, eof-object?, write, display, newline, write-char, load, transcript-on, transcript-off
r5rs
«
Differences between R5RS and predecessor R4RS are summarized below when these changes also apply to Aim. (cf scheme-punks.org and schemers.org)
requests for implementation
«
Aim should add many common SRFI extensions, including ones listed below. Note the selections were made in a quick, cursory pass while scanning the srfi documents. Criteria influencing Wil's choices include the following:
The last item above might be restated, "Don't eat anything bigger than your head." Wil plans to write a short behavioral spec later for every feature in Aim, so anything that can't be said in reasonably few words is too complex an early feature. Wil expects never to implement features he supposes might be too hard to explain completely to bright folks new to a language. SRFIs « Wil's picks from SRFI's are sketched as follows: srfi-1 (list library): cons*, make-list, list-copy, proper-list?, circular-list?, not-pair?, list=, caar, cadr, ..., cdddar, cddddr, list-ref, first, second, third, fourth, fifth, sixth, seventh, eighth, ninth, tenth, car+cdr, take, drop, take-right, drop-right, split-at, last, last-pair, length+, concatenate, append-reverse, zip, count, fold, fold-right, reduce, reduce-right, append-map, append-map!, map!, filter, partition, remove, find, find-tail, take-while, drop-while, span, break, any, every, list-index srfi-2: and-let* srfi-4: s8vector, u8vector, s16vector, u16vector, s32vector, u32vector, s64vector, u64vector, f32vector, f64vector, etc, (s8vector?, make-s8vector, s8vector-length, s8vector-ref, s8vector-set!, s8vector->list, list->s8vector, etc) srfi-5: (form with signatures and rest arguments) let srfi-6 (string ports): open-input-string, open-output-string, get-output-string srfi-8 (multiple values): values, call-with-values, receive srfi-11 (multiple values): let-values, let*-values srfi-16 (varargs proc syntax): case-lambda srfi-23 (error reporting): error srfi-28 (format strings): format srfi-30 (nested multiline comments): #|...|# srfi-31 (recursive evaluation): rec srfi-38 (shared structure): write-with-shared-structure, read-with-shared-structure srfi-43 (vector library): vector-unfold, vector-unfold-right, vector-copy, vector-reverse-copy, vector-append, vector-concatenate, vector-empty?, vector=, vector-fold, vector-fold-right, vector-map, vector-for-each, vector-count, vector-index, vector-index-right, vector-skip, vector-skip-right, vector-binary-search, vector-any, vector-every, reverse-vector->list, reverse-list->vector srfi-56 (binary i/o): (Wil might later support srfi-56 using a Gab binary i/o api.) srfi-69 (hash tables): make-hash-table, hash-table?, alist->hash-table, hash-table-equivalence-function, hash-table-hash-function, hash-table-ref, hash-table-ref/default, hash-table-set!, hash-table-delete!, hash-table-exists?, hash-table-update!, hash-table-update!/default, hash-table-size, hash-table-keys, hash-table-values, hash-table-walk, hash-table-fold, hash-table->alist, hash-table-copy, hash-table-merge!, hash, string-hash, string-ci-hash, hash-by-identity srfi-71 (multi values): (Instead of following srfi-71, Wil prefers allowing a list of formals to replace a variable inside let.) srfi-74 (shared structure): endianness (big, little, native), blob?, make-blob, blob-length, blob-u8-ref, blob-s8-ref, blob-u8-set!, blob-s8-set!, blob-uint-ref, blob-sint-ref, blob-uint-set!, blob-sint-set!, blob-u16-ref, blob-s16-ref, blob-u16-native-ref, blob-s16-native-ref, blob-u16-set!, blob-s16-set!, blob-u16-native-set!, blob-s16-native-set!, blob-u32-ref, blob-s32-ref, blob-u32-native-ref, blob-s32-native-ref, blob-u32-set!, blob-s32-set!, blob-u32-native-set!, blob-s32-native-set!, blob-u64-ref, blob-s64-ref, blob-u64-native-ref, blob-s64-native-ref, blob-u64-set!, blob-s64-set!, blob-u64-native-set!, blob=?, blob-copy!, blob-copy, blob->u8-list, u8-list->blob, blob->uint-list, blob->sint-list, uint-list->blob, sint-list->blob srfi-88 (keyword objects): (Since Gab symbols in Aim will use symbols ending in colons, Wil can never support keywords that must evaluate to themselves, if merely ending with a colon defines a keyword.) srfi-95 (sorting and merging): sorted?, merge, merge!, sort, sort! srfi-98 (environment variables): getenv, get-environment-variable, get-environment-variables |
menu
mu, toy, peg, imm, tag, box, symbol, token, number, bigint, class, method, reader, writer, eval, env, vm, gc, world, pcode, compiler, asm, lathe, lisp « Þ, smalltalk, design, weight, jar, card, harp, debug, profile (thorn, todo, names, fd, iovec, assert, log, run, hex, crc, buf, in, out, quote, escape, compare, file, deck, cow, arc, blob, tree, slice, rand, time, stat, hash, heap, node, primes, page, book, pile, stack, atomic, lock, mutex, thread, map, meter, list, iter, ctype) (icon credit: Lisp logo designed by Manfred Spiller, found by way of Bill Clementson's link to normal-null.)
delimiters
«
Aim uses the MIT Scheme definition of delimiters: characters breaking tokens that can't be in identifiers. In addition to all whitespace characters, the following characters are always delimiters: ( ) ; " ' ` | [ ] { }
And although not delimiters, sharp (#) and comma (,) cannot start an identifier, but they can appear in the middle. While using them in the middle of identifiers is legal, it's considered poor practice by convention. Implementations might use such identifiers for internal purposes. (Note , is a legal binary operator in Smalltalk, which would be awkward when trying to access the same operator using Aim syntax in Lathe. So the Gab subset of Smalltalk might reserve the , comma operator as a metacharacter in tuples.)
read macros
«
The phrase read macro traditionally means a simple transformation performed by a reader which parses Lisp syntax, at read time. Typically such a macro fires when a certain character is seen. Several characters related to quoting and macros have a long-standing tradition: ' ` , ,@
The four tokens shown above always act like prefix operators wrapping the next expression, whatever it is, in a short list of length two according to the following rules:
There are no exceptions: any time you precede a value x with a ' single quote, that expression becomes (quote x) in the reader: it's defined that way. Similarly, a writer normally prints (quote x) as 'x because the latter is short and clear, and the reader will later reverse the transformation.
read syntax
«
A few extensions to read syntax will be supported when they simplify or improve expressiveness in specification of code and/or data. For example, several extra literal string constant formats will be supported in addition to Scheme's. More detail may appear here later. For now, here's one detail imitating Chicken's non-standard string read syntax: #<<TAG
Aim should support use of notation similar to that shown above (where TAG can be any identifier) as an alternative string quoting mechanism for multiline strings, which is only terminated by the first appearance of TAG (or end of file if it comes first). But, um, where should the string begin exactly? If the string begins with a legal character in an identifier, what separates the the tag from the string start? That seems awkward. Okay, let's say exactly one delimiter character (for example any whitespace — or vertical bar | is good choice) is the end of TAG and is not part of the string. The string would begin immediately after. #<<TAG abcdTAG
The example above is the same as "abcd" because a single space ends TAG, and then the next TAG closes the string.
lexical syntax
«
Lexical syntax is a set of rules dividing program text into a lexeme sequence, where each lexeme is usually called a token. In other words, lexical syntax describes the tokenizer within the Lisp reader, which consumes text and generates objects. Informally, each token parsed from input text is a contiguous sequence of bytes (note: bytes and not characters) recognized as a single lexeme by the reader, along with any interpretation of this text. For example, if a token represents a number, then the token's state typically includes the scalar value of that number as well as a code (denoting type of the token) that means number — plus the location of that token in source text (in case debug info or error messages are also needed). The token page will specify lexical syntax detail more precisely. In the meantime, here's an informal breakdown explaining what octet values seen lead to what kinds of tokens are seen. tokens « Each time the reader parses the next input token, logic something like the following is used:
Ascii control characters (octets less than 0x20) are treated the same was whitespace even when isspace() is false, since the only practical result otherwise is an error, possibly caused by 'invisible' characters from your editor's perspective. Non-whitespace control characters might elicit a warning to let you know you have unusual extra octets in your source code. (For example, a zero-width control character in the middle of what seems a single identifier to you will turn into two identifiers when read.) As a side effect, you currently can't use Unicode in utf16 for symbol names because (eg) null bytes will be treated as whitespace delimiters.
Comments started by a semicolon (;) last until end of line, while comments started by #| last until a closing |#. If another #| appears before the closing |#, then one more closing |# is needed to end the comment. (In other words, the multiline comment syntax nests, ending only when the total number of closing |# equals the number of opening #|.) A comment has the same effect as a whitespace octet: once ignored, the parser keeps skipping whitespace until the first non-whitespace, non-comment octet. (Some folks use pairs of semicolons (;;) to start comments, but only the first is meaningful. After the first semicolon, everything is a comment until end of line, the same as // in C++.)
Aim string literals look just like C string literals: bounded at both ends by " double quotes, using roughly the same backslash escape sequences used in C, with Unicode extensions.
Aim uses vertical bar | as a way to quote symbols in a manner similar to the way " quotes string literals. The next vertical bar seen ends the symbol, and unlike strings, no escape sequences are supported (for now).
If comma , is followed immediately by @ the resulting token is ,@ for unquote-splicing. Otherwise , by itself is an unquote token. In both cases, the reader uses these tokens to adjust the result of subsequent expressions in read macros. Note this also applies to quote and quasiquote read macros mentioned next.
Both quote tokens have the effect of suppressing evaluation (at runtime) of the following expression parsed by the reader, but while quote is hard and fast, quasiquote permits evaluation at points inside the next expression that are unquoted. (This will be explained at length later; this is just a tiny intro to quasiquotation if the idea is wholly new to you.)
Aim includes Scheme tokens beginning with #, and there are so many they're treated separately in another section below. Basically # acts a bit like an escape sequence opening a subspace of lexical syntax interpretation. Tokens beginning with # include booleans, numbers, string literals, character literals, vector literals, and miscellaneous magic values, like an eof-object literal. To work with Unix shell scripts, as a special case the sequence #!/ typically denotes a specialized comment (until end line) so this notation can be used on first lines to mock shell script syntax. (Actually, interpretation of #! depends on the next octet seen; unless the name of a unique value appears, the rest of the line is understood as a comment, in which case / seen next is just a subcase of all comment interpretations.)
Lists are enclosed in pairs of ( and ) tokens that must be balanced. Aim also uses [ and ] for the same purpose, the same as Scheme's R6RS spec does (despite the fact Aim ignores most everything else about R6RS). While Scheme reserves { and } for future language extensions, Aim will probably use these in constant literals for maps — likely prefixed with # to imitate vector literals — in some future feature extension for dictionaries.
When parsing lists (enclosed by ( ) or by [ ]) a single period . when used by itself is a token used by the reader to modify the meaning of the last expression appearing in a list: a dotted list is one where . separates the penultimate list member from the last list member — such a dotted list puts the last value in the cdr of the last pair instead of terminating the list with nil. (Yes, it seems a bit exotic the first time you hear it, if Lisp syntax is new to you.) But when . appears next to any other non-whitespace, non-delimiter octet — whether at beginning, middle, or end — the resulting token treats . the same as any other letter in an identifier or number, except numbers can contain period . only once, as the decimal.
Symbols and numbers are both sequences of non-delimiter octets, starting with any non-whitespace octet except , or # or one of the delimiters listed earlier (();"'`|[]{}). At first the reader makes no attempt to distinguish symbols and numbers: both are collected as sequences of non-delimiter octets. But once a non-delimiter sequence ends (usually terminated by a delimiter) the reader tries to interpret the token as a number, using several different sorts of syntax including Scheme, C, and Smalltalk number syntax. Anything that doesn't match the octet grammar of a number is simply a symbol. Several different letters can be used to mean exponent: in the same position where C uses only e, Aim will also allow several variants implying number of bits in the floating point representation: eEsSfFlLdD. Smalltalk numbers allow a number constant to begin with an arbitrary radix as the base before r preceding the actual number literal; so Smalltalk's 16r20 means the same thing as Scheme's #x20 and C's 0x20, and all these must be understood by Aim's reader. sharp « As already mentioned, sharp # begins many sorts of tokens in Scheme and therefore in Aim too. The following list is presented informally. (More precise and exact conditions for # lexemes might be addressed later on the token page.)
By convention, the runtime considers only #f to be false, and all other values are 'true' — thus #t is just a canonical true value. Both are represented as immediate unboxed scalars.
Both #( and #[ begin constant vector literals which are almost identical in syntax to lists, except for the leading #, and except for the use of . in dotted lists. (Dotted vectors don't make any sense.) Delimiters work like lists: a vector started by #( is closed by ), and a vector started by #[ is closed by ].
A Scheme style number constant starting with # should specify a radix (b, o, d, or x) once, and optionally specify an exactness (e or i) at most once, where radix and exactness can occur in any order before digits begin.
Notation #\ always begins a Scheme character literal (the same was $ does in Smalltalk), but the number of octets used in the literal can vary, depending on how you want to encode the character. The very first octet following #\ is the character value if the next input octet is a delimiter (like whitespace). But if the first octet after #\ is a non-delimiter, and the next octet after that is too, then a sequence of non-delimiter octets are collected before deciding what character is denoted by the text. Some characters have standard names:
|