|
Warning: this is an htmlized version!
The original is here, and the conversion rules are here. |
-- Some notes for a text on blogme3, that will be implemented as an
-- extension of miniforth.
-- Links:
-- http://angg.twu.net/blogme3.html
-- http://angg.twu.net/miniforth/blogme3.txt
-- http://angg.twu.net/miniforth/blogme3.txt.html
-- http://angg.twu.net/miniforth/miniforth3.lua
-- http://angg.twu.net/littlelangs.html
def [[ * 2 a,b return a*b ]]
def [[ + 2 a,b return a+b ]]
[* [+ 1 2] [+ 3 4]]
[if ...]
A more precise description
==========================
The core of Blogme is made of a parser that recognizes a very simple
language, and an interpreter coupled to the parser; as the parser goes
on processing the input text the interpreter takes the outputs of the
parser and interprets these outputs immediately.
This core engine should the thought as if it had layers. At the base,
a (formal) grammar; then functions that parse and recognize constructs
from that grammar; then functions that take what the parser reads,
assemble that into commands and arguments for those commands, and
execute those commands.
I think that the best way to describe Blogme is to describe these
three layers and the implementation of the top two layers - the
grammar layer doesn't correspond to any code. Looking at the actual
code of the core is very important; the core is not a black box at all
- the variables are made to be read by and changed by user scripts,
and most functions are intended to be replaced by the user eventually,
either by less simplistic versions with more features, or, sometimes,
by functions only thinly connected to the original ones.
Influences and rationale
========================
I know that it sounds pretentious to say that, but it's true... Blogme
descends from three important "extensible" programming languages -
Forth, Lisp, and Tcl - and from several
Blogme was inspired
The design of Blogme was inspired mainly by _ borrows some of its
ideas from Forth, Lisp, and Tcl.
(1) Forth. This is a Forth program that prints "3 Hello20":
1 2 + . ." Hello" 4 5 * .
Forth reads one word at a time and executes it immediately
(sometimes it "compiles" the word instead of running it, but we can
ignore this now). `.' is a word that prints the number at the top of
the stack, followed by a space; `."' is a word that prints a string;
it's a tricky word because it _interferes on the parsing_ to get the
string to be printed. I've always thought that this permission to
interfere on the parsing was one of Forth's most powerful features,
and I have always thought about how to implement something like that
- maybe as an extension - on other languages.
So - the Forth interpreter (actually the "outer interpreter" in
Forth's jargon; the "inner interpreter" is the one that executes
bytecodes) reads the word `."', and then it calls the associated
code to execute it; at that point the pointer to the input text -
let's call it "pos" - is after the space after the `."', that is, at
the `H'; the code for `."' advances pos past the `Hello"' and prints
the "Hello", after that the control returns to the outer
interpreter, who happilly goes on to interpret "4 5 * .", without
ever touching the 'Hello"'.
(2) Lisp. In Lisp all data structures are built from "atoms" (numbers,
strings, symbols) and "conses"; a list like (1 2 3) is a cons - a
pair - holding the "first element of the list", 1, and the "rest of
the list", which is the cons that represents the list (2 3). Trees
are also built from conses and atoms, and programs are trees - there
is no distinction between code and data. The Lisp parser is very
simple, and most of the semantics of Lisp lies in the definition of
the "eval" function. The main idea that I borrowed from Lisp's
"eval" is that of having two kinds of evaluation strategies: in
(* (+ 1 2) (+ 3 4))
the "*" is a "normal" function, that receives the _results_ of (+ 1
2) and (+ 3 4) and returns the result of multiplying those two
results; but in
(if flag (message "yes") (message "no"))
the "if" is a "special form", that receives its three arguments
unevaluated, then evaluates the first one, "flag", to decide if it
is going to evaluate the second one or the third one.
(3) Tcl. In Tcl the main data structure is the string, and Tcl doesn't
even have the distinction that Lisp has between atoms and conses -
in Tcl numbers, lists, trees and program code are just strings that
can be parsed in certain ways. Tcl has an evaluation strategy, given
by 11 rules, that describes how to "expand", or "substitute", the
parts of the program that are inside ""s, []s, and {}s (plus rules
for "$"s for variables, "#"s for comments, and a few other things).
The ""-contexts and []-contexts can nest inside one another, and
what is between {}s is not expanded, except for a few backslash
sequences. In a sense, what is inside []s is "active code", to be
evaluated immediately, while what is inside {}s is "passive code",
to be evaluated later, if at all.
Here are some examples of Tcl code:
set foo 2+3
set bar [expr 2+3]
puts $foo=$bar ;# Prints "2+3=5"
proc square {x} { expr $x*$x }
puts "square 5 = [square 5]" ;# Prints "square 5 = 25"
Blogme descends from a "language" for generating HTML that I
implemented on top of Tcl in 1999; it was called TH. The crucial
feature of Tcl on which TH depended was that _in ""-expansions
the whitespace is preserved, but []-blocks are evaluated_. TH
scripts could be as simple as this:
htmlize {Title of the page} {
[P A paragraph with a [HREF http://foo/bar/ link].]
}
but it wasn't hard to construct slightly longer TH scripts in which
a part of the "body of the page" - the second argument to htmlize -
would become, say, an ASCII diagram that would be formatted as a
<pre>...</pre> block in the HTML output, keeping all the whitespace
that it had in the script. That would be a bit hard to do in Lisp;
_it is only trivial to implement new languages on top of Lisp when
the code for programs in those new languages is made of atoms and
conses_. I wanted something more free-form than that, and I couldn't
do it in Lisp because the Lisp parser can't be easily changed; also,
sometimes, if a portion of the source script became, say, a cons, I
would like to be able to take this cons and discover from which part
of the source script that cons came... in Blogme this is trivial to
do, as []-blocks in the current Blogme scripts are represented
simply by a number - the position in the script just after the "[".
The Lisp parser can't be easily changed to