Crim 0.01 - a Forth-like language
Copyleft (C) Eduardo Ochs, 2000
This is the README file.
2000jul19


Introduction
------------

Crim is a very loosely-defined language based in Forth. Forth is
already extremely extensible and one can change its syntax completely
with relative ease; Crim is intended to let people do the same with
even less effort. One can test entirely different inner interpreters
in Crim just by adding new "modes" to its current inner interpreter,
and the Crim tools used to implement words like Forth's LIT or <.">,
that take immediate data, should be very easy to extend to complex
parsers.

The simple ideas behind Crim's way to deal with immediate data -- the
streams stack plus the RSR words -- were the fruit of many weeks of
very hard thinking in the middle of '95, and for me (I'm biased, of
course:-) they still sound like the Forth counterpart of one of those
mathematical theorems that establish deep connections in a simple way,
like the Curry-Howard isomophism, that, by the way, is as marginal in
Mathematics as Forth is marginal in Computer Science...

Five years after getting to those ideas I'm still convinced that they
are the way to go, and now that I've got some insights from Darrell
Johnson's Perpol (http://www.boswa.com/misc/) about using Nasm to do
the boring stuff I'm resurrecting the whole thing up and releasing an
implementation that does some more interesting things, like calling C
functions.

Note: the Forth inner interpreter's three modes are not generally
named; here I'll call them "head", "forth", and "assembler". See the
other docs in this directory.


Compiling and running Crim
--------------------------

You must have Tcl, Make and Nasm. Just unpack everything and run
"make"; then running "demo0", "demo1", etc will give you lots of
debugging dump from the demos. If you want to run any of them without
debugging info give them a numeric argument of 0, like: "demo2 0".

The most interesting demo is demo2, as it is about calling C
functions. It just prints "Hello There" in two lines, but the really
interesting part is its bytecode, that is in see demo2.lst... the
engine that runs it is demo2.engine.c; both are generated from
demo2.tf by tclstuff, as described below.

Note that this is still a hacker's version -- there's no interactive
mode. This is not (yet) a Forth... But, on the other hand, the code is
very simple and I hope that it shows the ideas clearly.


The .tf files
-------------

Urgh. The .tf ("Tcl-Forth") files are processed by tclstuff to
generate a C file and a nasm file, that are then compiled and linked
together to generate an executable; a .tf file contains a Crim
"program" that is to be executed by the Crim "engine".

The syntax of the .tf files is somewhat horrible. At this moment Crim
doesn't have any definite syntax, just some ideas about its bytecode
-- and even the details of the bytecode change when we change the set
of instructions with one-byte forms or when we change the
engine-xxx.c, for example to add more primitives or more modes.

I'm using nasm to generate the array (well, sort of...) that contains
the Crim bytecode that the engine will execute. IT IS MUCH BETTER TO
EXAMINE THE BYTECODE BY INSPECT THE .lst FILE GENERATED BY NASM THAN
TO TRY TO UNDERSTAND EVERYTHING BY LOOKING THE .tf FILES. THE .tf
FILES ARE JUST HACKS!!! YOU HAVE BEEN WARNED!!!

Having said that we can proceed to the technicalities.

Tclstuff is usually invoked like this:

  cat foobar.tf | tclstuff foobar

This will produce a "foobar.asm" and maybe a "foobar.engine.c" (if the
"engine" variable is set; read on).

Here's a short description of the .tf file syntax. Lines beginning
with certaing strings are processed in special ways:

* Lines that start with "#" are ignored.

* Lines that start with "asm" are sent to stdout immediately. The nasm
  code is also sent to stdout as it is produced, and so the lines
  starting with "asm" will usually get into the nasm code.

* Lines that start with "tcl" are evaluated by Tcl (technical details:
  in the toplevel, with "uplevel #0 $restofline"). Some examples of
  usage:

  * "tcl parray a_code" -
    Show the contents of the a_code array. This is useful to
    understand how tclstuff works.

  * "tcl set engine bletch.c" -
    If your .tf file has a line like this then tclstuff will produce
    a C file after writing out all nasm code; its name will be derived
    from the name of the .tf file -- foobar.tf generates
    foobar.engine.c, for example -- and it will contain some #defines
    and array definitions (more technically: the result of a "[join
    $c_defs "\n"]") followed by a copy of bletch.c. The resulting
    foobar.engine.c is a valid C file (or at least it's meant to!);
    bletch.c usually lacks some definitions.

Other lines are processed in a Forthish fashion, one "word" at a time;
as usual in the Forth world, words are delimited by whitespace.

The only predefined words are the "tick words" listed below. Those
ending with two "''"s gobble the two words coming after them; those
ending with a single "'" gobble one word. The double-tick words are
like their single-tick friends but they also define synonyms with
better-behaved names that C and nasm can accept. Quick descriptions:

  tick word:	used to define:				"X' E" defines:

    HPRIM'	a primitive head, like DOCOL (":")	H_E E

    FPRIM'	a primitive Forth word, like DUP	F_E E

    SFPRIM'	a primitive word with a one-byte	F_E SF_E E
		("short") form, like EXIT (";")

    FIPPRIM'	a Forth-IP primitive. When the engine	FIP_E E
		tries to execute Forth code in the
		address covered by a FIP it executes
		the FIP primitive instead. FIPs are
		often pushed on	the return stack when
		we want to call words and have them do
		some something special when they finish.

    ' or F'	a Forth word whose head is located at	F_E ADR_E E
		the current ("HERE") address.

    S'		same, but the word will also have a	F_E ADR_E SF_E E
		short form and will usually be called
		using the short form.

These tick words will define some words than can be later used in a
.tf file; the action associated to each of the defined words is always
like "output $a_code($word) to the nasm file"; I haven't yet extended
tclstuff to support other actions for defined words.

A line like "tcl parray a_code" in a .tf file will show which words
have been defined up to that point, and for each one the corresponding
value of $a_code($word); this can be useful for understanding
tclstuff.