Chapter 6

Chapter 14. eForth System

The file EF24.F contains all the high level words in P24 eForth.

This implementation follows closely the eForth model. The following set of words are removed because they are not absolutely necessary for embedded applications. In this implementation, the size constrain is severe, and the existence of every word must be justified rigorously.

Words removed from the eForth model:

CATCH, THROW, PRESET, XIO, FILE, HAND, I/O

CONSOLE, RECURSE, USER, VER, HI, 'BOOT

Many of the user variables are eliminated:

SP0, RP0, '?KEY, 'EMIT, 'EXPECT, 'TAP, 'ECHO

'PROMPT, CSP, 'NUMBER, HANDLER, CURRENT, NP

Only these user variables remain and are macros:

HLD, SPAN, >IN, #TIB, 'TIB, 'EVAL, BASE, tmp

CP, CONTEXT, LAST, 'ABORT, TEXT

14.1 Overview of the P24 eForth system

Figure 14.1 is a very interesting graphic representation of the P24 eForth system operating inside a P24 chip. Upon power up, the eForth system is initialized and enters into the Forth interpreter loop, to accept commands from the user, executes them and returns for more commands. Occasionally, it falls into the compiling loop and compiles new routines into the system.

Figure 14.1 The eForth system in a P24 chip

The P24 eForth system can be more rigorously specified in the following list together with their pseudo code:

COLD boots Forth, print sign-on message and jump to QUIT

QUIT repeats the sequence: accepts a line of text and executes

the commands in sequence. The pseudo code is:

: QUIT BEGIN QUERY EVAL AGAIN ;

QUERY accepts one line of text of 80 characters or terminated

by a carriage-return.

EVAL parses out tokens in the text and evaluates them:

: EVAL BEGIN TOKEN WHILE 'EVAL @EXECUTE REPEAT .OK ;

TOKEN parses out one word from the input text.

'EVAL contains $INTERPRET in the interpret mode or $COMPILE

in the compiling mode.

@EXECUTE executes either $INTERPRET or $COMPILE.

.OK prints out the "OK" message.

$INTERPRET ( a ) searches the dictionary for a word of the

text string at a. If the word exists, execute it.

Else, convert the string into a number on the stack.

Failing to convert the string to a number, prints an

error message and abort to QUIT.

: $INTERPRET NAME? IF EXECUTE ELSE NUMBER?

IF ELSE ERROR THEN THEN ;

$COMPILE ( a ) searches the dictionary for a word of the

text string at a. If the word exists, compile it.

Else, convert the string to a number and compile the

number as a literal. Failing the conversion, prints

a message and abort to QUIT.

: $COMPILE NAME? IF , ELSE NUMBER?

IF LITERAL ELSE ERROR THEN THEN ;

NAME? calls 'find' to locate a word of the name parsed out

out the input text string.

NUMBER? ( a ) converts the text string at a to a number.

ERROR prints the offending text string and aborts to QUIT.

LITERAL ( n ) compiles n as a literal into the current word

being compiled.

The above words serve as a top-down map of the eForth operating system. The eForth system source code builds up to QUIT and COLD. Most words in EF24.F are necessary in the building process. The eForth system can be viewed as a very sophisticated application of P24. Most applications are much simpler than eForth system. You can model your application code to eForth, and use all the tools contained therein.

14.2 Serial Port

The No-Cost UART uses very little hardware resource and give us a powerful tool to access and to examine the P24 CPU.

On executing SHR instruction, the least significant bit in T, T(0), is shifted to a flip-flop, whose output is connected to the serial output port. At the same time the state of the serial input port is latched into the carry bit, which is bit T(24). Repeating SHR 8 times, a character is sent out. One character is captured by waiting for the start bit on the serial input port, and then test the port at the intervals of 100 us. One must be very careful in using the SHR instruction. In order not to disturb the output port, you should always set T(0) to a 1 before executing SHR. This way, the serial output port stays at the mark level.

50us delays 52 us, half of a bit at 9600 baud.

100us delays 104 us, one bit frame at 9600 baud.

EMIT ( c ) sends character c to the serial output port.

KEY ( -- c ) waits for a character from the serial input port. The serial ports are actually connected to the T register.

CRR .( Chararter IO ) CRR

CODE 50us

2 ldi skip

CODE 100us

1 ldi

then

sta -138 ldi

begin lda add

-until

drop

ret

CODE EMIT ( c -- )

$7F ldi and

shl $FFFF01 ldi xor

$0A ldi

FOR shr 100us NEXT

drop ret

CODE KEY ( -- c )

$FFFFFF ldi

begin shr

-until

repeat ( wait for start bit )

50us

7 ldi

FOR

100us shr

-if $80 ldi xor then

$FF ldi and

100us ret

14.3 Simple Utility Words

These common functions are too complicated to code in machine instructions, and are left in the high level form.

CRR .( Common functions ) CRR

:: U< ( u u -- t ) 2DUP XOR 0< IF SWAP DROP 0< EXIT THEN - 0< -;'

:: < ( n n -- t ) 2DUP XOR 0< IF DROP 0< EXIT THEN - 0< -;'

:: MAX ( n n -- n ) 2DUP < IF SWAP THEN DROP ;;

:: MIN ( n n -- n ) 2DUP SWAP < IF SWAP THEN DROP ;;

:: WITHIN ( u ul uh -- t ) \ ul <= u < uh

OVER - >R - R> U< -;'

14.4 Division

UM/MOD and /MOD share the same body to do division of a 48-bit divident by a 24 bit divisor, using the DIV machine instruction. The higher half of the divident is placed in T and the lower half is placed in A. The divisor is negated and placed on the data stack below T. The negated divisor is added to T in the adder. If a carry is generated, indicating that T is big enough to subtract the divisor, The sum is accepted into T, and then T-A combination is shifted left by one bit. The most significant bit in A is shifted into T(0), and Carry is shifted into A(0). If the adder does not generate a carry, the subtraction will not be done. The T-A combination is shifted left by one bit, and a 0 is shifted into A(0).

The above divide step DIV instructions is repeated 25 times to generate the proper quotient in A. The remainder is in T, if it is shifted right by one bit.

The only restriction in this division procedure is that the divisor and the divident must be positive. It cannot handle negative divisor or negative divident. This is not a serious limitation because the special word M/MOD does signed division by first convert both divisor and divident to positive numbers for division operations, and then place appropriate signs in front of quotient and remainder.

UM/MOD, /MOD, /, and MOD all assume that divisors and dividents are positive. In the eForth system, this is not a problem. Nevertheless, users must be aware of this limitation when writing code which must handle negative numbers.

CRR .( Divide ) CRR

CODE UM/MOD ( ud u -- ur uq )

com 1 ldi add sta

push lda push sta

pop pop

skip

CODE /MOD ( n n -- r q )

com 1 ldi add push

sta pop 0 ldi

then

div div div div

div 1 ldi xor shr

push drop pop lda

ret

CODE MOD ( n n -- r )

/MOD

drop ret

CODE / ( n n -- q )

/MOD

push drop pop ret

:: M/MOD ( d n -- r q ) \ floored

DUP 0< DUP >R

IF NEGATE >R DNEGATE R>

THEN >R DUP 0< IF R@ + THEN R> UM/MOD R>

IF SWAP NEGATE SWAP THEN ;;

14.5 Multiplication

UM* multiplies two unsigned 24-bit integers and produces a 48-bit product. The multiplier is placed in A register, and the multiplicant is placed on the data stack below T. T is cleared to zero. The MUL machine instruction looks at A(0) bit. If it is a one, the multiplicant is added to T, and the T-A combination is shifted to the right by one bit. Carry us shifted into T(23). It A(0) is a zero, the multiplicant is not added. The T-A combination is shifted to the right, and a zero is shifted into T(23). After the MUL instruction is repeated 24 times, a 48-bit product is produced in the T-A combination. T has the more significant half and A has the less significant half of the product.

Both UM* and * do the unsigned multiplication. M* does signed multiplication. For correctness, * should call M* to do the multiplicant. However, here * calls UM* for speed. You should be aware of this property in your applications. As the eForth system only does unsigned multiplications, it is not a problem.

CRR .( Multiply ) CRR

CODE UM* ( u u -- ud )

sta 0 ldi

mul mul mul mul

push drop lda pop

ret

:: * ( n n -- n ) UM* DROP ;;

:: M* ( n n -- d )

2DUP XOR 0< >R ABS SWAP ABS UM* R> IF DNEGATE THEN ;;

:: */MOD ( n n n -- r q ) >R M* R> M/MOD -;'

:: */ ( n n n -- q ) */MOD SWAP DROP ;;

14.6 Memory Access Words

There are three buffer areas used often in the eForth system. HERE returns the address of the first free location above the code dictionary, where new words are compiled. PAD returns the address of the text buffer where numbers are constructed and text strings are stored temporarily. TIB is the terminal input buffer where input text string is held.

@EXECUTE is a special word supporting the vectored execution words in eForth. It takes the word address stored in a memory location and executes the word. It is used extensively to execute the vectored words in the user area.

A memory array is generally specified by a starting address and its length in words. In a string array, the first word is a count, specifying the number of words in the following string. This is called a counted string.

COUNT converts a string array address to the address-length representation of a counted string.

CMOVE copies a memory array from one location to another. FILL fills a memory array with the same byte.

>CHAR filters out non-printable characters for TYPE. It thus ensures that TYPEing a non-printable character will not choke the printer.

CRR .( Bits & Bytes ) CRR

:: >CHAR ( c -- c )

$7F LIT AND DUP $7F LIT BL WITHIN

IF DROP ( CHAR _ ) $5F LIT THEN ;;

CRR .( Memory access ) CRR

:: HERE ( -- a ) CP @ ;;

:: PAD ( -- a ) CP @ 50 LIT + ;;

:: TIB ( -- a ) 'TIB @ ;;

CRR

:: @EXECUTE ( a -- ) @ ?DUP IF EXECUTE THEN ;;

:: CMOVE ( b b u -- )

FOR AFT >R DUP @ R@ ! 1+ R> 1+ THEN NEXT 2DROP ;;

:: FILL ( b u c -- )

SWAP FOR SWAP AFT 2DUP ! 1+ THEN NEXT 2DROP ;;

14.6 String Packing and Unpacking Words

PACK$ packs the string at b with length u into memory located at a, three bytes to a 24-bit program word. It calls B> to do the packing. This packing function greatly reduces the total size of the P24 code image. The packing also speeds up the dictionary searches because three bytes are compared at once. The system scratch variable TMP is used to store the byte count which directs the bytes to their proper location. After the byte string is fully packed, the last packed program word is left justified and empty slots are filled with NUL bytes.

:: PACK$ ( b u a -- a ) \ null fill

dup push

1 ldi tmp sta st

sta dup push st

lda pop

FOR AFT ( b a )

tmp sta ld

IF ld 1 ldi xor

IF dup dup xor st

1 ldi add

ELSE 2 ldi st

THEN

ELSE 1 ldi st

THEN

THEN NEXT

tmp sta ld

IF ld 2 ldi xor

IF sta ld

shl shl shl shl

st lda

THEN

sta ld

shl shl shl shl

st lda

THEN

drop drop pop

;;

UNPACK$ unpacks a packed string at address a into a counted byte string at b. It calls >B to unpack a 24-bit word into three bytes. It allows names of words to be printed, and in-line packed strings to be accessed as byte strings.

:: UNPACK$ ( a b -- b )

DUP >R ( save b )

>B $1F LIT AND 3 LIT /

FOR AFT

>B DROP

THEN NEXT

2DROP R>

;;

14.7 Number Output Words

All numbers in P24 are stored internally as 24-bit binary patterns. To make the numbers visual to the user, they are converted to strings of digits to be printed. A number is converted one digit at a time. It is divided by the value stored in BASE, and the remainder is converted to a digit by DIGIT. The quotient is divided further by BASE to build a complete numeric string suitable for printing. The output numeric string is built backward below the memory buffer at PAD, using HLD as the pointer moving backward. Additional formatting characters can be inserted into the output string by HOLD.

This numeric output mechanism is extremely flexible and can produce numbers in a wide variety of formats for tables and arrays. It also allows the user to display numbers in any reasonable base, like decimal, hexadecimal, octal, and binary, among other non-conventional bases.

DIGIT converts an integer to a digit.

EXTRACT extracts the least significan digit from a number n. n is divided by the radix in BASE and returned on the stack.

The output number string is built below the PAD buffer. The least significant digit is extracted from the integer on the top of the data stack by dividing it by the current radix in BASE. The digit thus extracted are added to the output string backwards from PAD to the low memory. The conversion is terminated when the integer is divided to zero. The address and length of the number string are made available by #> for outputting.

An output number conversion is initiated by <# and terminated by #>. Between them, # converts one digit at a time, #S converts all the digits, while HOLD and SIGN inserts special characters into the string under construction. This set of words is very versatile and can handle all different output formats.

CRR .( Numeric Output ) CRR \ single precision

:: DIGIT ( u -- c )

9 LIT OVER < 7 LIT AND +

( CHAR 0 ) 30 LIT + ;;

:: EXTRACT ( n base -- n c )

0 LIT SWAP UM/MOD SWAP DIGIT -;'

:: <# ( -- ) PAD HLD ! ;;

:: HOLD ( c -- ) HLD @ 1- DUP HLD ! ! ;;

:: # ( u -- u ) BASE @ EXTRACT HOLD -;'

:: #S ( u -- 0 ) BEGIN # DUP WHILE REPEAT ;;

CRR

:: SIGN ( n -- ) 0< IF ( CHAR - ) 2D LIT HOLD THEN ;;

:: #> ( w -- b u ) DROP HLD @ PAD OVER - ;;

:: str ( n -- b u ) DUP >R ABS <# #S R> SIGN #> -;'

:: HEX ( -- ) 10 LIT BASE ! ;;

:: DECIMAL ( -- ) 0A LIT BASE ! ;;

14.8 Number Input Words

Numbers are entered into P24 as strings of digits, delimited by spaces and other white characters like CR, TAB, NUL, etc. Numeric strings are converted to internal binary form by multiply the digits, most significant digit first, by the value in BASE and accumulate the product until the digits are exhausted.

DIGIT? converts a digit to its numeric value according to the current base.

NUMBER? converts a string of digits to a single integer. If the first character is a $ sign, the number is assumed to be in hexadecimal. Otherwise, the number will be converted using the radix value stored in BASE. For negative numbers, the first character should be a - sign. No other characters are allowed in the string. If a non-digit character is encountered, the address of the string and a false flag are returned.

CRR .( Numeric Input ) CRR \ single precision

:: DIGIT? ( c base -- u t )

>R ( CHAR 0 ) 30 LIT - 9 LIT OVER <

IF 7 LIT - DUP 0A LIT < OR THEN DUP R> U< -;'

:: NUMBER? ( a -- n T | a F )

BASE @ >R 0 LIT OVER COUNT ( a 0 b n)

OVER @ ( CHAR $ ) 24 LIT =

IF HEX SWAP 1+ SWAP 1- THEN ( a 0 b' n')

OVER @ ( CHAR - ) 2D LIT = >R ( a 0 b n)

SWAP R@ - SWAP R@ + ( a 0 b" n") ?DUP

IF 1- ( a 0 b n)

FOR DUP >R @ BASE @ DIGIT?

WHILE SWAP BASE @ * + R> 1+

NEXT DROP R@ ( b ?sign) IF NEGATE THEN SWAP

ELSE R> R> ( b index) 2DROP ( digit number) 2DROP 0 LIT

THEN DUP

THEN R> ( n ?sign) 2DROP R> BASE ! ;;

Following is the set of words displaying characters to the output device.

DO$ is an internal system word which unpacks a packed string compiled in-line with program words. It digs up the starting address of the packed string on the return stack, unpacks the string to location a, and then move the return address passing the packed string. Then, the execution can continue, skipping the packed string in-line.

$"| is compiled before a packed string. It unpacks the string and returns the address of the TEXT buffer where the unpacked string is stored.

."| is also compiled before a packed string. It unpacks the string and displays it on the output device.

CRR .( Basic I/O ) CRR

:: SPACE ( -- ) BL EMIT -;'

:: CHARS ( +n c -- )

SWAP 0 LIT MAX

FOR AFT DUP EMIT THEN NEXT DROP ;;

:: SPACES ( +n -- ) BL CHARS -;'

:: TYPE ( b u -- )

FOR AFT DUP @ >CHAR EMIT 1+

THEN NEXT DROP ;;

:: CR ( -- ) ( =Cr )

0A LIT 0D LIT EMIT EMIT -;'

:: do$ ( -- a )

R> R@ TEXT UNPACK$

R@ R> @ $3FFFFF LIT AND $30000 LIT / 1+ +

>R SWAP >R ;;

CRR

:: $"| ( -- a ) do$ -;'

:: ."| ( -- ) do$ COUNT TYPE -;'

:: .R ( n +n -- )

>R str R> OVER - SPACES TYPE -;'

:: U.R ( u +n -- )

>R <# #S #> R> OVER - SPACES TYPE -;'

:: U. ( u -- ) <# #S #> SPACE TYPE -;'

:: . ( n -- )

BASE @ 0A LIT XOR

IF U. EXIT THEN str SPACE TYPE -;'

:: ? ( a -- ) @ . -;'

With the number formatting word set as shown above, one can format numbers for output in any form desired. The free output format is a number string preceded by a single space. The fix column format displays a number right-justified in a column of pre-determined width. The words ., U., and ? use the free format. The words .R and U.R use the fix format.

14.9 String Parser

TOKEN parses out the next string in the input stream, delimited by spaces. The string is packed and placed on the top of the dictionary, so that it can be used to do dictionary searches, and becomes the name field if the string just happened to be the name of a new definition.

PARSE allows the user to specify the delimiting character to parse out the next string in the input stream. It calls 'parse' to do the dirty work.

'parse' scans the input stream and skips the leading blanks if SPACE is the delimiting character. The parsed string starts with the next non-delimiting character and is terminated by the next delimiting character. It returns b the beginning address of the parsed word, u the length of the remaining characters in the input stream, and delta the length of the parsed word. It is a very long word with many nested and interlaced structures. It is a challenge even to the very experienced Forth programmers.

PARSE parses out the next string in the Terminal Input Buffer (TIB), started where >IN is pointing at. The c specifies the delimiting character of the string. It returns the address of the string in TIB and its length b;

TOKEN is the crucial word in the Forth text interpreter which scans the terminal input buffer for the next string delimited by spaces. It packs the string into the word buffer at HERE, ready for dictionary search.

WORD is similar to TOKEN, except that it takes the delimiting character from the stack. TOKEN is used by the system. WORD is intended for the users who has to do special parsing on his input strings.

CRR .( Parsing ) CRR

:: (parse) ( b u c -- b u delta ; <string> )

tmp ! OVER >R DUP \ b u u

IF 1- tmp @ BL =

IF \ b u' \ 'skip'

FOR BL OVER @ - 0< NOT

WHILE 1+

NEXT ( b) R> DROP 0 LIT DUP EXIT \ all delim

THEN R>

THEN OVER SWAP \ b' b' u' \ 'scan'

FOR tmp @ OVER @ - tmp @ BL =

IF 0< THEN WHILE 1+

NEXT DUP >R

ELSE R> DROP DUP 1+ >R

THEN OVER - R> R> - EXIT

THEN ( b u) OVER R> - ;;

:: PARSE ( c -- b u ; <string> )

>R TIB >IN @ +

#TIB @ >IN @ -

R> (parse) >IN +! ;;

:: TOKEN ( -- a ;; <string> )

BL PARSE 1F LIT MIN 2DUP

DUP TEXT ! TEXT 1+ SWAP CMOVE

HERE 1+ PACK$ -;'

:: WORD ( c -- a ; <string> )

PARSE HERE 1+ PACK$ -;'

14.10 Dictionary Search

'find' follows the linked list in the dictionary, and compares the names of each compiled word with the packed string stored at address a. va points to the starting name field of the dictionary. If a match is found, it returns the execution address (code field address) and the name field address of the matching word in the dictionary. If it failed to find a match, it returns the address of the packed string and a 0 for a false flag.

'find' runs through the dictionary very quickly, because it compares the length and the first two characters in the names. Most Forth words are unique in these three characters. For words with the same lengths and identical first two characters, 'find' calls SAME? to determine whether the remaining characters of the packed strings match.

NAME> converts a name field address na to a code field address xt.

NAME? Searches the dictionary for the string at address a, starting from the top of the dictionary. The name field address of the last word stored in the dictionary is maintained in the variable CONTEXT. This is where the dictionary search begins.

CRR .( Dictionary Search ) CRR

:: NAME> ( na -- xt )

DUP @ $3FFFFF LIT AND

$30000 LIT / + 1+ ;;

:: SAME? ( a a u -- a a f \ -0+ )

$30000 LIT /

FOR AFT OVER R@ + @

OVER R@ + @ - ?DUP

IF R> DROP EXIT THEN

THEN NEXT

0 LIT ;;

:: find ( a va -- xt na | a F )

SWAP \ va a

DUP @ tmp ! \ va a \ get cell count

DUP @ >R \ va a \ count

1+ SWAP \ a' va

BEGIN @ DUP \ a' na na

IF DUP @ $3FFFFF LIT AND

R@ XOR \ ignore lexicon bits

IF 1+ -1 LIT

ELSE 1+ tmp @ SAME?

THEN

ELSE R> DROP SWAP 1- SWAP EXIT \ a F

THEN

WHILE 1- 1- \ a' la

REPEAT R> DROP SWAP DROP

1- DUP NAME> SWAP ;;

:: NAME? ( a -- xt na | a F )

CONTEXT find -;'

14.11 Terminal Input

^H processes the Back Space encountered in the input stream. It backs up the character pointer and erased the character preceding the Back Space.

TAP echoes an input character and deposits it into the terminal input buffer.

kTAP detects a Carriage Return to terminate the input stream. It also calls ^H to process a Back Space, and TAP to process ordinary characters. These words allows the interpreter to handle a human user on the terminal smoothly, and friendly.

CRR .( Terminal ) CRR

:: ^H ( b b b -- b b b ) \ backspace

>R OVER R> SWAP OVER XOR

IF ( =BkSp ) 8 LIT EMIT

1- BL EMIT \ distructive

( =BkSp ) 8 LIT EMIT \ backspace

THEN ;;

:: TAP ( bot eot cur c -- bot eot cur )

DUP EMIT OVER ! 1+ ;;

:: kTAP ( bot eot cur c -- bot eot cur )

DUP ( =Cr ) 0D LIT XOR

IF ( =BkSp ) 8 LIT XOR

IF BL TAP ELSE ^H THEN

EXIT

THEN DROP SWAP DROP DUP ;;

QUERY accepts a line of characters typed in by the user and put them in the terminal input buffer for interpreting or compiling. The line is terminated at the 80th input character or by a Carriage Return.

'accept' waits for input characters and place them in the terminal input buffer at b with length u. It returns the same buffer address b with the length of the character string actually received.

EXPECT receives the input stream and stores the length in the variable SPAN.

CRR

:: accept ( b u -- b u )

OVER + OVER

BEGIN 2DUP XOR

WHILE KEY DUP BL - 5F LIT U<

IF TAP ELSE kTAP THEN

REPEAT DROP OVER - ;;

:: EXPECT ( b u -- ) accept SPAN ! DROP ;;

:: QUERY ( -- )

TIB 50 LIT accept #TIB !

DROP 0 LIT >IN ! ;;

14.12 Error Handling Words

ABORT actually executes QUIT, which is defined much later. Here it is defined as a vectored execution word which gets the execution address in the system variable 'ABORT. This mechanism also gives the user some flexibility in how the application should handle an error condition.

abort" aborts after a warning message is displayed, if the flag on stack is true. Otherwise, ignore the message and continue on.

ERROR prints the character string store in the TEXT buffer before aborting. The TEXT buffer contains the word just parsed out of the input stream. This is the word which the interpreter/compiler fail to recognize. The natural error message is the name of this word followed by a ? mark.

CRR .( Error handling ) CRR

:: ABORT ( -- ) 'ABORT @EXECUTE ;;

:: abort" ( f -- )

IF do$ COUNT TYPE ABORT THEN do$ DROP ;;

:: ERROR ( a -- )

SPACE TEXT COUNT TYPE

$3F LIT EMIT CR ABORT

14.13 Text Interpreter

$INTERPRET interprets the word just parsed out of the input stream. It searches the dictionary for this word. If a match is found, executes it, unless the word is marked as a compile-only word. It a match is now found in the dictionary, convert the word into a number. If successful, the number is left on the data stack. If not successful, exit with ERROR.

[ activates the text interpreter by storing the execution address of $INTERPRET into the variable 'EVAL, which is executed in EVAL while the text interpreter is in the interpretive mode.

.OK prints the familiar 'ok' prompt after executing to the end of a line. 'ok' is printed only when the text interpreter is in the interpretive mode. While compiling, the prompt is suppressed.

EVAL is the interpreter loop which parses words from the input stream and invokes whatever is in 'EVAL to handle that word, either execute it with $INTERPRET or compile it with $COMPILE.

QUIT is the operating system, or a shell, of the eForth system. It is an infinite loop eForth will never get out. It uses QUERY to accept a line of commands from the terminal and then let EVAL parse out the words and execute them. After a line is processed, it displays 'ok' and wait for the next line of commands. When an error occurred during execution, it displays the command which caused the error with an error message.

Because the behavior of EVAL can be changed by storing either $INTERPRET or $COMPILE into 'EVAL, QUIT exhibits the dual nature of a text interpreter and a compiler.

CRR .( Interpret ) CRR

:: $INTERPRET ( a -- )

NAME? ?DUP

IF @ 400000 LIT AND

ABORT" $LIT compile only" EXECUTE EXIT

THEN DROP TEXT NUMBER?

IF EXIT THEN ERROR

:: [ ( -- )

forth' $INTERPRET >body forth@ LIT 'EVAL !

;; IMMEDIATE

:: .OK ( -- )

forth' $INTERPRET >body forth@ LIT 'EVAL @ =

IF ."| $LIT OK" CR

THEN ;;

:: EVAL ( -- )

BEGIN TOKEN DUP @

WHILE 'EVAL @EXECUTE \ ?STACK

REPEAT DROP .OK -;'

CRR .( Shell ) CRR

:: QUIT ( -- )

( =TIB) $730 LIT 'TIB !

[ BEGIN QUERY EVAL AGAIN

14.14 Compiler

After wading through the text interpreter, the Forth compiler will be an easy piece of cake, because the compiler uses almost all the modules used by the text interpreter. What the compile does, over and above the text interpreter, is to build various structures required by the new words we want to add to the existing system. Here is a list of these structures:

Name headers

Colon definitions

Constants

Variables

Integer literals

String literals

Address literals

Control structures

A special concept of immediate words is difficult to grasp at first. It is required in the compiler because of the needs in building different data and control structures in a colon definition. To understand the Forth compiler fully, you have to be able to differentiate and relate the actions during compile time and actions taken during executing time. Once these concepts are clear, the whole Forth system will become fairly transparent.

Here is a group of words which support the compiler to build new words in the code dictionary.

' (tick) searches the next word in the input stream for a word in the dictionary. It returns the execution address of the word if successful. Otherwise, it displays an error message.

ALLOT allocates n bytes of memory on the top of the code dictionary. Once allocated, the compiler will not touch the memory locations.

, (comma) adds the execution address of a word on the top of the data stack to the code dictionary, and thus compiles a word to the growing word list of the word currently under construction.

COMPILE is used in a colon definition. It causes the next word after COMPILE to be added to the top of the code dictionary. It therefore forces the compilation of a word at the run time.

[COMPILE] acts similarly, except that it compiles the next word immediately. It causes the following word to be compiled, even if the following word is an immediate word which would otherwise be executed.

LITERAL compiles an integer literal to the current colon definition under construction. The integer literal is taken from the data stack, and is preceded by the word doLIT. When this colon definition is executed, doLIT will extract the integer from the word list and push it back on the data stack. LITERAL compiles an address literal if the compiled integer happens to be an execution address of a word. The address will be pushed on the data stack at the run time by doLIT.

$," compiles a string literal. The string is taken from the input stream and is terminated by the double quote character. $," only copies the counted string to the code dictionary. A word which makes use of the counted string at the run time must be compiled before the string. It is used by ." and $".

CRR .( Compiler Primitives ) CRR

:: ' ( -- xt )

TOKEN NAME? IF EXIT THEN

ERROR

:: ALLOT ( n -- ) CP +! ;;

:: , ( w -- ) HERE DUP 1+ CP ! ! ;;

:: [COMPILE] ( -- ; <string> )

' $100000 LIT OR , -;' IMMEDIATE

CRR

:: COMPILE ( -- ) R> DUP @ , 1+ >R ;;

:: LITERAL $29E79E LIT , ,

-;' IMMEDIATE

:: $," ( -- ) ( CHAR " )

22 LIT WORD @ 1+ ALLOT -;'

?UNIQUE is used to display a warning message to show that the name of a new word is a duplicate to a word already existing in the dictionary. eForth does not mind your reusing the same name for different words. However, giving many words the same name is a potential cause of problems in maintaining software projects. It is to be avoided if possible and ?UNIQUE reminds you of it.

$,n builds a new entry in the name dictionary using the name already moved to the bottom of the name dictionary by PACK$. It pads the word field with the address of the top of code dictionary where the new code is to be built, and link the link field to the current vocabulary. A new word can now be built in the code dictionary.

CRR .( Name Compiler ) CRR

:: ?UNIQUE ( a -- a )

DUP NAME?

IF TEXT COUNT TYPE ."| $LIT reDef "

THEN DROP ;;

:: $,n ( a -- )

DUP @

IF ?UNIQUE

( na) DUP DUP NAME> CP !

( na) DUP LAST ! \ for OVERT

( na) 1-

( la) CONTEXT @ SWAP ! EXIT

THEN ERROR

$COMPILE compiles the word just parsed out of the input stream. It searches the dictionary for this word. If a match is found, compiles it as a subroutine call, unless the word is marked as an immediate word. An immediate word is executed by the compiler. If a match is not found in the dictionary, convert the word into a number. If successful, the number is compile as a literal. If not successful, exit with ERROR.

OVERT links a new definition to the current vocabulary and thus makes it available for dictionary searches.

; terminates a colon definition. It compiles an RET to the end of the word list, links this new word to the current vocabulary, and then reactivates the interpreter.

] turns the interpreter to a compiler.

: creates a new header and start a new colon word. It takes the following string in the input stream to be the name of the new colon definition, by building a new header with this name in the name dictionary. Now, the code dictionary is ready to accept a word list. ] is now invoked to turn the text interpreter into a compiler, which will compile the following words in the input stream to a list of subroutine calls in the dictionary. The new colon definition is terminated by ;, which compiles an RET to terminate the word list, and executes [ to turn the compiler back to text interpreter.

CRR .( FORTH Compiler ) CRR

:: $COMPILE ( a -- )

NAME? ?DUP

IF @ $800000 LIT AND

IF EXECUTE

ELSE $3FFFF LIT AND $100000 LIT OR ,

THEN EXIT

THEN DROP TEXT NUMBER?

IF LITERAL EXIT

THEN ERROR

:: OVERT ( -- ) LAST @ CONTEXT ! ;;

:: ; ( -- )

$5E79E LIT , [ OVERT -;' IMMEDIATE

:: ] ( -- )

forth' $COMPILE >body forth@ LIT 'EVAL ! ;;

:: : ( -- ; <string> )

TOKEN $,n ] -;'

With “:” thus defined, the eForth system is essentially complete. It runs generally as a text interpreter. When “:” is encountered, it compiles a new word and adds it to the existing system. This is Forth.

14.15 Debugging Tools

eForth provides a set of very powerful tools to help users debugging their programs. Since most Forth words can be executed interactively under the interpreter, there is no need to set up break points for tracing a complicated program. One simply execute the component words sequentially and examine the stack and memory to determine if the words behave properly.

What it does provide are:

DUMP to dump the contents of a range of memory.

WORDS to dump the names of words in the dictionary.

.S to dump the contents of the data stack.

SEE to decompile a colon word.

DUMP dumps u words starting at address b to the terminal. It dumps 8 words to a line. A line begins with the address of the first word, followed by 8 words shown in hex, 7 columns per word.

dm+ displays u words from b1 in one line. It leave the address b1+u on the stack for the next dm+ command to use.

_TYPE is similar to TYPE. It displays u characters starting from b. Non-printable characters are replaced by underscores.

CRR .( Tools ) CRR

:: dm+ ( b u -- b )

OVER 7 LIT U.R SPACE

FOR AFT DUP @ 7 LIT U.R 1+

THEN NEXT ;;

:: DUMP ( b u -- )

BASE @ >R HEX 8 LIT /

FOR AFT CR 8 LIT 2DUP dm+

THEN NEXT DROP R> BASE ! ;;

WORDS allows you to examine the dictionary and to look for the correct names of words in case you are not sure of their spellings. WORDS follows the vocabulary thread in the user variable CONTEXT and displays the names of each entry in the name dictionary. The vocabulary thread can be traced easily because the link field in the header of a word points to the name field of the previous word. The link field of the next word is one cell below its name field.

WORDS displays all the names in the context vocabulary. The order of words is reversed from the compiled order. The last defined words is shown first.

.ID displays the name of a word, given the word's name field address. It also replaces non-printable characters in a name by under-scores.

Since the name fields are linked into a list in the name dictionary, it is fairly easy to locate a word by searching its name in the name dictionary. However, finding the name of a word from the execution address of the word is more difficult, because the execution addresses of words are not organized in any systematic way.

It is necessary to find the name of a word from its execution address, if we wanted to decompile the contents of a word list in the code dictionary. This reversed search is accomplished by the word >NAME.

>NAME finds the name field address of a word from the execution address of the word. If the word does not exist in the CURRENT vocabulary, it returns a false flag. It is the mirror image of the word NAME>, which returns the execution address of a word from its name address. Since the execution address of a word is stored in the word field, two cells below the name, NAME> is trivial. >NAME is more complicated because the entire name dictionary must be searched to locate the word. >NAME only searches the CURRENT vocabulary.

SEE searches the dictionary for the next word in the input stream and returns its code field address. Then it scans the list of subroutine calls (words) in the colon definition. If the address of the subroutine matches the execution address of a word in the name dictionary, the name will be displayed by the command '.ID'. If the word does not match any subroutine in the dictionary, it must be part of a structure and it is displayed by 'U.'. This way, the decompiler ignores all the data structures and control structures in the colon definition, and only displays valid subroutine calls in the word list.

CRR

:: >NAME ( xt -- na | F )

CONTEXT

BEGIN @ DUP

WHILE 2DUP NAME> XOR

IF 1-

ELSE SWAP DROP EXIT

THEN

REPEAT SWAP DROP ;;

:: .ID ( a -- )

TEXT UNPACK$

COUNT $01F LIT AND TYPE SPACE -;'

CRR

:: SEE ( -- ; <string> )

' CR

BEGIN

20 LIT FOR

DUP @ DUP FC0000 LIT AND

DUP

IF 100000 LIT XOR THEN

IF U. SPACE

ELSE 3FFFF LIT AND >NAME

?DUP IF .ID THEN

THEN 1+

NEXT KEY 0D LIT = \ can't use ESC on terminal

UNTIL DROP ;;

:: WORDS ( -- )

CR CONTEXT

BEGIN @ ?DUP

WHILE DUP SPACE .ID 1-

REPEAT ;;

Data stack is the working place of the Forth computer. It is where words receive their parameters and also where they left their results. In debugging a newly defined word which uses stack items and which leaves items on the stack, the best was to check its function is to inspect the data stack. The number output words may be used for this purpose, but they are destructive. You print out the number from the stack and it is gone. To inspect the data stack non-destructively, a special utility word .S is provided in most Forth systems.

.S dumps the contents of the data stack on the screen in the free format. The top of the stack is aligned to the left. .S does not change the data stack so it can be used to inspect the data stack non-destructively at any time. As the P24 has a 16 level hardware data stack, and the stack pointer is not available to software, we really do not know how deep the stack is. We only know the top of the stack, and can push it or pop it. However, if we dump all 16 items out, the stack will come to rest at the same point before the dump. Hence, .S dumps of T and 16 levels of data stack, and the stack is preserved.

CODE .S ( dump all 17 stack items )

PAD sta stp

stp stp stp stp

DROP PAD $10 LIT

FOR DUP ? 1+ NEXT

DROP PAD @ CR -;'

14.16 Start Up

After powering up, the P24 CPU starts executing the instruction at location 0. A small set of instructions following location 0 sets up the user variables necessary for the proper operation of the interpreter and the compiler. Then, it jumps to COLD and starts eForth.

COLD first executes DIAGNOSE, to help the hardware designer making sure that the CPU is executing the most commonly used machine instruction correctly. This routine is very useful in verifying the CPU design in VHDL. Simulator in VHDL can be invoked to trace through these instructions in DIAGNOSE.

DIAGNOSE executes a sequence of words to leave the ASCII code of ‘ForthMl’ on the data stack for the hardware designer to see.

COLD starts eForth by first displaying the sign-on message ‘P24 v1.02’, and then jump to QUIT to start the Forth interpreter.

CRR .( Hardware reset ) CRR

:: DIAGNOSE ( - )

$65 LIT

\ 'F' prove UM+ 0< \ carry, TRUE, FALSE

0 LIT 0< -2 LIT 0< \ 0 FFFF

UM+ DROP \ FFFF ( -1)

3 LIT UM+ UM+ DROP \ 3

$43 LIT UM+ DROP \ 'F'

\ 'o' logic: XOR AND OR

$4F LIT $6F LIT XOR \ 20h

$F0 LIT AND

$4F LIT OR

\ 'r' stack: DUP OVER SWAP DROP

8 LIT 6 LIT SWAP

OVER XOR 3 LIT AND AND

$70 LIT UM+ DROP \ 'r'

\ 't'-- prove BRANCH ?BRANCH

0 LIT IF $3F LIT THEN

-1 LIT IF $74 LIT ELSE $21 LIT THEN

\ 'h' -- @ ! test memeory address

$68 LIT $700 LIT !

$700 LIT @

\ 'M' -- prove >R R> R@

$4D LIT >R R@ R> AND

\ 'l' -- prove 'next' can run

1 LIT $6A LIT FOR 1 LIT UM+ DROP NEXT

;;

CRR

:: COLD ( -- )

diagnose

CR ."| $LIT P24 v"

66 LIT <# # # ( CHAR . ) 2E LIT HOLD # #> TYPE

CR QUIT

14.17 Control Structure Words

This is the set of compiler words which allows the user to build control structures in a colon word. The structures include:

Conditional:

IF … THEN

IF … ELSE … THEN

Finite loop:

FOR … NEXT

FOR … AFT … THEN … NEXT

Infinite loop:

BEGIN … AGAIN

Indefinite loop:

BEGIN … UNTIL

BEGIN … WHILE … REPEAT

These compiler directives are not compiled like other regular Forth words into a colon word. Instead, they compile machine instructions like JZ, JMP, doNEXT, >R, into colon word with the proper address information so that the control structures behave properly when the colon word is executed. All these words are ‘IMMEDIATE’ words which are executed, not compiled, in colon words.

CRR .( Structures ) CRR

:: IF ( -- A ) HERE $80000 LIT , -;' IMMEDIATE

:: FOR ( -- a ) $71E79E LIT , HERE -;' IMMEDIATE

:: BEGIN ( -- a ) HERE -;' IMMEDIATE

:: AHEAD ( -- A ) HERE 0 LIT , -;' IMMEDIATE

CRR

:: AGAIN ( a -- ) , -;' IMMEDIATE

:: THEN ( A -- ) HERE SWAP +! ;; IMMEDIATE

:: NEXT ( a -- ) COMPILE doNEXT , -;' IMMEDIATE

:: UNTIL ( a -- ) $80000 LIT + , -;' IMMEDIATE

CRR

:: REPEAT ( A a -- ) AGAIN THEN -;' IMMEDIATE

:: AFT ( a -- a A ) DROP AHEAD BEGIN SWAP ;; IMMEDIATE

:: ELSE ( A -- A ) AHEAD SWAP THEN -;' IMMEDIATE

:: WHILE ( a -- A a ) IF SWAP ;; IMMEDIATE

14.18 Redefine Macro Words

As many Forth words are actually P24 machine instructions, the P24 Metacompiler tries its best to assemble machine instructions instead of compiling subroutine calls. Macros were defined, as shown in Section 6.2, to produce optimized code in the eForth system.

However, the macros are only tools in the metacompiler, and are not available in the target system. The end users still need all these Forth words for interpreting and compiling. These words must be included in the final system as ordinary Forth words. They are defined here.

CRR .( macro words ) CRR

CODE EXIT pop drop ret

CODE EXECUTE push ret

CODE ! sta st ret

CODE @ sta ld ret

CRR

CODE R> pop sta pop lda push ret

CODE R@ pop sta pop dup push lda push ret

CODE >R sta pop push lda ret

CRR

CODE SWAP

push sta pop lda ret

CODE OVER

push dup sta pop

lda ret

CODE 2DROP

drop drop ret

CRR

CODE + add ret

CODE NOT com ret

CODE NEGATE

com 1 ldi add ret

CODE 1-

-1 ldi add ret

CODE 1+

1 ldi add ret

CRR

CODE BL

20 ldi ret

CODE +!

sta ld add st

ret

CODE -

com add 1 ldi add

ret

CRR

CODE DUP dup ret

CODE DROP drop ret

CODE AND and ret

CODE XOR xor ret

CODE COM com ret

14.19 Final System Words

ABORT" compiles an error message. This error message is display when the top item on the stack is non-zero. The rest of the words in the definition is skipped and eForth re-enters the interpreter loop. This is the universal response to an error condition.

." packs and compiles a character string literal which will be printed which the word containing it is executed in the runtime.

$" packs and compiles a character string literal. When it is executed, only the address of the unpacked string is left on the data stack. The programmer will use this address to access the string and individual characters in the string as a string array.

CODE starts a new word containing machine code mnemonic.

CREATE defines an array in memory. Its size must be specified by ALLOT.

VARIABLE defines a variable in memory. Its initial value is 0.

( starts a comment like ( this is a comment. ) The string until and including ) is ignored.

\ starts a comment line until the next end-of-line.

.( starts a comment which is printed on the terminal. The string up to but not including ) is printed.

IMMEDIATE marks the word last defined as ‘immediate’. Immediate words are not compiled in a colon word. They are executed to build control structures.

CRR

:: ABORT" ( -- ; <string> ) COMPILE abort" $," ;; IMMEDIATE

:: $" ( -- ; <string> ) COMPILE $"| $," ;; IMMEDIATE

:: ." ( -- ; <string> ) COMPILE ."| $," ;; IMMEDIATE

:: CODE ( -- ; <string> ) TOKEN $,n OVERT -;'

:: CREATE ( -- ; <string> ) CODE doVAR ;;

:: VARIABLE ( -- ; <string> ) CREATE 0 LIT , -;'

CRR

:: .( ( -- ) 29 LIT PARSE TYPE -;' IMMEDIATE

:: \ ( -- ) #TIB @ >IN ! ;; IMMEDIATE

:: ( 29 LIT PARSE 2DROP ;; IMMEDIATE

:: IMMEDIATE $800000 LIT LAST @ @ OR LAST @ ! ;;

CRR