|
cmFORTH FOR RTX2000 FORTH ENGINE
CONTENTS
1. INTRODUCTION 1.1. cmForth For Novix NC4000 1.2. cmForth For Harris RTX2000 1.3. Available Literature
2. SPECIAL FEATURES IN cmFORTH 2.1. Directed Threaded Code 2.2. Dual Vocabulary Structure 2.3. Optimized Compiler 2.4. Meta-Compiler 2.5. Native Control Structures 2.6. Math Step Instructions 2.7. Multiple Function Single Cycle Instructions 2.8. Serial Communication 2.9. Serial Disk
3. ADAPTING cmFORTH TO RTX2000 3.1. Indelko Implementation of cmForth 3.2. Motorola Style Byte Orientation 3.3. Single Cycle Multiply 3.4. Accessing ASIC Bus 3.5. Timers 3.6. Interrupt Controller 3.7. Internal Stacks
4. INDELKO FORTH KIT 4.1. Hardware Design 4.2. cmForth in ROM 4.3. Debugging Forth Kit 4.4. Host Interface 4.5. Serial Disk
5. PROGRAMMING IN cmFORTH 5.1. cmForth Source Code 5.2. Utilities 5.3. Double and Quad Math Words 5.4. Multiple Precision Math Words
Listing 1. RTX cmForth Source Listing Listing 2. cmForth Utilities Listing 3. Double and Quad Math Words Listing 4. Multiple Precision Math Words
1. INTRODUCTION Forth was invented by Charles H. Moore in the late 1960's as a tool to make himself more productive. He developed it into a programming language and it was adopted by the International Astronomy Society as the language for observatory automation because Mr. Moore used it to program many telescopes. Forth was thus distributed to many distant countries following the telescopes. Forth evolved into a general purpose programming language after Mr. Moore and his colleagues formed Forth, Inc. to market Forth in the forms of microForth, imageForth, and finally, polyForth, and programming services based on Forth. However, wide-spread enthusiasm on Forth was generated by a non-profit organization, Forth Interest Group (FIG), in the San Francisco Area. FIG released a public domain version of Forth implemented on several microprocessors, including 8080, 6800, 6502, PDP-11, PACE, and 9900. This became well known as the figForth model Because Forth is simple, terse, and built on modular, high level constructs, it can be easily ported to different CPU's. In fact, Forth is claimed to exist on every commercial CPU. However, Mr. Moore was not entirely satisfied to implement Forth as a software layer on the top of CPU's not optimized to take advantages of the best features in the Forth language, such as the dual stack architecture, the threaded address lists, and the loop and control structures. In early 1980's, Mr. Moore left Forth, Inc. and started to design a microprocessor optimized for the Forth language. The chip was built by Novix, Inc. and released as NC4000 in 1985. NC4000 was a very powerful chip. It can execute several Forth instructions in a single machine cycle, using an 'external microcode' instruction set. However, Novix released NC4000 too soon, before many of the bugs in NC4000 were fixed. Instead of fixing the bugs in NC4000, Novix embarked on a more ambitious project to redesign it into a much more versatile chip, NC6000, while marketing NC4000 as a real chip. The NC6000 effort was equally unsuccessful, and sealed the fate of Novix. Many Forth programmers were excited by NC4000, in spite of its shortcomings. An NC4000 Users Group was organized in the Silicon Valley Chapter of Forth Interest Group to explore NC4000 applications and distribute information and source code on NC4000. A newsletter 'More on NC4000' is published quarterly. Harris Semiconductors, as the leading producer of CMOS logic for high reliability applications, were building its ASIC capability to supply customized ASIC's. It acquired the patents on NC4000/6000 from Novix and included this design as a CPU core in its cell library. NC4000 design fits perfectly in this strategy, because of its small size and the high execution speed. Harris first built a chip set consisting of a CPU core, two stack controllers, a single cycle multiplier, and an interrupt controller. Later, this chip set was merged into a single microprocessor, RTX2000. RTX stands for Real Time Express. There are many Forth languages and development systems supporting RTX2000. Harris sells a RTX Development Board with supporting software. Silicon Composers also build boards based on RTX2000 with their software development system. Forth, Inc. has a version of polyForth for RTX. Mr. Thor-Bjorn Bladh of Indelko in Lund, Sweden has been an active member of the NC4000 Users Group. He modified the cmForth on NC4000 to cross-compile cmForth to run on an RTX2000 system. He generously released this version of cmForth to the public domain. He also builds kits for people who like to experiment with RTX chips. The cmForth is part of this kit.
1.1. cmForth For Novix NC4000 After Mr. Moore designed the NC4000 chip, he was the first person receiving prototype chips for evaluation and experimentation. He built a small PC board to host an NC4000 chip, some memory and a few TTL glue chips. He modified polyForth and wrote a small yet complete Forth system for NC4000. This Forth went through many revisions in 1985 as Mr. Moore designed and revised his ForthKit (originally called Gamma Board), which was distributed through his company, Computer Cowboys. Silicon Composers, then Software Composers, were also building small single board computers based on NC4000. Mr. Moore kindly donated his Forth to the public domain to encourage people building NC4000 computers and developing applications. He called this Forth cmForth, which stands for Chuck Moore's Forth. It was the Forth used by Silicon Composers in their first product, the Delta Board. cmForth is a gem. It occupies about 6 Kbytes, and the source code is less than 30 screens. It summarizes Mr. Moore's programming experience over the prior 20 years. Many innovative ideas are incorporated in it, including optimizing compiler, immediate vocabulary search, efficient target compilation, serial disk, and down-counting loops. It is required reading for any serious Forth programmer. NC4000 and cmForth were thoroughly discussed in C. H. Ting's book 'Footsteps in an Empty Valley', published by Offete Enterprises, Inc.
1.2. cmForth For Harris RTX2000 RTX2000 is very similar to NC4000 in the internal architecture and the general design of the instruction set. However, the instruction sets are sufficiently different and the code cannot be ported without a specialized cross-compiler. Porting cmForth from NC4000 to RTX2000 is thus no trivial task. Mr. Bladh did a marvelous job in this porting. He developed a cross compiler, running on an NC4000 machine to recompile cmForth into RTX code. The cmForth was rewritten so that it could emit the correct RTX code. After the RTX is running, Mr. Bladh developed another version of cmForth which can be recompiled by the RTX machine. This cmForth can be then modified or extended by users on his RTX Forth machines. Mr. Bladh preserved as many features in the original cmForth as he possibly could. Most high level words were not modified at all. However, low level words which must deal with the RTX machine had to be properly modified to suit the new environment. One of the worst problems encountered in programming NC4000 is that it is a word oriented machine, and it is very difficult to access the upper byte in a 16 bit word. Most Forth systems assume that memory is addressible by bytes. Therefore, it is awkward to port available Forth applications into NC4000 cmForth. RTX2000 has hardware to do byte swapping during memory accesses and it can treat memory in bytes or in words. In byte addressing, RTX can use either Intel (Big Ending) or Motorola (Small Ending) byte order. Mr. Bladh chose to address memory in the Motorola byte order, and made it easy to get any byte in the 64 Kbyte addressing space.
1.3. Available Literature Information and literature on NC4000 and RTX2000 are scattered and not easily obtained. The NC4000 Users Group attempts to collect news, products, applications, and source code on them and redistribute to the members in the 'More on NC4000' newsletter.
2. SPECIAL FEATURES IN cmFORTH
2.1. Directed Threaded Code Most Forth systems are built on Indirect Threaded Code. A high level word contains a list of execution addresses. The inner interpreter scans this list and executes words represented by their execution addresses. An execution address points to a memory location which contains an address pointing to a piece of executable code. In a Direct Threaded Code Forth, the execution address points directly to executable code. Because cmForth is implemented on a Forth engine, which has very efficient subroutine call/return mechanism, it does not distinguish high level words from low level code words. All words contain executable code. A colon words contains a list of executable code, some of them are machine code, and some of them are subroutine calls, which are essentially 15 bit addresses with the MSB cleared. Address lists in Indirect Threaded Forth are replaced by lists of subroutine call instructions, which can be mixed with other machine instructions. In this directly threaded system, one layer of address referencing is eliminated. Speed is raised while code size is reduced.
2.2. Dual Vocabulary Structure In conventional Forth systems, we need a special class of immediate words to serve as compiler directives. In a colon definitions, words are compiled into a list of addresses, which will be executed when the colon word is executed. Immediate words are not compiled, but are executed immediately when they are encountered in a colon definition. The immediate words are needed to build structures in a colon definition. Mr. Moore got rid of the immediate words by putting them in a special vocabulary COMPILER. Ordinary words are placed in the FORTH vocabulary. When we compile a colon definition in cmForth, the COMPILER vocabulary is searched first. Words found in the COMPILER vocabulary are executed, not compiled. If a word is not found in the COMPILER vocabulary, then the FORTH vocabulary is searched. Words found in FORTH vocabulary are compiled into the new definition. A separate COMPILER vocabulary is very useful in cmForth, because many Forth words can be compacted into one machine instruction. These smart words must know how to compile itself optimally in a colon definition, while still exhibit the correct behavior when they are interpreted. The two distinctly different behaviors are best encoded in two separated vocabularies with the same name. Using immediate words to specify only the compiling-time behavior is cumbersome.
2.3. Optimizing Compiler Mr. Moore developed a very simple technique to optimize compiled code. He selected a few classes of words which are most susceptible for optimization. These words examine the previous word just compiled to see if the current word can be compacted into the previous word by modifying its bit pattern. The following cases are good candidates for optimization: . Shifts can be merged into ALU words . Stack operations can be merged into ALU words . ALU operations can be merged into stack words . ALU operation can be merged into memory words . Return can be merged into ALU and memory words . Short colon words can be expanded to machine code . Short literals can be compiled as machine code
Optimizing compiler is not included in the kernel of cmForth, but exists as an application which can be loaded when needed.
2.4. Meta-Compiler The meta-compiler in cmForth is designed so that it will recompiled itself and build its core image in the upper memory. User can add applications to the cmForth kernel to build his ROM based applications through this meta-compilation. The meta-compiler is very interesting in its brevity and versatility. The technique Mr. Moore used is to switch the dictionary pointer between the regular Forth dictionary in lower memory and the target dictionary in higher memory. Target words are compiled into the target dictionary, while compiler utility words can still be compiled to the regular Forth dictionary. Target words are still linked to the FORTH and COMPILER vocabularies in the host dictionary. After the entire target dictionary is completed, target words are trimmed off the regular vocabularies and re-linked into the FORTH and COMPILER vocabularies in the target system. The user can optionally add an offset to all the addresses compiled into the target dictionary so that words in the target dictionary can still be executed by the host. This feature makes it easy to test the target system in the host before committing the target image to EPROM's. If this dictionary offset is initialized to 0, the resulting words in the target vocabulary cannot be tested interactively, because the execution addresses in the target words refer to addresses in the target system, not to addresses in the host system. This meta-compiler consists of 9 words, and the source code is only one screenful. cmForth was designed to facilitate meta-compilation.
2.5. Native Control Structures A very important ingredient of modern high level languages is the control structures. The Forth engines like NC4000 and RTX2000 have machine instructions directly supporting the control structures. However, these machine instructions are buried under cmForth, because the compiler has the intelligence to translate high level structure commands and build the proper code sequences. It is appropriate to summarize the machine instructions and the high level structure words using them:
CALL Call the subroutine whose address is in the lower 15 bits of this instruction. The MSB bit is 0. Call instructions are compiled in colon definition unless the optimizer substitutes them with machine code. RETURN Not an instruction, but Bit 5 in an ALU or memory instruction. BRANCH Jump to an address specified by the address field in the instruction. In NC4000, Bits 0 to 11 contains the offset address in the current 4 Kword page. In RTX2000, Bits 0 to 8 contains the offset, and Bits 9 and 10 contains a page specifier, which allows jumping to current page, next page, previous page, or page 0. BRANCH is compiled by ELSE, REPEAT, and AGAIN. ZBRANCH Jump to an address specified if the top of data stack is 0; otherwise a Noop. The address specifier is the same as in BRANCH. ZBRANCH is compiled by IF, WHILE and UNTIL. LOOP Test the top item on the return stack. If it is zero, pop it off the return stack and continue executing the next instruction. ÊIf it is not zero, decrement it and jump to the address specified in this instruction. Address specifier is the same as in BRANCH. LOOP is compiled by NEXT. REPEATS Repeat the next instruction if the count on top of the return stack is not zero. The count is also decremented. If count is zero, pop the return stack and continue executing the following instruction. REPEATS is compiled by TIMES or OF(.
cmForth does not support the traditional DO-LOOP structures in most Forth systems. The FOR-NEXT loop is a more efficient loop structure with one decrementing counter. The DO-LOOP structure can be synthesized with much execution overhead. The REPEATS instruction is used frequently to implement complicated math operations, like shifts, multiply, divide and square root, from appropriate math step instructions. It is also useful in repeating auto-indexing memory instructions.
2.6. Math Step Instructions Regular arithmetic and logic instructions can be completed in a single machine cycle. Complicated math functions like multiply and divide must be synthesized from simpler instructions. To facilitate the construction of these often used functions, Mr. Moore added some logic in hardware to implement several special step instructions to serve these needs. The step instructions include the following:
*' Multiply step. *" Signed multiply step. The last step in a signed multiplication operation. *F Fraction multiply step. /1' First divide step. /' Divide step. /" Last divide step. S' Square root step. S" Last square root step.
High level multiply, divide, and square root operations are defined using the above step instructions. REPEATS is used to repeat an instruction many times. Users generally do not have to worry about the step instructions, because they are implemented in the high level math words. In RTX2000, a single cycle hardware multiplier is built-in. RTX cmForth uses the hardware multiplier to implement multiplication functions, which are much faster than the stepped multiplications.
2.7. Multiple Function Single Cycle Instructions There are six major classes of instructions in NC4000 and RTX2000: CALL, BRANCH, ZBRANCH, LOOP, ALU, and Memory instructions. In the first four classes, address specifier occupies a large field in the instruction and the instructions are dedicated to control the execution flow. In the ALU and memory instructions, there are many fields in an instruction which operate different sections of the CPU in parallel. Thus many Forth words can be encoded in various fields and executed together in a single machine cycle. In an ALU instruction, there are an ALU field specifying the arithmetic/logic operations to be performed on the top two items on the data stack, a stack field specifying how the data is routed among the ALU, other registers, and the stacks, a return bit to do an optional subroutine return, and a shift field allowing data from the ALU unit to be shifted before storing them back to the top stack register. Very complicated and powerful instructions can thus be constructed and executed in a single machine cycle. The optimizing compiler in cmForth is encoded in all the Forth ALU and stack words. The basic strategy is to examine the instruction last compiled to the dictionary. If the last instruction has an empty field into which the current instruction can be fitted in, then there is no need to compile a new instruction. The new function is simply added to the last instruction and the compiling process can proceed. In the memory instruction, the ALU field, the stack field, and the return bit can all be used. Data fetched from memory can be operated on the fly. The least significant 5 bits in the instruction form a literal field, in place of the shift field in the ALU instruction. The literal field can specify a 5 bit small literal, an offset to the user area, an internal register, an external register on the ASIC bus, or an index for memory auto-indexing. Although the optimizing compiler tries to do its best in compacting functions into a single instruction, it may not succeed in every occasion. For time critical routines in which every machine cycle must be pushed to do the most, a user can hand code all or portions of it to further enhance the performance.
2.8. Auto-Indexing Memory Instructions In a normal memory fetching or storing operation, an address must be pushed on the top of the data stack. After the memory operation, the address is removed. When we need to access data arrays or strings in memory, it is cumbersome to push consecutive addresses on the stack. NC4000 and RTX2000 have auto-indexing memory instructions which allow the user to read and write consecutive memory locations very conveniently. The following instructions access consecutive words in memory:
n @+ Fetch data from memory pointed to by the top item on the stack. Data obtained is pushed below the top item. The top address is incremented by n, which is between 0 and 31 inclusive. Repeating this instruction pushes an array of data on the data stack below the address on top. n @- Same as @+. The address is decremented by n. n !+ Pop the second item on the data stack and store it in the memory pointed to by the address on top of the stack. The address is then incremented by n. n !- Same as !+. The address is decremented by n.
In NC4000, n above is a word count because NC4000 is a word machine. However, in this cmForth for RTX2000, the CPU is configured as a byte machine. n thus is a byte count. To access consecutive words, n must be 2. Because of the byte addressibility of RTX2000, cmForth also includes the following instructions to access byte arrays:
n C@+ Fetch a byte from memory and increment the address by n. n C@- Fetch a byte from memory and decrement the address by n. n C!+ Store a byte to memory and increment the address by n. n C!- Store a byte to memory and decrement the address by n.
2.9. Terminal Interface and Serial Disk
A useful computer system must include human interface and an interface to at least one mass storage device. cmForth supports these interfaces with minimal hardware. In the NC4000 implementation, a serial RS-232 port is implemented using two bits in the X-port. In the RTX2000 implementation, the serial port is implemented using the Boot pin as the transmitter and an interrupt line as the receiver. The baud rate of this serial communication is determined during boot-up. After power-up or manual reset, cmForth falls into a waiting loop, expecting that the user types a 'B' or 'b' on his terminal. cmForth determines the baud rate from this character and starts the text interpreter, ready to execute commands sent through the serial line. cmForth can reliably transmit and receive at a rate up to 38.4 Kbaud. Most terminal and host computer can keep up to about 9600 baud. Mass storage is absolutely necessary for serious programming. It is costly to build a disk drive for the small single board computers based on NC4000 or RTX2000, although many people, including Mr. Moore, did build systems with disk drives and other fancy peripherals. The simplest solution provided in cmForth is called a 'Serial Disk', which sends disk access requests through the serial RS-232 line. It assumes that there is a host computer at the other end to provide the requested services. The serial disk protocol is very simple. Normal serial transmission uses the ASCII character set. When cmForth needs to read or write to disk, it sends an ASCII NUL followed by two bytes carrying a block number. If the MS bit in the block number is zero, cmForth is requesting to read that block. The host must locate that block and send 1024 bytes of data to cmForth. If the MSB in the block number is one, cmForth requests to write to that block, and it sends 1024 bytes immediately to the host. cmForth does not use any other handshake signals. It is the responsibility of the host to provide timely disk services. Many users had contributed the host interface programs for many different host computers, including IBM-PC and Apple Macintosh. Source code and data are generally maintained as files in the host computer. The host computer must provide data to cmForth in the block format. ¡@ |