(July 2021)

Fork me on GitHub

It was raining hard, a week ago.

And what could you possibly do on a rainy Saturday afternoon?

Well... I was inspired by a series of articles on Hackaday to...

...make a Forth interpreter/compiler from scratch...
...then put it inside a 1.5$ Blue Pill microcontroller...
...then repeatedly squash it until it fits inside an Arduino UNO...
...i.e. inside 2K of RAM!

I haven't done anything even remotely close to this in decades.
I loved building it :-)

The rainy afternoon turned into a week-long hackfest (was looking forward every day to the post-work FORTH-tinkering in the afternoon...)

The result: a tiny, mini-Forth. In portable C++ :-)
It has...

Here's a recording of it in action:

...and another one, where a small Forth program blinks the Arduino LED:

ArduinoSTL and Valgrind / AddressSanitizer checks

I meant it when I said "portable" - the source code compiles as-is for both native and Arduino builds.

Another part of my reasoning was that in addition to targeting multiple platforms (e.g. BluePill and Arduino) I wanted to be able to use Valgrind and AddressSanitizer to detect - in the host! - any issues I have with my memory handling. And just as important, debugging with GDB in the native host is much easier/faster.

Since I had embedded targets in mind, I tried ArduinoSTL - but it was too wasteful memory-wise. It also made the build process significantly slower. I therefore built my own memory pool, as well as list, tuple and string-like C++ templates. It was a nice challenge, re-inventing a tiny C++ STL...

And I understand STL a lot better now, after building small pieces of it myself :-)

Simulation / Debugging

I setup simulation via simavr. This tremendously improved my developing speed, since a simulator spawns and runs much faster than the real board. Due to the code being portable, debugging took place mostly in the host GDB; and after Valgrind and AddressSanitizer gave their blessing, I usually found out that the simulator (and the real board) worked fine as well.

BluePill vs Arduino UNO

Thanks to ArduinoSTL, I quickly reached the point of running inside the BluePill. The 1.5$ mini-monster has 10 times more SRAM than an Arduino UNO; so in a couple of days, I had a working branch.

Before the Arduino UNO, I run my mini-FORTH inside the 1.5$ Blue Pill

Before the Arduino UNO, I run my mini-FORTH inside the 1.5$ Blue Pill

But as said above, that wasn't nearly enough to make it work in my Arduino UNO. That required far more work (see below).

As for the BluePill, I should note that, as in all my other embedded targets, I prefer a development workflow that is based on normal bootloaders (not on programmers). I therefore burned the stm32duino bootloader on the BluePill, which allowed me to easily program it in subsequent iterations via the USB connection (and a simple make upload).

The same USB connection would then function as a serial port immediately afterwards - allowing me to interact with the newly uploaded Forth in the BluePill.

The screenshot below is from a tmux: on the left, the output from make upload; and on the right, I used picocom to interact with my mini-Forth over the serial port:

Compiling, uploading and testing

Compiling, uploading and testing

Memory - the final frontier

That covered the first two days.

But when I tried compiling for the Arduino UNO, I realised that the ArduinoSTL was not enough. I run out of memory...

So I built my own mini-STL, and tightly controlled all memory utilisation.

I also used macro-magic to move all strings to Flash at compile-time (see dprintf in the code)... And saved memory everywhere I could, re-using error messages across various operations - and storing the entire array of native operations in Flash.

Nothing flexes your coding muscles as much as optimising; whether it is for speed or for space. See the implementation of ".S" for example, where the (obvious) stack reversal code is also the most wasteful... Changing it to a slower but memory-preserving algorithm allowed me to use ".S" even when almost all my memory is full.

C++ vs C

I know that many developers hate C++. I even wrote a blog post about it.

And I understand why - they see code like this...

#include "mini_stl.h"

template<class T>
typename forward_list<T>::box *forward_list<T>::_freeList = NULL;

...and they start screaming - "what the hell is that", "incomprehensible madness", etc.

But there are very important benefits in using C++ - and templates in particular. You write less code, with no additional run-time or memory overhead compared to C, and with a lot more compile-time checks that watch your back (for things that would otherwise blow up in your face).

See my Optional for example, that emulates (badly) the optional type of Rust/OCaml/F#/Scala/Kotlin etc. It forces you to check your returned error codes:

Optional<int> Forth::needs_a_number(const __FlashStringHelper *msg)
    if (_stack.empty())
        return error(emptyMsgFlash, msg);
    auto topVal = *_stack.begin();
    if (topVal._kind == StackNode::LIT)
        return topVal._u.intVal;
        return FAILURE;

You can't "forget" to check the potential for a failure coded inside your returned value - because your code has to "unwrap" it. I could have done this better, but I chose to implement it via simple tuples (this was a one-weeks-afternoons hack, after all :-)

As for the template "magic" incantation above - it is true magic: My forward_list template is using free-lists to store the pop_front-ed elements and reuse them in subsequent allocations. I wanted these free-lists to be global (i.e. static members) because lists of the same type must re-use a single, commonly-shared free-list. The magic spell tells the compiler I want to instantiate these globals once, for each type T that I use in any lists in my code.

My Forth test scenario - including a FizzBuzz!

Yep, FizzBuzz - we are fully Turing complete. And would surely pass Joel's interview :-)

." Reset... " RESET
." Check comments... " \ Yes, we support the new-style comments :-)
." Computing simple addition of 3 + 4... " 3 4 + .
." Is 1 = 2 ?... " 1 2 = .
." Is 1 > 2 ?... " 1 2 > .
." Is 1 < 2 ?... " 1 2 < .
." Define pi at double-word precision... " : pi 355 113 */ ;
." Use definition to compute 10K times PI... " 10000 pi .
." Check: 23 mod 7... " 23 7 MOD .
." Defining 1st level function1... " : x2 2 * ;
." Defining 1st level function2... " : p4 4 + ;
." 2nd level word using both - must print 24... " 10 x2 p4 . 
." Defining a variable with value 123... " 123 variable ot3
." Printing variable's value... " ot3 @ .
." Defining The Constant (TM)... " 42 constant lifeUniverse
." Printing The Constant (TM)... " lifeUniverse .
." Setting the variable to The Constant (TM)... " lifeUniverse ot3 !
." Printing variable's value... " ot3 @ .
." Setting the variable to hex 0x11... " $11 ot3 !
." Printing variable's value... " ot3 @ .
." Defining helper... " : p5 5 U.R . ;
." Defining 3 times loop... " : x3lp 3 0 DO I p5 LOOP ;
." Calling loop... " x3lp
." Defining loop calling loop 2 times... " : x6lp 2 0 DO x3lp LOOP ;
." Nested-looping 2x3 times... " x6lp
." Inline: " : m 3 1 DO 3 1 DO CR J p5 I p5 ." = " J I * p5 LOOP LOOP ;
." Use inline loops with two indexes... " m
." Make multiples of 7 via DUP... " : m7s 10 0 DO DUP I * . LOOP DROP ;
." Print them and DROP the 7... " 7 m7s
." Reset... " RESET
\ Time for Turing completeness...
." Let's do Fizz-Buzz! " \ Turing Completeness check...
\ fizz ( n -- 0_or_1 n )
." Define fizz... " : fizz DUP 3 MOD 0 = IF ." fizz " 1 ELSE 0 THEN SWAP ;
\ buzz ( n -- 0_or_1 n )
." Define buzz... " : buzz DUP 5 MOD 0 = IF ." buzz " 1 ELSE 0 THEN SWAP ;
\ emitNum ( 0_or_1 0_or_1 n -- )
." Define emitNum... " : emitNum ROT ROT + 0 = if . ELSE DROP THEN ;
\ mainloop ( n -- )
." Define mainloop... " : mainloop fizz buzz emitNum ;
\ fb ( -- )
." Define fizzbuzz... " : fb 37 1 DO I mainloop LOOP ;
." Run it! " fb
." Report memory usage... " .S


I am a strong believer in automation. The final form of my Makefile therefore has many rules - e.g. make arduino-sim - that automate various parts of the workflow.

Here's what they do:

Another example of automation - the complete test scenario shown in the previous section, is not just an example in the documentation; it is extracted automatically from this text and fed into the Valgrind and AddressSanitizer tests... and also into the Python testing script that sends the data to the board in real-time.

DRY, folks.


I thoroughly enjoyed building this. I know full well that Forths are not supposed to be built in C++; they are supposed to be built in assembly, and also, utilise the Flash to store the user-compiled code at run-time.

But that wasn't the point of this - the point was to have fun and learn Forth.
What better way to learn a language... than to actually implement it! :-)

And... as a child of the 80s... I now know first-hand what Jupiter Ace was about :-)

Fork the code, and enjoy tinkering with it!

profile for ttsiodras at Stack Overflow, Q&A for professional and enthusiast programmers
GitHub member ttsiodras
Updated: Sun Nov 19 23:06:10 2023