submit to programming reddit

(July 2021)

Fork me on GitHub

It was raining hard, a week ago.

And what could you possibly do on a rainy Saturday afternoon?

Well... I was inspired by a series of articles on Hackaday to...

...make a Forth interpreter/compiler from scratch...
...then put it inside a 1.5$ Blue Pill microcontroller...
...then repeatedly squash it until it fits inside an Arduino UNO...
...i.e. inside 2K of RAM!

I haven't done anything even remotely close to this in decades.
I loved building it :-)

The rainy afternoon turned into a week-long hackfest (was looking forward every day to the post-work FORTH-tinkering in the afternoon...)

The result: a tiny, mini-Forth. In portable C++ :-)
It has...

Here's a recording of it in action:

...and another one, where a small Forth program blinks the Arduino LED:

ArduinoSTL and Valgrind / AddressSanitizer checks

I meant it when I said "portable" - the source code compiles as-is for both native and Arduino builds.

Another part of my reasoning was that in addition to targeting multiple platforms (e.g. BluePill and Arduino) I wanted to be able to use Valgrind and AddressSanitizer to detect - in the host! - any issues I have with my memory handling. And just as important, debugging with GDB in the native host is much easier/faster.

Since I had embedded targets in mind, I tried ArduinoSTL - but it was too wasteful memory-wise. It also made the build process significantly slower. I therefore built my own memory pool, as well as list, tuple and string-like C++ templates. It was a nice challenge, re-inventing a tiny C++ STL...

And I understand STL a lot better now, after building small pieces of it myself :-)

Simulation / Debugging

I setup simulation via simavr. This tremendously improved my developing speed, since a simulator spawns and runs much faster than the real board. Due to the code being portable, debugging took place mostly in the host GDB; and after Valgrind and AddressSanitizer gave their blessing, I usually found out that the simulator (and the real board) worked fine as well.

BluePill vs Arduino UNO

Thanks to ArduinoSTL, I quickly reached the point of running inside the BluePill. The 1.5$ mini-monster has 10 times more SRAM than an Arduino UNO; so in a couple of days, I had a working branch.

Before the Arduino UNO, I run my mini-FORTH inside the 1.5$ Blue Pill

Before the Arduino UNO, I run my mini-FORTH inside the 1.5$ Blue Pill

But as said above, that wasn't nearly enough to make it work in my Arduino UNO. That required far more work (see below).

As for the BluePill, I should note that, as in all my other embedded targets, I prefer a development workflow that is based on normal bootloaders (not on programmers). I therefore burned the stm32duino bootloader on the BluePill, which allowed me to easily program it in subsequent iterations via the USB connection (and a simple make upload).

The same USB connection would then function as a serial port immediately afterwards - allowing me to interact with the newly uploaded Forth in the BluePill.

The screenshot below is from a tmux: on the left, the output from make upload; and on the right, I used picocom to interact with my mini-Forth over the serial port:

Compiling, uploading and testing

Compiling, uploading and testing

Memory - the final frontier

That covered the first two days.

But when I tried compiling for the Arduino UNO, I realised that the ArduinoSTL was not enough. I run out of memory...

So I built my own mini-STL, and tightly controlled all memory utilisation.

I also used macro-magic to move all strings to Flash at compile-time (see dprintf in the code)... And saved memory everywhere I could, re-using error messages across various operations - and storing the entire array of native operations in Flash.

Nothing flexes your coding muscles as much as optimising; whether it is for speed or for space. See the implementation of ".S" for example, where the (obvious) stack reversal code is also the most wasteful... Changing it to a slower but memory-preserving algorithm allowed me to use ".S" even when almost all my memory is full.

C++ vs C

I know that many developers hate C++. I even wrote a blog post about it.

And I understand why - they see code like this...

#include "mini_stl.h"

template<class T>
typename forward_list<T>::box *forward_list<T>::_freeList = NULL;

...and they start screaming - "what the hell is that", "incomprehensible madness", etc.

But there are very important benefits in using C++ - and templates in particular. You write less code, with no additional run-time or memory overhead compared to C, and with a lot more compile-time checks that watch your back (for things that would otherwise blow up in your face).

See my Optional for example, that emulates (badly) the optional type of Rust/OCaml/F#/Scala/Kotlin etc. It forces you to check your returned error codes:

Optional<int> Forth::needs_a_number(const __FlashStringHelper *msg)
    if (_stack.empty())
        return error(emptyMsgFlash, msg);
    auto topVal = *_stack.begin();
    if (topVal._kind == StackNode::LIT)
        return topVal._u.intVal;
        return FAILURE;

You can't "forget" to check the potential for a failure coded inside your returned value - because your code has to "unwrap" it. I could have done this better, but I chose to implement it via simple tuples (this was a one-weeks-afternoons hack, after all :-)

As for the template "magic" incantation above - it is true magic: My forward_list template is using free-lists to store the pop_front-ed elements and reuse them in subsequent allocations. I wanted these free-lists to be global (i.e. static members) because lists of the same type must re-use a single, commonly-shared free-list. The magic spell tells the compiler I want to instantiate these globals once, for each type T that I use in any lists in my code.

My Forth test scenario - including a FizzBuzz!

Yep, FizzBuzz - we are fully Turing complete. And would surely pass Joel's interview :-)

." Reset... " RESET
." Check comments... " \ Yes, we support the new-style comments :-)
." Computing simple addition of 3 + 4... " 3 4 + .
." Is 1 = 2 ?... " 1 2 = .
." Is 1 > 2 ?... " 1 2 > .
." Is 1 < 2 ?... " 1 2 < .
." Define pi at double-word precision... " : pi 355 113 */ ;
." Use definition to compute 10K times PI... " 10000 pi .
." Check: 23 mod 7... " 23 7 MOD .
." Defining 1st level function1... " : x2 2 * ;
." Defining 1st level function2... " : p4 4 + ;
." 2nd level word using both - must print 24... " 10 x2 p4 . 
." Defining a variable with value 123... " 123 variable ot3
." Printing variable's value... " ot3 @ .
." Defining The Constant (TM)... " 42 constant lifeUniverse
." Printing The Constant (TM)... " lifeUniverse .
." Setting the variable to The Constant (TM)... " lifeUniverse ot3 !
." Printing variable's value... " ot3 @ .
." Setting the variable to hex 0x11... " $11 ot3 !
." Printing variable's value... " ot3 @ .
." Defining helper... " : p5 5 U.R . ;
." Defining 3 times loop... " : x3lp 3 0 DO I p5 LOOP ;
." Calling loop... " x3lp
." Defining loop calling loop 2 times... " : x6lp 2 0 DO x3lp LOOP ;
." Nested-looping 2x3 times... " x6lp
." Inline: " : m 3 1 DO 3 1 DO CR J p5 I p5 ." = " J I * p5 LOOP LOOP ;
." Use inline loops with two indexes... " m
." Make multiples of 7 via DUP... " : m7s 10 0 DO DUP I * . LOOP DROP ;
." Print them and DROP the 7... " 7 m7s
." Reset... " RESET
\ Time for Turing completeness...
." Let's do Fizz-Buzz! " \ Turing Completeness check...
\ fizz ( n -- 0_or_1 n )
." Define fizz... " : fizz DUP 3 MOD 0 = IF ." fizz " 1 ELSE 0 THEN SWAP ;
\ buzz ( n -- 0_or_1 n )
." Define buzz... " : buzz DUP 5 MOD 0 = IF ." buzz " 1 ELSE 0 THEN SWAP ;
\ emitNum ( 0_or_1 0_or_1 n -- )
." Define emitNum... " : emitNum ROT ROT + 0 = if . ELSE DROP THEN ;
\ mainloop ( n -- )
." Define mainloop... " : mainloop fizz buzz emitNum ;
\ fb ( -- )
." Define fizzbuzz... " : fb 37 1 DO I mainloop LOOP ;
." Run it! " fb
." Report memory usage... " .S


I am a strong believer in automation. The final form of my Makefile therefore has many rules - e.g. make arduino-sim - that automate various parts of the workflow.

Here's what they do:

Another example of automation - the complete test scenario shown in the previous section, is not just an example in the documentation; it is extracted automatically from this text and fed into the Valgrind and AddressSanitizer tests... and also into the Python testing script that sends the data to the board in real-time.

DRY, folks.


I thoroughly enjoyed building this. I know full well that Forths are not supposed to be built in C++; they are supposed to be built in assembly, and also, utilise the Flash to store the user-compiled code at run-time.

But that wasn't the point of this - the point was to have fun and learn Forth.
What better way to learn a language... than to actually implement it! :-)

And... as a child of the 80s... I now know first-hand what Jupiter Ace was about :-)

Fork the code, and enjoy tinkering with it!

profile for ttsiodras at Stack Overflow, Q&A for professional and enthusiast programmers
GitHub member ttsiodras
Back to index  My CVLast update on: Thu Jul 8 05:36:28 2021

The comments on this website require the use of JavaScript. Perhaps your browser isn't JavaScript capable or the script is not being run for another reason. If you're interested in reading the comments or leaving a comment behind please try again with a different browser or from a different connection.