Stack-checking a program that will execute in orbit

(September 2011, posted on Reddit and Hacker News)
(October 2021, updated to use -fstack-usage)

Things change.

For those of us that were raised with Z80 CPUs, it is abundantly clear that we are living in "times of plenty"; most developers no longer have to think about things like "fitting into memory" or "having enough stack space". Moore's Law provided orders of magnitude of improvements in both CPU speeds and memory sizes; so big in fact, that resource-wasting virtual machines dominate our everyday software, and we don't even feel it.

There are always exceptions, however: platforms where you still need to be careful about memory and CPU usage. Especially when the device operates in a critical environment (e.g. in orbit), you need to be very careful.

For the last 5 years I've been developing code for the European Space Agency, in the context of projects like TASTE; the target platform is an embedded Leon processor, an open-source SPARC derivative specifically designed for operating in space. You really don't want to run out of memory there - "blue screens of death" are too expensive when the nearest ground station operator... is some 36.000 km below :‑)

Heap and Stack

First of all, you can't use heap - not when you're in orbit. Why? Because quite simply, if you do use heap, and malloc returns null... what do you do? You can't ask the user for help - there are no human operators aboard the satellite, and telecommands sent remotely need a functioning program that listens to what is sent; if the code runs out of memory while handling the incoming telecommands, well, you have a dead satellite.

And what about stack? Stack usage also grows and diminishes as functions call other functions and return back... A "stack overflow" means a crashed program. If you use recursive functions, it's impossible to predict how much stack space you'll actually need at runtime; and even without recursive functions, when spawning the application, you need to know the total stack the code will use, and reserve it up-front - being absolutely sure that you won't run out of stack.

So... how do you enforce the restrictions? How do you cope with the stack requirements?

Space companies use custom-made tools and procedures that address these challenges - but just for the fun of it, we can do some hacking of our own :‑)

Meet objdump

objdump displays information from object files. It is a very useful tool, capable of showing many kinds of information from your executables.

In our case, we will use it's ability to disassemble the ELF binary of our program:

bash$ cat > hello.c
main()
{
    int i=314159;
    printf("Hello, 10K times pi is %d\n", i);
}
(Ctrl-D)

bash$ gcc hello.c
bash$ objdump -d ./a.out
...
Disassembly of section .text:
...
08048384 <main>:
 8048384:       55                      push   %ebp
 8048385:       89 e5                   mov    %esp,%ebp
 804838a:       83 ec 20                sub    $0x20,%esp
 804838d:       c7 44 24 1c 2f cb 04    movl   $0x4cb2f,0x1c(%esp)
 8048394:       00
 8048395:       8b 44 24 1c             mov    0x1c(%esp),%eax
 8048399:       89 44 24 04             mov    %eax,0x4(%esp)
 804839d:       c7 04 24 70 84 04 08    movl   $0x8048470,(%esp)
 80483a4:       e8 0f ff ff ff          call   80482b8 <printf@plt>
 80483a9:       c9                      leave
 80483aa:       c3                      ret
 80483ab:       90                      nop

So, what do we see here?

Apparently, there's a lot of information we can gather, by simple text-processing of this output.

Functions start in lines of the form: ADDRESS <FUNCTIONNAME>:
Calls to other functions follow this pattern: ADDRESS: OPCODES call ADDRESS <FUNCTIONNAME>
Each function starts by reserving the space it needs on the stack: it lowers the stack pointer by a specific amount, which the compiler calculated to provide enough space for local variables and anything else that is to be placed on the stack, e.g.
```
ADDRESS OPCODES  sub $0x20,%esp
```
In fact, GCC is more creative, sometimes:
```
ADDRESS OPCODES  add $0xFFFFFF84,%esp
```
...so the stack pointer is decreased via an 'add' instruction, which adds a negative value. Update: Fabien Sanglard pointed me to the reason behind this; it's a size (opcode-wise) optimization.

This means that by catching these patterns in the output of objdump -d, we can...

learn what functions are called (and check if any of the "forbidden" ones are called!)
learn how much stack is used by each function (how much it lessens the stack pointer upon entry) [1]
learn what functions are called by each function - and therefore, form a call graph!

Note: there are better ways to obtain some of this information; for example, objdump offers a "-t" option, that provides only symbols and their offsets. However, by using "objdump -d", we can obtain all the information we want in one pass.

Scripting it in Python

We will be using regular expressions to match our three targets. Here they are:

...
import re
...
    functionNamePattern = re.compile(r'^(\S+) <([a-zA-Z0-9_]+?)>:')
    callPattern = re.compile(r'^.*call\s+\S+\s+<([a-zA-Z0-9_]+)>')
    stackUsagePattern = re.compile(r'^.*[add|sub]\s+\$(0x\S+),%esp')

The regular expressions mark their "targets" (ADDRESS, FUNCTIONNAME, AMOUNTOFSTACK, etc) within parentheses, so we can extract the matched portion easily in Python, with:

...
    match = functionNamePattern.match(line)
    if match:
        # print "Detected function:", match.group(1)
        functionName = match.group(1)

We apply these pattern matches to the lines generated by objdump:

    ...
    callgraph = {}
    stackUsagePerFunction = {}

    for line in os.popen("objdump -d \"" + sys.argv[1] + "\"").readlines():
        # Search for function start
        match = functionNamePattern.match(line)
        if match:
            functionName = match.group(1)
            callGraph[functionName] = set()  # this will store called func names
            foundStackUsage = False
        # Search for function calls
        call = callPattern.match(line)
        if functionName != "" and call:
            calledFunction = call.group(1)
            callGraph[functionName].add(calledFunction)
        # Search for the first stack lowering opcode
        if not foundStackUsage and functionName!="":
            stackMatch = stackUsagePattern.match(line)
            if stackMatch:
                # make sure we dont re-update the stackusage for this function
                foundStackUsage = True
                value = stackMatch.group(1)
                # sub    $0x46,%esp
                value = int(stackMatch.group(1), 16)
                # unfortunately, GCC may also write:
                # add    $0xFFFFFF86,%esp
                if value > 2147483647:
                    value = 4294967296-value
                stackUsagePerFunction[functionName] = value

And we now have two things:

a preliminary use of stack space per function, in the stackUsagePerFunction dictionary
a call graph, showing us which functions are called by each function (callGraph[functionName] is a set of unique function names)

At this point, detection of use of forbidden functions is very easy - we just look in the call graph, to see if some function is calling malloc, free, etc - and report it.

Measuring stack usage (and detecting cycles)

What about the actual stack usage of a function? It's not just how much space it reserves itself - we have to trace all the call chains that start from it, to see the "longest" chain in terms of stack usage.

But before we do that, we have to make sure that there are no "cycles" in the call graph - that there is no recursion whatsoever. Here's a standalone example of an algorithm that detects cycles:

#!/usr/bin/env python
import sys

callGraph = {
    'a': ['b', 'c'],  # a calls b and c
    'b': ['d'],      # b calls d
    'c': ['e'],      # etc
    'd': [],
    'e': ['a']       # cycle
#    'e': []         # no cycle
}


def SearchForCycles(path):
    print path
    node = path[-1]  # Take the last step in the chain so far
    neighbours = callGraph[node]  # fetch its "neighbours"
    for neighbour in neighbours:
        if neighbour in path:  # see if we've met one previously
            print "Detected cycle", path, "->", neighbour
            sys.exit(1)
        # add neighbour to the path, check the new, longer path
        SearchForCycles(path + [neighbour])

for node in callGraph.keys():
    SearchForCycles([node])

This is relatively simple code: we want to detect recursion in the call chains, so we... write a recursive function :‑) SearchForCycles checks a path for cycles, by:

taking the last node of the path formed so far
fetching the neighbours of this last node (from the call graph)
checking to see if any of these neighbours were already in the path - if so, a cycle has been found
otherwise, it recurses futher down in the chain, by calling itself.

And its results:

bash$ ./detectCycles.py 
['a']
['a', 'b']
['a', 'b', 'd']
['a', 'c']
['a', 'c', 'e']
Detected cycle ['a', 'c', 'e'] -> a

Once we've made sure that there are no cycles in the call graph, we can now safely accumulate stack usage per path: For each function, we report the max of the accumulated step costs in all call chains that start from it.

# Calculate the total stack usage of each function, taking into account who it calls
def findStackUsage(foo, stackUsagePerFunction, callGraph, cache={}):
    if foo in cache:  # memoization
        return cache[foo]
    if foo not in stackUsagePerFunction:
        return 0
    if foo not in callGraph or not callGraph[foo]:
        cache[foo]=stackUsagePerFunction[foo]
        return stackUsagePerFunction[foo]
    # the largest of the possible call chains
    totalStackUsage = max(
        ((stackUsagePerFunction[foo] + findStackUsage(x, stackUsagePerFunction, callGraph))
         for x in callGraph[foo]))
    cache[foo]=totalStackUsage
    return totalStackUsage

And now, we can finally see some results: invoked on my renderer, I get...

bash$ stackUsage.py renderer-2.x/src/renderer | c++filt  | tail -15
       788: Scene::renderAmbient(Camera const&, Screen&)
       872: Scene::renderRaytracer(Camera&, Screen&, bool)
      1012: Scene::renderGouraud(Camera const&, Screen&)
      1040: Scene::renderPhongAndShadowed(Camera const&, Screen&)
      1048: lib3ds_chunk_dump_info
      1088: Scene::renderPhong(Camera const&, Screen&)
      1108: lib3ds_node_read
      1124: texture_map_read
      1184: lib3ds_material_read
      1340: mdata_read
      1416: lib3ds_file_read
      1460: lib3ds_file_load
      1652: lib3ds_mesh_calculate_normals
      4080: Scene::load(char const*)
      4096: main

...so I can see exactly how much stack space is used by each function. More importantly, I can see whether cycles (recursion) exists:

bash$ stackUsage.py renderer-2.x/src/renderer | c++filt
...
Detected cycle: 
    ['lib3ds_file_insert_node', 'lib3ds_file_insert_node']
Detected cycle: 
    ['CountDepth(BVHNode*, int, int&)', 'CountDepth(BVHNode*, int, int&)']
Detected cycle: 
    ['Scene::PopulateCacheFriendlyBVH(Triangle*, BVHNode*, unsigned int&, unsigned int&)', 
     'Scene::PopulateCacheFriendlyBVH(Triangle*, BVHNode*, unsigned int&, unsigned int&)']
Detected cycle: 
    ['CountBoxes(BVHNode*)', 'CountBoxes(BVHNode*)']
Detected cycle: 
    ['CountTriangles(BVHNode*)', 'CountTriangles(BVHNode*)']
...

There - the script detects recursive functions. It also detects "deeper" cycles: here's cycle detection in a SPARC/Leon Ada binary:

bash$ stackUsage.py /path/to/leonBinaryMadeWithAda
...
Detected cycle: [
    '__gnat_raise_nodefer_with_msg', 
    '__gnat_last_chance_handler', 
    '__gnat_rcheck_19', 
    '__gnat_raise_program_error', 
    '__gnat_raise_nodefer_with_msg'
]
...

Mission accomplished :‑)

Final comments

This kind of functionality is usually part of far more mature tools than... quickly-hacked Python scripts. Don't use this to send your spaceship in orbit :‑)

[1] In fact, the latest versions of GCC support the -fstack-usage option, which creates stack usage per function (individually) upon compiling the code. Unfortunately, compilers for embedded systems usually take a while to... catch up, in terms of these features. Parsing the stack "decrease" at each function startup is for now, a good-enough workaround. Note that for SPARC/Leon compilers, the corresponding stack search pattern is... ADDRESS OPCODES save %sp, -AMOUNT, %sp, so the final version of the script uses separate regexps for each architecture.

Note that the calculated stack usage per function assumes a single thread running; If there are N threads spawned in our binary, then of course the stack usage scales correspondingly.

Finally, if you compile with debug info enabled (-g) you will get much better output - many symbols are not visible without it.

Update, 10 years later...

I found some time to improve the script - it now uses -fstack-usage data, and also copes with the same function appearing twice in the binary (which can happen, when you reuse old libraries and/or use static at file-scope, to isolate symbols inside compilation units...).

Download

The script lives and evolves in Github - here. You can just clone the repo, or download the script directly from here. You can also download the (10 years old) version of the script that works without -fstack-usage from here.

Index

Updated: Tue Jun 13 21:43:21 2023