The Ulam spiral (Python and PyPy)

(January 2014)

For the TL;DR crowd:

Yesterday, I found this Youtube video about the Ulam prime spiral ; I read the relevant Wikipedia entry, and reliving my first programming memories from childhood, reimplemented it (this time in Python, not BASIC :‑)

I then tried running it for big grids, and using PyPy, saw it execute 50 times faster - from 70 seconds down to 1.4 seconds.

I really loved math, as a kid. It was like an oasis, in a vast desert of subjects that I cared nothing about. I always kept it last in my "study queue of the day", as the reward waiting for me after I finished all the rest. Sometimes I couldn't take it any more - and bored to tears from intellectually stimulating subjects like Geography and On being a citizen (don't ask), I would just... fall off the study wagon, towards the only thing that actually made sense to me: math.

To be accurate, it wasn't just math. I appreciated the sciences that studied deterministic processes, stuff that could be measured by numbers and be modelled - and predicted! - by mathematical constructs. For that reason, I loved physics almost as much as math - a crossing of domains, forming a bridge between the beauty of mathematics and the seemingly chaotic natural world. I really liked that.

And then, at the age of 13, I got to meet the ultimate bridge between mathematics and reality, through a friend's magazine (called, clairvoyantly enough, 'Computers for everyone'). That particular issue contained a BASIC program that calculated primes. I didn't know anything about computers, I was only vaguely aware of their existence... but the mathematical nature of the article, oh, that I could definitely understand.

I was hooked. This was magic!

A year later, at the age of 14, one of my brothers gave me the best present ever: he bought a Sinclair ZX Spectrum 48K... and my life found purpose: I wanted to code - more than anything else in the world.

Memory trigger

Decades later, that young kid still lives in me. The rewards are now far easier to get ; I just lie on my couch, and use my tablet to enjoy MinutePhysics, Veritasium and Numberphile (which is a Greek-derived word, translating as "someone who loves numbers").

While visiting Numberphile yesterday, I found a video about prime spirals. It awoke that long forgotten memory of my first contact with programming - calculating primes! As I was watching the video, I knew I would not be able to resist writing the code to draw these prime spirals, and verify them for myself.

Have a look - or read the relevant Wikipedia entry:

Prime spirals (from the Numberphile Youtube channel)

An unexpected pattern of diagonals emerges when arranging all the natural numbers in a spiral - prime numbers seem to 'prefer' specific diagonals.

Coding, decades later

As soon as I finished watching the video, I stood up from my couch, sat at my desktop, and launched my Python/VIM environment. For added bonus points, I wanted to do this using no external libraries, and via exactly the same algorithm that I first saw all these years ago, in BASIC. Yes, I still remember the code, and no, it wasn't an array-marking sieve.

The only luxury I allowed myself, was the application of the corresponding functional (instead of imperative) looping forms. Translation: the nasty C-style loops would be morphed into yield, itertools and generator expressions.

import math
import itertools

def primes():
    yield 2
    primesSoFar = [2]
    for candidate in itertools.count(3, 2):
        for prime in (i for i in primesSoFar if i <= int(math.sqrt(candidate))):
            if 0 == candidate % prime:
                break
        else:
            primesSoFar.append(candidate)
            yield candidate

def main():
    for p in primes():
        if p > 100:
            break
        print p,

if __name__ == "__main__":
    main()

The code is quite simple: the purpose of the function is to yield primes, one after the other - so it stores them as it finds them (inside primesSoFar), and checks each candidate natural number to see if it can be divided by the primes computed so far (checking up to the square root of the candidate, which, if you think about it, is the upper limit of any potential factor).

If you've never seen yield before, go read my blog post about it and come back afterwards - believe me when I say, it's a construct worth knowing about. As you can see, I am also using itertools.count to start counting from 3, in steps of 2.

So, getting the list of primes: check.

Wait a second. That wasn't really the target though, was it? No, what I wanted was to generate the prime spiral ; to generate one pixel for each natural number, black or white, depending on whether the number is a prime or not.

def primeSpiralPixels():
    yield 255  # 1 is not prime, so return white
    yield 0    # 2 is prime, so return black
    primesSoFar = [2]
    for candidate in itertools.count(3):  # start from 3, up to infinity...
        for prime in (
                p for p in primesSoFar if p <= int(math.sqrt(candidate))):
            if 0 == candidate % prime:
                yield 255  # non-prime, return white
                break
        else:
            primesSoFar.append(candidate)
            yield 0  # prime, return black

This mild refactoring provided what I needed - an infinite supply of black and white pixel intensities (with intensity result 0 corresponding to black, and intensity 255 to white). Basically, the refactoring was about itertools.count losing its step of 2, and having not just the primes, but also the non-primes yield a result (255 - white).

Time to implement the spiral arrangement of this infinite stream of pixel intensities.

Hmm... how?

Let me think...

Drawing in a spiral means that one moves in circles: first towards the right, then up, then left, then down, and then all over again. We switch directions, don't we? When?...

When we find that the block towards the next direction in our cycle is empty.

OK, clear enough... but how do we implement this cycling process?

Well, we're in Python: itertools offers a cycle, which can be used to keep track of our (dx,dy) vectors: that is, the current move direction, and the future move direction. We will call that one the check direction, since it is the direction we will switch to, when we check the corresponding tile and find it to be empty:

# The square we will draw will be rows x rows 
rows = 202 if len(sys.argv) == 1 else int(sys.argv[1])

# The 'screen' we will write to, is a dictionary,
# with the (x,y) coordinate of the pixel as key,
# and as value, the pixel's intensity (0=black, 255=white)
#
# Why am I using a dict and not e.g. a NumPy 2D array here?
# As I said, I wanted to write this without external dependencies,
# and also prefer the matrix[x,y] form to the m[x][y] (i.e. the
# list of lists form). 'x,y' is a constructor form of the (x,y) tuple,
# which we can use to index a dictionary with. This also means,
# that since we are not using a real 2D array, the order of indices
# has no bearing on speed (row-major vs column-major).
screen = {}

# The (dx,dy) directions we will cycle through: right,up,left,down
moves = itertools.cycle([(1, 0), (0, -1), (-1,  0), (0, 1)])

# Setup: the check direction is one step ahead of the move one!
check = moves.next()
move, check = check, moves.next()

# Start from the middle of the board:
currentPosition = [int(rows / 2), int(rows / 2)]

# Spiral away!
for c in itertools.islice(primeSpiralPixels(), rows * rows):
    screen[currentPosition[0], currentPosition[1]] = c
    currentPosition[0] += move[0]
    currentPosition[1] += move[1]
    checkPosition = (
        currentPosition[0] + check[0],
        currentPosition[1] + check[1])
    if checkPosition not in screen:
        move, check = check, moves.next()

itertools are used here to...

cycle through the 4 directions, keeping the check direction one step ahead of the move direction:
```
move, check = check, moves.next()
```
...and to islice the infinite list of pixel values returned by primeSpiralPixels, to get the pixels necessary for a rows x rows grid.

A mini-class could also be used here, equipped with an addition operator, which would clear up the inner loop further - into something like this:

for c in itertools.islice(primeSpiralPixels(), rows * rows):
    screen[currentPosition] = c
    currentPosition += move
    checkPosition = currentPosition + check
    if checkPosition not in screen:
        move, check = check, moves.next()

This is left as an exercise for the reader :‑)

And that was it.

The screen dictionary filled up with the pixel data, so I could now save it in any image format I wanted. To abide by my own rules - no external dependencies - but also quickly test what I did, I saved the data in a .pgm file. PGM is a simple, grayscale format: it expects width x height bytes (so you now see why I used 0 for black, and 255 for white - it's what .pgm expects!), with a simple header up front to provide the image dimensions:

image = open("ulam.pgm", "w")
image.write("P5\n" + str(rows - 1) + " " + str(rows - 1) + "\n255\n")
for row in xrange(1, rows):
    for col in xrange(1, rows):
        image.write(chr(screen[col, row]))

I know, I know - this is probably the lamest and slowest way to save an image, ever :‑)

And there it was - a prime spiral of my own making:

See? There they are, those prime diagonals...

What about larger grids? Is this naive implementation fast enough?

Optimizing?

We live in the days of CPUs with large caches - and one of the things that clashes with proper cache usage is increased memory usage. Have we sinned in that department?

Sure we have - we use Python, for one. An interpreter.

We also store the image data in a dictionary, instead of a 2D array. Tut-tut.

And to top it all off, even though I can't say the magic lazy word here (for fear of being lynched by a Haskell mob, :‑) we are not exactly optimal in that department, either... Use of yield and functional-style generators clears up the code, sure - but a mutating imperative-style for-loop terrorist is far more efficient, both instruction- and data-cache-wise.

But first things first - before we optimize, we need to be able to monitor the executions, to see a percentage completion indicator. We know the target number of pixels from the start, so this...

pixelsDone = 0
oldPct = 0
print "Generating picture...     ",
for c in itertools.islice(primeSpiralPixels(), rows * rows):
    ...
    pixelsDone += 1
    newPct = 100 * pixelsDone / (rows * rows)
    if newPct != oldPct:
        sys.stdout.write("\b\b\b\b%3d%%" % newPct)
        sys.stdout.flush()
        oldPct = newPct

...this should do the trick. The \b backtracks the cursor to the left, and thus new percentage reports overwrite the previous ones. We will now know during execution how far away we are from completing the picture.

Let's try this out with a larger grid:

bash$ time ./primeSpirals.py 500

Generating picture... 100%
Done! Now go open 'ulam.pgm' with your picture viewer (e.g. feh, gqview, etc)...

real    1m10.755s
user    1m10.703s
sys     0m0.017s

Hmm, 70 seconds.

I could switch to hyperdrive, and as I did with other problems write an imperative implementation in C. The speedup would probably be tremendous: compiled instead of interpreted, a 2D array instead of a dictionary for screen, array-modulo-based access of moves instead of itertools.cycle, much improved CPU cache utilization, etc...

But before I go all Terminator on the code, I better try for some low-hanging fruit first - namely, use PyPy:

bash$ # Run in good old CPython...

bash$ time ./primeSpirals.py 500

Generating picture... 100%
Done! Now go open 'ulam.pgm' with your picture viewer (e.g. feh, gqview, etc)...

real    1m10.755s
user    1m10.703s
sys     0m0.017s

bash$ # Checksum the generated image, to be sure PyPy doesn't break things

bash$ md5sum ulam.pgm 
4a7c62d1becfad1580de8e549417052e  ulam.pgm

bash$ # Run in PyPy...

bash$ time pypy ./primeSpirals.py 500
Generating picture... 100%
Done! Now go open 'ulam.pgm' with your picture viewer (e.g. feh, gqview, etc)...

real    0m1.339s
user    0m1.300s
sys     0m0.037s

bash$ # Checksum matches?

bash$ md5sum ulam.pgm 
4a7c62d1becfad1580de8e549417052e  ulam.pgm

Wait - what just happened?

PyPy runs this code 50 times faster than CPython!

And I made sure (via MD5 checking) that the generated picture was indeed the same.

OK, optimizing in this case is surely not what I am used to. Way to go, PyPy!

...

I lie back on the couch, and get back to watching more Numberphile videos - oh, there's this "pebbling a chessboard" that begs for a Javascript interface... :‑)

Index

Updated: Sat Oct 8 11:41:25 2022