Ten Years of AI Pictures

In 2015, it was all DIY.

Ten Years of AI Pictures
Iconic Picasso, 14 Dec 2015

I posted a small picture on my Instagram feed on the 18th of November in 2015. A decade ago, as of next week.

This image, “Suburbia,” was one of the first I’d created using my then-newest computer, which I’d assembled specifically to make these sorts of pictures using software I'd written for that purpose. The computer sported the fastest Intel CPU I could find, a large-for-the-day SSD drive in addition to the usual system hard drive, and a fresh NVIDIA GTX Titan X graphics card to accelerate matrix math. Not long after, it contained two such graphics cards. The machine and its accessories occupied a wide corner of my home office, crowding-out the other smaller computers from half of my long desk. It had a big fan and size-XL power supply that would audibly vibrate in bursts when the GPUs were calculating intensely, which was nearly all the time. Due to overheating issues I often ran it with the case-side open. The machine’s hostname was “gerald,” for reasons I don’t recall. My (future) son-in-law Josh took a look, a listen, and immediately nicknamed it “Skynet.”

“Suburbia” was made using machine-learning techniques, cribbed awkwardly from ideas in the paper “Neural Style Transfer.” Even with gerald/Skynet's GPUs, the process was very slow – there weren’t yet standard fast libraries for developing what was then commonly referred to as “deep learning.” No Theano, pytorch, Keras or TensorFlow. Only tools like numpy and BLAS.

The programmers reading this may nod knowingly, the non-programmers might instead nod-off – so I won’t detail the details except to explain that “Suburbia” was made by teaching a program to make estimated re-creations of aerial photos based on a database of thousands of other aerial photos, and then to apply those learned patterns to images what were not aerial photos - in this case, a portrait. I imagined it all as a sort of layered collage where source prints were not just pasted together, but almost liquified and their carefully-stirred fragments then allowed to cool and gel into a totally new dessert.

Each image generation required hundreds of hours of preparatory computer training on my image collection(s) before even starting to make new pictures, and each generated picture could in turn take anywhere from 20 minutes to 3 hours to produce.

Collecting the source aerial imagery was the most time-consuming, had the biggest effects on the results, and unlike the calculation, something that needed to be done “by hand,” browsing public sources, web sites, etc., reformatting or excerpting to fit into my (large but also not so large) computer capabilities.

The results, after weeks of effort and long hours staring at various messes or crash logs, started to make themselves known around the end of 2015: What I knew was a new kind of picture, never seen before. Low-res 512x512, a little blurry, and when posted online, entirely ignored by the rest of the world.