An Explanation of DeepDream and Deep Neural Netowrks

In response to the question: It’s cool, but how is this what the computer sees?

Hello-

  There’s a short version, a long version, and a-bunch-of-research-papers version of the explanation. The short version is that I give a picture to an image-recognition algorithm, and tell it “whatever you recognize, enhance.”

The longer version is that it’s based on deep-neural-networks, which are a relatively new innovation in machine learning. Neural networks, and deep neural networks (DNNs) in particular, are a really important innovation because they enable computers to learn to do things “naturally”, instead of us having to describe every aspect of the task as with conventional programming. You and I have no problem recognizing a dog in an image, or telling a dog and a cat apart, or telling a jungle cat from a house cat. However, try describing what an image of a dog is like. It’s impossible. There’s no way anyone can write with code a dog-detector. So what if we were to give a computer a few thousand images of dogs, and a few thousand of cats, and tell it to figure it out on its own? That’s what a neural network lets us do.

A neural network is a computation system, or algorithm, where there are a bunch of “neurons” linked together in layers. Each neuron has a weight, which is how much it effects the neurons in the next layer.

Inline image 1

  These weights are “tuned” so that it gives a desired output for a given input. (This tuning is a process where a sample set of inputs is given, and the corresponding outputs are scored for desired-ness. The weights of the neurons are then randomly adjusted slightly, and repeats the first step to see if it improved. Think of it as evolution.)

A deep neural network is like a regular neural network, but is vastly larger (Thew one I use has 30 layers). This depth means that it can learn higher-level patterns, like eyes and things in images. The DNN I used was trained (tuned) on a set of 2.5 million images, by being given the images and taught to guess what was in them.

Deep-dreaming is a way to learn about how a DNN works. Because of a DNN’s evolved nature, it acts as a black box, an algorithm where the inside is unknown. Sure, we can view the weights of each neuron, but it’s not like there’s some neuron that is dog-ness, for example. DeepDream was created as a way to see what happens inside.

DeepDream works by inputting an image into the DNN, and then adjusting the image to optimize (maximize*) a particular layer of neurons. The results in increasing the patterns in the image that that particular layer recognizes. By varying which layer we optimize, the types of features enhanced can be varied. For example, low-medium layers (low being the first layers) look like impressionist art, with simple patterns of lines. About two-thirds of the way up (through) is where the video was done. At this point, most of the stuff recognized is on the detail-scale of humans, buildings, faces, cars, and other similar, and interesting, objects.  Up above these layers the patterns become too fine, and the images mostly look dirty.

 *Maybe. The optimization is some really crazy math, so I’ll leave it out. Partly because I don’t understand it. I think it uses imaginary numbers and vectors

  To sum it up, DeepDream is enhancing the parts and patterns of images that a layer of a DNN image recognition algorithm detects, showing what a computer recognizes in the image.

Below I have a variety of samples form various layers through the DNN, showing how each layer ‘sees’ different, and increasing detail.

This is the first layer, which turns out to be mostly contrast.

This is the first layer, which turns out to be mostly contrast.

another fairly low-level layer. I'm not sure what the main focus of this layer is. Maybe something with lighting?

another fairly low-level layer. I’m not sure what the main focus of this layer is. Maybe something with lighting?

This is 1/4 to 1/3 of the way through, and it's starting to gain more varying patterns. These are the ones that look like impressionist art.

This is 1/4 to 1/3 of the way through, and it’s starting to gain more varying patterns. These are the ones that look like impressionist art.

A neat effect squares and dots effect shows up in this layer. It looks like some art style to me, but I'm not sure what.

A neat effect squares and dots effect shows up in this layer. It looks like some art style to me, but I’m not sure what.

Now we're deep enough to start seeing more object-like patterns. Some eye-ish dots are forming along the river.

Now we’re deep enough to start seeing more object-like patterns. Some eye-ish dots are forming along the river.

Here we start getting into some serious impressionist effects.

Here we start getting into some serious impressionist effects.

This is about half-way between the 'art' layers and the 'stuff' layers.

This is about half-way between the ‘art’ layers and the ‘stuff’ layers.

Animals are starting to appear now. This is around the layer I use for a lot of my stuff.

Animals are starting to appear now. This is around the layer I use for a lot of my stuff.

This is one of the last layers in the DNN. By this point, almost all of the detail is on a very small scale, leaving the image looking dirty.

This is one of the last layers in the DNN. By this point, almost all of the detail is on a very small scale, leaving the image looking dirty.

Links to sources/more material:

the Google blog post that started it all:
http://googleresearch.blogspot.com/2015/06/inceptionism-going-deeper-into-neural.html

A Google research paper: http://arxiv.org/pdf/1409.4842v1.pdf

Tangents off of Euler’s Tonnetz (music theory / basic topology)

Hi, Adam here!

If you have ever done anything music-related, you will probably know that staying on the same chord throughout a piece of music doesn’t usually make for a great song. Music is built around chord progressions. But, of course, you can’t just take any random set of chords and mix them together, as this will usually return a dissonant conglomeration of messy frequency ratios. I had this in mind when I sat on a couch with a clipboard and blank piece of paper. The other thing I had in mind was Leonhard Euler’s Tonnetz. At the time, I remembered a few things: the Tonnetz had a bunch of letter names of musical notes on it, these letters had lines running between them showing harmonic relationships, and that they were arranged in a grid-like pattern. I half-decided then to reproduce this graph, or at least make something similar to it. So I started off by drawing the letter C and putting it in a circle. I couldn’t remember the Tonnetz very well then, so I assumed I was working with key signatures, instead of notes. Now I had to decide where to go next. There were two obvious choices right off the bat. F and G.

F_C_G_plus_5

I placed the two chords/key signatures to to the upper left and bottom right of the C in the middle. Now, why F and G? Continue reading

A (sort of) Brief Explanation of Gradient Descent

Hi, Adam here! I’m going to give out an explanation of the gradient descent algorithm!

Let’s say you have a program with a whole lot of parameters in it. Say 30 or so. Changing these parameters affects the speed with which the program does its job. Keep in mind that these values could be anything from synaptic weights in an artificial neural network to how often you should grab values from an IMU.

You can easily find out how fast the program does its job, but you aren’t sure what the best combination of parameters to use for the program is. You could try every combination, but this might take a while. You could try and figure out the best set of parameters by trial and error, but this might take a little while, and could be pretty tedious. So we want to come up with an algorithm to try and find the set of parameters that will make our program run as fast as it possibly can. We’re going to need a little math.

Let’s start off by making the set of parameters we’re using into a vector. Of course, this will be a pretty high-dimensional vector, but in linear algebra we can have as many dimensions as we like. Let’s say we have 30 parameters, and we’ll call our vector v. We will let F\left(v\right)= the time our program takes to get done using the values in v as the parameters. We want to find a value of v that minimizes F without taking 10^{30} samples.

So… how do we do this?

Continue reading