Visual Information Theory

sebastos · on Oct 14, 2015

I really enjoyed this article, it was very accessible. Recently I've been studying some literature on stochastic optimal control, and I've bumped into the KL-divergence concept a number of times, but never really understood it. I expected this article be a fun read, but I never expected to learn something so directly useful! Information theory really does show up everywhere.

diego898 · on Oct 15, 2015

I am currently learning about stochastic optimal control, and am finding the lecture notes [1] for this course [2] to be extremely helpful. The notes are by a probabilist Ramon Van Handel at Princeton. I hope you find them useful!

[1] https://www.princeton.edu/~rvan/acm217/ACM217.pdf [2] https://www.princeton.edu/~rvan/acm217/acm217.html

sebastos · on Oct 15, 2015

Thanks for the link, it looks great!

tlb · on Oct 14, 2015

I often avoid these sort of visual methods, because:

- they only work with 2 variables, but many interesting problems require digging into more variables

- they only work with medium-sized numbers, and aren't readable when P<0.01 or P>0.99

So they're great for gaining intuition (like the Simpson's Paradox example), but when you try to solve a real problem you find yourself boxed in.

colah3 · on Oct 14, 2015

Yep. The visualization tricks in this article are for building understanding of basic ideas in probability theory and information theory.

In most real situations, they wouldn't be very practical. As you note, the core trick in this essay only works for 2 or 3 variables, assumes they're discrete, and doesn't scale to the variables having lots of values or really improbable values.

There are visualization techniques which are useful in the real world, at least some of the time -- a lot of my blog explores this in the context of neural networks -- but that wasn't my goal in this article.

FrankenPC · on Oct 14, 2015

It's a conceptual building block. YMMV. I enjoyed the concepts regardless of actual applicability.

escherize · on Oct 14, 2015

After careful consideration I've always enjoyed how:

    p(rain,coat) = p(rain) * p(coat | rain)

Can be pronounced: "the probability of rain, and coat (wearing) is the probability of rain times the probability of (my wearing a) coat given rain". This intuitively showcases how the order of independent events doesn't effect the outcome, since after all:

    p(coat,rain) = p(rain,coat) = p(rain) * p(coat | rain)

jsprogrammer · on Oct 14, 2015

The problem with that example is that there is no reason to assume that coat wearing and rain are independent (in fact, you have even modeled that wearing a coat is partially dependent on it raining).

Maybe I missed your point?

escherize · on Oct 14, 2015

No, you're right those events shouldn't be called independent!

A defintition of independent events is:

    A and B are independent events iff P(A|B) = P(A) and P(B|A) = P(B)

So that was just plain wrong.

FrankenPC · on Oct 14, 2015

"I love the feeling of having a new way to think about the world. I especially love when there’s some vague idea that gets formalized into a concrete concept. Information theory is a prime example of this."

THIS! THIS is why I love programming and electronics/mechanical engineering. I live for the new ways to think.

colah3 · on Oct 14, 2015

I feel like 90% of my motivation for writing blog posts is vicariously reliving this feeling. :)

FrankenPC · on Oct 14, 2015

Personally, I feel like a moron most of the time when I'm here on HN. The caliber of engineers here is astounding. So, I comment here hoping for clarification. I'm nowhere near the level of engineering necessary to feel confident to blog.

EDIT: I understand optimizing search algorithms to create better P=NP O(n) solutions. So, I guess I not totally stupid.

nazka · on Oct 15, 2015

It shouldn't be a problem, if you want to write something just do it!

Just avoid to write statements about things you don't know. Start your article by saying that you are a beginner and for things you don't know, give open questions instead. You can also learn one simple thing well and do a post about it, or you can try something new and write a tutorial about it with a few conclusions of your experience. For instance how to do a simple web app with React, Flux, and Node.js.

So there are still things you can write, just be open about your level and what you don't know, and write about what you learned. And even trivial things can be useful for others (like simple stats for devs, or simple python code for data scientists).

vezzy-fnord · on Oct 15, 2015

I live for the new ways to think.

Philosophy should be right up your alley, in that case.

FrankenPC · on Oct 16, 2015

I got straight A's in college philosophy courses. LOVED them.

ljk · on Oct 14, 2015

Off-topic, but does anyone know what was used to draw the graphs? They look really clean

I'm guessing LaTeX? The font looked like LaTex's font

colah3 · on Oct 14, 2015

I drew the graphs in inkscape. It has a plugin for LaTeX equations.

misiti3780 · on Oct 15, 2015

i like forward to every article colah writes - the way he explained backpropagation a few weeks ago was really interesting - never thought about it that way but was very helpful!

incompatible · on Oct 15, 2015

It rains 25% of the time in California? Sounds like an unpleasant place.

colah3 · on Oct 15, 2015

We wish! We're actually in a drought where I live. I also don't think I wear a coat 75% when it is sunny. :)

But I wanted to have nice numbers and it felt like a nice example.

hackaflocka · on Oct 14, 2015

The visual cortex is one of the largest and most powerful cortices in the human brain.

But it may be that vision's supposed to work in conjunction with the other senses.

I think visual explanations work well for very simple visuals. As soon as higher order factors need to be factored-in, visual explanations are only sensible to the highly trained expert (think Feynman diagrams).

Nice essay, nevertheless. A lot of time and work went into it, and I can appreciate that.