Monday, April 28, 2014

Recursion Explained

This is the first in a series of posts with a specific, targeted audience, which means there's going to be a relatively wide variation in level-of-explanation. Some of the posts are going to very clearly be me-writing-for-someone-who-already-knows-lots-of-relevant-information, and some (like today's) will be me-writing-for-someone-who-wouldn't-normally-read-this-blog. Anyway, today's topic is what recursion is and why we should care. If you already think recursion is important, this post'll be pretty surface-y and review-heavy. So, let's get started, shall we?


Recursive-Xzibit thinks Recursive-Xzibit thinks Recursive-Xzbit thinks ... that recursion is fun

So yeah, recursion.  The basic idea is something that acts on itself, or something with multiple different levels.  And oh, the places you can go with self reference!
The most recent/significant pop-cultural example I can think of is the movie Inception, where dreams could occur within dreams within dreams, and what happened in one level would bleed down into all the levels below it (e.g. when gravity stopped in the hotel because the truck they were dreaming in was falling off a bridge).  
Recursion comes in a few different classes, and I've yet to hear any good terms to differentiate them, so I'm going to make some up:

  • Unidirectional recursion
    • Feedback only goes in only one direction
    •  Example 1: The Fibonacci Sequence (each element is the sum of the previous two elements, the first few are \(0, 1, 1, 2, 3, 5, 8, 13, 21...\)) - the later parts of the list are completely unimportant in figuring out the first few
    • Example 2: Sorting a list (using quicksort) - one simple and efficient way to sort a list is a monotonic recursion - you pick the first number off the list, split the rest of it in two by putting all the numbers smaller than the first one on one side and bigger on the other, then sort each of those sublists

  • Bidirectional recursion
    • Feedback can go in either direction
    • Example: Inception - the environment you're dreaming in influences the dream, but the events within the dreams have lasting influence in the outside world (e.g. planting an idea in someone's head, taking useful information, etc)
    • Example 2: INT - this graph is sorta strange.  It's constructed entirely out of smaller versions of itself - any single point contains the whole graph.  The lower levels are exactly the same as the higher ones.  Cool, right?
    •  

  • Mutual recursion
    • Two things that are defined, each one in terms of the other
    • This one's generally a bit more complicated than the other two
    • One set of good examples can be found in Hofstadter Sequences - specifically, the Figure-Figure sequence
    • It's pretty neatly defined in terms of itself and its complement (the set of all numbers not in the sequence)
    • The first number of the sequence (\(R(0)\), if you will) is \(1\)
    • The first number not in the sequence (Let's call it \(S(0)\)) is \(2\)
    • The rest of the sequence is defined \(R(n) = R(n-1) + S(n-1)\)
    • So, for example, \(R(1) = R(0) + S(0) = 3\), and \(R(2) = R(1) + S(1) = (R(0) + S(0)) + S(1) = (2+1) + S(1) = 3 + S(1) = 3 + 4 = 7\)
    • Let's write out another one just for kicks: 
    •  \(R(3) = R(2) + S(2) =  (R(1) + S(1)) + S(2)\)
    • \(= ((R(0)+S(0))+S(1))+S(2) = (3+4)+5 = 12\)

     So, why should we care exactly?  Why does this fancy "recursion" thing even matter?  Well, it's by far the most elegant (and sometimes the only) way to define all those things up there, but let's say you didn't care about any of those.  Why recursion?

    Well, given that the target audience here is into music theory, let's start with the immediately relevant example: fugues.  Fugues are recursive - you start from a relatively well defined structure, then modify it according to various rules and weave the modifications together, and, if you know what you're doing, beautiful music comes out.



    Convinced you care yet?  No?  What was that?  "More examples"?  Great.

    Well, recursion is useful in understanding language.  I'd go into it, but Guy Steele did a presentation on this a while ago, and it's one of my favourite talks on the internet, so I'll link to that instead.


    That brings me to my next point pretty nicely actually - recursion is extremely important in programming.
    I'll summarize a bit, but recursion-in-programming really deserves its own post
    A relatively common idiom is to define a base case and a way to reduce anything else to said base case.  We saw a few examples of that earlier, with the Fibonacci and Figure-Figure sequences - we defined \(F(0)\), \(S(0)\), and \(R(0)\), then gave rules for computing the rest of the sequences in terms of the first element.
    But, to take a more programming-y example, let's say you've got some operation (we'll call it f) and you want to apply it to each element of a list. How would you go about that?
    Well, you'd probably start by applying the operation to the first result of the list, then start again with the rest of the list, until you've f'd every element.

    So, we've seen that recursion gives us pretty ways to define seemingly messy things, but the discussion is still incomplete.  Why?
    Simple.
    I haven't pulled out the pretty colors.
    A discussion of recursion always has to end with fractals, so we'll start with a joke, then follow up with a pretty picture:

    I recently saw someone post a question on Quora, "What's the prettiest image ever found by zooming in on the Mandelbrot set?" - The best answer was, of course, "The Mandelbrot set".
    The Mandelbrot set is defined by doing some funky things with complex numbers, and the trick is that it ends up containing infinitely many smaller versions of itself - Just about anywhere along the boundary, if you zoom in enough, you'll find a (often very slightly deformed) copy of the whole set.

    So, without further ado, pretty colors

Monday, April 7, 2014

Playing with Probabilities

MDists.lhs

The Header

Imports, pragmas, and the module declaration

> {-# LANGUAGE TupleSections #-}
> module MDists where
> import Data.Map (Map, (!))
> import qualified Data.Map as M
> import Data.Function
> import Data.List
> import Data.Ratio

The Code

> type P = Rational
> type PDist a = Map a P

Hypotheses are getting represented as probability distributions with a probability attached to them

> data Hypothesis a = H {conditionals :: PDist a,
>                          chance :: P} deriving (Show)

Next, the distribution over hypotheses - let’s make strings our labels, just because strings are easy to use

> data HDist a = HD {hps :: Map String (Hypothesis a)} deriving (Show)

Helpers

First, we want to implement a few basic funcitons that will allow the rest of these to work nicely For example, we’ll want to be able to apply some function [0, 1] → [0, 1] to the probabilities we’re juggling here

> applyToProbability :: (P -> P) -> Hypothesis a -> Hypothesis a
> applyToProbability f (H d p) = H d $ f p

That one’s pretty simple, but on its own not that useful. Instead, let’s wrap it up so we don’t have to deal with the individual hypotheses - instead, we’ll map it over a the hypothesis distribution

> applyToProbs :: (P -> P) -> HDist a -> HDist a
> applyToProbs f (HD hps) = HD $ M.map (applyToProbability f) hps

Now, the main reason we’re defining those two functions is to write a function renormalize that takes a distribution and makes sure the probabilities actually sum to 1 without changing any of the ratios between probabilities

> renormalize :: HDist a -> HDist a
> renormalize h@(HD dist) = applyToProbs (/normalizer) h
>   where normalizer = M.foldl' (+) 0 . M.map chance $ dist

Our basic guess function is just a small wrapper around M.lookup - it fetches the likelihood of a hypothesis on its identifier If we don’t have a hypothesis in our distribution, it assigns it a probability of 0, but otherwise just fetches the probability from the hypothesis

> guess :: Ord a => HDist a -> String -> P
> guess (HD hps) str = case M.lookup str hps of Nothing -> 0
>                                               Just h -> chance h

Updating!

Now, our not-so-secret goal here was to model Bayesian updates If you’re not familiar with Bayes’ Theorem, here it is:

$$p(H|D)=p(D|H)\frac{p(H)}{p(D)}$$

To explain: the probability of a hypothesis being true, on some data, is the probability of that data given the hypothesis (p(DH)) multiplied by the probability we had previously assigned the hypothesis (our “prior”, or p(H)), then divided by the probability of that data under all hypotheses (p(D)). We don’t deal directly with the probability of the data under all hypotheses - instead preferring to renormalize the distribution after the fact

Of course, we want to automate this adjustment. The first step is to multiply each probability by the numerator - p(DH) ⋅ p(H)

> pAndCond :: Ord a => a -> Hypothesis a -> Hypothesis a
> pAndCond event (H dist prior) = H dist $ prior * likelihood
>   where likelihood = M.findWithDefault 0 event dist

With that done, we just renormalize, effecitvely dividing by p(D) We know this works because if we divided by anything other than the renormalizer, the ending probabilities would sum to some number other than 1, so, on pain of contradiction,
p(D) = renormalizer

> updateOnEvent :: Ord a => HDist a -> a -> HDist a
> updateOnEvent (HD dist) event = renormalize . HD . M.map (pAndCond event) $ dist

Of course, we want to be able to update on a series of events, so we fold over a list of events This fold ends up being fully strict, so instead of a foldr we use foldl' (hooray tail recursion!)

> update :: Ord b => HDist b -> [b] -> HDist b
> update = foldl' updateOnEvent

Construction helpers

We’ll use these utilities to construct and solve some toy problems a little further down First, a few more helpers that’ll be really important for constructing the toy problems

This one just normalizes a hypothesis, so we can think of it in terms of allocating odds instead of probabilities

> normalizeHyp :: Hypothesis a -> Hypothesis a
> normalizeHyp (H m p) = if totalP == 1 then (H m p) else H (M.map (/totalP) m) p
>   where totalP = M.foldl' (+) 0 m

Next, this one just makes a hypothesis distribution from a list

> fromList :: [(String, Hypothesis a)] -> HDist a
> fromList = renormalize . HD . M.fromList

Toy Problems

Monty Hall

We’ll start out by getting a list of doors

> drs = "abc"

Next, our function for generating a hypothesis for which door gets opened based on our guess and the correct answer

> isIn :: Char -> Char -> Hypothesis Char
> isIn guess correct = normalizeHyp . flip H 1 . M.fromList $
>                      [(dr,1) | dr <- drs, dr /= guess, dr /= correct]

And, finally, the problem itself! The identifiers are each just the single character name of the door (a, b, or c) The hypotheses are each generated assuming the identifier is the correct door according to the isIn function

> montyHall :: Char -> HDist Char
> montyHall guess = fromList [([dr],isIn guess dr) | dr <- drs]
> shouldYouSwitch = guess (update (montyHall 'a') "b") "a" < 1 % 2

And there you go! The boolean shouldYouSwitch starts from you picking door a, then Monty opens b, and checks if you have worse than even chances of it being behind a

Cookies

We’ve got two bools of chocolate and vanilla cookies The first bowl is three quarters vanilla and one quarter chocolate, while the other is half and half

> bowl1 = normalizeHyp $ flip H 1 $ M.fromList [('v',3),('c',1)]
> bowl2 = normalizeHyp $ flip H 1 $ M.fromList [(a,1) | a <- "vc"]

Next we’ll make a distribution from both the bowls

> bowls = fromList  [("b1",bowl1),("b2",bowl2)]

From this, given a stream of cookies, it’ll guess the chance you were drawing from one of the bowls

M&M’s

Some background: The mixture of M&M’s has been changed a few times In 1995, blue M&M’s were introduced Given a bundle of M&M’s, what’s the probability it came from a 1994 mix vs a 1996 mix?

color percentage in ’94 percentage in ’96
brown 30 24
green 10 20
orange 10 16
yellow 20 14
red 20 13
tan 10 0
blue 0 13

Now, let’s encode that into hypotheses

> mix94 = normalizeHyp . flip H 1 $ M.fromList
>         [("brown",30),("green",10),("orange",10),("yellow",20),("red",20),("tan",10)]
> mix96 = normalizeHyp . flip H 1 $ M.fromList
>         [("brown",24),("green",20),("orange",16),("yellow",14),("red",13),("blue",13)]

And, finally, our hypothesis distribution

> bag = fromList [("94",mix94),("96",mix96)]