Monday, April 28, 2014

Recursion Explained

This is the first in a series of posts with a specific, targeted audience, which means there's going to be a relatively wide variation in level-of-explanation. Some of the posts are going to very clearly be me-writing-for-someone-who-already-knows-lots-of-relevant-information, and some (like today's) will be me-writing-for-someone-who-wouldn't-normally-read-this-blog. Anyway, today's topic is what recursion is and why we should care. If you already think recursion is important, this post'll be pretty surface-y and review-heavy. So, let's get started, shall we?

Recursive-Xzibit thinks Recursive-Xzibit thinks Recursive-Xzbit thinks ... that recursion is fun

So yeah, recursion. The basic idea is something that acts on itself, or something with multiple different levels. And oh, the places you can go with self reference!
The most recent/significant pop-cultural example I can think of is the movie Inception, where dreams could occur within dreams within dreams, and what happened in one level would bleed down into all the levels below it (e.g. when gravity stopped in the hotel because the truck they were dreaming in was falling off a bridge).
Recursion comes in a few different classes, and I've yet to hear any good terms to differentiate them, so I'm going to make some up:

Unidirectional recursion

Feedback only goes in only one direction
Example 1: The Fibonacci Sequence (each element is the sum of the previous two elements, the first few are $0, 1, 1, 2, 3, 5, 8, 13, 21...$) - the later parts of the list are completely unimportant in figuring out the first few
Example 2: Sorting a list (using quicksort) - one simple and efficient way to sort a list is a monotonic recursion - you pick the first number off the list, split the rest of it in two by putting all the numbers smaller than the first one on one side and bigger on the other, then sort each of those sublists

Bidirectional recursion

Feedback can go in either direction
Example: Inception - the environment you're dreaming in influences the dream, but the events within the dreams have lasting influence in the outside world (e.g. planting an idea in someone's head, taking useful information, etc)
Example 2: INT - this graph is sorta strange. It's constructed entirely out of smaller versions of itself - any single point contains the whole graph. The lower levels are exactly the same as the higher ones. Cool, right?

Mutual recursion

Two things that are defined, each one in terms of the other
This one's generally a bit more complicated than the other two
One set of good examples can be found in Hofstadter Sequences - specifically, the Figure-Figure sequence
It's pretty neatly defined in terms of itself and its complement (the set of all numbers not in the sequence)
The first number of the sequence ($R(0)$, if you will) is $1$
The first number not in the sequence (Let's call it $S(0)$) is $2$
The rest of the sequence is defined $R(n) = R(n-1) + S(n-1)$
So, for example, $R(1) = R(0) + S(0) = 3$, and $R(2) = R(1) + S(1) = (R(0) + S(0)) + S(1) = (2+1) + S(1) = 3 + S(1) = 3 + 4 = 7$
Let's write out another one just for kicks:
$R(3) = R(2) + S(2) = (R(1) + S(1)) + S(2)$
$= ((R(0)+S(0))+S(1))+S(2) = (3+4)+5 = 12$

extremely important

Monday, April 7, 2014

Playing with Probabilities

MDists.lhs

The Header

Imports, pragmas, and the module declaration

> {-# LANGUAGE TupleSections #-}
> module MDists where
> import Data.Map (Map, (!))
> import qualified Data.Map as M
> import Data.Function
> import Data.List
> import Data.Ratio

The Code

> type P = Rational
> type PDist a = Map a P

Hypotheses are getting represented as probability distributions with a probability attached to them

> data Hypothesis a = H {conditionals :: PDist a,
>                          chance :: P} deriving (Show)

Next, the distribution over hypotheses - let’s make strings our labels, just because strings are easy to use

> data HDist a = HD {hps :: Map String (Hypothesis a)} deriving (Show)

Helpers

First, we want to implement a few basic funcitons that will allow the rest of these to work nicely For example, we’ll want to be able to apply some function [0, 1] → [0, 1] to the probabilities we’re juggling here

> applyToProbability :: (P -> P) -> Hypothesis a -> Hypothesis a
> applyToProbability f (H d p) = H d $ f p

That one’s pretty simple, but on its own not that useful. Instead, let’s wrap it up so we don’t have to deal with the individual hypotheses - instead, we’ll map it over a the hypothesis distribution

> applyToProbs :: (P -> P) -> HDist a -> HDist a
> applyToProbs f (HD hps) = HD $ M.map (applyToProbability f) hps

Now, the main reason we’re defining those two functions is to write a function renormalize that takes a distribution and makes sure the probabilities actually sum to 1 without changing any of the ratios between probabilities

> renormalize :: HDist a -> HDist a
> renormalize h@(HD dist) = applyToProbs (/normalizer) h
>   where normalizer = M.foldl' (+) 0 . M.map chance $ dist

Our basic guess function is just a small wrapper around M.lookup - it fetches the likelihood of a hypothesis on its identifier If we don’t have a hypothesis in our distribution, it assigns it a probability of 0, but otherwise just fetches the probability from the hypothesis

> guess :: Ord a => HDist a -> String -> P
> guess (HD hps) str = case M.lookup str hps of Nothing -> 0
>                                               Just h -> chance h

Updating!

Now, our not-so-secret goal here was to model Bayesian updates If you’re not familiar with Bayes’ Theorem, here it is:

$$p(H|D)=p(D|H)\frac{p(H)}{p(D)}$$

To explain: the probability of a hypothesis being true, on some data, is the probability of that data given the hypothesis (p(D∣H)) multiplied by the probability we had previously assigned the hypothesis (our “prior”, or p(H)), then divided by the probability of that data under all hypotheses (p(D)). We don’t deal directly with the probability of the data under all hypotheses - instead preferring to renormalize the distribution after the fact

Of course, we want to automate this adjustment. The first step is to multiply each probability by the numerator - p(D∣H) ⋅ p(H)

> pAndCond :: Ord a => a -> Hypothesis a -> Hypothesis a
> pAndCond event (H dist prior) = H dist $ prior * likelihood
>   where likelihood = M.findWithDefault 0 event dist

With that done, we just renormalize, effecitvely dividing by p(D) We know this works because if we divided by anything other than the renormalizer, the ending probabilities would sum to some number other than 1, so, on pain of contradiction,
p(D) = renormalizer

> updateOnEvent :: Ord a => HDist a -> a -> HDist a
> updateOnEvent (HD dist) event = renormalize . HD . M.map (pAndCond event) $ dist

Of course, we want to be able to update on a series of events, so we fold over a list of events This fold ends up being fully strict, so instead of a foldr we use foldl' (hooray tail recursion!)

> update :: Ord b => HDist b -> [b] -> HDist b
> update = foldl' updateOnEvent

Construction helpers

We’ll use these utilities to construct and solve some toy problems a little further down First, a few more helpers that’ll be really important for constructing the toy problems

This one just normalizes a hypothesis, so we can think of it in terms of allocating odds instead of probabilities

> normalizeHyp :: Hypothesis a -> Hypothesis a
> normalizeHyp (H m p) = if totalP == 1 then (H m p) else H (M.map (/totalP) m) p
>   where totalP = M.foldl' (+) 0 m

Next, this one just makes a hypothesis distribution from a list

> fromList :: [(String, Hypothesis a)] -> HDist a
> fromList = renormalize . HD . M.fromList

Toy Problems

Monty Hall

We’ll start out by getting a list of doors

> drs = "abc"

Next, our function for generating a hypothesis for which door gets opened based on our guess and the correct answer

> isIn :: Char -> Char -> Hypothesis Char
> isIn guess correct = normalizeHyp . flip H 1 . M.fromList $
>                      [(dr,1) | dr <- drs, dr /= guess, dr /= correct]

And, finally, the problem itself! The identifiers are each just the single character name of the door (a, b, or c) The hypotheses are each generated assuming the identifier is the correct door according to the isIn function

> montyHall :: Char -> HDist Char
> montyHall guess = fromList [([dr],isIn guess dr) | dr <- drs]

> shouldYouSwitch = guess (update (montyHall 'a') "b") "a" < 1 % 2

And there you go! The boolean shouldYouSwitch starts from you picking door a, then Monty opens b, and checks if you have worse than even chances of it being behind a

Cookies

We’ve got two bools of chocolate and vanilla cookies The first bowl is three quarters vanilla and one quarter chocolate, while the other is half and half

> bowl1 = normalizeHyp $ flip H 1 $ M.fromList [('v',3),('c',1)]
> bowl2 = normalizeHyp $ flip H 1 $ M.fromList [(a,1) | a <- "vc"]

Next we’ll make a distribution from both the bowls

> bowls = fromList  [("b1",bowl1),("b2",bowl2)]

From this, given a stream of cookies, it’ll guess the chance you were drawing from one of the bowls

M&M’s

Some background: The mixture of M&M’s has been changed a few times In 1995, blue M&M’s were introduced Given a bundle of M&M’s, what’s the probability it came from a 1994 mix vs a 1996 mix?

color	percentage in ’94	percentage in ’96
brown	30	24
green	10	20
orange	10	16
yellow	20	14
red	20	13
tan	10	0
blue	0	13

Now, let’s encode that into hypotheses

> mix94 = normalizeHyp . flip H 1 $ M.fromList
>         [("brown",30),("green",10),("orange",10),("yellow",20),("red",20),("tan",10)]

> mix96 = normalizeHyp . flip H 1 $ M.fromList
>         [("brown",24),("green",20),("orange",16),("yellow",14),("red",13),("blue",13)]

And, finally, our hypothesis distribution

> bag = fromList [("94",mix94),("96",mix96)]