Posted in: Schools, Uncategorized

G. Probability

recognizing patterns, structure. tell two different things are the same.


  • History: Mosteller – Wallace on Federalist Papers: trying to pin down who is the author of certain papers.
  • Govt: IQSS
  • Finance:
  • Gambling: games of chance – dice, cards, coins, Fermat and Pascal correspondence in 1650s. Newton helped gamblers.
  • Life. Statistics is the logic of uncertainty. – quantifying uncertainty.


  • A sample space is the set of all possible outcomes of an experiment.
  • An event is a subset of the sample space.
    • SETS is key concept that made statistics as a math subject.
    • Sets: unions, intersections, complements, etc.
    • empty set
  • Naive defin. of probability: P(A)=# favorable outcomes / # possible outcomes. (flip coins: coins are fair.) assuming: all outcomes are equally likely and there are finite sample space. Note: we are going to loose these two conditions when we talk about formal definition.


Multiplication Rule: If we have an experiment with n_1 possible outcomes, and for each outcome of the first experiment there are r_2 outcome for 2nd experiment, …., for each of the previous experiment, there are n_r outcomes for r-th experiment. Then, there are n_1* n_2* ….. n_r overall possible outcomes.

combinations and permutations

  • Binomial coefficient:
  • sampling: pick k times from set of n objects:

Note: with replacement, indistinguishable objects; order doesn’t matter: objects are not labeled.

Proof: with replacement and order doesn’t matter.

equivalently: how many ways are there to put k (=6) indistinguishable particles into n (=4) distinguishable boxes. [physics]

The problem becomes 6 circles and n-1 separators. Therefore, we have n+k-1 positions, and we place circles in these positions, or we place separators in these positions.

flip two coins: Bose argued that there are 3 outcomes instead of 4 outcomes: HH, TT, HT/TH (one head or one tail) [Bose-Einstein condensation]

  • Problem: 10 people, how many ways to make two groups: a group of 6 and a group of 4? how many ways to make two groups: 5 in each group.

Story proof

Non-naive definition:

sample space is expanded to

  • finite sample spaces,
  • countably infinite sample spaces,
  • uncountably sample spaces.

Breakthrough: one is set concept; the other is these two axioms. Therefore, probability becomes a science that can be presented in math.

Birthday Problem

Problem: k people, find prob. that 2 have same birthday.

Exclude Feb. 29, assume other 365 days are equally likely (indeed, more babies are born in the ninth month after a holiday.) , assume independence of birth (no twins)

  • pigeon hole principle in computer science: clever way of storing information in the different data structures. collision happens when you try to store same thing in the same hole place. you want to know what’s the probability you are going to have collisions.

If you ask how many people you need to have 50/50 chance to have two people either with same birthday or birthdays one day apart. k = 14.

Ref. applets by Stanford or online to run simulation of birthday problem.

Properties of probability

Problem: deMortmort’s problem 1713, matching problem – gambling

card game: n cards labeled from 1-n. Flip each card each time and you say one, two, ….; if the number you say match the number on the card, you win.

Note: e, inclusion and exclusion, symmetry

Independent events

Note: pairwise independence doesn’t mean three are independent. The above four conditions must be met.

independence means multiply.

Problem: Newton-Pepys problem (1693). Have fair dice. Which is most likely?

  • A. at least one 6 with six dice.
  • B. at least two 6’s with 12 dice
  • C. at lease three 6’s with 18 dice.

Conditional probability

How should update probability / beliefs / uncertainty based on new evidence?

Use conditional probabilities to break up unconditional probabilities. (Law of total probability)

Conditioning is the soul of statistics.


Intuition 1. Pebble world

Sample Space with 9 pebbles (outcomes),, total mass is 1

  • Event B = subset – 4 green pebbles
  • Get rid of pebbles not in B. (in B complement)
  • Renormalize mass to 1 again
  • Event A red pebbles. 

Intuition 2. Frequentist world: repeat experiment many times. Flipping a coin 1000 times. 

  • List of repetition. 
  • Binary numbers, circle repetitions where B occurred. Among the circled ones, what fraction of time did A also occur?

Bayes’ rule: the most useful and deep influential idea in statistics.

Thinking conditionally is a condition for thinking.

How to solve a problem.

  1. try simple and extreme cases.
  2. Break up problems into simpler pieces.

Law of Total Probability

The way to partition S is key.

Problem: get random two-card hand from standard deck. Find P(both aces|have ace), P(both aces |have ace of spade).

P(both aces|have ace) = P(both aces, have ace) / P (have ace) =

P(both aces |have ace of spade) = 3/51 = 1/17

AS + ?

Problem. patient gets tested for certain disease. afflicts 1% of population. tests positive.

Suppose test as advertised as “95% accurate”, suppose this means P(T|D) = 0.95 = P(Tc|Dc).

Patient cares about P(D|T).

D: patient has disease. T: patient test positive.

suppose we have tested 1000 patients. 10 is positive. 990

Notes: people pay attention to 95% accuracy and ignore this is a rare case 1% of population. If patient have consistent symptoms, then we change percent of population with symptoms.

Coherence of Bayes’ rule: suppose you have two pieces of information, same as you have one piece of evidence and later get another one. Update at once is the same as update twice.


(1) confusing P(A|B),. P(B|A) Prosecutor’s fallacy

If a person is guilty given all the evidence. mistake: probability of evidence given innocence.

  • Problem. Sally Clark case. Sally’s two kids died of no explanation SIDS. The expert said assuming she is innocent, the probability of a baby just spontaneously dying for no apparent reason 1/8500.1/8500.
  • Mistakes: assuming independence of both deaths.
  • We want P(innocent | evidence). Prior probability is very rare mother kill their babies.

(2) confusing P(A) prior with P(A|B) posterior: P(A|A)=1

(3) confusing independence and conditional independence

DEFN. Events A, B are conditionally independent given C. if

Does conditional independence given C imply independence? NO

Ex. Chess opponent of unknown strength. may be that game outcomes are conditionally independent given Strength of opponent, but not independent unconditionally.

Does independence imply conditional independence given C. No.

Events caused by multiple factors. A: fire alarm goes off. caused by: F: fire or C: popcorn. Suppose F & C independent; but P(F|A, Cc)=1. Not conditional independent given A.

Monty Hall

  • The game show: three doors, behind one door there is a car, behind two doors there is a goat each. The contestant is asked to choose one door. Then the host Monty will open one door with the goat. Monty then asks if the contestant change his mind or not?
  • Assumptions: 1. the contestant does not know anything. 2. Monty knows. He always open a goat door. 3. If he has a choice, he picks with goat equally. (sometimes he does not have a choice. or he is lazy to go to one door farther. ) People ignore these assumptions esp. the 3rd one.
  • By symmetry, we assume you pick up door One. Monty opens door two or three to reveal a goat.

Note: if Monty opens door two, we know door two has a goat, and Monty opened door two.

Solution One with tree diagram

  • Suppose Monty opens door 2. We normalize by dividing ⅙ and ⅓ by 2 as there are only two relevant cases. 
  • It is good to change your mind to choose door three. 
  • P(success if swtich|Monty opens door two) = ⅔
  • Intuition: ⅓ of time your initial guess is correct. 

Solution 2 with the law of total probability (LOTP)

To use LOTP, the key is to decide what to condition on. Then, use I wish I knew that, then you condition on this. This “I wish I knew” method is unique to statistics.

  • I wish I knew the car door.
  • S: succeed (assuming switch)
  • Dj: Door j has the car (j = 1, 2,3)
  • P(S) = P(S|D1) 1/3 + P(S|D2) 1/3 + P(S|D3) 1/3 = 0+1/3+1/3 = 2/3 (assuming car is door one)
  • By symmetry, P(S|Monty opens door two) = 2/3
  • In symmetry, conditional and unconditional probability is the same.
  • If Monty has preference to pick up door 2 over door 3, then the conditional probability will change.
  • applets in New York Times
  • If you don’t know any statistics, you can simulate it. Do it with cups. try simulate 1000 times.
  • Extreme case: What if Monty has 1 million doors. you pick 1 door, Monty opens 999,998 doors leaving just one door, should you switch? With 1 million doors, we are sure that our first initial guess is wrong. We will switch for sure.

Simpson’s Paradox

Is it possible to have two doctors where the first doctor has a higher success rate at every single possible type of surgery imaginable than the second one; Yet, the second doctor has an higher overall success rate?

One thing is better in every case and you add up all those cases to get total.

Dr. Hillbert vs Dr. Nick. assuming two different surgeries.

Conditional on heart surgery: H is better. 

Expert’s success rate is not that great due to he gets hard cases. 

<math xmlns=""><mfrac><mn>1</mn><mn>3</mn></mfrac><mo>+</mo><mfrac><mn>2</mn><mn>5</mn></mfrac><mo> </mo><mo>=</mo><mo> </mo><mfrac><mn>3</mn><mn>8</mn></mfrac></math>
  • A: successful surgery. (c for complement)
  • B: treated by Dr. Nick
  • C: heart surgery
  • P(A|B,C)<P(A|Bc, C)
  • P(A|B,Cc)<P(A|Bc, Cc)
  • Overall: P(A|B) > P(A|Bc).
  • C is confounder, or controller.

Ex. Four jars of two kinds of jelly beans. you like one flavor than the other. Jar one > Jar two, Jar three > Jar four as jar one has higher percentage of your liking flavor. You pour Jar one and Jar three together, Jar two and Jar four together. Sometimes, the percent of your flavor in the first sum is less than that of the second sum.

Statistics is about

(1) Conditioning: the soul of statistics

(2) Random variables their distributions

Gambler’s Ruin (lecture 7)

Two gamblers A and B, sequence of rounds, bet $1, p=P(A wins a certain round), q=1-p; the game is over if one player. What is the probability that A wins entire game (so B is ruined).

Assuming A starts with $i, B starts with $(N-i). Total is $N.

  • — repeated trials
  • — random walk: p = prob. of going right; absorbing state at 0, N.


  • recursive structure: A loses 1st round, moving to the left by 1 step: i-1; OR A wins the 1st round, moving to the right by 1 step: i+1; Same problem with different initial condition (starting at a different point (from last round).
  • strategy: condition on the first step.

Difference equation: discrete analogue of a differential equation. seldom taught in US.

To solve difference equations: you get a polynomial, you find the roots, if all the roots are distinct, then the general solution is just a linear combination of those roots.

  • What we learn: think conditionally and detect the structure of the problem.
  • If it is a fair game, chance of wining is proportional to what fraction of the wealth that A has. A has 2/3 of money, then he has 2/3 of chance wining.
  • let i = N-i, p = 0.49; if N=20, then 0.40; N = 100, 0.12; N = 200, 0.02.
  • Casino: more money than anybody, and unfair
  • what if the game oscillate forever, what if B wins, A is ruined. for a fair game, q = (N-i)/N and p + q = 1. chance of A wins and chance of B wins add up to 1, and there is nothing leftover. So the probability of games going forever is 0.

Random Variable and their distributions

what is variable?

x + 2 = 9, then x = 7.

to get a variable, we need a function.

RV is a function from Sample space S to IR. RV is a numerical summary of an aspect of the experiment. Randomness comes from experiment. Each subset s of S is mapped to a number in real number line.

Distribution: is like the blueprint that says what’s the probability that the r.v. will do this and what is the probability of it will do that. It’s a specification of the probabilities associated with that r.v. Variable is a function, distribution is saying the what probability a variable behaves different ways.

DEFN. Bernoulli A r.v. X is said to have Bernoulli distribution, if X has only 2 possible values: 0 and 1. and P(x=1) = p, P(x=0) = 1-p . [One experiment: flip a coin once]

s: s(x) = 1

(1) Story: X is # of successes in n independent Bern(p) trials. p is probability of success. coin flips

(2) sum of indicator r.v. s: X = X1+ X2 + …. Xn, where X1, … Xn i.i.d. Bern (p) (add 1 if success, add 0 if failure) Breaking complicated variable X into simple ones.

  • Xj = 1 if jth trial success;
  • Xj = 0 if otherwise

i.i.d. independently, identically, distributed:

(3) PMF: probability mass function. it specify what’s the probability that equals zero, what’s the probability that equals 1, and so on.

only for discrete R.V. s

DEFN. Binomial (n, p). The distribution of the number of successes in n independent Bern(p) trials. its distribution is given by

1110000 (n=7, k=3)

[n times of experiment: flip a coin n times]

Sample space, there are many outcomes, we assign numbers to different outcomes. 

X = 7 is an event, which is a subset of space. This is a notation. 

CDF of X (cumulative distribution function)

X<=x is an event.

Let’s check if Binomial (n, p) is a valid PMF.

X, Y are two functions. To add up functions, they need to have same domain – pebble world.

First proof: by story

Second proof:

Third proof: Use PMF

% \f is defined as #1f(#2) using the macro \f\relax{x} = \int_{-\infty}^\infty \f\hat\xi,e^{2 \pi i \xi x} \,d\xi \begin(aligned) C &= 2 \pi r \\
 C &= 2 \pi r \\

list Harvard entry class: work on this course to gain a solid understanding.

Books used for this series: Austin, Grew

Leave a Reply