vector spaces
- n-tuples:
- linear dependence, linear combination of the spanning set; subspace of V
- linear independent and spanning: basis, dimension of V
- Theorem 1.6. Let V be a vector space over F. Then any two base of V have the same size.
Linear transformation and matrices
- ??: T(v1) is determined by the first column of the matrix. What is in Markov Chain?
- Rank of a linear transformation
- Linear operator: the linear transformations we work with Markov chains will be linear operators. ???
- eigenvalue, engenvectors, eigenspace
Probability spaces
- The function P is the probability measure
- measure theory
- CDF, cumulative density function — cumulative distribution function
- PDF, probability density function
- Gaussian (normal distribution) – central limit theorem
conditional probability – Bayes’s theorem
- Random variables:
- Expected value
- Law of Large Numbers
- i.i.d. Independent and identically distributed.
- Bernoulli distribution, Binomial distribution
future is conditionally independent of the past given present.
- assumptions: finite, a countable set (discrete time and discrete space)
- very strong assumption, though you can have n step back, n-order MC.
- MCMC created scientific computing and is used everywhere. You don’t worry about whether the process you are observing follows a Markov chain, MCMC means that you synthetically construct your own Markov chain. You can construct a clever Markov chain that will converge to a distribution that you’re interested in. You use the results run from computer to study the distribution. (extremely complicated simulations)
- two uses of Markov chain: you have the natural system that you choose to model it using Markov chain. MCMC you build a Markov chain on your own.
- History: in 1900s, Markov used it to settle a religious debate over the existence of free will. Law of Large Numbers will prevent us from having free will because in the long run everything converges to the mean. Where is the scope to have free will then? Well if iid, LOLN holds. and human behavior is not iid. But this is not convincing. Because not iid cannot prove LOLN does not hold. Markov wanted to show one step in complexity beyond the iid. He proved LOLN applied to Markov chain and said you don’t need iid for LOLN to hold.
- first Markov chain, Russian novel, look for consonants and vowels as two states.
- Markov chain is governed by the transitional probability matrix (entries are non-negative, sum to 1 each row)
- transition matrix is probability from i to j in one step
- question to answer: probability from i to j in n steps
- at time n, Xn has distribution S vector (row vector) – PMF listed out; we want to know P(X_n+1=j) .


- assumption: time-homogeneous (which step does not matter; probability from i to j is q(ij) and doesn’t depend on time)
- Gamber’s ruin: absorbing Markov chain: there are certain states that, once reached, we can never leave
- question: “probability that the gamer goes home broke versus the probability that amber goes home with $7.”
- the amount of time that must pass until we reach the absorbing state. hitting time problem – martingale process
- weather: ergodic Markov chain: we get from any state to any other state forever.
- question: “What proportion of the days are sunny?”
- limiting distribution: stationary distribution. Question: how long does it take to get to limiting distribution. n=? steps of MC
- mixing times: card shuffling question: “how many of these riffle shuffles do we need to do before the deck is well-shuffled?”
- transition matrix:
- Multi-step transition probabilities
Absorbing Markov chains
- Communication classes:
- questions:
- Are we able to reach j from i?
- with what probability, there exists a t such that Xt=j hitting time from state i to reach j?
- first hitting time, 4th hitting time (recursively), t-step transition probability
- two states i and j communicate – reflexing, symmetry, transitivity
- using this to partition the state space
- Irreducible: communicate
- transience and recurrence
- question: if we eventually reach state j with probability 1.
- recurrent: get back; otherwise transient
- positive recurrent and null recurrent
- using transient to decompose the state space (canonical decomposition)
- generating functions This video makes you understand the concept in 7 minutes.
- Inverse Matrices
- bijection, surjective, injective
- determinant = product of its eigenvalues. A matrix is invertible if and only if 0 is not an enginevalue.
- Absorption probabilities
-
- question: the probability of eventually getting trapped in each one.
- fundamental matrix of the Markov chain
- Hitting times: as j might be our absorbing state.
Stationary distributions
s (a probability vector 1xM) is stationary if sQ=s. Eigenvalue and eigenmatrix. Intuitively, if it starts with a distribution s, it will have a s distribution forever.
- Does stationary distribution exist? we can solve the equation sQ=s. nonnegative and add up to 1 each row.
- Is it unique
- does chain converge to s?
- How we compute it?
- irreducible Markov chain:
- question: what is the probability of being in state j after t steps, starting from i? t is large enough.
- aperiodic
- ergodic
- Stationary distribution: Ergodic Theorem for Markov chain
- Perron-Frobenius
- it is always possible to escape from S. ???
- Interpreting the stationary distribution
- invariant measure ??
- Law of large number
- Examples
- Random walk
- Uniform distribution
- projecting one Markov chain to another
- Ehrenfest urn model ??
- Homework
Random Walks
- Recurrence for infinite Markov chains
- in a finite irreducible MC, every state is necessarily recurrent.
- in a finite irreducible MC, the expected value of the return time to state i is necessarily finite.
- The above two need not be true in the case of infinite MC.
- 1-dimention random walk ???
- binomial distribution
- 2-dimentaional random walk
- sterling formula
- Reflection principles
- Catalan number
- Random walks on groups (group theory)
- total variation distance
- coupling & Doeblin’s proof of the ergodic theorem
- infimum: min/max bound
- ex. card shuffling
- convergence Theorem for Ergodic Markov Chain
- Stopping times: harmonic number
- Hitting time
- randomized stopping time
- stationary time
- strong stationary time
- separation distance
- halting state
- counting bound
- diameter bound
- d-regular
- bottleneck ratio
- edge measure
- distinguishing statistics
- Cauchy-Schwarz inequality
- Homework 1. Stopping time
Philosophy: stories are extremely important. Math is used to catch the key parts of a real world problem and to resolve it. You need to
- try to associate each concept with one or two vivid stories or examples.
- in these stories, see what problem people raise up, how to bring it into a math world, so that all the math tools can be used.
- after people resolve the problem, more problems will emerge. Then by changing assumptions, methods, more solutions are created. Still… theory can never be equal to the reality, the struggles happily go on…
Tools:
- Law of Large Numbers
- Bayes Theorem (from conditional probability )
- Stirling formula
- Central limit Theorem
- generating functions
- LoTus
Some distributions
- Bernoulli (indicator concept)
- Binomial
- Uniform
- Normal
References:
- Markov chain
- Introduction to Random Walk (Theorem 5.3 using Law of large numbers etc)
- Simple random Walk (Alm)(Monkey at cliff)
- One dimensional random walk
- Simple Random walk (Caltech) reachable points, first return to zero, etc
- 1-d random walk (distance in n steps)
- Random walk 2-D (with theta angle)
- Random walk-2d (lattice)
- Random walk, Markov Chain, how to analyze them (drunker’s walk)
- Random walk vs Markov Chain
- Random walk (MIT) (good explanat for 1-d with a bug, unfair gambling)
- Random walk on groups