Birkhoff Ergodic Theorem

Let (X,\mu, T) be an ergodic measure-preserving system, f\in L^1(\mu). Then the Birkhoff Ergodic Theorem states that for \mu-a.e. x\in X, the time average \frac{1}{n}S_nf(x) converges to the space average \mu(f):=\int f(x) d\mu(x). In the case that \mu(f)>0, we see that S_nf(x) \approx n\cdot \mu(f) as n\to \infty for \mu-a.e. x\in X.

In link there is an interesting observation:

Theorem. If S_nf(x) \to +\infty for \mu-a.e. x\in X, then \mu(f)>0.

Proof. Let \epsilon >0, A_{\epsilon}=\{x\in X: S_nf(x)\ge \epsilon \text{ for each } n\ge 1\}, and \displaystyle B=\bigcup_{k\ge 0}\bigcup_{\epsilon >0}T^{-k}A_{\epsilon}.

Note that the complement \displaystyle X\backslash B= \bigcap_{k\ge 0}\bigcap_{\epsilon >0}T^{-k}(X\backslash A_{\epsilon}).

So if x\notin B, then for each k\ge 0, for each \epsilon >0,
there exists n_{k,\epsilon}\ge 1 such that S_{n_{k,\epsilon}}f(T^kx) \le \epsilon.

Pick a sequence \epsilon_p =e^{-p}, k_0=0, n_p=n_{k_{p-1},\epsilon_p}, k_p=k_{p-1}+ n_p for each p\ge 1. Then

k_0=0, n_1=n_{0,e^{-1}} \ge 1, k_1=n_1 \ge 1: S_{n_1}f(x) \le e^{-1};

n_2=n_{k_1,e^{-2}} \ge 1, k_2=k_1+ n_2 \ge 2: S_{n_2}f(T^{k_1}x) \le e^{-2};

n_3=n_{k_2,e^{-3}} \ge 1, k_3=k_2+ n_3 \ge 3: S_{n_3}f(T^{k_2}x) \le e^{-3};

n_p=n_{k_{p-1},e^{-p}}\ge 1, k_p=k_{p-1}+ n_p \ge p: S_{n_p}f(T^{k_{n-1}}x) \le e^{-p};

Add them together: S_{k_p}f(x)\le e^{-1}+ \cdots + e^{-p} \le \frac{1}{1-e^{-1}} with k_p \ge p \to \infty.

Applying the assumption of the theorem, we see that \mu(X\backslash B)=0. Then \mu(A_{\epsilon}) >0 for some \epsilon>0.

Let n_k(x) be the k-th return of a typical point x \in A_{\epsilon} to the set A_{\epsilon}. Then S_{n_k}f(x)\ge k\epsilon. It follows that

\displaystyle \mu(f)=\lim_{k \to \infty}\frac{1}{n_k}S_{n_k}f(x) \ge \lim_{k\to\infty}\frac{k}{n_k}\cdot \epsilon = \mu(A_{\epsilon})\cdot \epsilon >0.

This completes the proof.


Some random variables

Consider a stochastic process X_n, n\ge 0, where X_0 =0, and X_{n+1} = \begin{cases}1+ X_n, & p=1/2, \\ - X_{n}, & p=1/2. \end{cases}

Then the conditional expectation E(X_{n+1}| X_n) =(1+X_n)/2 + (-X_n)/2 = 1/2. It follows that E(X_{n+1}) = E(E(X_{n+1}| X_n))=1/2.

Now we consider another stochastic process: let A=\begin{bmatrix} 2 & 0 \\ 0 & 1/2 \end{bmatrix}, and B=\begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}. The process R_n is given by R_0= \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}, and R_{n+1} = \begin{cases}A\cdot R_n, & p=1/2, \\ B\cdot R_{n}, & p=1/2. \end{cases}

We will use the max norm \|(a_{ij})\|=\max |a_{ij}|. Consider two successive appearances of the matrix B, say one time n=a, and the second one at tome n=a+1+b:

R_a =\begin{bmatrix} 2^a & 0 \\ 0 & 2^{-a} \end{bmatrix},
R_{a+1} =\begin{bmatrix} 0 & 2^{-a} \\  2^{a} & 0 \end{bmatrix},
R_{a+1+b} =\begin{bmatrix} 0 & 2^{b-a} \\  2^{a-b} & 0 \end{bmatrix},
R_{a+1+b+1} =\begin{bmatrix} 2^{a-b} & 0 \\ 0  & 2^{b-a} \end{bmatrix}.

So if we break the process into pairs of segments of length (a_n, b_n), n\ge 1, where a_n \ge 0, b_n \ge 0. Then the norms of the process follow the pattern \|R_{t_n}\|=2^{|s_n|}, where t_n=\sum_{k=1}^n(a_k + b_k+2) and s_n=\sum_{k=1}^n(a_k - b_k). One would guess that \frac{s_n}{t_n} \to 0 in probability one, and hence \|R_n\| grows subexponentially in probability one.

Some special actions

Consider the conjugate action \rho of GL(2,R) on M(2,R): \rho_A(M) = A MA^{-1}.

1. This action \rho factors through an action of PGL(2,R).

2. There exists a 3D invariant subspace E=\{M\in M(2,R): tr(M)=0\}.

3. The determinant \det M is an invariant quadratic form on E, and the signature of this form is (-, - ,+).

Let Q=x_1^2 + x_2^2 - x_3^2 be a quadratic form on R^3, whose isometry group is O(2,1)=\{A\in M(3,R): A^TgA=g\}, where g=\mbox{diag}\{1, 1, -1\}.

This induces an injection PGL(2,R) \subset O(2,1), and an identification between PSL(2,R) and the connected component of O(2,1).

The action O(2,1) on R^3 passes on to the projective space P^2. The cone C=Q^{-1}(0) is invariant, and separates P^2 into two domains: one of them is homeomorphic to a disk, which the other is a Mobius band. This induces an action of PSL(2,R) on the disk.

Equilibrim states

Let S=\{1,\dots, l\} be the space of symbols, A=(a_{ij}) be an l\times l matrix with a_{ij}\in\{0,1\}, \Sigma_A be the set of sequences x=(x_n) that is A-admissible. Consider the dynamical system (\Sigma_A, \sigma) We assume this system is mixing.

Let f:\Sigma_A \to \mathbb{R} be a Holder potential, which induces a transfer operator L_f on the space of continuous functions: \phi(x) \mapsto L_f\phi(x):=\sum_{\sigma y =x} e^{f(x)}\phi(x).

Let \lambda be the spectral radius of L_f. Then \lambda is also an eigenvalue of L_f, which is called the principle eigenvalue. Moreover, there exists a positive eigenfunction h such that L_f h =\lambda h. Replacing f by f-\log\lambda, we will assume \lambda =1.

Consider the conjugate action L_f^{\ast} on the space of functional (or sign measures). There is a positive eigenmeasure \nu such that L_f^{\ast} \nu =\nu.

We normalize the pair (h,\nu) such that \int h d\nu =1. Then the measure \mu:= h \nu is a \sigma-invariant probability measure. It is called the equilibrium state of (\Sigma_A, \sigma, f).

Two continuous functions f, g:\Sigma_A \to \mathbb{R} is called cohomologous if there exists a continuous function \phi:\Sigma_A \to \mathbb{R} such that
f(x)-g(x) =\phi(\sigma x) -\phi(x).

Let f, g:\Sigma_A \to \mathbb{R} be cohomologous. Then the two operators L_f and L_g are different, but \lambda(f) =\lambda(g)=1.
Their eigenfunctions and eigenmeasures are different, but the associated equilibrium states are the same.

To find a natural representative in the class [f] of functions that are cohomologous to f, we set g(x)=f(x)+ \log h(x) -\log h(\sigma x). Then we have

1). \displaystyle L_g1(x)=\sum_{\sigma y =x} e^{g(y)}\cdot 1= \sum_{\sigma y =x} e^{f(y)}h(y)/h(x)=\frac{L_fh(x)}{h(x)}=1. So 1 is the eigenfunction of L_g.

2). \displaystyle \int \phi dL_g^{\ast} \mu=\int L_g\phi d\mu =\int L_f(\phi h)d\nu =\int \phi\cdot h dL_f^{\ast}\nu =\int \phi h d\nu =\phi d\mu.
So \mu is the eigenmeasure of L_g.

From this point of view, we might pick g(x)=f(x)+ \log h(x) -\log h(\sigma x) as the representative of [f].

Notes. Some basic terms

1. Let R be a commutative ring, S be a multiplicatively closed subset in the sense that a,b\in S \Rightarrow ab \in S. Then we consider the localization S^{-1}R as the quotient S\times R/\sim, where (r,a)\sim (s,b) if (br-as)t=0 for some t\in S.

Let f\in R. We can construct a m.c.subset S=\{f^n: n\ge 0\}, and denote the corresponding local ring by R_f=S^{-1}R.

Let p\triangleleft R be a prime ideal of R. Then S=R\backslash p is m.c. We denote the corresponding local ring by R_p=S^{-1}R.

Let \text{Spec}R be the set of all prime ideals of R. For each ideal I\triangleleft R, let V_I=\{p\in \text{Spec}R: p\supset I\}. The Zariski topology on \text{Spec}R is defined that the closed subsets are exactly \{V_I: I\triangleleft R\}.

A basis for the Zariski topology on \text{Spec}R can be constructed as follows. For each f\in R, let D_f\subset \text{Spec}R to be the set of prime ideals not containing f. Then each D_f= \text{Spec}R\backslash V_{(f)} is open.

The points corresponding to maximal ideals m \triangleleft R are closed points in the sense that the singleton \{m\}=V_m.

In the case R=C[x_1, \dots, x_n], we see that each maximal ideal m=\langle x_1-a_1,\dots, x_n-a_n \rangle corresponds to a point (a_1,\dots, a_n)\in C^n. So one can interprat this as C^n \subset X= \text{Spec} R. A non-max prime ideal p (a non-closed point) corresponds an affine variety P, which is a closed subset in C^n. Then p is called the generic point of the varity P.

2. Let (M,\omega) be a symplectic manifold, G be a Lie group acting on M via symplectic diffeomorphisms. Let \mathfrak{g} be the Lie algebra of G. Each \xi \in \mathfrak{g} induces a vector field \rho(\xi):x\in M \mapsto \frac{d}{dt}\Big|_{t=0}\Big(\exp(t\xi)\cdot x\Big). Note that \rho(g^{-1}\xi g)=g_\ast \rho(\xi), and \rho([\xi,\eta])=[\rho(\xi),\rho(\eta)].

Consider the 1-form induced by the contraction \iota_{\rho(\xi)}\omega. Clearly this 1-form is closed: d\iota_{\rho(\xi)}\omega=L_{\rho(\xi)}\omega=0 since G preserves the form \omega.

Then the action is called weakly Hamiltonian, if for every \xi\in \mathfrak{g}, the one-form \iota_{\rho(\xi)} \omega is exact: \iota_{\rho(\xi)} \omega=dH_\xi for some smooth function H_{\xi} on M. Although H_\xi is only determined up to a constant C_\xi, the constant \xi \mapsto C_\xi can be chosen such that the map \xi\mapsto H_\xi becomes linear.

The action is called Hamiltonian, if the map \mathfrak{g} \to C^\infty(M), \xi\mapsto H_\xi is a Lie algebra homomorphism with respect to Poisson structure. Then \rho(\xi)=X_{H_\xi} and H_{g^{-1}\xi g}(x)=H_\xi(gx).

A moment map for a Hamiltonian G-action on (M,\omega) is a map \mu: M\to \mathfrak{g}^\ast such that H_\xi(x)=\mu(x)\cdot \xi for all \xi\in \mathfrak{g}. In other words, for each fixed point x\in M, the map \xi \mapsto H_\xi(x) from \mathfrak{g} to \mathbb{R} is a linear functional on \mathfrak{g} and is denoted by \mu(x). Also note that \mu(gx)\cdot \xi=H_\xi(gx)=H_{g^{-1}\xi g}(x). So \mu(gx)=g\mu(x)g^{-1}.

There is no positively expansive homeomorphism

Let f be a homeomorphism on a compact metric space (X,d). Then f is said to be \mathbb{Z}-expansive, if there exists \delta>0 such that for any two points x,y\in X, if d(f^nx,f^ny)<\delta for all n\in\mathbb{Z}, then x=y. The constant \delta is called the expansive constant of f.

Similarly one can define \mathbb{N}-expansiveness if f is not invertible. An interesting phenomenon observed by Schwartzman states that

Theorem. A homeomorphism f cannot be \mathbb{N}-expansive (unless X is finite).

This result was reported in Gottschalk–Hedlund’s book Topological Dynamics (1955), and a proof was given in King’s paper A map with topological minimal self-joinings in the sense of del Junco (1990). Below we copied the proof from King’s paper.

Proof. Suppose on the contrary that there is a homeo f on (X,d) that is \mathbb{N}-expansive. Let \delta>0 be the \mathbb{N}-expansive constant of f, and d_n(x,y)=\max\{d(f^k x, f^k y): 1\le k\le n\}.

It follows from the \mathbb{N}-expansiveness that N:=\sup\{n\ge 1: d_n(x,y)\le\delta \text{ for some } d(x,y)\ge\delta\} is a finite number. Pick \epsilon\in(0,\delta) such that d_N(x,y)<\delta whenever d(x,y)<\epsilon.

Claim. If d(x,y)<\epsilon, then d(f^{-n} x, f^{-n}y)<\delta for any n\ge 1.

Proof of Claim. If not, we can prolong the N-string since f^{k}=f^{k+n}\circ f^{-n}.

Recall that a pair (x,y) is said to be \epsilon-proximal, if d(f^{n_i}x, f^{n_i}y)<\epsilon for some n_i\to\infty. The upshot for the above claim is that any \epsilon-proximal pair is \delta-indistinguishable: d(f^{n}x, f^{n}y)<\delta for all n.

Cover X by open sets of radius < \epsilon, and pick a finite subcover, say \{B_i:1\le i\le I\}. Let E=\{x_j:1\le j\le I+1\} be a subset consisting of I+1 distinct points. Then for each n\ge 0, there are two points in f^n E share the room B_{i(n)}, say f^nx_{a(n)}, and f^nx_{b(n)}. Pick a subsequence n_i such that a(n_i)\equiv a and b(n_i)\equiv b. Clearly x_a\neq x_b, and d(f^{n_i}x_a,f^{n_i}x_b)<\epsilon. Hence the pair (x_a,x_b) is \epsilon-proximal and \delta-indistinguishable. This contradicts the \mathbb{N}-expansiveness assumption on f. QED.

Area under holomorphic maps

Let f be a map from (x,y)\in \mathbb{R}^2 to (a,b)\in \mathbb{R}^2. The area form dA=dx\wedge dy gives the Jacobian dA=da\wedge db= J(x,y)dx\wedge dy, where J(x,y)=a_xb_y- a_yb_x.

Now consider the complex setting, where \displaystyle dA=\frac{i}{2} dz\wedge d\bar z. Let f be a map from z\in \mathbb{C} to w\in \mathbb{C}. Then \displaystyle dA=\frac{i}{2} dw\wedge d\bar w= \frac{i}{2}f'(z)\overline{f'(z)} dz\wedge d\bar z. So this time the Jacobian J(z) becomes f'(z)\overline{f'(z)}.

Suppose \displaystyle f(z)=\sum_{n\ge 0} a_n z^n is a holomorphic map on the unit disk D. Then
\displaystyle J(z)=\sum_{n,m\ge 0}nm a_n \bar a_m z^{n-1} \bar z^{m-1}, the area of f(D) is \displaystyle \int_D J_f(z) dA.

Using polar coordinate, we have dA= rdr\, d\theta, \displaystyle z^{n-1} \bar z^{m-1}=r^{n+m-2}e^{i\theta(n-m)},
and \displaystyle \int_D r^{n+m-2}e^{i\theta(n-m)} rdr\, d\theta= 0 if n\neq m, and =\frac{\pi}{n} if n=m.

So \displaystyle |f(D)|=\sum_{n\ge 0} n^2 |a_n|^2\cdot \frac{\pi}{n}=\pi \sum_{n\ge 0} n |a_n|^2.

Continue reading

An interesting lemma about the Birkhoff sum

A few days ago I attended a lecture given by Amie Wilkinson. She presented a proof of Furstenberg’s theorem on the Lyapunov exponents of random products of matrices in SL(2,\mathbb{R}).

Let \lambda be a probability measure on SL(2,\mathbb{R}), \mu=\lambda^{\mathbb{N}} be the product measure on \Omega=SL(2,\mathbb{R})^{\mathbb{N}}. Let \sigma be the shift map on \Omega, and A:\omega\in\Omega\mapsto \omega_0\in SL(2,\mathbb{R}) be the projection. We consider the induced skew product (f,A) on \Omega\times \mathbb{R}^2. The (largest) Lyapunov exponent of (f,A) is defined to be the value \chi such that \displaystyle \lim_{n\to\infty}\frac{1}{n}\log\|A_n(\omega)\|=\chi for \mu-a.e. \omega\in \Omega.

To apply the ergodic theory, we first assume \int\log\|A\| d\lambda < \infty. Then \chi(\lambda) is well defined. There are cases when \chi(\lambda)=0:

(1) the generated group \langle\text{supp}\lambda\rangle is compact;

(2) there exists a finite set \mathcal{L}=\{L_1,\dots, L_k\} of lines that is invariant for all A\in \langle\text{supp}\lambda\rangle.

Furstenberg proved that the above cover all cases with zero exponent:
\chi(\lambda) > 0 for all other \lambda.

Continue reading

Symplectic and contact manifolds

Let (M,\omega) be a symplectic manifold. It said to be exact if \omega=d\lambda for some one-form \lambda on M.

(1) If \omega=d\lambda is exact, then there is a canonical isomorphism between the v.f. and 1-forms. In particular, there exists a v.f. X such that \lambda=i_X\omega. Then we have \lambda(X)=\omega(X,X)=0, and L_X\lambda=i_X d\lambda+d i_X\lambda=i_X\omega +0=\lambda, and L_X\omega=d i_X\omega=d\lambda=\omega.

(2) Suppose there exists a vector field X on M such that its Lie-derivative L_X\omega=\omega (notice the difference with L_X\omega=0). Then Cartan’s formula says that \omega=i_X d\omega+ di_X\omega=d\lambda, where \lambda=i_X\omega. So \omega=d\lambda is exact, and L_X\lambda=i_Xd\lambda+di_X\lambda=i_X\omega+0=\lambda.

Continue reading


10. Let f_a:S^1\to S^1, a\in[0,1] be a strictly increasing family of homeomorphisms on the unit circle, \rho(a) be the rotation number of f_a. Poincare observed that \rho(a)=p/q if and only if f_a admits some periodic points of period q. In this case f_a^q admits fixed points.

Note that a\mapsto \rho(a) is continuous, and non-decreasing. However, \rho may not be strictly increasing. In fact, if \rho(a_0)=p/q and f^q\neq Id, then \rho is locked at p/q for a closed interval I_{p/q}\ni a_0. More precisely, if f^q(x) > x for some x, then \rho(a)=p/q on [a_0-\epsilon,a_0] for some \epsilon > 0; if f^q(x)  0; while a_0\in \text{Int}(I_{p/q}) if both happen.

Also oberve that if r=\rho(a)\notin \mathbb{Q}, then I_r is a singelton. So assuming f_a is not unipotent for each a\in[0,1], the function a\mapsto \rho(a) is a Devil’s staircase: it is constant on closed intervals I_{p/q}, whose union \bigcup I_{p/q} is dense in I.

9. Let X:M\to TM be a vector field on M, \phi_t:M\to M be the flow induced by X on M. That is, \frac{d}{dt}\phi_t(x)=X(\phi_t(x)). Then we take a curve s\mapsto x_s\in M, and consider the solutions \phi_t(x_s). There are two ways to take derivative:

(1) \displaystyle \frac{d}{dt}\phi_t(x_s)=X(\phi_t(x_s)).

(2) \displaystyle \frac{d}{ds}\phi_t(x_s)=D\phi_t(\frac{d}{ds}x_s)), which induces the tangent flow D\phi_t:TM\to TM of \phi_t:M\to M.

Combine these two derivatives together:

\displaystyle \frac{d}{dt}D_x\phi_t(x_s')=\frac{d}{dt}\frac{d}{ds}\phi_t(x_s) =\frac{d}{ds}\frac{d}{dt}\phi_t(x_s)=\frac{d}{ds}X(\phi_t(x_s)) =D_{\phi_t(x)}X\circ D_x\phi_t(x_s').

This gives rise to an equation \displaystyle \frac{d}{dt}D_x\phi_t=D_{\phi_t(x)}X\circ D_x\phi_t.


Formally, one can consider the differential equation along a solution x(t):
\displaystyle \frac{d}{dt}D(t)=D_{\phi_t(x)}X\circ D(t), D(0)=Id. Then D(t) is called the linear Poincare map along x(t). Suppose x(T)=x(0). Then D(T) determines if the periodic orbit is hyperbolic or elliptic. Note that the path D(t), 0\le t\le T contains more information than the above characterization.

Continue reading