Sinai Theorem on Local Ergodicity

Anosov’s Ergodic theorem for uniformly hyperbolic diffeomorphisms f: M \to M: the stable and unstable foliations (a) cover the whole space, (b) are absolutely continuous, (c) are of uniform size, and (d) are uniformly transverse to each other.

Sinai gave a systematic method to prove the ergodicity for hyperbolic systems F: M \to M with singularities. Under some mild conditions, Katok and Strelcyn proved that those two foliations cover a full measure set of the space and are still absolutely continuous. However, their leaves can be arbitrarily short, and the angle between them can be arbitrarily small. Assume the singularity sets of the iterates F^{n} are regular. The Sinai Theorem states that local ergodicity holds if the stable and unstable cones are relatively small while the separation of the two cones is relatively not small. Then the short stable leaves and the short unstable leaves can be used to obtain local ergodic theorem. Assume that there is a continuous invariant cone field on M. Then a sufficient condition for both small cones and non-small separation of cones is that \sigma_{\mathcal{C}}(D_xF^n) > 3, by Liverani and Wojtkowski.

Some comparison series

The first one appeared in a paper of R. Mane.

Let a_n \in (0, 1), n\ge 1 be a sequence of positive numbers such that \sum n\cdot a_n is convergent. Then \sum a_n \cdot |\log a_n| is also convergent.

Proof. The difference between the two series is that we replace n by |\log a_n|. Naturally, we split the indices into two cases:

1). the mild ones: n\in G if |\log a_n| \le n. That is, a_n \ge e^{-n}.
Then it is clear that \displaystyle \sum_{G} a_n \cdot |\log a_n| \le \sum_{G} n\cdot a_n, which is finite.

2). the not-so-mild case: n\notin G. It follows that a_n \le e^{-n}. Note that
x^{1/2}\cdot \log x \to 0 when x \to 0. In fact, x^{1/2}\cdot |\log x| \le 1.
It follows that \displaystyle \sum_{n\notin G} a_n \cdot |\log a_n| \le \sum_{n\notin G} a_n^{1/2} \le \sum_{n\notin G} e^{-n/2}, which is also finite. QED

The second one appeared in a paper of C. Liverani and M. Wojtkowski.

Let a_n \in (0, 1), n\ge 1 be a sequence of positive numbers and S_n= a_1 + \cdots + a_n. Then \sum a_n is divergent if and only if \displaystyle \sum \frac{a_n}{S_n} is divergent.

One can also state the convergence version.

Proof. We only need to show one direction.
Let k, l\ge 1 be two indices. Note that S_{k+l} \ge S_{k+j} and \displaystyle \frac{a_{k+j}}{S_{k+j}} \ge \frac{a_{k+j}}{S_{k+l}} for all 1\le j \le l.
Therefore, \displaystyle \sum_{1\le j \le l}\frac{a_{k+j}}{S_{k+j}} \ge \sum_{1\le j \le l}\frac{a_{k+j}}{S_{k+l}}= \frac{1}{S_{k+j}}(S_{k+l}- S_k) \to 1 as l \to \infty.
Therefore, \displaystyle \sum \frac{a_n}{S_n} is divergent. QED.

Birkhoff Ergodic Theorem

Let (X,\mu, T) be an ergodic measure-preserving system, f\in L^1(\mu). Then the Birkhoff Ergodic Theorem states that for \mu-a.e. x\in X, the time average \frac{1}{n}S_nf(x) converges to the space average \mu(f):=\int f(x) d\mu(x). In the case that \mu(f)>0, we see that S_nf(x) \approx n\cdot \mu(f) as n\to \infty for \mu-a.e. x\in X.

In link there is an interesting observation:

Theorem. If S_nf(x) \to +\infty for \mu-a.e. x\in X, then \mu(f)>0.

Proof. Let \epsilon >0, A_{\epsilon}=\{x\in X: S_nf(x)\ge \epsilon \text{ for each } n\ge 1\}, and \displaystyle B=\bigcup_{k\ge 0}\bigcup_{\epsilon >0}T^{-k}A_{\epsilon}.

Note that the complement \displaystyle X\backslash B= \bigcap_{k\ge 0}\bigcap_{\epsilon >0}T^{-k}(X\backslash A_{\epsilon}).

So if x\notin B, then for each k\ge 0, for each \epsilon >0,
there exists n_{k,\epsilon}\ge 1 such that S_{n_{k,\epsilon}}f(T^kx) \le \epsilon.

Pick a sequence \epsilon_p =e^{-p}, k_0=0, n_p=n_{k_{p-1},\epsilon_p}, k_p=k_{p-1}+ n_p for each p\ge 1. Then

k_0=0, n_1=n_{0,e^{-1}} \ge 1, k_1=n_1 \ge 1: S_{n_1}f(x) \le e^{-1};

n_2=n_{k_1,e^{-2}} \ge 1, k_2=k_1+ n_2 \ge 2: S_{n_2}f(T^{k_1}x) \le e^{-2};

n_3=n_{k_2,e^{-3}} \ge 1, k_3=k_2+ n_3 \ge 3: S_{n_3}f(T^{k_2}x) \le e^{-3};

n_p=n_{k_{p-1},e^{-p}}\ge 1, k_p=k_{p-1}+ n_p \ge p: S_{n_p}f(T^{k_{n-1}}x) \le e^{-p};

Add them together: S_{k_p}f(x)\le e^{-1}+ \cdots + e^{-p} \le \frac{1}{1-e^{-1}} with k_p \ge p \to \infty.

Applying the assumption of the theorem, we see that \mu(X\backslash B)=0. Then \mu(A_{\epsilon}) >0 for some \epsilon>0.

Let n_k(x) be the k-th return of a typical point x \in A_{\epsilon} to the set A_{\epsilon}. Then S_{n_k}f(x)\ge k\epsilon. It follows that

\displaystyle \mu(f)=\lim_{k \to \infty}\frac{1}{n_k}S_{n_k}f(x) \ge \lim_{k\to\infty}\frac{k}{n_k}\cdot \epsilon = \mu(A_{\epsilon})\cdot \epsilon >0.

This completes the proof.

Some random variables

Consider a stochastic process X_n, n\ge 0, where X_0 =0, and X_{n+1} = \begin{cases}1+ X_n, & p=1/2, \\ - X_{n}, & p=1/2. \end{cases}

Then the conditional expectation E(X_{n+1}| X_n) =(1+X_n)/2 + (-X_n)/2 = 1/2. It follows that E(X_{n+1}) = E(E(X_{n+1}| X_n))=1/2.

Now we consider another stochastic process: let A=\begin{bmatrix} 2 & 0 \\ 0 & 1/2 \end{bmatrix}, and B=\begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}. The process R_n is given by R_0= \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}, and R_{n+1} = \begin{cases}A\cdot R_n, & p=1/2, \\ B\cdot R_{n}, & p=1/2. \end{cases}

We will use the max norm \|(a_{ij})\|=\max |a_{ij}|. Consider two successive appearances of the matrix B, say one time n=a, and the second one at tome n=a+1+b:

R_a =\begin{bmatrix} 2^a & 0 \\ 0 & 2^{-a} \end{bmatrix},
R_{a+1} =\begin{bmatrix} 0 & 2^{-a} \\  2^{a} & 0 \end{bmatrix},
R_{a+1+b} =\begin{bmatrix} 0 & 2^{b-a} \\  2^{a-b} & 0 \end{bmatrix},
R_{a+1+b+1} =\begin{bmatrix} 2^{a-b} & 0 \\ 0  & 2^{b-a} \end{bmatrix}.

So if we break the process into pairs of segments of length (a_n, b_n), n\ge 1, where a_n \ge 0, b_n \ge 0. Then the norms of the process follow the pattern \|R_{t_n}\|=2^{|s_n|}, where t_n=\sum_{k=1}^n(a_k + b_k+2) and s_n=\sum_{k=1}^n(a_k - b_k). One would guess that \frac{s_n}{t_n} \to 0 in probability one, and hence \|R_n\| grows subexponentially in probability one.

Some special actions

Consider the conjugate action \rho of GL(2,R) on M(2,R): \rho_A(M) = A MA^{-1}.

1. This action \rho factors through an action of PGL(2,R).

2. There exists a 3D invariant subspace E=\{M\in M(2,R): tr(M)=0\}.

3. The determinant \det M is an invariant quadratic form on E, and the signature of this form is (-, - ,+).

Let Q=x_1^2 + x_2^2 - x_3^2 be a quadratic form on R^3, whose isometry group is O(2,1)=\{A\in M(3,R): A^TgA=g\}, where g=\mbox{diag}\{1, 1, -1\}.

This induces an injection PGL(2,R) \subset O(2,1), and an identification between PSL(2,R) and the connected component of O(2,1).

The action O(2,1) on R^3 passes on to the projective space P^2. The cone C=Q^{-1}(0) is invariant, and separates P^2 into two domains: one of them is homeomorphic to a disk, which the other is a Mobius band. This induces an action of PSL(2,R) on the disk.

Equilibrim states

Let S=\{1,\dots, l\} be the space of symbols, A=(a_{ij}) be an l\times l matrix with a_{ij}\in\{0,1\}, \Sigma_A be the set of sequences x=(x_n) that is A-admissible. Consider the dynamical system (\Sigma_A, \sigma) We assume this system is mixing.

Let f:\Sigma_A \to \mathbb{R} be a Holder potential, which induces a transfer operator L_f on the space of continuous functions: \phi(x) \mapsto L_f\phi(x):=\sum_{\sigma y =x} e^{f(x)}\phi(x).

Let \lambda be the spectral radius of L_f. Then \lambda is also an eigenvalue of L_f, which is called the principle eigenvalue. Moreover, there exists a positive eigenfunction h such that L_f h =\lambda h. Replacing f by f-\log\lambda, we will assume \lambda =1.

Consider the conjugate action L_f^{\ast} on the space of functional (or sign measures). There is a positive eigenmeasure \nu such that L_f^{\ast} \nu =\nu.

We normalize the pair (h,\nu) such that \int h d\nu =1. Then the measure \mu:= h \nu is a \sigma-invariant probability measure. It is called the equilibrium state of (\Sigma_A, \sigma, f).

Two continuous functions f, g:\Sigma_A \to \mathbb{R} is called cohomologous if there exists a continuous function \phi:\Sigma_A \to \mathbb{R} such that
f(x)-g(x) =\phi(\sigma x) -\phi(x).

Let f, g:\Sigma_A \to \mathbb{R} be cohomologous. Then the two operators L_f and L_g are different, but \lambda(f) =\lambda(g)=1.
Their eigenfunctions and eigenmeasures are different, but the associated equilibrium states are the same.

To find a natural representative in the class [f] of functions that are cohomologous to f, we set g(x)=f(x)+ \log h(x) -\log h(\sigma x). Then we have

1). \displaystyle L_g1(x)=\sum_{\sigma y =x} e^{g(y)}\cdot 1= \sum_{\sigma y =x} e^{f(y)}h(y)/h(x)=\frac{L_fh(x)}{h(x)}=1. So 1 is the eigenfunction of L_g.

2). \displaystyle \int \phi dL_g^{\ast} \mu=\int L_g\phi d\mu =\int L_f(\phi h)d\nu =\int \phi\cdot h dL_f^{\ast}\nu =\int \phi h d\nu =\phi d\mu.
So \mu is the eigenmeasure of L_g.

From this point of view, we might pick g(x)=f(x)+ \log h(x) -\log h(\sigma x) as the representative of [f].

Notes. Some basic terms

1. Let R be a commutative ring, S be a multiplicatively closed subset in the sense that a,b\in S \Rightarrow ab \in S. Then we consider the localization S^{-1}R as the quotient S\times R/\sim, where (r,a)\sim (s,b) if (br-as)t=0 for some t\in S.

Let f\in R. We can construct a m.c.subset S=\{f^n: n\ge 0\}, and denote the corresponding local ring by R_f=S^{-1}R.

Let p\triangleleft R be a prime ideal of R. Then S=R\backslash p is m.c. We denote the corresponding local ring by R_p=S^{-1}R.

Let \text{Spec}R be the set of all prime ideals of R. For each ideal I\triangleleft R, let V_I=\{p\in \text{Spec}R: p\supset I\}. The Zariski topology on \text{Spec}R is defined that the closed subsets are exactly \{V_I: I\triangleleft R\}.

A basis for the Zariski topology on \text{Spec}R can be constructed as follows. For each f\in R, let D_f\subset \text{Spec}R to be the set of prime ideals not containing f. Then each D_f= \text{Spec}R\backslash V_{(f)} is open.

The points corresponding to maximal ideals m \triangleleft R are closed points in the sense that the singleton \{m\}=V_m.

In the case R=C[x_1, \dots, x_n], we see that each maximal ideal m=\langle x_1-a_1,\dots, x_n-a_n \rangle corresponds to a point (a_1,\dots, a_n)\in C^n. So one can interprat this as C^n \subset X= \text{Spec} R. A non-max prime ideal p (a non-closed point) corresponds an affine variety P, which is a closed subset in C^n. Then p is called the generic point of the varity P.

2. Let (M,\omega) be a symplectic manifold, G be a Lie group acting on M via symplectic diffeomorphisms. Let \mathfrak{g} be the Lie algebra of G. Each \xi \in \mathfrak{g} induces a vector field \rho(\xi):x\in M \mapsto \frac{d}{dt}\Big|_{t=0}\Big(\exp(t\xi)\cdot x\Big). Note that \rho(g^{-1}\xi g)=g_\ast \rho(\xi), and \rho([\xi,\eta])=[\rho(\xi),\rho(\eta)].

Consider the 1-form induced by the contraction \iota_{\rho(\xi)}\omega. Clearly this 1-form is closed: d\iota_{\rho(\xi)}\omega=L_{\rho(\xi)}\omega=0 since G preserves the form \omega.

Then the action is called weakly Hamiltonian, if for every \xi\in \mathfrak{g}, the one-form \iota_{\rho(\xi)} \omega is exact: \iota_{\rho(\xi)} \omega=dH_\xi for some smooth function H_{\xi} on M. Although H_\xi is only determined up to a constant C_\xi, the constant \xi \mapsto C_\xi can be chosen such that the map \xi\mapsto H_\xi becomes linear.

The action is called Hamiltonian, if the map \mathfrak{g} \to C^\infty(M), \xi\mapsto H_\xi is a Lie algebra homomorphism with respect to Poisson structure. Then \rho(\xi)=X_{H_\xi} and H_{g^{-1}\xi g}(x)=H_\xi(gx).

A moment map for a Hamiltonian G-action on (M,\omega) is a map \mu: M\to \mathfrak{g}^\ast such that H_\xi(x)=\mu(x)\cdot \xi for all \xi\in \mathfrak{g}. In other words, for each fixed point x\in M, the map \xi \mapsto H_\xi(x) from \mathfrak{g} to \mathbb{R} is a linear functional on \mathfrak{g} and is denoted by \mu(x). Also note that \mu(gx)\cdot \xi=H_\xi(gx)=H_{g^{-1}\xi g}(x). So \mu(gx)=g\mu(x)g^{-1}.

There is no positively expansive homeomorphism

Let f be a homeomorphism on a compact metric space (X,d). Then f is said to be \mathbb{Z}-expansive, if there exists \delta>0 such that for any two points x,y\in X, if d(f^nx,f^ny)<\delta for all n\in\mathbb{Z}, then x=y. The constant \delta is called the expansive constant of f.

Similarly one can define \mathbb{N}-expansiveness if f is not invertible. An interesting phenomenon observed by Schwartzman states that

Theorem. A homeomorphism f cannot be \mathbb{N}-expansive (unless X is finite).

This result was reported in Gottschalk–Hedlund’s book Topological Dynamics (1955), and a proof was given in King’s paper A map with topological minimal self-joinings in the sense of del Junco (1990). Below we copied the proof from King’s paper.

Proof. Suppose on the contrary that there is a homeo f on (X,d) that is \mathbb{N}-expansive. Let \delta>0 be the \mathbb{N}-expansive constant of f, and d_n(x,y)=\max\{d(f^k x, f^k y): 1\le k\le n\}.

It follows from the \mathbb{N}-expansiveness that N:=\sup\{n\ge 1: d_n(x,y)\le\delta \text{ for some } d(x,y)\ge\delta\} is a finite number. Pick \epsilon\in(0,\delta) such that d_N(x,y)<\delta whenever d(x,y)<\epsilon.

Claim. If d(x,y)<\epsilon, then d(f^{-n} x, f^{-n}y)<\delta for any n\ge 1.

Proof of Claim. If not, we can prolong the N-string since f^{k}=f^{k+n}\circ f^{-n}.

Recall that a pair (x,y) is said to be \epsilon-proximal, if d(f^{n_i}x, f^{n_i}y)<\epsilon for some n_i\to\infty. The upshot for the above claim is that any \epsilon-proximal pair is \delta-indistinguishable: d(f^{n}x, f^{n}y)<\delta for all n.

Cover X by open sets of radius < \epsilon, and pick a finite subcover, say \{B_i:1\le i\le I\}. Let E=\{x_j:1\le j\le I+1\} be a subset consisting of I+1 distinct points. Then for each n\ge 0, there are two points in f^n E share the room B_{i(n)}, say f^nx_{a(n)}, and f^nx_{b(n)}. Pick a subsequence n_i such that a(n_i)\equiv a and b(n_i)\equiv b. Clearly x_a\neq x_b, and d(f^{n_i}x_a,f^{n_i}x_b)<\epsilon. Hence the pair (x_a,x_b) is \epsilon-proximal and \delta-indistinguishable. This contradicts the \mathbb{N}-expansiveness assumption on f. QED.

Area under holomorphic maps

Let f be a map from (x,y)\in \mathbb{R}^2 to (a,b)\in \mathbb{R}^2. The area form dA=dx\wedge dy gives the Jacobian dA=da\wedge db= J(x,y)dx\wedge dy, where J(x,y)=a_xb_y- a_yb_x.

Now consider the complex setting, where \displaystyle dA=\frac{i}{2} dz\wedge d\bar z. Let f be a map from z\in \mathbb{C} to w\in \mathbb{C}. Then \displaystyle dA=\frac{i}{2} dw\wedge d\bar w= \frac{i}{2}f'(z)\overline{f'(z)} dz\wedge d\bar z. So this time the Jacobian J(z) becomes f'(z)\overline{f'(z)}.

Suppose \displaystyle f(z)=\sum_{n\ge 0} a_n z^n is a holomorphic map on the unit disk D. Then
\displaystyle J(z)=\sum_{n,m\ge 0}nm a_n \bar a_m z^{n-1} \bar z^{m-1}, the area of f(D) is \displaystyle \int_D J_f(z) dA.

Using polar coordinate, we have dA= rdr\, d\theta, \displaystyle z^{n-1} \bar z^{m-1}=r^{n+m-2}e^{i\theta(n-m)},
and \displaystyle \int_D r^{n+m-2}e^{i\theta(n-m)} rdr\, d\theta= 0 if n\neq m, and =\frac{\pi}{n} if n=m.

So \displaystyle |f(D)|=\sum_{n\ge 0} n^2 |a_n|^2\cdot \frac{\pi}{n}=\pi \sum_{n\ge 0} n |a_n|^2.

Continue reading

An interesting lemma about the Birkhoff sum

A few days ago I attended a lecture given by Amie Wilkinson. She presented a proof of Furstenberg’s theorem on the Lyapunov exponents of random products of matrices in SL(2,\mathbb{R}).

Let \lambda be a probability measure on SL(2,\mathbb{R}), \mu=\lambda^{\mathbb{N}} be the product measure on \Omega=SL(2,\mathbb{R})^{\mathbb{N}}. Let \sigma be the shift map on \Omega, and A:\omega\in\Omega\mapsto \omega_0\in SL(2,\mathbb{R}) be the projection. We consider the induced skew product (f,A) on \Omega\times \mathbb{R}^2. The (largest) Lyapunov exponent of (f,A) is defined to be the value \chi such that \displaystyle \lim_{n\to\infty}\frac{1}{n}\log\|A_n(\omega)\|=\chi for \mu-a.e. \omega\in \Omega.

To apply the ergodic theory, we first assume \int\log\|A\| d\lambda < \infty. Then \chi(\lambda) is well defined. There are cases when \chi(\lambda)=0:

(1) the generated group \langle\text{supp}\lambda\rangle is compact;

(2) there exists a finite set \mathcal{L}=\{L_1,\dots, L_k\} of lines that is invariant for all A\in \langle\text{supp}\lambda\rangle.

Furstenberg proved that the above cover all cases with zero exponent:
\chi(\lambda) > 0 for all other \lambda.

Continue reading