Week 5: The connection between statistical mechanics and ensemble forecast I: Theory
Contents
Week 5: The connection between statistical mechanics and ensemble forecast I: Theory¶
In the following two weeks, we will go through some fundamental theory behind the chaos and predictability. As we know, the forecast uncertainty arises from two components (1) imperfect model physics and (2) imperfect observation in the initial condition. While most of the studies discussed from week 1 to week 4 were focusing on the intrinsic predictability limit (i.e., the upper limit of prediction), some efforts have been dedicated to the quantification/prediction of forecast error via statistical mechanics. In the context of predicting the forecast error, the Liouville equation arises as the general framework to describe the time evolution of ensemble density (i.e., the number of ensemble member in a finite phase space).
Introduction to Liouville equation¶
Liouville equation carries the name of a French mathematician, Joseph Liouville. You might have heard his name in the applied math course, when we introduced the Sturm-Liouville theory. and yes, these two “Liouville” are the same person. Liouville equation was first proposed by Joseph Liouville in 1838, which was about 100-year later after Euler proposed the Euler equation of fluid dynamics (1752,1757). While these two equations are highly similar, Liouville and Euler did carry out the formulas separately. Liouville equation (LE hereafter) can be considered as a more general form in Hamiltonian mechanics, which can be applied to not only fluid mechanics but also “any” dynamical system which is continuous in time and phase space.
Note
One should notice that we have a very strong assumption here, which is the continuity of a dynamical system. Whether such assumption holds in Navier-Stoke equation is still a million-dollar question (see Millennium questions by Clay Institute of Mathematics .
Jacobian Matrix and determinant¶
Before diving right in the LE, we will start with something simple, which will enable us to better understand the physical meaning behind. Two major ingredients in LE are (1) Jacobian matrix and (2) determinant.
Mathematically, Jacobian matrix represents mapping one matrix to the other with a simple linear transformation. For example, when we manage to convert Polar coordinate to Cartesian coordinate, we can use the following formula,
However, if the transformation is too complicated, we might want to find an easier way to approach the problem. One way to do that is through a simple linear transformation. So… how does it work? Let’s start from a nonlinear function \(y=f(x)\), which maps \(x\) to \(y\) or \(f(x)\). If we only focus on a small range (let’s say…centering around \(p\)) and assume a linear function \(T(x)\) (i.e., \(T(x) = Ax+b\)) can well approximate the more complicated function \(f(x)\). This will give us
or
One way to test if the linear assumption holds is by examining whether the \(f(x)\) will approach \(T(x)\) when \(x\rightarrow p\), i.e.,
(33) can be read as “if the difference between \(f(x)\) and \(T(x)\) is well approximated by linear function \(A(x-p)\) and differentiable at point \(p\)”.
Now, we can extend the formula in (33) to higher dimensions. Let’s first assume that \(x\) has matrix form \(\mathbf{x}\) with \(n\) dimensions. In addition, we introduce another matrix \(\mathbf{e}=[e_1,e_2,e_3...,e_j,...e_n]\), which is an n-dimension basis. We further assume that \(\mathbf{e}=0\) except that its \(j\)th element equals 1 (i.e.,\(e_{j}=1\)) and include a small (nearly 0) constant, \(h\). If \(f(x)\) is differentiable at any given dimension, then we can rewrite (33) as
and because \(A(h \mathbf{e_j})\) = \(hA(\mathbf{e_j})\). (34) will lead to
On the left-hand-side of (35) is the partial derivative of \(f\) at point \(\mathbf{p}\). Therefore, taking the derivative at \(j\)th dimension can be written as…
The full \(\mathbf{A}\) matrix can be derived by taking the partial derivative of every component in \(f\), i.e.,
Now, we can go back to our original question: converting the polar coordinate to Cartesian coordinate. We can find the \(\mathbf{A}\) matrix for this problem is
Jacobian matrix is widely used in Geoscience and Geophysical Fluid Dynamics. One example is estimating the contribution of regional warming to the climate sensitivity, i.e., how much the earth should warm when we double the CO2 concentration. The role of Jacobian matrix in climate sensitivity studies is that we can consider the final coordinate as “the overall radiative forcing by greenhouse gases” and the initial coordinate (r.h.s) is the warming in SST over different regions. Thus, the yielded Jacobian matrix can tell us the relative importance of warming over different regions. In climate sensitivity study, it has an alternative name, Green’s function. Some of the most updated studies about climate sensitivity can check this workshop
Note
While pattern effect (or patching method) is a useful tool to identify the key regions responsible for global warming, its linear assumption is usually dubbed because the change in global temperature by CO2 forcing is not necessary small. Thus, there is a growing community turning their focus on the nonlinear effect of equilibrium climate sensitivity as well as it’s local maximum (i.e., when global temperature is warm enough, the positive feedback from CO2 will get weaker (bell-like curve))
The other key ingredient we need to learn is “determinant”. Determinant evaluates how much the area spanned by the eigen basis has changed after the linear transformation. For example, in week 1, we have used the following example,
where the original basis, \([1,0]\) and \([0,1]\) (x-axis and y-axis respectively), is moved to \([1,0]\) and \([0,2]\) after transformation. We can easily find that the area originally spanned by \([1,0]\) and \([0,1]\) has grown twice bigger (from 1 \(\rightarrow\) 2). The change in area size is equivalent to the determinant of the linear operator. i.e., \(\mathrm{det}{(\begin{bmatrix} 1 & 0 \\ 0 & 2 \end{bmatrix})}\)
Now, both ingredients are ready, the only question left is what is the connection among Jacobian matrix, determinant, and LE? We can take a look of the example below. If today, we would like to convert a vector \(\begin{bmatrix} u \\ v \end{bmatrix}\) to a vector \(\begin{bmatrix} x \\ y \end{bmatrix}\) the corresponding linear transformation can be written as
if we are using \(du\) and \(dv\) to represent the unit vectors in the original coordinate, then the unit vector in the final coordinates will be…
for the area spanned by \(dx\) and \(dy\) in the final coordinate, it can be written as
from (42), we can easily observe that the area spanned by unit vector has expanded (shrunk) by a factor of \(||\mathrm{det}J(u,v)||\).
Connections between Liouville equation and ensemble forecasts¶
OK, now we know the mathematical meaning (geometric meaning) of a Jacobian matrix and determinant. To further understand the their roles in LE, let’s first take a what Joseph Liouville found in 1838. For a dynamical system with third order differential terms
he subsequently assumed that the solution has a form of
where a, b and c are the initial state of this system. He further defined a Jacobian matrix, which maps the initial state to the final state, i.e.,
the determinant of this Jacobian matrix can be written as
He then proved that
Note
(47) can be proved by using the chain rule i.e., \(\frac{\partial x^{'''}}{\partial a}=\frac{\partial P}{\partial a} = \frac{\partial P}{\partial x}\frac{\partial x}{\partial a} + \frac{\partial P}{\partial x^{''}}\frac{\partial x^{''}}{\partial a}+\frac{\partial P}{\partial x^{'''}}\frac{\partial x^{'''}}{\partial a}\) (HW)
Let’s take a look what (47) tells us. First, (44) describes how the dynamical system gonna evolve with time. If \(x\) represents the forecast uncertainty or forecast error (i.e., the distance between any given ensemble member to the ensemble mean), then (44) is a “prognostic equation of model uncertainty”. Second, the determinant tells us the how the volume spanned by eigen basis changes after the linear transformation. Physically, the volume spanned by different ensemble members is so-called ensemble spread. Thus, (47) is a “prognostic equation for ensemble spread” (i.e., forecast the entire PDF). In addition, it also tells us the time tendency of ensemble spread is only related to the highest-order differentiation i.e., \(\frac{\partial }{\partial x^{'''}}\).
(47) also has some very interesting applications in the information theory. If we rewrite (47) by dividing both side with \(\alpha\), we can have
we further integrate both sides over the entire phase space. One can find that the integration of right-hand-side of (49) is always 0 (why?). Thus, (48) is also equivalent to
this also gives us \(\frac{\partial }{\partial t} \int\mathrm{ln}\alpha dx^{''}\) = 0 (note that we swap the order of integration and \(\frac{\partial }{\partial t}\) since both of them are linear operators). (49) is so-called the “conservation of information”. In information theory, the definition of “information” is very similar to the left-hand-side of (49), where the information provided by \(i\)th element is \(\mathrm{ln}\alpha(x_{i})\).
An alternative way to write (49) is using the density form, where the density is defined as \(\rho=\frac{1}{\alpha}\) (bigger ensemble spread, less likely to have observe members at a fixed volume). Thus, a more widely used definition of information is \(-\mathrm{ln}\rho(x_{i})\).
Now, we can imagine how powerful (47) is since it forecasts the entire probability density function. Different from the traditional probabilistic forecast method, which estimates the ensemble spread by generating tens of hundreds of ensemble simulations, LE is a more powerful tool. However, there is no free meal. Implementing LE in real operational forecast is extremely challenging since we need to find the analytical solution of a dynamical system first (i.e., (44)) and then calculate its derivative over all possible dimensions (i.e., the right-hand-side of (47)). This is only possible when the dynamical system is simple enough.
Next week, we will walk through a few cases where we can analytical implement LE in the forecast of the PDF.
Note
The Nobel prize winner, Roger Penrose (Penrose (1989)), who was also Stephan Hawkins’ Ph.D., committee, referred to LE as a very beautiful formula since it describes that the volume of any region of phase spaces must remain the same. Penrose also proposed that a black hole can destroy information (i.e., \(\alpha\) in (47) is not conserved) and thus the statistical mechanics might break down at the space-time singularity.
In addition to his achievement in theoretical physics, Penrose is also an artist who is famous for his 3D Penrose tiling. An award-winning app (mobile game), monument valley is inspired by his arts. NYC has a National Museum of Mathematics where you can find some of Penrose’s work.