Lithuanian Journal of Physics, Vol. **62**, No. 2, pp. 73–80 (2022)

© Lietuvos mokslų akademija, 2022

Received 20 January 2022; revised 20 March 2022; accepted 1 April 2022

In this article, it is suggested that a pedagogical point of departure in the teaching of classical mechanics is the Liouville’s theorem. The theorem is interpreted to define the condition that describes the conservation of information in classical mechanics. The Hamilton’s equations and the Hamilton’s principle of the least action are derived from the Liouville’s theorem.

**Keywords:** information, determinism, Liouville’s theorem, Hamilton’s equations, Hamilton’s principle

In this article, the theory of classical mechanics is approached from a different perspective. Its purpose is entirely pedagogical. I have taught classical mechanics in the traditional way, starting with Newton’s laws of motion, and following up with Hamilton’s principle, the Euler–Lagrange equations of motion, the Hamilton’s equations and the Liouville’s theorem. The students have, in general, had problems seeing the relations between the standard mathematical representations of classical mechanics. In each class, there are typically a few students that ask whether there exists a foundational principle of classical mechanics that is independent of the specific mathematical representation being chosen. I have never been able to answer this question in a satisfactory manner. This article grew out of the desire to address this question.

Clearly, there is no reason to believe that there exists a unique principle. However, in this article a specific point of departure is identified and shown to lead to the traditional formulations. The suggested principle is that of the conservation of information. It is argued that the Liouville’s theorem is the mathematical representation of this principle. The Hamilton’s equations, the Hamilton’s principle of the least action and the invariance of the Poisson algebra are then understood as different manifestations of the Liouville’s theorem.

There is nothing new appearing in this article. Everything is known from before. What then, one might ask, is the purpose and use of the article? The answer is threefold. First, and foremost, it suggests an alternative way to teach the subject. As a teacher for many years, it is obvious that it is beneficial to have a diverse repertoire when it comes to presenting and explaining a topic. Personally, I take great pleasure in being able to explain the subject to my students in different ways. Secondly, it provides a different point of view on an old and well-known subject. Even though it might not be of any use in the immediate, or near, future, it is generally good to be aware of a multitude of equivalent perspectives on any given problem. Thirdly, to the best of my knowledge, the Hamilton’s principle has never been derived from the Liouville’s theorem.

In classical mechanics, it is a fundamental assumption that the evolution of a system is deterministic in both directions of time, i.e. both into the future and into the past. The deterministic evolution of a system means that it is possible, with absolute certainty, to say that any given state of the system evolved from a definite single state in the past and will evolve into a definite single state in the future. There cannot be any ambiguity in the evolutionary history of a system. Thus, the deterministic evolution implies that nowhere on the phase space can states converge or diverge (see Fig. 1).

Systems that appear to evolve non-deterministically give rise to the appearance of irreversible processes in nature. The reason for this is that if a system starts out in a given state, it is not necessarily the case that the system ends up at the same initial state by reversing the motion of the system in time. An example of a seemingly irreversible process is the sliding of a block of cheese along a table. Due to friction the block will always come to rest, apparently independently of the initial condition of the block. Thus, it appears as though the multitude of possible initial states for the block, given by the possibility of sending off the block with different initial speeds, all converge to the same final state where the block is at rest. Knowing the final state of the system does not help in predicting the initial state of the system. Therefore, the experiment with sending off the block of cheese seems to represent an evolution which is non-deterministic into the past.

The origin for the apparent violation of reversibility in physical processes is not due to a fundamental character in physical laws, but rather it is due to the ignorance of the observer. The observer has not taken into account all the details of the system. Degrees of freedom for the system have been ignored. In the case of the sliding block of cheese, it is the individual motion of atoms in the block and the table which has been ignored. Assuming that all degrees of freedom for the block and the table are followed in detail as the block slides on the table it is clear that each unique initial state will give rise to a unique final state where the distinction between the final states is given by the distinct final position and the velocity of each atom in the block and table.

A direct consequence of the assumption of deterministic evolution is that distinctions between physical states never disappear. If there is an initial distinction between the states, this distinction will survive throughout the entire motion of the system. These distinctions between the states seem to disappear as time unfold is merely a consequence of the difficulty for an observer to keep a perfect track of the motion of all particles. In the case of the sliding block, for a human observer, the distinction between individual motions of atoms in the block and the table is too small to measure and therefore it appears as though two distinct initial states, characterized by distinct initial speeds, which are easy to measure, converge to the same final state, i.e. that the block is at rest. In conclusion, the assumption of deterministic evolution can equivalently be stated as follows:

Due to the conservation of the distinction between physical states, any set of states which lie in the interior of some volume element on phase space will remain the interior of this volume element as the system evolves in time.

If a system is followed, as it evolves in time, in detail by an observer, it means that the observer has perfect and complete knowledge about all the degrees of freedom of the system, i.e. the observer knows, with an infinite precision, the exact position and momenta of all particles within the system. In such an ideal scenario, the observer has no problem to see the distinction between the states of the system. The amount of knowledge, or information, about the system possessed by the observer, at any instant of time, is complete. Since the ideal observer never loses the track of the system, the distinction between states is never lost. In other words, the knowledge, or information, that the observer has about the system is not lost as the system evolves in time.

If, however, as is the case in practical reality, the observer has a limited ability to track the motion of individual particles, the observer does not possess complete information about the system. Even worse, the observer may, as is usually the case for complicated systems with many degrees of freedom, find it more and more difficult to track the system as time unfolds. In such a scenario, the amount of information about the system, possessed by the observer, decreases with time. In other words, from the perspective of the ignorant observer, information about the system is lost. However, it is important to emphasize that this apparent loss of information is entirely due to the ignorance of the observer. If all the degrees of freedom were tracked with an infinite precision, information would never be lost. In the case of the sliding block of cheese, the observer has lost information because the system was known to exist in one of two distinct initial states, obtained by measuring the initial speed of the block, whereas it is not possible to distinguish between the two final states.

In conclusion, the loss of the distinction between states implies that information has been lost. Thus, the conservation of distinction between the states can equivalently be stated as an assumption of information conservation:

In other words, the assumption that classical systems evolve deterministically, i.e. that the state of the system is perfectly predictable by an observer both into the future and back to the past, is equivalent to the statement that an observer of the system possesses complete information about the system, and assufming that the system is closed, this amount of information is never lost.

Consider the arbitrary region Ω on the 2-dimensional
phase space, with the volume *V*_{Ω} and the volume element ∆*q*∆*p*. The mathematical condition imposing information conservation is

$$\frac{\text{\Delta}N}{\text{\Delta}t}=0,\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(1\right)$$

where *N* is the number of states within the phase
space volume Ω. The condition states that *N* can neither increase nor decrease within the time interval ∆*t*. For this condition to be satisfied, it is necessary that the incoming and outgoing flow of states
through Ω within ∆*t *cancels, i.e. that

$$\text{\Delta}\left(\rho \left(q,p\right)\dot{q}\right)+\text{\Delta}\left(\rho \left(q,p\right)\dot{p}\right)=0,\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(2\right)$$

where *ρ*(*q*, *p*) is the density of states on the phase space, and the flow differences are defined by, respectively,

$$\text{\Delta}\left(\rho \left(q,p\right)\dot{q}\right)\equiv \left\{\rho \left({q}_{\text{out}},p\right){\dot{q}}_{\text{out}}-\rho \left({q}_{\text{in}},p\right){\dot{q}}_{\text{in}}\right\}\text{\Delta}p\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(3\right)$$

and

$$\text{\Delta}\left(\rho \left(q,p\right)\dot{p}\right)\equiv \left\{\rho \left(q,{p}_{\text{out}}\right){\dot{p}}_{\text{out}}-\rho \left(q,{p}_{\text{in}}\right){\dot{p}}_{\text{in}}\right\}\text{\Delta}q.\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(4\right)$$

In a differential form the condition, aﬅer having been extended to be valid for an arbitrary length of time, reads in the vector notation as

$$\frac{\partial \rho}{\partial t}+\nabla \cdot \left(\rho v\right)=0,\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(5\right)$$

where

$$\nabla \equiv \left(\frac{\partial}{\partial q},\frac{\partial}{\partial p}\right)\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(6\right)$$

is the differential operator on the phase space, and

$$v\equiv \left(\dot{q},\dot{p}\right)\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(7\right)$$

is the velocity by which states flow on the phase space. Equation (5) is the Liouville’s continuity equation [1] for the density of states on the phase space. It says that the number of states is locally conserved. The term ∇·(*ρv*) represents the net flow of states through Ω, i.e. the difference between the outflow and inflow of states. The continuity equation can be rewritten as

$$\frac{\text{d}\rho}{\text{d}t}+\rho \nabla \cdot v=0,\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(8\right)$$

by using the total time derivative of the density of states and the product rule applied to the net flow of states. Thus, if the divergence of the phase flow velocity vanishes, i.e. if

$$\nabla \cdot v=0,\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(9\right)$$

then, by the continuity equation, the density of states on the phase space is constant in time along the flow on the phase space, i.e.

$$\frac{\text{d}\rho}{\text{d}t}=0.\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(10\right)$$

In such a situation, the flow of the system on the phase space is incompressible because the condition that the density of states at any given location (*q*, *p*) on the phase space, within an arbitrary region Ω, does not change over time ensures that the states do not lump together. In other words, in conclusion, a necessary and sufficient condition for the flow of the system on the phase space to conserve information is that the divergence of the phase flow velocity vanishes. This is the Liouville’s theorem [1].^{1}

The 2-dimensional Liouville’s theorem straight-forwardly generalizes to the 6*N*-dimensional phase space. Each conjugate pair (*q** _{j}*,

$$\frac{\text{d}{\rho}_{j}}{\text{d}t}+{p}_{j}\nabla \cdot {v}_{j}=0,\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{}j\in \left[1,3N\right],\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(11\right)$$

where *ρ** _{j}* ≡

$$\nabla \cdot {v}_{j}=0\text{}\forall j\in \left[1,3N\right].\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(12\right)$$

The vanishing divergence of the flow velocity **v**** _{j}** for all conjugate pairs (

$$\frac{\partial {\dot{q}}_{j}}{\partial {q}_{j}}+\frac{\partial {\dot{p}}_{j}}{\partial {p}_{j}}=0\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\forall j\in \left[1,3N\right].\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(13\right)$$

Let $\mathcal{H}$ be a smooth function on the 6*N*-dimensional phase space with the property that it contains no terms that mix different conjugate pairs, e.g. *p** _{i}* ·

$${\dot{q}}_{j}=\frac{\partial \mathcal{H}}{\partial {p}_{j}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{}\forall j\in \left[1,3N\right],\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(14\right)$$

$${\dot{p}}_{j}=-\frac{\partial \mathcal{H}}{\partial {q}_{j}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{}\forall j\in \left[1,3N\right].\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(15\right)$$

Under these circumstances, the Hamilton’s equations are, according to the Liouville–Arnold theorem [4, 5], integrable from a set of known initial conditions. This simply means that they describe an evolution of the system which is unique and deterministic. Thus, given the function $\mathcal{H}$, the flow of the system in time is determined by how $\mathcal{H}$ changes on the phase space. In this sense, $\mathcal{H}$ is said to be the generator for the motion in time of the system on the phase space. The flow of the system on the phase space, described by the Hamilton’s equations, is referred to as a Hamiltonian flow.

Equation (14), for a specific conjugate pair (*q** _{j}*,

$$\mathcal{H}\left({p}_{j}\right)=\int \text{d}{p}_{j}{\dot{q}}_{j}\left({p}_{j}\right).\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(16\right)$$

The momentum *p*_{j}* *and speed *q̇*_{j}* *are assumed to be in one-to-one correspondence. This means that for each value of *q̇*_{j}* *there is a unique value for *p** _{j}*, and vice versa. The function $\mathcal{H}$(

Due to the one-to-one correspondence between *p*_{j}* *and *q̇*_{j}* *it is possible to define a related area, $\mathcal{L}$(*q̇** _{j}*), given by the unique area under the

$$\mathcal{L}\left({\dot{q}}_{j}\right)=\int \text{d}{\dot{q}}_{j}{p}_{j}\left({\dot{q}}_{j}\right).\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(17\right)$$

This integral equation corresponds to the differential equation

$$\frac{\text{d}\mathcal{L}\left({\dot{q}}_{j}\right)}{\text{d}{\dot{q}}_{j}}={p}_{j}.\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(18\right)$$

The total area of the rectangle bounded by (0, *p** _{j}*) and (0,

$$\mathcal{L}\left({\dot{q}}_{j}\right)+\mathcal{H}\left({p}_{j}\right)={p}_{j}\cdot {\dot{q}}_{j}.\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(19\right)$$

It is possible to include a dependence on the generalized coordinate *q*_{j}* *under the constraint that any *q*_{j}*-*dependent terms in the functions $\mathcal{H}$ and $\mathcal{L}$ cancel, such that the total area is *q** _{j}*-independent. Thus, in general, the functions $\mathcal{H}$ and $\mathcal{L}$, referred to as the Hamiltonian and Lagrangian, respectively, satisfy the so-called Legendre transformation, i.e.

$$\left.\mathcal{L}\left({q}_{j},{\dot{q}}_{j}\right)+\mathcal{H}\left({q}_{j},{p}_{j}\right)={p}_{j}\cdot {\dot{q}}_{j}\right)\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(20\right)$$

where

$$\mathcal{L}\left({q}_{j},{\dot{q}}_{j}\right)={{\displaystyle \int}}_{0}^{{\dot{q}}_{j}}\text{d}{\dot{q}}_{j}{p}_{j}\left({\dot{q}}_{j}\right)-U\left({q}_{j}\right),\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(21\right)$$

$$\mathcal{H}\left({q}_{j},{p}_{j}\right)={{\displaystyle \int}}_{0}^{{p}_{j}}\text{d}{p}_{j}{\dot{q}}_{j}\left({p}_{j}\right)+U\left({q}_{j}\right).\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(22\right)$$

The requirement that the total area is *q** _{j}*-independent causes the Hamiltonian and Lagrangian to have a relative sign difference for the function

For the 6*N*-dimensional phase space, the Hamiltonian and Lagrangian are defined by

$\mathcal{L}\left(q,\dot{q}\right)={\displaystyle \sum}_{j=1}^{3N}{{\displaystyle \int}}_{0}^{{\dot{q}}_{j}}\text{d}{\dot{q}}_{j}{p}_{j}\left({\dot{q}}_{j}\right)-U\left(q\right),\text{}\text{}\text{}\text{}\text{}\text{}\text{}\text{}\text{}\text{}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(23\right)$

$\mathcal{H}\left(q,p\right)={\displaystyle \sum}_{j=1}^{3N}{{\displaystyle \int}}_{0}^{{p}_{j}}\text{d}{p}_{j}{\dot{q}}_{j}\left({p}_{j}\right)+U\left(q\right),\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(24\right)$

where the function *U*(*q*), defined by

$U\left(q\right)\equiv {\displaystyle \sum}_{j=1}^{3N}U\left({q}_{j}\right),\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(25\right)$

is referred to as the potential energy of the system.

The pair of Hamilton’s equations

$$-\frac{\partial \mathcal{H}}{\partial {q}_{j}}-{\dot{p}}_{j}=0,\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(26\right)$$

$${\dot{q}}_{j}-\frac{\partial \mathcal{H}}{\partial {p}_{j}}=0\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(27\right)$$

is the local differential representation of the principle of information conservation on the phase space. A global, or integral representation can be obtained by considering the entire evolutionary path from some initial time *t*_{i}* *to some final time *t*_{f},* *where the Hamilton’s equations are integrated over time.^{2} For this purpose, multiply the Hamilton’s equations with two independent arbitrary functions of time, *δq** _{j}*(

$$\left(-\frac{\partial \mathcal{H}}{\partial {q}_{j}}-{\dot{p}}_{j}\right)\delta {q}_{j}\left(t\right)=0,\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(28\right)$$

$$\left({\dot{q}}_{j}-\frac{\partial \mathcal{H}}{\partial {p}_{j}}-\right)\delta {p}_{j}\left(t\right)=0.\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(29\right)$$

The displacements *δq** _{j}*(

$${q}_{j}\left(t\right)\to {q}_{j}\left(t\right)+\delta {q}_{j}\left(t\right),\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(30\right)$$

$${p}_{j}\left(t\right)\to {p}_{j}\left(t\right)+\delta {p}_{j}\left(t\right).\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(31\right)$$

Equations (28) and (29) are equivalent to the Hamilton’s equations since they hold for arbitrary variations. The fact that it is necessary to introduce two displacement functions is due to the independence of the state parameters *q*_{j}* *and *p** _{j}*. The boundary conditions are given by

$$\delta {q}_{j}\left({t}_{\text{i}}\right)=\delta {q}_{j}\left({t}_{\text{f}}\right)=0,\text{}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(32\right)$$

$$\delta {p}_{j}\left({t}_{\text{i}}\right)=\delta {p}_{j}\left({t}_{\text{f}}\right)=0,\text{}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(33\right)$$

i.e. the variations vanish at the initial and final times. Integrating the Hamilton’s equations over time from *t*_{i}* *to *t*_{f}* *gives, to the leading order in the variations,

$${{\displaystyle \int}}_{{t}_{\text{i}}}^{{t}_{\text{f}}}\text{d}t\left[\left(-\frac{\partial \mathcal{H}}{\partial {q}_{j}}-{\dot{p}}_{j}\right)\delta {q}_{j}\left(t\right)+\left({\dot{q}}_{j}-\frac{\partial \mathcal{H}}{\partial {p}_{j}}\right)\delta {p}_{j}\left(t\right)\right]=0.\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(34\right)$$

Integration by parts and recalling the boundary conditions give

$$\delta \mathcal{A}\left({q}_{j},{\dot{q}}_{j}\right)=0,\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(35\right)$$

where

$$\mathcal{A}\left({q}_{j},{\dot{q}}_{j}\right)\equiv {{\displaystyle \int}}_{{t}_{\text{i}}}^{{t}_{\text{f}}}\text{d}t\mathcal{L}\left({q}_{j},{\dot{q}}_{j}\right)\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(36\right)$$

is the action of the system within the subset (*q** _{j}*,

$$\begin{array}{rr}\hfill \mathcal{A}\left(q,\dot{q}\right)& \hfill \equiv {\displaystyle \sum}_{j=1}^{3N}{{\displaystyle \int}}_{{t}_{\text{i}}}^{{t}_{\text{f}}}\text{d}t\mathcal{L}\left({q}_{j},{\dot{q}}_{j}\right)\\ \hfill & \hfill ={{\displaystyle \int}}_{{t}_{\text{i}}}^{{t}_{\text{f}}}\text{d}t\mathcal{L}\left(q,\dot{q}\right).\end{array}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(37\right)$$

This is the Hamilton’s formulation of the principle of stationary action, or briefly, the Hamilton’s principle. It is a global representation of information conservation, i.e. a statement on the entire evolutionary path which must be satisfied if the system is to adhere to the principle of information conservation.

Since the Hamilton’s principle can be derived from the Hamilton’s equations, which in turn are an immediate consequence of the requirement that the divergence of the Hamiltonian flow velocity vanishes, it should be possible to obtain the Hamilton’s principle directly from the requirement that ∇ · **v**** _{j}** = 0 is invariant under the displacements

$$\begin{array}{rr}\hfill \delta {v}_{j}& \hfill ={v}_{j}\left({q}_{j}+\delta {q}_{j},{p}_{j}+\delta {p}_{j}\right)-{v}_{j}\left({q}_{j},{p}_{j}\right)\\ \hfill & \hfill =\delta {q}_{j}\frac{\partial}{\partial {q}_{j}}{v}_{j}+\delta {p}_{j}\frac{\partial}{\partial {p}_{j}}{v}_{j}.\end{array}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(38\right)$$

The divergence of the flow velocity transforms as

$$\nabla \cdot {v}_{\text{j}}\to \nabla \cdot \left({v}_{\text{j}}+\delta {v}_{j}\right)=\nabla \cdot {v}_{\text{j}}+\nabla \cdot \delta {v}_{\text{j}}.\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(39\right)$$

If ∇ *· δ***v**** _{j }**≠ 0, information is not conserved for the deviated path. Therefore, it is required that

$$\nabla \cdot \delta {v}_{j}=0,\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(40\right)$$

which is equivalent to

$$\delta \left(\nabla \cdot {v}_{j}\right)=0.\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(41\right)$$

This statement is for a blob of volume d*V *which encloses the single state (*q** _{j}*,

$$\delta {{\displaystyle \int}}_{{t}_{\text{i}}}^{{t}_{\text{f}}}\text{d}t{{\displaystyle \int}}_{V}\text{d}V\nabla \cdot {v}_{j}=0.\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(42\right)$$

Applying the divergence theorem

$${{\displaystyle \int}}_{V}\text{d}V\nabla \cdot {v}_{j}={{\displaystyle \int}}_{\partial V}dS\cdot {v}_{j}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(43\right)$$

gives

$$\delta {\int}_{{t}_{\text{i}}}^{{t}_{\text{f}}}\text{d}t{\int}_{\partial V}\overrightarrow{\text{dS}}\cdot {v}_{\text{j}}=0.\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(44\right)$$

The integrand **dS** · **v**** _{j}** represents the density of the net Hamiltonian flow out of the tube. The surface area element

$$dS=\text{d}S\text{\hspace{0.33em}}n,\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(45\right)$$

where **n** = (*p** _{j}*,

$$\left({p}_{j},{q}_{j}\right)\cdot \left({\dot{q}}_{j},{\dot{p}}_{j}\right)={p}_{j}{\dot{q}}_{j}+{q}_{j}{\dot{p}}_{j}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(46\right)$$

Using that *q*_{j}_{ }= ∫d*q*_{j}* *and the Hamilton’s equation ${\dot{p}}_{j}=-\frac{\partial H}{\partial {q}_{j}}$, the integrand can be written as

$${p}_{j}{\dot{q}}_{j}-\int \text{d}{q}_{j}\frac{\partial \mathcal{H}}{\partial {q}_{j}}={p}_{j}{\dot{q}}_{j}-\int \text{d}\mathcal{H}={p}_{j}{\dot{q}}_{j}-\mathcal{H}.\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(47\right)$$

Equivalently, the integrand could have been written as

$${q}_{j}{\dot{p}}_{j}+\mathcal{H},\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(48\right)$$

by using that *p** _{j}* =

$$\delta {{\displaystyle \int}}_{{t}_{\text{i}}}^{{t}_{\text{f}}}\text{d}t\int \text{d}S\mathcal{L}=0.\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(49\right)$$

The equality must hold independently of the surface area of the tube, i.e. the principle of information conservation should hold true independently of the number of states in which the system can exist. Therefore, the integration over the surface area can be taken outside of the infinitesimal variation, giving that

$$\delta {{\displaystyle \int}}_{{t}_{i}}^{{t}_{f}}\text{d}t\mathcal{L}=0,\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(50\right)$$

which is, again, the Hamilton’s principle. Thus, the Hamilton’s principle can be derived directly from the Liouville’s theorem.

Given that the divergence of the Hamiltonian flow velocity vanishes, the Liouville equation can be written as

$$\frac{\partial \rho}{\partial t}+\nabla \rho \cdot v=0.\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(51\right)$$

The Poisson bracket {*ρ*, $\mathcal{H}$} between the density of states *ρ* and the Hamiltonian $\mathcal{H}$ is defined by

$$\left\{\rho ,\mathcal{H}\right\}\equiv \nabla \rho \cdot v=\frac{\partial \rho}{\partial q}\frac{\partial \mathcal{H}}{\partial p}-\frac{\partial \rho}{\partial p}\frac{\partial \mathcal{H}}{\partial q}.\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(52\right)$$

In general, the Poisson bracket {*A*,* B*} between any two arbitrary functions *A *and *B *on the phase space is defined by

$$\left\{A,B\right\}\equiv \frac{\partial A}{\partial q}\frac{\partial B}{\partial p}-\frac{\partial A}{\partial p}\frac{\partial B}{\partial q}.\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(53\right)$$

In this notation, the Hamilton’s equations are written as

$$\dot{q}=\left\{q,\mathcal{H}\right\},\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(54\right)$$

$$\dot{p}=\left\{p,\mathcal{H}\right\}.\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(55\right)$$

The Poisson bracket satisfies a set of algebraic properties. It is antisymmetric, i.e.

$$\left\{A,B\right\}=-\left\{B,A\right\}.\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(56\right)$$

It satisfies linearity, i.e.

$$\left\{aA+bB,C\right\}=a\left\{A,C\right\}+b\left\{B,C\right\}.\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(57\right)$$

Furthermore, it satisfies the product rule and the Jacobi identity, i.e.

$$\left\{AB,C\right\}=A\left\{B,C\right\}+\left\{A,C\right\}B,\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(58\right)$$

$$\left\{A,\left\{B,C\right\}\right\}+\left\{B,\left\{C,A\right\}\right\}+\left\{C,\left\{A,B\right\}\right\}=0.\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\text{\hspace{0.33em}}\left(59\right)$$

These properties define the Poisson algebra of classical mechanics. Since the Liouville’s equation for the incompressible Hamiltonian flow can be expressed in terms of the Poisson bracket, the Liouville’s theorem can equivalently be stated by saying that the evolution in time of any given system conserves information if it leaves the Poisson algebra invariant.

The Liouville’s theorem is interpreted as the mathematical condition representing the physical conservation of information in classical mechanics. The Hamilton’s equations, the Hamilton’s principle and the invariance of the Poisson algebra are distinct, but equivalent, manifestations of the theorem.

[1] F. Bloch, *Fundamentals of Statistical Mechanics*, Manuscript and Notes of Felix Bloch, 3rd ed. (Imperial College Press and World Scientific Publishing, London, 2000).

[2] J.W. Gibbs, *Elementary Principles in Statistical Mechanics* (Charles Scribner’s Sons, New York, 1902).

[3] J. Liouville, Note sur la Theorié de la Variation des constantes arbitraires, J. Math. Pures Appl. **3**(1), 342–349 (1838).

[4] J. Liouville, Note sur l’intégration des équations différentielles de la Dynamique, J. Math. Pures Appl. **20**(1), 137–138 (1855).

[5] V.I. Arnold, *Mathematical Methods of Classical Mechanics*, 2nd ed. (Springer-Verlag, New York, 1989).

[6] H. Jeffreys and B.S. Jeffreys, *Methods of Mathematical Physics*, 3rd ed. (Cambridge University Press, 1956).

^{1} To the best of the author’s knowledge, the physical formulation and relevance of the Liouville theorem was first stated by J.W. Gibbs in 1902 [2]. There it was referred to as the ‘Principle of Conservation of Density-in-phase’ or equivalently as the ‘Principle of Conservation of Extension-in-phase’. However, the mathematical background for the theorem dates back to J. Liouville in 1838 [3].

^{2} For the derivation of an integral representation on the configuration space starting from the Newton’s second law of motion, see Chapter 10 in Ref. [6].