A. Henriksson

Stavanger Katedralskole, Haakon VII’s gate 4, 4005 Stavanger, Norway

Received 20 January 2022; revised 20 March 2022; accepted 1 April 2022

In this article, it is suggested that a pedagogical point of departure in the teaching of classical mechanics is the Liouville’s theorem. The theorem is interpreted to define the condition that describes the conservation of information in classical mechanics. The Hamilton’s equations and the Hamilton’s principle of the least action are derived from the Liouville’s theorem.

Keywords: information, determinism, Liouville’s theorem, Hamilton’s equations, Hamilton’s principle

1. Introduction

In this article, the theory of classical mechanics is approached from a different perspective. Its purpose is entirely pedagogical. I have taught classical mechanics in the traditional way, starting with Newton’s laws of motion, and following up with Hamilton’s principle, the  Euler–Lagrange equations of motion, the  Hamilton’s equations and the Liouville’s theorem. The students have, in general, had problems seeing the  relations between the  standard mathematical representations of classical mechanics. In each class, there are typically a few students that ask whether there exists a  foundational principle of classical mechanics that is independent of the specific mathematical representation being chosen. I have never been able to answer this question in a satisfactory manner. This article grew out of the desire to address this question.

Clearly, there is no reason to believe that there exists a  unique principle. However, in this article a specific point of departure is identified and shown to lead to the  traditional formulations. The suggested principle is that of the conservation of information. It is argued that the Liouville’s theorem is the  mathematical representation of this principle. The  Hamilton’s equations, the  Hamilton’s principle of the least action and the invariance of the Poisson algebra are then understood as different manifestations of the Liouville’s theorem.

There is nothing new appearing in this article. Everything is known from before. What then, one might ask, is the  purpose and use of the  article? The answer is threefold. First, and foremost, it suggests an alternative way to teach the  subject. As a teacher for many years, it is obvious that it is beneficial to have a diverse repertoire when it comes to presenting and explaining a topic. Personally, I take great pleasure in being able to explain the subject to my students in different ways. Secondly, it provides a different point of view on an old and well-known subject. Even though it might not be of any use in the immediate, or near, future, it is generally good to be aware of a multitude of equivalent perspectives on any given problem. Thirdly, to the best of my knowledge, the Hamilton’s principle has never been derived from the Liouville’s theorem.

2. Determinism and information

In classical mechanics, it is a fundamental assumption that the evolution of a system is deterministic in both directions of time, i.e. both into the future and into the  past. The  deterministic evolution of a  system means that it is possible, with absolute certainty, to say that any given state of the system evolved from a definite single state in the past and will evolve into a definite single state in the future. There cannot be any ambiguity in the evolutionary history of a system. Thus, the deterministic evolution implies that nowhere on the phase space can states converge or diverge (see Fig. 1).


Fig. 1. Non-deterministic evolution implies that system trajectories would cross each other on the phase space, here at point (q0, p0).

Systems that appear to evolve non-deterministically give rise to the appearance of irreversible processes in nature. The reason for this is that if a system starts out in a given state, it is not necessarily the case that the system ends up at the same initial state by reversing the motion of the system in time. An example of a seemingly irreversible process is the sliding of a block of cheese along a table. Due to friction the block will always come to rest, apparently independently of the initial condition of the  block. Thus, it appears as though the  multitude of possible initial states for the  block, given by the possibility of sending off the block with different initial speeds, all converge to the same final state where the block is at rest. Knowing the final state of the  system does not help in predicting the initial state of the system. Therefore, the experiment with sending off the block of cheese seems to represent an evolution which is non-deterministic into the past.

The origin for the apparent violation of reversibility in physical processes is not due to a fundamental character in physical laws, but rather it is due to the ignorance of the observer. The observer has not taken into account all the details of the system. Degrees of freedom for the system have been ignored. In the case of the sliding block of cheese, it is the individual motion of atoms in the block and the  table which has been ignored. Assuming that all degrees of freedom for the block and the table are followed in detail as the block slides on the table it is clear that each unique initial state will give rise to a unique final state where the distinction between the final states is given by the distinct final position and the velocity of each atom in the block and table.

A direct consequence of the assumption of deterministic evolution is that distinctions between physical states never disappear. If there is an initial distinction between the states, this distinction will survive throughout the  entire motion of the  system. These distinctions between the states seem to disappear as time unfold is merely a consequence of the difficulty for an observer to keep a perfect track of the motion of all particles. In the case of the sliding block, for a  human observer, the  distinction between individual motions of atoms in the block and the table is too small to measure and therefore it appears as though two distinct initial states, characterized by distinct initial speeds, which are easy to measure, converge to the same final state, i.e. that the block is at rest. In conclusion, the assumption of deterministic evolution can equivalently be stated as follows:

The distinction between the physical states of a closed system is conserved in time.

Due to the conservation of the distinction between physical states, any set of states which lie in the interior of some volume element on phase space will remain the interior of this volume element as the system evolves in time.

If a system is followed, as it evolves in time, in detail by an observer, it means that the observer has perfect and complete knowledge about all the degrees of freedom of the system, i.e. the observer knows, with an infinite precision, the exact position and momenta of all particles within the system. In such an ideal scenario, the observer has no problem to see the distinction between the  states of the  system. The  amount of knowledge, or information, about the system possessed by the observer, at any instant of time, is complete. Since the ideal observer never loses the track of the system, the distinction between states is never lost. In other words, the knowledge, or information, that the observer has about the system is not lost as the system evolves in time.

If, however, as is the  case in practical reality, the observer has a limited ability to track the motion of individual particles, the observer does not possess complete information about the  system. Even worse, the observer may, as is usually the case for complicated systems with many degrees of freedom, find it more and more difficult to track the  system as time unfolds. In such a  scenario, the amount of information about the system, possessed by the observer, decreases with time. In other words, from the perspective of the ignorant observer, information about the system is lost. However, it is important to emphasize that this apparent loss of information is entirely due to the ignorance of the  observer. If all the  degrees of freedom were tracked with an infinite precision, information would never be lost. In the case of the sliding block of cheese, the  observer has lost information because the system was known to exist in one of two distinct initial states, obtained by measuring the initial speed of the block, whereas it is not possible to distinguish between the two final states.

In conclusion, the  loss of the  distinction between states implies that information has been lost. Thus, the conservation of distinction between the states can equivalently be stated as an assumption of information conservation:

The information contained within a closed system is conserved in time.

In other words, the assumption that classical systems evolve deterministically, i.e. that the state of the system is perfectly predictable by an observer both into the future and back to the past, is equivalent to the statement that an observer of the system possesses complete information about the system, and assufming that the system is closed, this amount of information is never lost.

3. The Liouville’s theorem

Consider the arbitrary region Ω on the 2-dimensional phase space, with the volume VΩ and the volume element ∆qp. The  mathematical condition imposing information conservation is

ΔN Δt =0, 1

where N is the number of states within the phase space volume Ω. The condition states that N can neither increase nor decrease within the time interval ∆t. For this condition to be satisfied, it is necessary that the incoming and outgoing flow of states through Ω within ∆t cancels, i.e. that

Δ ρ q,p q ˙ +Δ ρ q,p p ˙ =0, 2

where ρ(qp) is the density of states on the phase space, and the flow differences are defined by, respectively,

Δ ρ q,p q ˙ ρ q out  ,p q ˙ out  ρ q in ,p q ˙ in Δp 3


Δ ρ q,p p ˙ ρ q, p out  p ˙ out  ρ q, p in  p ˙ in  Δq. 4

In a differential form the condition, aſter having been extended to be valid for an arbitrary length of time, reads in the vector notation as

ρ t + ρv =0, 5


q , p 6

is the differential operator on the phase space, and

v q ˙ , p ˙ 7

is the  velocity by which states flow on the  phase space. Equation (5) is the  Liouville’s continuity equation [1] for the density of states on the phase space. It says that the  number of states is locally conserved. The term ∇·(ρv) represents the net flow of states through Ω, i.e. the difference between the  outflow and inflow of states. The  continuity equation can be rewritten as

dρ dt +ρv=0, 8

by using the total time derivative of the density of states and the product rule applied to the net flow of states. Thus, if the divergence of the phase flow velocity vanishes, i.e. if

v=0, 9

then, by the  continuity equation, the  density of states on the phase space is constant in time along the flow on the phase space, i.e.

dρ dt =0. 10

In such a  situation, the  flow of the  system on the phase space is incompressible because the condition that the density of states at any given location (qp) on the phase space, within an arbitrary region Ω, does not change over time ensures that the states do not lump together. In other words, in conclusion, a necessary and sufficient condition for the flow of the system on the phase space to conserve information is that the divergence of the phase flow velocity vanishes. This is the Liouville’s theorem [1].1

The 2-dimensional Liouville’s theorem straight-forwardly generalizes to the 6N-dimensional phase space. Each conjugate pair (qjpj), where j ∈ [1, 3N], gives rise to an independent Liouville’s continuity equation, i.e.

d ρ j  dt + p j v j =0, j 1,3N , 11

where ρjρ(qjpj) is the density of states in the 2-dimensional subset (qjpj) of the  6N-dimensional phase space and v j q ˙ j , p ˙ j is the phase flow velocity along this subset. Thus, information is conserved on the  6N-dimensional phase space if the  divergence of each phase flow velocity v j vanishes, i.e. if

v j =0 j 1,3N . 12

4. Hamilton’s equations

The vanishing divergence of the flow velocity vj for all conjugate pairs (qjpj), j ∈ [1, 3N], written out explicitly in terms of its velocity components j and j, becomes

q ˙ j q j + p ˙ j p j =0j 1,3N . 13

Let H be a smooth function on the 6N-dimensional phase space with the  property that it contains no terms that mix different conjugate pairs, e.g. pi · pj, ∀i ≠ j. In this situation, taking into account that the set of conjugate pairs q j , p j j=1 3N are postulated to be independent, the condition of vanishing divergence can equivalently be stated by the set of differential equations known as Hamilton’s equations,

q ˙ j = H p j  j 1,3N , 14
p ˙ j = H q j  j 1,3N . 15

Under these circumstances, the Hamilton’s equations are, according to the Liouville–Arnold theorem [4, 5], integrable from a set of known initial conditions. This simply means that they describe an evolution of the system which is unique and deterministic. Thus, given the function H , the flow of the system in time is determined by how H changes on the phase space. In this sense, H is said to be the generator for the motion in time of the system on the  phase space. The  flow of the  system on the  phase space, described by the  Hamilton’s equations, is referred to as a Hamiltonian flow.

5. The Hamiltonian and Lagrangian

Equation (14), for a specific conjugate pair (qj, pj), corresponds to the integral equation

H p j = d p j q ˙ j p j . 16

The momentum pj and speed j are assumed to be in one-to-one correspondence. This means that for each value of j there is a unique value for pj, and vice versa. The function H (pj) is then geometrically interpreted as the unique area under the  j(pj)-graph, bounded by (0, pj) and (0, j(pj)), see Fig. 2.


Fig. 2. The areas under j(pj) and j(qj) graphs define the Hamiltonian and Lagrangian, respectively.

Due to the one-to-one correspondence between pj and j it is possible to define a related area, L (j), given by the unique area under the pj(j)-graph,

L q ˙ j = d q ˙ j p j q ˙ j . 17

This integral equation corresponds to the differential equation

dL q ˙ j d q ˙ j = p j . 18

The total area of the rectangle bounded by (0, pj) and (0, j) is given by

L q ˙ j +H p j = p j q ˙ j . 19

It is possible to include a  dependence on the generalized coordinate qj under the constraint that any qj-dependent terms in the  functions H and L cancel, such that the total area is qj-independent. Thus, in general, the functions H and L , referred to as the  Hamiltonian and Lagrangian, respectively, satisfy the so-called Legendre transformation, i.e.

L q j , q ˙ j +H q j , p j = p j q ˙ j 20


L q j , q ˙ j = 0 q ˙ j  d q ˙ j p j q ˙ j U q j , 21
H q j , p j = 0 p j  d p j q ˙ j p j +U q j . 22

The requirement that the  total area is qj-independent causes the  Hamiltonian and Lagrangian to have a relative sign difference for the function U(qj).

For the 6N-dimensional phase space, the Hamiltonian and Lagrangian are defined by

L q, q ˙ = j=1 3N 0 q ˙ j  d q ˙ j p j q ˙ j U q , 23
H q,p = j=1 3N 0 p j  d p j q ˙ j p j +U q , 24

where the function U(q), defined by

U q j=1 3N U q j , 25

is referred to as the potential energy of the system.

6. Principle of stationary action

The pair of Hamilton’s equations

H q j p ˙ j =0, 26
q ˙ j H p j =0 27

is the local differential representation of the principle of information conservation on the phase space. A global, or integral representation can be obtained by considering the  entire evolutionary path from some initial time ti to some final time tf, where the Hamilton’s equations are integrated over time.2 For this purpose, multiply the Hamilton’s equations with two independent arbitrary functions of time, δqj(t) and δpj(t), representing, respectively, small displacements in qj and pj on the phase space, in the following manner:

H q j p ˙ j δ q j t =0, 28
q ˙ j H p j δ p j t =0. 29

The displacements δqj(t) and δpj(t) are pictured as slight variations of the physical path on the phase space, i.e.

q j t q j t +δ q j t , 30
p j t p j t +δ p j t . 31

Equations (28) and (29) are equivalent to the  Hamilton’s equations since they hold for arbitrary variations. The  fact that it is necessary to introduce two displacement functions is due to the independence of the state parameters qj and pj. The boundary conditions are given by

δ q j t i =δ q j t f =0, 32
δ p j t i =δ p j t f =0, 33

i.e. the  variations vanish at the  initial and final times. Integrating the  Hamilton’s equations over time from ti to tf gives, to the  leading order in the variations,

t i t f dt H q j p ˙ j δ q j t + q ˙ j H p j δ p j t =0. 34

Integration by parts and recalling the boundary conditions give

δA q j , q ˙ j =0, 35


A q j , q ˙ j t i t f dtL q j , q ˙ j 36

is the action of the system within the subset (qjpj) on the 6N-dimensional phase space. The action on the entire phase space is given by

A q, q ˙   j=1 3N t i t f dtL q j , q ˙ j  = t i t f dtL q, q ˙ . 37

This is the  Hamilton’s formulation of the  principle of stationary action, or briefly, the Hamilton’s principle. It is a global representation of information conservation, i.e. a statement on the entire evolutionary path which must be satisfied if the system is to adhere to the principle of information conservation.

Since the Hamilton’s principle can be derived from the Hamilton’s equations, which in turn are an immediate consequence of the requirement that the divergence of the Hamiltonian flow velocity vanishes, it should be possible to obtain the Hamilton’s principle directly from the requirement that ∇ · vj = 0 is invariant under the displacements δqj(t) and δpj(t). Given that the  variations are small, the  flow velocity v j can be expanded as a Taylor series about the state (qjpj), where terms that are of quadratic or higher order in the variations δqj and δpj can be ignored. The infinitesimal change in vj thus becomes

δ v j = v j q j +δ q j , p j +δ p j v j q j , p j =δ q j q j v j +δ p j p j v j . 38

The divergence of the flow velocity transforms as

v j v j +δ v j = v j +δ v j . 39

If ∇ · δv≠ 0, information is not conserved for the deviated path. Therefore, it is required that

δ v j =0, 40

which is equivalent to

δ v j =0. 41

This statement is for a blob of volume dV which encloses the single state (qjpj). Information conservation should hold for all varied states along the evolutionary path of the system, from the initial state (qjpj)i, at time ti, to the final state (qjpj)f, at time tf. Thus, the  above statement should be integrated over all the blobs of volume dV along the  path, i.e. the  integration is over a  tube, with the volume V, the interior of which defines the region of extended phase space where the principle of information conservation is fulfilled. Thus,

δ t i t f dt V  dV v j =0. 42

Applying the divergence theorem

V  dV v j = V dS v j 43


δ t i t f dt V dS v j =0. 44

The integrand dS · vj represents the density of the net Hamiltonian flow out of the tube. The surface area element dS is given by

dS=dSn, 45

where n = (pjqj) is the normal vector to the surface of the  tube, i.e. n gives the  direction in the phase space in which the system has to flow if it is to eventually reach a region where the principle of the  conservation of information no longer holds. Thus, with vj = (jj), the integrand becomes

p j , q j q ˙ j , p ˙ j = p j q ˙ j + q j p ˙ j 46

Using that qj = ∫dqj and the Hamilton’s equation p ˙ j = H q j , the integrand can be written as

p j q ˙ j  d q j H q j = p j q ˙ j  dH= p j q ˙ j H. 47

Equivalently, the  integrand could have been written as

q j p ˙ j +H, 48

by using that pj  =  dpj and the  other Hamilton equation q ˙ j = H p j . However, the  form pj j  –  H is the preferred choice due to the fact that it is equal to the Lagrangian L (qjj). Thus, on the 6N-dimensional phase space it is obtained that

δ t i t f dt dSL=0. 49

The equality must hold independently of the surface area of the tube, i.e. the principle of information conservation should hold true independently of the number of states in which the system can exist. Therefore, the integration over the surface area can be taken outside of the infinitesimal variation, giving that

δ t i t f  dtL=0, 50

which is, again, the  Hamilton’s principle. Thus, the  Hamilton’s principle can be derived directly from the Liouville’s theorem.

7. Invariance of the Poisson algebra

Given that the divergence of the Hamiltonian flow velocity vanishes, the  Liouville equation can be written as

ρ t +ρv=0. 51

The Poisson bracket {ρ H } between the density of states ρ and the Hamiltonian H is defined by

ρ,H ρv= ρ q H p ρ p H q . 52

In general, the Poisson bracket {A, B} between any two arbitrary functions A and B on the phase space is defined by

A,B A q B p A p B q . 53

In this notation, the  Hamilton’s equations are written as

q ˙ = q,H , 54
p ˙ = p,H . 55

The Poisson bracket satisfies a  set of algebraic properties. It is antisymmetric, i.e.

A,B = B,A . 56

It satisfies linearity, i.e.

aA+bB,C =a A,C +b B,C . 57

Furthermore, it satisfies the  product rule and the Jacobi identity, i.e.

AB,C =A B,C + A,C B, 58
A, B,C + B, C,A + C, A,B =0. 59

These properties define the Poisson algebra of classical mechanics. Since the Liouville’s equation for the  incompressible Hamiltonian flow can be expressed in terms of the Poisson bracket, the Liouville’s theorem can equivalently be stated by saying that the evolution in time of any given system conserves information if it leaves the Poisson algebra invariant.

8. Conclusions

The Liouville’s theorem is interpreted as the mathematical condition representing the physical conservation of information in classical mechanics. The  Hamilton’s equations, the  Hamilton’s principle and the  invariance of the  Poisson algebra are distinct, but equivalent, manifestations of the theorem.


[1] F.  Bloch, Fundamentals of Statistical Mechanics, Manuscript and Notes of Felix Bloch, 3rd ed. (Imperial College Press and World Scientific Publishing, London, 2000).

[2] J.W.  Gibbs, Elementary Principles in Statistical Mechanics (Charles Scribner’s Sons, New York, 1902).

[3] J. Liouville, Note sur la Theorié de la Variation des constantes arbitraires, J. Math. Pures Appl. 3(1), 342–349 (1838).

[4] J. Liouville, Note sur l’intégration des équations différentielles de la Dynamique, J. Math. Pures Appl. 20(1), 137–138 (1855).

[5] V.I.  Arnold, Mathematical Methods of Classical Mechanics, 2nd ed. (Springer-Verlag, New York, 1989).

[6] H.  Jeffreys and B.S.  Jeffreys, Methods of Mathematical Physics, 3rd ed. (Cambridge University Press, 1956).

1 To the best of the author’s knowledge, the physical formulation and relevance of the Liouville theorem was first stated by J.W. Gibbs in 1902 [2]. There it was referred to as the ‘Principle of Conservation of Density-in-phase’ or equivalently as the ‘Principle of Conservation of Extension-in-phase’. However, the mathematical background for the theorem dates back to J. Liouville in 1838 [3].

2 For the derivation of an integral representation on the configuration space starting from the Newton’s second law of motion, see Chapter 10 in Ref. [6].


A. Henriksson

Stavangerio katedros mokykla, Stavangeris, Norvegija