$$ \nonumber \newcommand{\br}{\mathbf{r}} \newcommand{\bp}{\mathbf{p}} \newcommand{\bk}{\mathbf{k}} \newcommand{\bq}{\mathbf{q}} \newcommand{\bv}{\mathbf{v}} \newcommand{\bx}{\mathbf{x}} \newcommand{\bz}{\mathbf{z}} $$
$$ \begin{aligned} \dot \bq &=\frac{\partial H}{\partial \bp}\\ \dot \bp &=-\frac{\partial H}{\partial \bq} \end{aligned} $$

Newton tells us $\ddot x = - \omega^2 x$
General solution $x(t) = A\cos \omega t + B \sin \omega t$
$$ \begin{aligned} \dot x &= p\\ \dot p &= -x \end{aligned} $$


$$ \ddot \theta = -\frac{g}{l}\sin\theta $$
$$ \begin{aligned} \dot x &= p\\ \dot p &= -\sin x \end{aligned} $$

 
 
$(\mathbf{q}, \mathbf{p})\in\mathbb{R}^{2n}$, Hamiltonian $H:\mathbb{R}^{2n}\to \mathbb{R}$ defines dynamics via$$ \begin{aligned} \dot \bq &= \frac{\partial H}{\partial \bp}\\ \dot \bp &= -\frac{\partial H}{\partial \bq} \end{aligned} $$
$$ H(\mathbf{q},\mathbf{p}) = \frac{1}{2}\mathbf{p}^2 + V(\mathbf{q}) $$
$(\dot\bq,\dot\bp)$ perpendicular to $\nabla H$ ($H$ conserved)$$ (\dot\bq,\dot\bp)\cdot (\nabla_\bq H, \nabla_\bp H) = (\nabla_\bp H,-\nabla_\bq H)\cdot (\nabla_\bq H, \nabla_\bp H) = 0 $$
`$$ \nabla\cdot\bv=\frac{\partial \bv_\bq}{\partial \bq}+\frac{\partial \bv_\bp}{\partial \bp}=\frac{\partial^2 H}{\partial \bq\partial \bp}-\frac{\partial^2 H}{\partial \bp\partial \bq} = 0 $$``


Write $x=(\mathbf{q}, \mathbf{p})\in \mathbb{R}^{2N}$
Hamiltonian evolution on $t\in[0,T]$ with $x’=x(T)$, $x=x(0)$

$$ \begin{aligned} \dot \bq &= \frac{\partial H}{\partial \bp}\\ \dot \bp &= -\frac{\partial H}{\partial \bq}\\ \dot{x} &= \Omega \nabla_x H\, ,\quad \Omega = \begin{pmatrix} 0 & \mathbb{1}_n\\ -\mathbb{1}_n & 0 \end{pmatrix} \end{aligned} $$
$$ J^T\Omega J_f = \Omega $$
$J(x)\in\text{Sp}_{2n}(\mathbb{R})$ is member of linear symplectic group
Simplest case $n=1$
$$ J = \begin{pmatrix} a & b \\ c & d \end{pmatrix}\longrightarrow ad-bc=1 $$
$$ J^T\Omega J_f = \Omega $$
Since $\det(J_f) = +1$ $\longrightarrow$ volume is conserved
More: the sum of (signed) areas in each $q_j-p_j$ plane is preserved

Sample from a multivariate distribution

$$ P(\{\sigma_i\}) = Z^{-1} \exp\left(-\beta\sum_{i,j}J_{ij}\sigma_i\sigma_j\right) $$
Use MCMC to sample from distribution, calculate expectations, etc.

$$ p_\mathbf{X}(\mathbf{x}) = p_{\mathbf{Z}}(f(\mathbf{x}))\left|\frac{\partial \mathbf{f}}{\partial \mathbf{x}}\right| $$$p_\mathbf{X}$ from simple $p_{\mathbf{Z}}$ (e.g. Gaussian)$\mathbf{f}_L\circ \mathbf{f}_{L-1}\cdots \circ \mathbf{f}_1$
$$ p_\mathbf{X}(\mathbf{x}) = p_{\mathbf{Z}}(f(\mathbf{x}))\left|\frac{\partial \mathbf{f}}{\partial \mathbf{x}}\right| $$
$$ \mathbf{f}(\mathbf{x}) = \mathbf{x}+\epsilon \mathbf{g}(\mathbf{x}) $$ $$ \left|\frac{\partial \mathbf{f}}{\partial \mathbf{x}}\right|\sim 1 +\epsilon \textrm{tr}\left[\frac{\partial \mathbf{g}}{\partial \mathbf{x}}\right] $$
$x_{1:d}$ and $x_{d+1:D}$$$ \begin{aligned} z_j &= x_j e^{\alpha_j(x_{d+1:D})} + \mu_j(x_{d+1:D}), \qquad j=1,\ldots, d \\ z_j &= x_j \qquad j=d+1,\ldots, D\\ \left|\frac{\partial \mathbf{f}}{\partial \mathbf{x}}\right| &= \prod_{j=1}^d e^{\alpha_i(x_{d+1:D})} \end{aligned} $$
Parameterize scale $e^{\alpha_j(x_{d+1:D})}$ and shift $\mu_j(x_{d+1:D})$ by NN
Compose many bijections
$p(\mathbf{x}) = \prod_j p(x_j|x_{1:x_{j-1}})$$$ x_j = z_j e^{\alpha_j(x_{1:j-1})} + \mu_j(x_{1:j-1}) $$


$$ \operatorname*{argmin}_\theta \bigg \Vert \frac{d\mathbf{q}}{dt} - \frac{\partial \mathcal{H_{\theta}}}{\partial \mathbf{p}} \bigg \Vert^2 + \bigg \Vert \frac{d\mathbf{p}}{dt} + \frac{\partial \mathcal{H_{\theta}}}{\partial \mathbf{q}} \bigg \Vert^2 $$

$\left[\bx_{t}, \bx_{t+1} \right]$ to a “latent” phase space $\bz=\left[q_t,p_t\right]$

$$ \mathbf{q}'=\mathbf{q}, \qquad \mathbf{p}'=\mathbf{p}-\nabla F(\mathbf{q}) $$
$$ z_j = x_j e^{\alpha_j(x_{d+1:D})} + \mu_j(x_{d+1:D}), \qquad j=1,\ldots, d $$
__Problem__: $O(m^2)$ in training network of $m$ layers.
$$ \mu(\mathbf{q}) = W_1^T\sigma(W_2 \sigma(W_1\mathbf{q})), \qquad W_2 \text{ diagonal} $$
Linear layer to mix $p,q$ so that deeper additive couplings act on all phase space coordinates (c.f. Glow)
To parametrize $S\in \text{Sp}_{2n}(\mathbb{R})$
$$ \begin{aligned} S = NAK= \begin{pmatrix} \mathbf{1} & 0 \\ M & \mathbf{1} \end{pmatrix} \begin{pmatrix} L^\top & 0 \\ 0 & L^{-1} \end{pmatrix} \begin{pmatrix} X & -Y \\ Y & X \end{pmatrix} \, , \end{aligned} $$
with
$$ \begin{aligned} &M = M^\top \,,\quad &X^\top Y = Y^\top X \, ,\quad X^\top X + Y^\top Y = \mathbf{1}\,. \end{aligned} $$
$$ K = \begin{pmatrix} X & -Y \ Y & X \end{pmatrix} , , $$
$$ U = \text{diag}(e^{i\phi_i}),. $$
$$ \begin{aligned} \begin{cases} Q = q - \mu^q + \alpha\ P = p - \mu^p + \beta \end{cases}, , \end{aligned} $$
Training: $\mu$ is batch mean during training
Testing: weighted moving average accumulated during training
Full version of batch norm not canonical

Integrable means $N$ conserved phase space functions $I_j$:
Canonical transformations generated by each (as Hamiltonian) commute
Submanifold of phase space at fixed $\left\{I_{i}\right\}$ is $N$-Torus $\mathbb{T}^{N}$

$$ \begin{aligned} \dot{\varphi} = \partial_IK = \text{const.}\,,\quad \dot{I} = -\partial_\varphi K = 0 \end{aligned} $$
$$ H_{\text{K}} = \tfrac{1}{2}\sum p_i^2 + \frac{k}{r},, \quad r = \sqrt{\sum q_i^2}. $$
$H_\text{K}$ and angular momentum $\mathbf{L}=\mathbf{q}\times\mathbf{p}$ conserved
Additional conserved quantity Laplace–Runge–Lenz vector $$ \mathbf{A} = \mathbf{p}\times\mathbf{L} + k \frac{\mathbf{q}}{r}. $$
Total of 7 conserved quantities. But phase space only 6D!
$$ \begin{aligned} \mathbf{A}\cdot \mathbf{L}&=0\qquad \mathbf{A}^2 &= k^2 + 2H_\text{K} \mathbf{L}^2 \end{aligned} $$

$$ T : (\hat{q}, \hat{p}) \mapsto (q, p) ,. $$
$$ \begin{aligned} \ell = \frac{1}{n \tau} \sum_{k=1}^{\tau} || r_{k} - r_{k+1} ||^2\,,\quad r_{k} = \hat{q}(t_k)^2 + \hat{p}(t_k)^2 \,. \end{aligned} $$
Find trajectories from equations of motions (RK)
SGD (Adam)
Shuffle the trajectories at every epoch

Continuous canonical transformations (c.f. neural ODE)
Convolutional bijectors, identical particles. etc.
See Neural Canonical Transformation with Symplectic Flows, Li et al. arXiv:1910.00024 for recent developments
$$ H = \sum_{i=1}^N \frac{1}{2} [p_i^2 + (q_{i} - q_{i+1})^2] + \frac{\alpha}{3} (q_{i} - q_{i+1})^3 + \frac{\beta}{4} (q_{i} - q_{i+1})^4 $$


​