notes - ode, index notation, etc

DRAFT under construction 7/10/23-...

Main notions
Order reduction
Reduction of a non-autonomous equation to an autonomous one
What evolutionary (i.e., depending on time \(t\)) process can be described by ODEs?
IVP (Initial Value Problems)
- Index Notation Aside
Now about the proof: contraction mappings and such
An existence theorem
Geometry of ODEs: vector fields
Dependence of solutions on parameters and initial data.
Extendability of local solutions.
Boundary value problems (BVPs)

As with most of these posts, what I have included here closely follows and often directly copies or quotes a text written by someone more errudite than me. In this case thank you to Prof. Peter Kuchment for A brief sketch of the main ODE theorems written for Texas A&M University's Math 611, Fall 2017.

_{For those with a lot of time on their hands, you may notice that the headings for this pages are h3 while the other posts so far use h2, and the a links are bold face!}

Main notions

Definition 1: An ODE of order \( k \) is an equation relating the values of one or more unknown functions of a single variable \( t \) (which we will call "time"), their derivatives up to the order \( k \), and the independent variable itself:

\[ \Phi(t, x_1, x_2, \ldots, x_n, x_i, x_i', \ldots, x_i^{(k)}, x_{i+1}^{(1)}, \ldots, x_{i+1}^{(n)}) = 0. \quad (1) \]

If more than one unknown function is involved, a system of such equations is usually needed. A system can be neatly written in vector form so that it looks like a single equation, for example:

\[ \Phi(t, \mathbf{x}) = 0, \quad (2) \]

where boldface font is used to denote vectors.

Example 2:
\( x_i(t)x(t)^2 - 3t \sin(x_{ii}(t)) = 8 \) is an ODE (of what order? linear or non-linear?)
\( x_i(t) = -5x(t + 7) \) is NOT an ODE!
\( x''(t) = -x(t) + \int \cos(x(\tau)) \, d\tau \) is NOT an ODE.

Question:

Why aren't the latter two examples ODEs? If you read the definition, it looks at first glance like there is nothing wrong with these examples.
What was missing in the wording of the definition? How should it be changed to make sure we exclude such cases?
Do you know how the equations of the type shown in the last two examples are called?

I skipped section 2 because it's like three lines listing ODE vs PDE, Order, and Linear vs Non-linear.

Order reduction

TLDR; ( Prof. Peter Kuchment's) idea: just use vectors to represent systems of equations, then it's one equation of vectors.

Introducing new unknown functions, an ODE or a system \(\Phi(t, \mathbf{x}) = 0, \quad (2)\) can be reduced to a first-order system:

\[ \Phi(t, x_1, x_2, \ldots, x_k, x_i) = 0 \quad (3) \]

where

\[ \begin{align} x_{1i} &= x_2 \\ x_{2i} &= x_3 \\ &\ldots \\ x_{k-1i} &= x_k \\ \end{align} \]

So now we can always deal with the first-order systems:

\[ \Phi(t, \mathbf{x}, \mathbf{x_i}) = 0 \quad (4) \]

Definition 3:

TLDR; with all due respect to Prof. Peter Kuchment his is just about whether the ODE does rely (non-autonomous) on stuff other than the initial condition \(x=0\), or if the ODE does no and only relies on the intial condition (autonomous).

Normal:

\[ x_i = F(t, \mathbf{x}) \quad (5) \]

Autonomous:

\[ x_i = F(\mathbf{x}) \quad (6) \]

and non-autonomous (5) equations.

Reduction of a non-autonomous equation to an autonomous one:

Introduce a new time \( \tau \) and consider the autonomous system:

\[ \begin{align} x'_0(\tau) &= F(t(\tau), x(\tau)) \\ t'_0(\tau) &= 1 \end{align} \]

This autonomous system is equivalent to the non-autonomous (5). In these notes, we will assume that \( x(t) \) is a differentiable function of \( t \in (a, b) \subseteq \mathbb{R}^n \), and \( F : \mathbb{R}^{n+1} \mapsto \mathbb{R}^n \). (The complex case is possible, but we will not consider it here.)

What evolutionary (i.e., depending on time \(t\)) process can be described by ODEs?

When we have a process, whether it is mechanical, biological, or another type, where the instantaneous state can be described by a set of parameters \(x\), we can represent the evolution of this process with time \(t\) using ordinary differential equations (ODEs). The parameters \(x\) become functions of time \(x(t)\), and the space of these parameters is called the phase space.

For a process to be described by ODEs, three conditions must be satisfied:

Finite-dimensional system: The system can be described by a finite number of parameters \(x_1, \ldots, x_n\). This means that the system has a finite number of variables that characterize its state. However, there are some processes, like fluid dynamics, heat conduction, and quantum mechanics, that are not finite-dimensional and cannot be described by ODEs.
Smoothness: The parameters change in a differentiable manner with time. This implies that the functions representing the parameters \(x(t)\) are smooth and can be differentiated. However, there are cases, such as shock waves, where the parameters change abruptly and are not differentiable, making ODEs inadequate for describing them.
Determinism: The process is deterministic, meaning that the state of the system at a particular moment determines its entire future behavior. Given the initial conditions \(x(t)\) at a certain moment \(t\), the future values of \(x(\tau)\) for all \(\tau\) can be determined. The fact that the system is finite-dimensional and the parameters change smoothly allows us to represent the evolution of the system using a differentiable vector function \(x(t)\). This function captures the deterministic nature of the process.

If a process satisfies these three conditions - finite-dimensionality, smoothness, and determinism - it can be described by ODEs, which provide a mathematical framework for understanding and predicting its behavior over time.

Hence, \( x'_0(t) \) is determined by \( t \) and \( x(t) \). In mathematical notations, we write that \( x'_0(t) \) is a function of \( t \) and \( x(t) \): \( x'_0(t) = F(t, x(t)) \), which is an ODE (a system of ODEs).

IVP (Initial Value Problems):

\[ \begin{align} \frac{dx}{dt} &= F(t, x) \\ x(t_0) &= x_0 \quad \quad (8) \end{align} \]

For those familiar with index notation (i.e., Kronecker delta, Levi-Civita, and generally "linear algebra without matrices"), skip the following four cards, and jump into Definition 4 .

Index Notation Aside

A note on index notation:

Thank you to Prof. David Roylance for his brief Matrix and Index Notation from MIT September 18, 2000

A vector can be represented by its components along the Cartesian axes, denoted as \(u_x, u_y, u_z\) for the displacement vector \(\mathbf{u}\). The components can also be indicated with numerical subscripts, such as \(u_1, u_2, u_3\), corresponding to the \(x, y\), and \(z\) directions. In a shorthand notation, the displacement vector can be written as \(u_i\), where the subscript \(i\) ranges over 1, 2, 3 (or 1 and 2 in two-dimensional problems). This is known as the range convention for index notation. With this convention, the vector equation \(u_i = a\) yields three scalar equations:

\[ \begin{aligned} & u_{1}=a \\ & u_{2}=a \\ & u_{3}=a \end{aligned} \]

We will often find it convenient to denote a vector by listing its components in a vertical list enclosed in braces, and this form will help us keep track of matrix-vector multiplications a bit more easily. We therefore have the following equivalent forms of vector notation:

\[ \mathbf{u}=u_{i}=\left\{\begin{array}{l} u_{1} \\ u_{2} \\ u_{3} \end{array}\right\}=\left\{\begin{array}{l} u_{x} \\ u_{y} \\ u_{z} \end{array}\right\} \]

Second-rank quantities such as stress, strain, moment of inertia, and curvature can be denoted as \(3 \times 3\) matrix arrays; for instance the stress can be written using numerical indices as

\[ [\sigma]=\left[\begin{array}{lll} \sigma_{11} & \sigma_{12} & \sigma_{13} \\ \sigma_{21} & \sigma_{22} & \sigma_{23} \\ \sigma_{31} & \sigma_{32} & \sigma_{33} \end{array}\right] \]

Here the first subscript index denotes the row and the second the column. The indices also have a physical meaning, for instance \(\sigma_{23}\) indicates the stress on the 2 face (the plane whose normal is in the 2 , or \(y\), direction) and acting in the 3 , or \(z\), direction. To help distinguish them, we'll use brackets for second-rank tensors and braces for vectors.

Using the range convention for index notation, the stress can also be written as \(\sigma_{i j}\), where both the \(i\) and the \(j\) range from 1 to 3 ; this gives the nine components listed explicitly above. (Since the stress matrix is symmetric, i.e. \(\sigma_{i j}=\sigma_{j i}\), only six of these nine components are independent.)

A subscript that is repeated in a given term is understood to imply summation over the range of the repeated subscript; this is the summation convention for index notation. For instance, to indicate the sum of the diagonal elements of the stress matrix we can write:

\[ \sigma_{k k}=\sum_{k=1}^{3} \sigma_{k k}=\sigma_{11}+\sigma_{22}+\sigma_{33} \]

The multiplication rule for matrices can be stated formally by taking \(\mathbf{A}=\left(a_{i j}\right)\) to be an \((M \times N)\) matrix and \(\mathbf{B}=\left(b_{i j}\right)\) to be an \((R \times P)\) matrix. The matrix product \(\mathbf{A B}\) is defined only when \(R=N\), and is the \((M \times P)\) matrix \(\mathbf{C}=\left(c_{i j}\right)\) given by

\[ c_{i j}=\sum_{k=1}^{N} a_{i k} b_{k j}=a_{i 1} b_{1 j}+a_{i 2} b_{2 j}+\cdots+a_{i N} b_{N k} \]

Using the summation convention, this can be written simply

\[ c_{i j}=a_{i k} b_{k j} \]

where the summation is understood to be over the repeated index \(k\). In the case of a \(3 \times 3\) matrix multiplying a \(3 \times 1\) column vector we have

\[ \left[\begin{array}{lll} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{array}\right]\left\{\begin{array}{l} b_{1} \\ b_{2} \\ b_{3} \end{array}\right\}=\left\{\begin{array}{c} a_{11} b_{1}+a_{12} b_{2}+a_{13} b_{3} \\ a_{21} b_{1}+a_{22} b_{2}+a_{23} b_{3} \\ a_{31} b_{1}+a_{32} b_{2}+a_{33} b_{3} \end{array}\right\}=a_{i j} b_{j} \]

The comma convention uses a subscript comma to imply differentiation with respect to the variable following, so \(f_{, 2}=\partial f / \partial y\) and \(u_{i, j}=\partial u_{i} / \partial x_{j}\). For instance, the expression \(\sigma_{i j, j}=0\) uses all of the three previously defined index conventions: range on \(\mathrm{i}\), sum on \(\mathrm{j}\), and differentiate:

\[ \begin{aligned} & \frac{\partial \sigma_{x x}}{\partial x}+\frac{\partial \sigma_{x y}}{\partial y}+\frac{\partial \sigma_{x z}}{\partial z}=0 \\ & \frac{\partial \sigma_{y x}}{\partial x}+\frac{\partial \sigma_{y y}}{\partial y}+\frac{\partial \sigma_{y z}}{\partial z}=0 \\ & \frac{\partial \sigma_{z x}}{\partial x}+\frac{\partial \sigma_{z y}}{\partial y}+\frac{\partial \sigma_{z z}}{\partial z}=0 \end{aligned} \]

The Kronecker delta is a useful entity is defined as

\[ \delta_{i j}= \begin{cases}0, & i \neq j \\ 1, & i=j\end{cases} \]

This is the index form of the unit matrix \(\mathbf{I}\) :

\[ \delta_{i j}=\mathbf{I}=\left[\begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array}\right] \]

So, for instance

\[ \sigma_{k k} \delta_{i j}=\left[\begin{array}{ccc} \sigma_{k k} & 0 & 0 \\ 0 & \sigma_{k k} & 0 \\ 0 & 0 & \sigma_{k k} \end{array}\right] \]

where \(\sigma_{k k}=\sigma_{11}+\sigma_{22}+\sigma_{33}\).

undefined

Now, before returning to Prof. Peter Kuchment's lovely review of the ODE, and for one last note beyond Prof. David Roylance's notes on index notation, (this time from the faceless physics department at UC Berkeley, Fall 2002): I want to address the Levi-Civita Symbol which is closely tied to index notation and the Kronecker delta.

The Levi-Civita symbol is useful for converting cross products and curls into the language of tensor analysis, and for many other purposes. The following is a summary of its most useful properties in three-dimensional Euclidean space.

The Levi-Civita symbol is defined by

\[\epsilon_{ijk}= \begin{cases}1, & \text{ if }(ijk) \text{ is an even permutation of }(123) ; \\ -1, & \text{ if }(ijk) \text{ is an odd permutation of }(123) ; \\ 0, & \text{ otherwise. }\end{cases}\]

It has 27 components, of which only 6 are nonzero. It follows directly from this definition that \(\epsilon_{ijk}\) changes sign if any two of its indices are exchanged,

\[\epsilon_{ijk}=\epsilon_{jki}=\epsilon_{kij}=-\epsilon_{jik}=-\epsilon_{ikj}=-\epsilon_{kji} .\]

The Levi-Civita symbol is convenient for expressing cross products and curls in tensor notation. For example, if \(\mathbf{A}\) and \(\mathbf{B}\) are two vectors, then

\[(\mathbf{A} \times \mathbf{B})_{i}=\epsilon_{ijk} A_{j} B_{k}\]

and

\[(\nabla \times \mathbf{B})_{i}=\epsilon_{ijk} \frac{\partial B_{k}}{\partial x_{j}}\]

Any combination of an even number of Levi-Civita symbols (or an even number of cross products and curls) can be reduced to dot products with the following system of identities. Similarly, any combination of an odd number of Levi-Civita symbols (or an odd number of cross products and curls) can be reduced to a single Levi-Civita symbol (or a cross product or a curl) plus dot products. The first is the most general:

\[\epsilon_{ijk} \epsilon_{\ell mn}=\left|\begin{array}{ccc} \delta_{i\ell} & \delta_{im} & \delta_{in} \\ \delta_{j\ell} & \delta_{jm} & \delta_{jn} \\ \delta_{k\ell} & \delta_{km} & \delta_{kn} \end{array}\right|\]

Notice that the indices \((i j k)\) label the rows, while ( \(\ell m n)\) label the columns. If this is contracted on \(i\) and \(l\), we obtain

\[\epsilon_{ijk} \epsilon_{imn}=\left|\begin{array}{cc} \delta_{jm} & \delta_{jn} \\ \delta_{km} & \delta_{kn} \end{array}\right|=\delta_{jm} \delta_{kn}-\delta_{jn} \delta_{km} .\]

This identity is the one used most often, for boiling down two cross products that have one index in common, such as \(\nabla \times(\mathbf{A} \times \mathbf{B})\). By contracting \(\epsilon_{ijk} \epsilon_{ijn}\) in \(j\) and \(m\) we obtain

\[\epsilon_{ijk} \epsilon_{ijn}=2 \delta_{kn} \text {. }\]

Finally, contracting on \(k\) and \(n\) we obtain

\[\epsilon_{ijk} \epsilon_{ijk}=6\]

It should be clear how to generalize these identities to higher dimensions.

If \(A_{ij}=-A_{ji}\) is an antisymmetric, \(3 \times 3\) tensor, it has 3 independent components that we can associate with a 3 -vector \(\mathbf{A}\), as follows:

\[A_{ij}=\left(\begin{array}{ccc} 0 & A_{3} & -A_{2} \\ -A_{3} & 0 & A_{1} \\ A_{2} & -A_{1} & 0 \end{array}\right)=\epsilon_{ijk} A_{k} .\]

The inverse of this is

\[A_{ij}=\frac{1}{2} \epsilon_{ijk} A_{k}\]

Using these identities, the multiplication of an antisymmetric matrix times a vector can be reexpressed in terms of a cross product. That is, if

\[X_{i}=A_{ij} Y_{j}\]

then

\[\mathbf{X}=\mathbf{Y} \times \mathbf{A}\]

Similarly, if \(\mathbf{A}\) and \(\mathbf{B}\) are two vectors, then

\[A_{i} B_{j}-A_{j} B_{i}=\epsilon_{ijk}(\mathbf{A} \times \mathbf{B})_{k},\]

and

\[\frac{\partial B_{j}}{\partial x_{i}}-\frac{\partial B_{i}}{\partial x_{j}}=\epsilon_{ijk}(\nabla \times \mathbf{B})_{k}\]

Finally, if \(M_{ij}\) is a \(3 \times 3\) matrix (or tensor), then

\[\operatorname{det} M=\epsilon_{ijk} M_{1i} M_{2j} M_{3k}=\frac{1}{6} \epsilon_{ijk} \epsilon_{\ell mn} M_{i\ell} M_{jm} M_{kn} .\]

The Levi-Civita symbol has been defined here only on \(\mathbb{R}^{3}\), but most of the properties above are easily generalized to \(\mathbb{R}^{n}\) (including the case \(n=2\) ). It only transforms as a tensor under proper orthogonal changes of coordinates, which is why we are calling it a "symbol" instead of a "tensor." It can, however, be used to create so-called tensor densities on arbitrary manifolds with a metric, and has fascinating applications in Hodge-de Rham theory in differential geometry.

As a postscript to that last note on the Levi-Civita symbol (you should be glad I have not touched on the dozens of notations for differential geometry, & all that.) I just wanted to give the definition and properties of the \(n\)-dimensional case for Levi-Civita from Wikipedia: Levi-Civita symbol

\(n\)-Levi-Civita Definition:

More generally, in \(n\) dimensions, the Levi-Civita symbol is defined by:

\[ \varepsilon_{a_1 a_2 a_3 \ldots a_n} = \begin{cases} +1 & \text{if }(a_1, a_2, a_3, \ldots, a_n) \text{ is an even permutation of } (1, 2, 3, \dots, n) \\ -1 & \text{if }(a_1, a_2, a_3, \ldots, a_n) \text{ is an odd permutation of } (1, 2, 3, \dots, n) \\ 0 & \text{otherwise} \end{cases} \]

Thus, it is the sign of the permutation in the case of a permutation, and zero otherwise.

Using the capital pi notation \(\Pi\) for ordinary multiplication of numbers, an explicit expression for the symbol is:

\[ \begin{align*} \varepsilon_{a_1 a_2 a_3 \ldots a_n} & = \prod_{1 \leq i < j \leq n} \text{sgn} (a_j - a_i) \\ &=\text{sgn}(a_2 - a_1) \text{sgn}(a_3 - a_1) \dotsm \text{sgn}(a_n - a_1) \text{sgn}(a_3 - a_2) \text{sgn}(a_4 - a_2) \dotsm \text{sgn}(a_n - a_2) \dotsm \text{sgn}(a_n - a_{n-1}) \end{align*} \]

where the signum function (denoted \(\text{sgn}\)) returns the sign of its argument while discarding the absolute value if nonzero. The formula is valid for all index values, and for any \(n\) (when \(n = 0\) or \(n = 1\), this is the empty product). However, computing the formula above naively has a time complexity of \(O(n^2)\), whereas the sign can be computed from the parity of the permutation from its disjoint cycles in only \(O(n \log(n))\) cost.

Now, the properties of \(n\)-Levi-Civita are harder to shoot off because the \(n\)-case is more like the top rung of the Levi-Civita latter. Where low dimensional intuition is merely helpful for the definition and general understanding of Levi-Civita, such low dimensional intuition is almost necessary for familiarity with the \(n\)-dimensional properties.

TLDR; this first paragraph describes the tensor properties of Levi-Civita, but that might warrant a whole different post, so I paraphrased plus reduced the font size and color it to skip or read as you please.

_{A permutation tensor has components given by the Levi-Civita symbol in an orthonormal basis. It is a tensor of covariant rank.}
_{The Levi-Civita symbol remains unchanged under pure rotations and in coordinate systems related by orthogonal transformations. However, it is classified as a pseudotensor because it does not change under certain orthogonal transformations that would result in a sign change if it were a tensor.}
_{Taking a cross product using the Levi-Civita symbol yields a pseudovector, not a vector.}
_{Under a general coordinate change, the components of the permutation tensor are scaled by the Jacobian of the transformation matrix. This means that in different coordinate frames, the components can differ from those of the Levi-Civita symbol by an overall factor. In an orthonormal frame, the factor will be ±1 depending on the orientation.}
_{In index-free tensor notation, the Hodge dual replaces the Levi-Civita symbol.}
_{Einstein notation allows the elimination of summation symbols, where a repeated index implies summation over that index.}

\[\varepsilon_{ijk} \varepsilon^{imn} \equiv \sum_{i=1,2,3} \varepsilon_{ijk} \varepsilon^{imn}\] In the following examples, Einstein notation is used.

Two dimensions

In two dimensions, when all \(i, \ j, \ m, \ n\) each take the values 1 and 2:

\[\varepsilon_{ij} \varepsilon^{mn} = {\delta_i}^m {\delta_j}^n - {\delta_i}^n {\delta_j}^m\]

\[\varepsilon_{ij} \varepsilon^{in} = {\delta_j}^n\]

\[\varepsilon_{ij} \varepsilon^{ij} = 2.\]

Three dimensions

In three dimensions, when all \(i, \ j, \ m, \ n\) each take values 1, 2, and 3:

\[\varepsilon_{ijk} \varepsilon^{pqk}=\delta_i{}^{p}\delta_j{}^q - \delta_i{}^q\delta_j{}^p\]

\[\varepsilon_{jmn} \varepsilon^{imn}=2{\delta_j}^i\]

\[\varepsilon_{ijk} \varepsilon^{ijk}=6.\]

Product

The Levi-Civita symbol is related to the Kronecker delta. In three dimensions, the relationship is given by the following equations (vertical lines denote the determinant):

\[\begin{align} \varepsilon_{ijk}\varepsilon_{lmn} &= \begin{vmatrix} \delta_{il} & \delta_{im} & \delta_{in} \\ \delta_{jl} & \delta_{jm} & \delta_{jn} \\ \delta_{kl} & \delta_{km} & \delta_{kn} \\ \end{vmatrix} \\[6pt] &= \delta_{il}\left( \delta_{jm}\delta_{kn} - \delta_{jn}\delta_{km}\right) - \delta_{im}\left( \delta_{jl}\delta_{kn} - \delta_{jn}\delta_{kl} \right) + \delta_{in} \left( \delta_{jl}\delta_{km} - \delta_{jm}\delta_{kl} \right). \end{align}\]

A special case of this result is

\[\sum_{i=1}^3 \varepsilon_{ijk}\varepsilon_{imn} = \delta_{jm}\delta_{kn} - \delta_{jn}\delta_{km}\]

sometimes called the "contracted epsilon identity"

In Einstein notation, the duplication of the \(i\) index implies the sum on \(i\). The previous is then denoted \( \varepsilon_{ijk}\varepsilon_{ijn} = \delta_{jm}\delta_{kn} - \delta_{jn}\delta_{km} \).

\[\sum_{i=1}^3 \sum_{j=1}^3 \varepsilon_{ijk}\varepsilon_{ijn} = 2\delta_{kn}\]

Hopefully, that looked quite similar to the UC Berkeley physics exposition because here are the properties for the \(n\)-Levi-Civita case:

\(n\)-Levi-Civita Properties:

In \(n\) dimensions, when all \(i_1, \ldots, i_n, j_1, \ldots, j_n\) take values from \(1, 2, \ldots, n\):

\((A) \quad \varepsilon_{i_1 \dots i_n} \varepsilon^{j_1 \dots j_n} = \delta^{j_1 \dots j_n}_{i_1 \dots i_n} \)
\((B) \quad \varepsilon_{i_1 \dots i_k~i_{k+1} \dots i_n} \varepsilon^{i_1 \dots i_k~j_{k+1} \dots j_n} = \delta_{ i_1 \ldots i_k~i_{k+1} \ldots i_n}^{i_1 \dots i_k~j_{k+1}\ldots j_n} = k!~\delta^{j_{k+1} \dots j_n}_{i_{k+1} \dots i_n}\)
\((C) \quad \varepsilon_{i_1 \dots i_n}\varepsilon^{i_1 \dots i_n} = n!\)

where the exclamation mark (\(!\)) denotes the factorial, and \(\delta^{\text{α} \ldots}_{\text{β} \ldots}\) is the generalized Kronecker delta. For any \(n\), the property:

\[\sum_{i, j, k, \ldots = 1}^n \varepsilon_{ijk\ldots}\varepsilon_{ijk\ldots} = n!\]

follows from the facts that:

every permutation is either even or odd,
\(1 = (+1)^2 = (-1)^2 = 1\), and
the number of permutations of any \(n\)-element set is exactly \(n!\).

The particular case above of

\((B) \quad \varepsilon_{i_1 \dots i_k~i_{k+1} \dots i_n} \varepsilon^{i_1 \dots i_k~j_{k+1} \dots j_n} = \delta_{ i_1 \ldots i_k~i_{k+1} \ldots i_n}^{i_1 \dots i_k~j_{k+1}\ldots j_n} = k!~\delta^{j_{k+1} \dots j_n}_{i_{k+1} \dots i_n}\)

with \(k = n-2\) is:

\[\varepsilon_{i_1\ldots i_{n-2}jk}\varepsilon^{i_1\ldots i_{n-2}lm} = (n-2)!(\delta_j^l\delta_k^m - \delta_j^m\delta_l^k)\]

Product

In general, for \(n\) dimensions, the product of two Levi-Civita symbols can be written as:

\[\varepsilon_{i_1 i_2 \ldots i_n} \varepsilon_{j_1 j_2 \ldots j_n} = \begin{vmatrix} \delta_{i_1 j_1} & \delta_{i_1 j_2} & \ldots & \delta_{i_1 j_n} \\ \delta_{i_2 j_1} & \delta_{i_2 j_2} & \ldots & \delta_{i_2 j_n} \\ \vdots & \vdots & \ddots & \vdots \\ \delta_{i_n j_1} & \delta_{i_n j_2} & \ldots & \delta_{i_n j_n} \\ \end{vmatrix}\]

Proof: Both sides change signs upon switching two indices, so without loss of generality, assume \(i_1 \leq \ldots \leq i_n\) and \(j_1 \leq \ldots \leq j_n\). If some \(i_c = i_{c+1}\), then the left side is zero, and the right side is also zero since two of its columns are equal. The same applies if \(j_c = j_{c+1}\). Finally, if \(i_1 < \ldots < i_n\) and \(j_1 < \ldots < j_n\), then both sides equal 1.

Now, time to put that index notation to use!

Ha! Just kidding! What is a norm & why is it outside the scope of these notes?

To begin, a well-phrased question from Math Stack Exchange:

So if \(X\) is a vector space, and you define a norm, \(x \mapsto \| x \|\), on it, then the bounded subset, \(V = \{ x \in X: \|x\| < \infty \}\), is automatically a subspace.

This follows from the definition of a norm, so for all \(x,y \in V\), \(c \in \mathbb{C}\), \(\|x + c y\| \le \|x\| + \|cy\| = \|x\|+ |c| \|y\| < \infty\), so \(x+cy \in V\).

Is this the reasoning behind the definition of a norm?

Now, the top answer to this MSE question is from QC in 2011. While I love the guy, it is basically links to some funcitonal analysis resources. So, no.

There is a better answer from another MSE question:

Norms are inspired from the Euclidean distance function and refer to a generalized class of metrics \(d\) which for a normed linear space \(V\), satisfy the properties:

\(d(a,b) = d(a-b,0) = d(0,b-a) \quad \forall \, a,b \in V\)
\(d(\lambda u,0) = |\lambda| d(u,0) \quad \forall \, u \in V\)
\(d(a,b) \le d(a,c)+d(c,b)\)
\(d(a,b) \ge 0 \quad \) with equality \( \Leftrightarrow a=b.\)

Ok, I encourage you to read more about norms despite the looming shadow of functional analysis just like I encourage you to learn more about measures despite the looming shadow of real analysis.

A NOTE FROM THE FUTURE: (this is intertextuality, not grift) I discuss this more in notes - cont. measure & intro prob

Definition 4:

TLDR; Prof. Peter Kuchment says differentials are matrices (linear transforms) & small-norm differnetials are just first order Taylor polynomials, oh my.

Even though Prof. Peter Kuchment does alright, I tried to clarify this a bit...

A given differentiable function \(F: \mathbb{R}^m \rightarrow \mathbb{R}^n\) represents a function that maps points from an \(m\)-dimensional space to an \(n\)-dimensional space. For example, \(F\) could be a mapping from a 3D space to a 2D space.

The differential \(DF(y)\) of function \(F\) at a point \(y\) is the linear mapping from \(\mathbb{R}^m\) to \(\mathbb{R}^n\). In other words, it is a linear transformation that takes vectors in the \(m\)-dimensional space as input and produces vectors in the \(n\)-dimensional space as output.

The matrix representation of \(DF(y)\) is given by the expression \(\{DF\}_{ij}(y) = \frac{\partial F_i}{\partial x_j}(y)\), where \(\{DF\}_{ij}(y)\) represents the element in the \(i\)-th row and \(j\)-th column of the matrix. This element is calculated by taking the partial derivative of the \(i\)-th component of \(F\) with respect to the \(j\)-th variable evaluated at point \(y\).

For a small vector \(x\) with a small norm, \(DF(y)x\) represents the linear approximation of the change in the function \(F\) between points \(y + x\) and \(y\). This approximation is given by the formula \(F(y + x) = F(y) + DF(y)x + o(|x|)\), where \(o(|x|)\) denotes a term that is "smaller" than \(|x|\) as \(x\) approaches the zero vector. This formula is known as the Taylor formula of the first order or the linearization formula.

Theorem 5: Existence and Uniqueness Theorem.

This theorem states that if we have an open domain \(\Omega\) in \(\mathbb{R}^n\), an open segment \((a, b)\) in the real line \(\mathbb{R}\), and continuous functions \(F(t, x)\) and \(D_xF(t, x)\) defined in \((a, b) \times \Omega\), then for any initial point \((t_0, x_0)\) in \((a, b) \times \Omega\), there exists a unique solution \(x(t)\) to the initial value problem (IVP) \(\frac{dx}{dt} = F(t, x)\) with the initial condition \(x(t_0) = x_0\). This solution \(x(t)\) is defined in a neighborhood of \(t_0\).

Remark 6:

The theorem does not guarantee global uniqueness of the solution, meaning that the uniqueness property may not hold for the entire interval \((a, b)\). There may be points where multiple solutions exist.
The proof of the theorem shows that the condition of Lipschitz continuity for \(F\) is sufficient for guaranteeing uniqueness. Lipschitz continuity means that there exists a constant \(K\) such that \(|F(t, x) - F(t, y)| \leq K|x - y|\) for all \((t, x)\) and \((t, y)\) in the domain. This condition is weaker than requiring differentiability of \(F\) as stated in the theorem.

Now about the proof: contraction mappings and such

Some notions and notations:

For a continuous function \( x: [a, b] \rightarrow \mathbb{R}^n \) on a finite segment \([a, b]\), we denote by

\( \| x \| = \max_{t \in [a, b]} \| x(t) \| \quad (10) \)

its norm in the space of such continuous functions, where \( \| x \| \) is the Euclidean norm of a vector in \( \mathbb{R}^n \).

In the given context, we have a continuous function \(x: [a, b] \rightarrow \mathbb{R}^n\), where \([a, b]\) represents a finite segment. The norm of this function, denoted as \(\| x \|\), is defined as the maximum value of the norm of \(x(t)\) for \(t\) in the interval \([a, b]\) (equation 10). The norm of a vector \(x(t)\) in \(\mathbb{R}^n\) is typically calculated using the Euclidean norm, which measures the length of the vector in a geometric sense.

The notation \(C^r\) refers to a class of functions that are \(r\) times continuously differentiable. This notation can be applied to both scalar-valued and vector-valued functions, and the number of variables can vary. In other words, a function belonging to the \(C^r\) class can be differentiated up to \(r\) times while remaining continuous. The value of \(r\) indicates the degree of smoothness or differentiability of the function. Therefore, when encountering the notation \(C^r\) in the context of a specific function, it signifies that the function possesses a certain level of continuity and differentiability.

Definition 7:

TLDR; a real-valued function \( A(x) \) is considered a contraction if it fulfills the inequality condition, indicating that the function values decrease as the input values become closer to each other, with the rate of decrease determined by the constant \( K \), which must be less than 1.

A real-valued function \( A(x) \) on \( \mathbb{R} \) is a contraction if it satisfies the inequality

\[ | A(x) - A(y) | \leq K | x - y | \]

for some \( K < 1 \) and all real \( x \) and \( y \).

Remark 8:

Condition of continuous differentiability of \( A \) and estimate \( | A'(x) | \leq k < 1 \) guarantee that \( A \) is a contraction.
The definition of a contraction can be naturally extended to any metric space \( M \) with a metric (distance function) \( \rho \) instead of the real line, replacing \( | A(x) - A(y) | \), \( | x - y | \) above with \( \rho(A(x), A(y)) \), \( \rho(x, y) \).

A simple instance of the contraction mapping principle:

Theorem 9: Let \( A(x) \) be a contraction on \( \mathbb{R} \). Then:

The equation \( x = A(x) \) has a unique solution \( x^* \) (called the fixed point of \( A(x) \)).
This fixed point can be found as \( x^* = \lim_{j \to \infty} x_j \), where \( x_0 \) is arbitrary and \( x_{i+1} = A(x_i) \).

The general contraction mapping principle:

In the given context, let \(X\) be a metric space, which means it is a set equipped with a metric or "distance" function \(\rho(x, y)\). The metric satisfies the following properties: \(\rho\) is non-negative (\(\rho \geq 0\)), symmetric (\(\rho(x, y) = \rho(y, x)\)), \(\rho(x, y) = 0\) only if \(x = y\), and it satisfies the triangle inequality (\(\rho(x, y) \leq \rho(x, z) + \rho(y, z)\)).

Furthermore, assume that \(X\) is a complete metric space. This means that if a sequence \((x_n)\) is such that \(\rho(x_n, x_m) \to 0\) as \(n, m \to \infty\), then there exists a limit \(x\) such that \(\rho(x_n, x) \to 0\). In other words, all Cauchy sequences in \(X\) converge to a point in \(X\).

Now, consider a mapping \(A: X \to X\). This mapping is called a contraction if it satisfies the inequality \(\rho(A(x), A(y)) \leq K \rho(x, y)\) for some constant \(K < 1\) and for all \(x, y\) in \(X\). This inequality essentially states that applying the mapping \(A\) reduces the distance between any two points in \(X\) by a factor of \(K\) or less.

Theorem 10

If \(A(x)\) is a contraction on the complete metric space \(X\), then the equation \(x = A(x)\) has a unique solution \(x^*\) in \(X\). This solution is referred to as the fixed point of \(A(x)\).
The fixed point \(x^*\) can be found by starting with an arbitrary point \(x_0\) and repeatedly applying the mapping \(A\) to obtain a sequence \((x_i)\). The fixed point \(x^*\) is the limit of this sequence, denoted as \(\lim_{j \to \infty} x_j\).

Basically, Theorem 10 guarantees the existence and uniqueness of a fixed point for a contraction mapping on a complete metric space. It provides a method for finding the fixed point by iteratively applying the mapping and converging to the fixed point through a sequence of points.

An equivalent integral equation reformulation of the IVP \( \frac{dx}{dt} = F(t, x) \Longrightarrow x(t_0) = x_0 \quad (8)\):

\( x(t) = x_0 + \int_{t_0}^t F(\tau, x(\tau)) d\tau \quad (11)\)

Lemma 11: Continuous solutions of \( x(t) = x_0 + \int_{t_0}^t F(\tau, x(\tau)) d\tau \quad (11)\) are exactly the continuously differentiable solutions of \( \frac{dx}{dt} = F(t, x) \Longrightarrow x(t_0) = x_0 \quad (8)\).

Now the proof of Theorem 5 would be concluded if we prove the existence and uniqueness of continuous solutions of \( x(t) = x_0 + \int_{t_0}^t F(\tau, x(\tau)) d\tau \quad (11)\) on a small segment around \( t_0 \).

The metric space: Consider the interval \([t_0 - d, t_0 + d] \subset (a, b)\) with a small \( d \) (it will be determined later on how small it should be). We also consider a ball \( B = \{ x \in \mathbb{R}^n \,|\, \| x - x_0 \| \leq b \} \) that is entirely contained in \( \Omega \). Now define on the set \( X \) of all continuous functions \( x(t) \) from \([t_0 - d, t_0 + d]\) to \( B \) the max norm \(\| x \| = \max_{t \in [a, b]} \| x(t) \| \quad (10)\) as before and the corresponding metric \( \rho(x, y) = \| x - y \| \).

Define the following integral operator \( x \to A(x) \):

\[ [A(x)](t) = x_0 + \int_{t_0}^t F(\tau, x(\tau)) d \tau \quad (12) \]

Note that this definition works for functions that map \([t_0 - d, t_0 + d]\) to \( B \).

Lemma 12: For a sufficiently small \( d \), the operator \( A(x) \) maps the above class of functions into itself and is a contraction, i.e., \( \| A(x) - A(y) \| \leq \| x - y \| \) for some \( k < 1 \).

Corollary 13:

The integral equation \( x(t) = x_0 + \int_{t_0}^t F(\tau, x(\tau)) d\tau \quad (11)\) has a unique continuous solution in a neighborhood of \( t_0 \)

2. This solution can be found as the limit in the norm \( \| x \| = \max_{t \in [a, b]} \| x(t) \| \quad (10) \) of Picard iterations:

\[ y_{i+1}(t) = x_0 + \int_{t_0}^t F(\tau, y_i(\tau)) d\tau \quad (14) \]

where \( y_0 \) can be chosen arbitrarily in such a way that \( y_0(t_0) = x_0 \), e.g., \( y_0 \equiv x_0 \).

3. The Uniqueness and Existence Theorem 5 is proven.

An existence theorem:

Theorem 14: Peano's existence theorem. Continuity of \( F \) alone guarantees local existence of a solution of the IVP \( \frac{dx}{dt} = F(t, x) \Longrightarrow x(t_0) = x_0 \quad (8)\).

Remark 15:

In the proof of Peano's theorem, the solution is also found as the limit of some sequence of functions, but rather than Picard's iterations, a sequence of Euler's piecewise linear functions (recall the Euler's method of numerical solution) is constructed.
Example of the IVP problem: \(\frac{dx}{dt} = 3x^{2/3}\), \(x(0) = 0\) shows that the conditions of Peano's theorem cannot guarantee uniqueness.

Geometry of ODEs: vector fields

Consider the autonomous case:

\[ \frac{dx}{dt} = F(x), \quad x(t) \in \mathbb{R}^n \quad (15) \]

We can think that \( F(x) \) assigns to each point \( x \) a vector \( F(x) \) (a vector "grows" out of any point). Then we call \( F(x) \) a vector field. We will consider at least continuous (or smoother) functions \( F(x) \) and corresponding vector fields. If \( F \) is of some class \( C^r \), we will also say that the field is of this class.

Lemma 16:

Trajectories of solutions of (15) are exactly the curves that are tangent at each point to the vector field corresponding to this equation. Such curves are called phase curves of the field.

Note that vector fields are NOT defined for non-autonomous systems. The field \( F(x) \) is said to be non-singular at a point \( x_0 \), if \( F(x_0) \neq 0 \).

Exercise 17:

Show that the fields arising from turning a non-autonomous system into autonomous ones are always non-singular at all points. Example of a non-singular vector field: a constant vector field, where \( F(x) \) is a constant non-zero vector.

Question 18: Is the existence and uniqueness theorem obvious for a constant vector field?

A diffeomorphism of class \( C^r \) is a mapping \( G \) from a domain such that it is one-to-one and both \( G \) and its inverse \( G^{-1} \) are of mappings class \( C^r \). In other words, a diffeomorphism smoothly deforms the domain. At each point \( x \), the differential \( (DG)(x) \) is an invertible linear mapping of vectors in \( \mathbb{R}^n \). One can act by diffeomorphisms on vector fields as well. One can come up with a right definition using the following heuristics: Let \( x(t) \) be a solution of the equation defined by our vector field: \( x_0 = F(x) \). We can act on this solution by our diffeomorphism to get a new function \( x_G(t) = G(x(t)) \). Then the chain rule gives
\[ x_0^G = (DG)(x) x_0 = (DG)(x) F(x) = (DG)(G^{-1}(x_G)) F(G^{-1}(x_G)) \]
In other words, the \( G \)-modified function \( x_G \) satisfies the ODE \( y_0 = F_G(y) \) with a vector field \( F_G(y) = (DG)(G^{-1}(y)) F(G^{-1}(y)) \).

Definition 19:

TLDR; this is about creating a new vector field called \( F_G(x) \). This is done by taking an existing vector field \( F(x) \) and using a special transformation called a diffeomorphism \( G \). We apply \( G \) to the point \( x \), then take the inverse of that transformed point \( G^{-1}(x) \). We use \( G^{-1}(x) \) as an input to \( F(x) \) and get a vector \( F(G^{-1}(x)) \). Finally, we use the differential of the diffeomorphism \( DG \) on the vector \( F(G^{-1}(x)) \) to get the new vector \( F_G(x) \).

Let \( F(x) \) be a vector field and \( G \) be a diffeomorphism. Then one defines a new vector field as follows: \( F_G(x) = (DG)(G^{-1}(x)) F(G^{-1}(x)) \).

Theorem 20: Vector Field Rectification Theorem.

Any vector field of class \( C^r \) in a neighborhood of any of its non-singular point \( x_0 \) can be reduced to a constant field ("rectified") by applying a diffeomorphism of class \( C^r \). One of the exercises is to show that the rectification theorem implies the existence and uniqueness one.

Question: Can one do the converse, i.e., get an idea of the local construction of the rectifying diffeomorphism from a known solution?

Dependence of solutions on parameters and initial data.

The solution of the IVP \( \frac{dx}{dt} = F(t, x) \Longrightarrow x(t_0) = x_0 \quad (8)\) depends on the values of \( t_0 \) and \( x_0 \). How smooth is this dependence? Another important question: Assume that the right hand side (the vector field) also depends on some parameter(s) \( \mu \):

\[ \frac{dx}{dt} = F(t, x, \mu), \quad x(t_0) = x_0. \quad (16) \]

How smoothly does the solution depend on the parameter? In fact, it can be seen that dependence on the initial data reduces to dependence on parameters. Indeed, introducing a new time variable \( \tau = t - t_0 \) and a new spatial variable \( y = x - x_0 \), one reduces \( \frac{dx}{dt} = F(t, x) \Longrightarrow x(t_0) = x_0 \quad (8)\) to:

\[ \frac{dy}{d\tau} = F(\tau + t_0, y + x_0), \quad y(0) = 0. \quad (17) \]

Now all variable parameters are in the right hand side rather than in the initial data (which become constant). So, this is the only case to handle.

Theorem 21: Let the vector field \( F(x, \mu) \) (where \( \mu \) belongs to an open domain of a space \( \mathbb{R}^m \)) be of class \( C^r \). Let also \( F(x_0, \mu_0) \neq 0 \). Then the (unique) solution \( x(t, t_0, x, x_0, \mu) \) of the IVP:

\[ \frac{dx}{dt} = F(x, \mu), \quad x(t_0) = x_0. \quad (18) \]

depends differentiably of class \( C^r \) on \( (t, t_0, x, x_0, \mu) \) for sufficiently small \( |t - t_0| \), \( |x - x_0| \), \( |\mu - \mu_0| \).

Extendability of local solutions.

Our theorems guaranteed the existence of a local solution only with no guarantee of how long it will survive. Simple examples show the disappearance of solutions into a singular point. Even without singular points, a solution curve can grow fast and disappear in a finite time. An example is the IVP \( \frac{dx}{dt} = x^2 \), \( x(0) = 1 \) that has the solution \( x = \frac{1}{1 - t} \) that disappears at infinity when \( t \) approaches 1. Are there any other options? Answer: no.

Theorem 22: Extendability Theorem.

Let \( N \) be a compact (bounded closed) subset in \( \Omega \) (the domain where the smooth field is defined). Let also \( F \) have no singular points in \( N \). Then any local solution of \( \frac{dx}{dt} = F(t, x) \Longrightarrow x(t_0) = x_0 \quad (8)\) in \( (a, b) \times \Omega \) can be extended forward (for \( t > t_0 \)) and backward (for \( t < t_0 \)) either indefinitely or until it reaches the boundary of \( N \).

Boundary value problems (BVPs)

Here the conditions are imposed at both ends of a time interval, rather than at one end only in the IVP case.
Important applications.
The number of conditions should still be correct (depending on the order of the system and the number of unknown functions).
No such nice existence and uniqueness theorem.

Table of Contents