1 Introduction

This review discusses fundamental tools from the analytical and numerical theory underlying the Einstein field equations as an evolution problem on a finite computational domain. The process of reaching the current status of numerical relativity after decades of effort not only has driven the community to use state of the art techniques but also to extend and work out new approaches and methodologies of its own. This review discusses some of the theory involved in setting up the problem and numerical approaches for solving it. Its scope is rather broad: it ranges from analytical aspects related to the well-posedness of the Cauchy problem to numerical discretization schemes guaranteeing stability and convergence to the exact solution.

At the continuum, emphasis is placed on setting up the initial-boundary value problem (IBVP) for Einstein’s equations properly, by which we mean obtaining a well-posed formulation, which is flexible enough to incorporate coordinate conditions, which allow for long-term and accurate stable numerical evolutions. Here, the well-posedness property is essential, in that it guarantees the existence of a unique solution, which depends continuously on the initial and boundary data. In particular, this assures that small perturbations in the data do not get arbitrarily amplified. Since such small perturbations do appear in numerical simulations because of discretization errors or finite machine precision, if such unbounded growth were allowed, the numerical solution would not converge to the exact one as resolution is increased. This picture is at the core of Lax’ historical theorem, which implies that the consistency of the numerical scheme is not sufficient for its solution to converge to the exact one. Instead, the scheme also needs to be numerically stable, a property, which is the discrete counterpart of well-posedness of the continuum problem.

While the well-posedness of the Cauchy problem in general relativity in the absence of boundaries was established a long time ago, only relatively recently has the IBVP been addressed and well-posed problems formulated. This is mainly due to the fact that the IBVP presents several new challenges, related to constraint preservation, the minimization of spurious reflections, and well-posedness. In fact, it is only very recently that such a well-posed problem has been found for a metric based formulation used in numerical relativity, and there are still open issues that need to be sorted out. It is interesting to point out that the IBVP in general relativity has driven research, which has led to well-posedness results for second-order systems with a new large class of boundary conditions, which, in addition to Einstein’s equations, are also applicable to Maxwell’s equations in their potential formulation.

At the discrete level, the focus of this review is mainly on designing numerical schemes for which fast convergence to the exact solution is guaranteed. Unfortunately, no or very few general results are known for nonlinear equations and, therefore, we concentrate on schemes for which stability and convergence can be shown at the linear level, at least. If the exact solution is smooth, as expected for vacuum solutions of Einstein’s field equations with smooth initial data and appropriate gauge conditions, at least as long as no curvature singularities form, it is not unreasonable to expect that schemes guaranteeing stability at the linearized level, perhaps with some additional filtering, are also stable for the nonlinear problem. Furthermore, since the solutions are expected to be smooth, emphasis is placed here on using fast converging space discretizations, such as highorder finite-difference or spectral methods, especially those which can be applied to multi-domain implementations.

The organization of this review is at follows. Section 3 starts with a discussion of well-posedness for initial-value problems for evolution problems in general, with special emphasis on hyperbolic ones, including their algebraic characterization. Next, in Section 4 we review some formulations of Einstein’s equations, which yield a well-posed initial-value problem. Here, we mainly focus on the harmonic and BSSN formulations, which are the two most widely used ones in numerical relativity, as well as the ADM formulation with different gauge conditions. Actual numerical simulations always involve the presence of computational boundaries, which raises the need of analyzing well-posedness of the IBVP. For this reason, the theory of IBVP for hyperbolic problems is reviewed in Section 5, followed by a presentation of the state of the art of boundary conditions for the harmonic and BSSN formulations of Einstein’s equations in Section 6, where open problems related with gauge uniqueness are also described.

Section 7 reviews some of the numerical stability theory, including necessary eigenvalue conditions. These are quite useful in practice for analyzing complicated systems or discretizations. We also discuss necessary and sufficient conditions for stability within the method of lines, and Runge-Kutta methods. Sections 8 and 9 are devoted to two classes of spatial approximations: finite differences and spectral methods. Finite differences are rather standard and widespread, so in Section 8 we mostly focus on the construction of optimized operators of arbitrary high order satisfying the summation-by-parts property, which is useful in stability analyses. We also briefly mention classical polynomial interpolation and how to systematically construct finite-difference operators from it. In Section 9 we present the main elements and theory of spectral methods, including spectral convergence from solutions to Sturm-Liouville problems, expansions in orthogonal polynomials, Gauss quadratures, spectral differentiation, and spectral viscosity. We present several explicit formulae for the families of polynomials most widely used: Legendre and Chebyshev. Section 10 describes boundary closures. In the present context they refer to procedures for imposing boundary conditions leading to stability results. We emphasize the penalty technique, which applies to both finite-difference methods of arbitrary high-order and spectral ones, as well as outer and interface boundaries, such as those appearing when there are multiple grids as in complex geometries domain decompositions. We also discuss absorbing boundary conditions for Einstein’s equations. Finally, Section 11 presents a random sample of approaches in numerical relativity using multiple, semi-structured grids, and/or curvilinear coordinates. In particular, some of these examples illustrate many of the methods discussed in this review in realistic simulations.

There are many topics related to numerical relativity, which are not covered by this review. It does not include discussions of physical results in general relativity obtained through numerical simulations, such as critical phenomena or gravitational waveforms computed from binary blackhole mergers. For reviews on these topics we refer the reader to [223] and [337, 122], respectively. See also [9, 45] for recent books on numerical relativity. Next, we do not discuss setting up initial data and solving the Einstein constraints, and refer to [133]. For reviews on the characteristic and conformal approach, which are only briefly mentioned in Section 6.4, we refer the reader to [432] and [172], respectively. Most of the results specific to Einstein’s field equations in Sections 4 and 6 apply to four-dimensional gravity only, though it should be possible to generalize some of them to higher-dimensional theories. Also, as we have already mentioned, the results described here mostly apply to the vacuum field equations, in which case the solutions are expected to be smooth. For aspects involving the presence of shocks, such as those present in relativistic hydrodynamics we refer the reader to [165, 295]. Finally, see [352] for a more detailed review on hyperbolic formulations of Einstein’s equations, and [351] for one on global existence theorems in general relativity. Spectral methods in numerical relativity are discussed in detail in [215]. The 3+1 approach to general relativity is thoroughly reviewed in [214]. Finally, we refer the reader to [126] for a recent book on general relativity and the Einstein equations, which, among many other topics, discusses local and global aspects of the Cauchy problem, the constraint equations, and self-gravitating matter fields such as relativistic fluids and the relativistic kinetic theory of gases.

Except for a few historical remarks, this review does not discuss much of the historical path to the techniques and tools presented, but rather describes the state of the art of a subset of those which appear to be useful. Our choice of topics is mostly influenced by those for which some analysis is available or possible.

We have tried to make each section as self-consistent as possible within the scope of a manageable review, so that they can be read separately, though each of them builds from the previous ones. Numerous examples are included.

2 Notation and Conventions

Throughout this article, we use the following notation and conventions. For a complex vector u ∈ ℂm, we denote by u* its transposed, complex conjugate, such that u · v := u*v is the standard scalar product for two vectors u, v ∈ ℂm. The corresponding norm is defined by \(\vert u\vert := \sqrt {{u^\ast}u}\). The norm of a complex, m × k matrix A is

$$\vert A\vert : = \underset {u \in {{\mathbb {C}}^k}\backslash \{0\}} {\sup} {{\vert Au\vert} \over {\vert u\vert}}.$$

The transposed, complex conjugate of A is denoted by A*, such that v · (Au) = (A*v) · u for all u ∈ ℂk and v ∈ ℂm. For two Hermitian m × m matrices A = A* and B = B*, the inequality AB means u · Auu · Bu for all u ∈ ℂm. The identity matrix is denoted by I.

The spectrum of a complex, m × m matrix A is the set of all eigenvalues of A,

$$\sigma (A): = \{\lambda \in {\mathbb C}:\lambda I - A{\rm{is not invertible}}\} ,$$

which is real for Hermitian matrices. The spectral radius of A is defined as

$$\rho (A): = \max \{\vert \lambda \vert :\lambda \in \sigma (A)\}.$$

then, the matrix norm |B| of a complex m × k matrix B can also be computed as \(\vert B\vert := \sqrt {\rho ({B^\ast}B)}\).

Next, we denote by L2(U) the class of measurable functions ƒ: U ⊂ ℝn → ℂm on the open subset U of ℝn, which are square-integrable. Two functions ƒ, gL2(U), which differ from each other only by a set of measure zero, are identified. The scalar product on L2(U) is defined as

$$\left\langle {f,\,g} \right\rangle : = \int\limits_U f {(x)^{\ast}}g(x){d^n}x,\quad f,g \in {L^2}(U),$$

and the corresponding norm is \(\Vert f\Vert := \sqrt {\langle f,f\rangle}\). According to the Cauchy-Schwarz inequality we have

$$\left\langle {f,g} \right\rangle \leq \left\Vert f \right\Vert \left\Vert g \right\Vert ,\quad f,g \in {L^2}(U).$$

The Fourier transform of a function ƒ, belonging to the class \(C_0^\infty ({{\rm{\mathbb R}}^n})\) of infinitely-differentiable functions with compact support, is defined as

$$\hat f(k): = {1 \over {{{(2\pi)}^{n/2}}}}\int {{e^{- ik \cdot x}}} f(x){d^n}x,\quad k \in {{\mathbb R}^n}.$$

According to Parseval’s identities, \(\langle \hat f,\hat g\rangle = \langle f,g\rangle\) for all \(f,g \in C_0^\infty ({{\rm{\mathbb R}}^n})\), and the map \(C_0^\infty ({{\rm{{\mathbb R}}}^n}) \rightarrow {L^2}({{\rm{{\mathbb R}}}^n}),f \mapsto \hat f\) can be extended to a linear, unitary map Ƒ : L2(ℝn) → L2(ℝn) called the Fourier-Plancharel operator; see, for example, [346]. Its inverse is given by \({{\mathcal F}^{- 1}}(f)(x) = \hat f(- x)\) for ƒ ∈ L2(ℝn) and x ∈ ℝn.

For a differentiable function u, we denote by ut, ux, uy, uz its partial derivatives with respect to t, x, y, z.

Indices labeling gridpoints and number of basis functions range from 0 to N. Superscripts and subscripts are used to denote the numerical solution at some discrete timestep and gridpoint, as in

$$v_j^k: = v({t_k},{x_j}).$$

We use boldface fonts for gridfunctions, as in

$${{\mathbf {v}}^k}: = \{v({t_k},{x_j})\} _{j = 0}^N.$$

3 The Initial-Value Problem

We start here with a discussion of hyperbolic evolution problems on the infinite domain ℝn. This is usually the situation one encounters in the mathematical description of isolated systems, where some strong field phenomena take place “near the origin” and generates waves, which are emitted toward “infinity”. Therefore, the goal of this section is to analyze the well-posedness of the Cauchy problem for quasilinear hyperbolic evolution equations without boundaries. The case with boundaries is the subject of Section 5. As mentioned in the introduction (Section 1), the well-posedness results are fundamental in the sense that they give existence (at least local in time if the problem is nonlinear) and uniqueness of solutions and show that these depend continuously on the initial data. Of course, how the solution actually appears in detail needs to be established by more sophisticated mathematical tools or by numerical experiments, but it is clear that it does not make sense to speak about “the solution” if the problem is not well posed.

Our presentation starts with the simplest case of linear constant coefficient problems in Section 3.1, where solutions can be constructed explicitly using Fourier transform. Then, we consider in Section 3.2 linear problems with variable coefficients, which we reduce to the constant coefficient case using the localization principle. Next, in Section 3.3, we treat first-order quasilinear equations, which we reduce to the previous case by the principle of linearization. Finally, in Section 3.4 we summarize some basic results about abstract evolution operators, which give the general framework for treating evolution problems including not only those described by local partial differential operators, but also more general ones.

Much of the material from the first three subsections is taken from the book by Kreiss and Lorenz [259]. However, our summary also includes recent results concerning second-order equations, examples of wave systems on curved spacetimes, and a very brief review of semigroup theory.

3.1 Linear, constant coefficient problems

We consider an evolution equation on n-dimensional space of the following form:

$${u_t} = P(\partial /\partial x)u \equiv \sum\limits_{\vert \nu \vert \leq p} {{A_\nu}} {D_\nu}u,\quad x \in {{\mathbb R}^n},\quad t \geq 0.$$
(3.1)

Here, u = u(t, x) ∈ ℂm is the state vector, and ut its partial derivative with respect to t. Next, the Av’s denote complex, m × m matrices where v = (v1, v2, …, vn) denotes a multi-index with components Vj ∈ {0, 1, 2, 3, …} and |v| := v1 + … + vn. Finally, Dv denotes the partial derivative operator

$${D_\nu}: = {{{\partial ^{\vert \nu \vert}}} \over {\partial x_1^{{\nu _1}} \cdot \cdot \cdot \partial x_n^{{\nu _n}}}}$$

of order |v|, where D0 := I. Here are a few representative examples:

Example 1. The advection equation ut(t, x) = λux(t, x) with speed λ ∈ ℝ in the negative x direction.

Example 2. The heat equation ut(t, x) = Δu(t, x), where

$$\Delta : = {{{\partial ^2}} \over {\partial x_1^2}} + {{{\partial ^2}} \over {\partial x_2^2}} + \ldots + {{{\partial ^2}} \over {\partial x_n^2}}$$

denotes the Laplace operator.

Example 3. The Schrödinger equation ut(t, x) = iΔu(t, x).

Example 4. The wave equation Utt = ΔU, which can be cast into the form of Eq. (3.1),

$${u_t} = \left({\begin{array}{*{20}c} 0 & 1 \\ \Delta & 0 \\ \end{array}} \right)u,\quad u = \left({\begin{array}{*{20}c} U \\ V \\ \end{array}} \right).$$
(3.2)

We can find solutions of Eq. (3.1) by Fourier transformation in space,

$$\hat u(t,k) = {1 \over {{{(2\pi)}^{n/2}}}}\int {{e^{- ik \cdot x}}} u(t,x){d^n}x,\quad k \in {{\mathbb R}^n},\quad t \geq 0.$$
(3.3)

Applied to Eq. (3.1) this yields the system of linear ordinary differential equations

$${\hat u_t} = P(ik)\hat u,\quad t \geq 0,$$
(3.4)

for each wave vector k ∈ ℝn where P(ik), called the symbol of the differential operator P(∂/x), is defined as

$$P(ik): = \sum\limits_{\vert \nu \vert \leq p} {{A_\nu}} {(i{k_1})^{{\nu _1}}} \cdot \cdot \cdot {(i{k_n})^{{\nu _n}}},\quad k \in {{\mathbb R}^n}.$$
(3.5)

The solution of Eq. (3.4) is given by

$$\hat u(t,k) = {e^{P(ik)t}}\hat u(0,k),\quad t \geq 0,$$
(3.6)

where û(0, k) is determined by the initial data for u at t = 0. Therefore, the formal solution of the Cauchy problem

$${u_t}(t\,,x) = P(\partial /\partial x)u(t\,,x)\,,x \in {{\mathbb {R}}^n},\quad t \geq 0,$$
(3.7)
$$u(0\,,x) = f(x)\,, x\in{\mathbb R}^n,$$
(3.8)

with given initial data ƒ for u at t = 0 is

$$u(t,x) = {1 \over {{{(2\pi)}^{n/2}}}}\int {{e^{ik \cdot x}}} {e^{P(ik)t}}\hat f(k){d^n}k,\quad x \in {{\mathbb R}^n},\quad t \geq 0,$$
(3.9)

where \(\hat f(k) = {1 \over {{{(2\pi)}^{n/2}}}}\int {{e^{- ik \cdot x}}f(x){d^n}x}\)

3.1.1 Well-posedness

At this point we have to ask ourselves if expression (3.9) makes sense. In fact, we do not expect the integral to converge in general. Even if \({\hat f}\) is smooth and decays rapidly to zero as |k| → ∞ we could still have problems if |eP(ik)t| diverges as |k| → ∞. One simple, but very restrictive, possibility to control this problem is to limit ourselves to initial data ƒ in the class \({{\mathcal S}^\omega}\) of functions, which are the Fourier transform of a C-function with compact support, i.e., \(f \in {{\mathcal S}^\omega}\), where

$${{\mathcal S}^\omega}: = \left\{{v(\cdot) = {1 \over {{{(2\pi)}^{n/2}}}}\int {{e^{ik \cdot (\cdot)}}} \hat v(k){d^n}k:\hat v \in C_0^\infty ({{\mathbb R}^n})} \right\}.$$
(3.10)

A function in this space is real analytic and decays faster than any polynomial as |x| → ∞.Footnote 1 If \(f \in {{\mathcal S}^\omega}\) the integral in Eq. (3.9) is well-defined and we obtain a solution of the Cauchy problem (3.7, 3.8), which, for each t ≥ 0 lies in this space. However, this possibility suffers from several unwanted features:

  • The space of admissible initial data is very restrictive. Indeed, since \(f \in {{\mathcal S}^\omega}\) is necessarily analytic it is not possible to consider nontrivial data with, say, compact support, and study the propagation of the support for such data.

  • For fixed t > 0, the solution may grow without bound when perturbations with arbitrarily small amplitude but higher and higher frequency components are considered. Such an effect is illustrated in Example 6 below.

  • The function space \({{\mathcal S}^\omega}\) does not seem to be useful as a solution space when considering linear variable coefficient or quasilinear problems, since, for such problems, the different k modes do not decouple from each other. Hence, mode coupling can lead to components with arbitrarily high frequencies.Footnote 2

For these reasons, it is desirable to consider initial data of a more general class than \({{\mathcal S}^\omega}\). For this, we need to control the growth of eP(ik)t. This is captured in the following

Definition 1. The Cauchy problem ( 3.7 , 3.8 ) is called well posed if there are constants K ≥ 1 and α ∈ ℝ such that

$$\vert {e^{P(ik)t}}\vert \leq K{e^{\alpha t}}\quad {\rm{for}}\,{\rm{all}}\,t \geq 0\,{\rm{and}}\,{\rm{all}}\,k \in {{\mathbb R}^n}.$$
(3.11)

The importance of this definition relies on the property that for each fixed time t > 0 the norm |eP(ik)t| of the propagator is bounded by the constant C(t) := Keαt, which is independent of the wave vector k. The definition does not state anything about the growth of the solution with time other that this growth is bounded by an exponential. In this sense, unless one can choose α ≤ 0 or α > 0 arbitrarily small, well-posedness is not a statement about the stability in time, but rather about stability with respect to mode fluctuations.

Let us illustrate the meaning of Definition 1 with a few examples:

Example 5. The heat equation ut(t, x) = Δu(t, x).

Fourier transformation converts this equation into ût(t, k) = −|k|2û(t, k). Hence, the symbol is \(P(ik) = - \vert k{\vert ^2}\) and \(\vert {e^{P(ik)t}}\vert = {e^{\vert k{\vert ^2}t}} \leq 1\). The problem is well posed.

Example 6. The backwards heat equation ut(t, x) = −Δu(t, x).

In this case the symbol is \(P(ik) = + \vert k{\vert ^2}\) and \(\vert {e^{P(ik)t}}\vert = {e^{\vert k{\vert ^2}}}\). In contrast to the previous case, eP(ik)t exhibits exponential frequency-dependent growth for each fixed t > 0 and the problem is not well posed. Notice that small initial perturbations with large |k| are amplified by a factor that becomes larger and larger as |k| increases. Therefore, after an arbitrarily small time, the solution is contaminated by high-frequency modes.

Example 7. The Schrödinger equation ut(t, x) = iΔu(t, x).

In this case we have P(ik) = i|k|2 and |eP(ik)t| = 1. The problem is well posed. Furthermore, the evolution is unitary, and we can evolve forward and backwards in time. When compared to the previous example, it is the factor i in front of the Laplace operator that saves the situation and allows the evolution backwards in time.

Example 8. The one-dimensional wave equation written in first-order form,

$${u_t}(t,x) = A{u_x}(t,x),\quad A = \left({\begin{array}{*{20}c} 0 & 1 \\ 1 & 0 \\ \end{array}} \right).$$
(3.12)

The symbol is P(ik) = ikA. Since the matrix A is symmetric and has eigenvalues ±1, there exists an orthogonal transformation U such that

$$A = U\left({\begin{array}{*{20}c} 1 & 0 \\ 0 & {- 1} \\ \end{array}} \right){U^{- 1}},\quad {e^{ikAt}} = U\left({\begin{array}{*{20}c} {{e^{ikt}}} & {0\quad} \\ {0\;\;} & {{e^{- ikt}}} \\ \end{array}} \right){U^{- 1}}.$$
(3.13)

Therefore, |eP(ik)t| = 1, and the problem is well posed.

Example 9. Perturb the previous problem by a lower-order term,

$${u_t}(t,\,x) = A{u_x}(t,\,x) + \lambda u(t,\,x),\quad A = \left({\begin{array}{*{20}c} 0 & 1 \\ 1 & 0 \\ \end{array}} \right),\quad \lambda \in {\mathbb R}.$$
(3.14)

The symbol is P(ik) = ikA + λI, and |eP(ik)t| = eλt. The problem is well posed, even though the solution grows exponentially in time if λ > 0.

More generally one can show (see Theorem 2.1.2 in [259]):

Lemma 1. The Cauchy problem for the first-order equation ut = Aux + B with complex m × m matrices A and B is well posed if and only if A is diagonalizable and has only real eigenvalues.

By considering the eigenvalues of the symbol P(ik) we obtain the following simple necessary condition for well-posedness:

Lemma 2 (Petrovskii condition). Suppose the Cauchy problem ( 3.7 , 3.8 ) is well posed. Then, there is a constant α ∈ ℝ such that

$$Re(\lambda) \leq \alpha$$
(3.15)

for all eigenvalues λ of P(ik).

Proof. Suppose λ is an eigenvalue of P(ik) with corresponding eigenvector v, P(ik)v = λv. Then, if the problem is well posed,

$$K{e^{\alpha t}}\vert v\vert \geq \vert {e^{P(ik)t}}v\vert = \vert {e^{\lambda t}}v\vert = {e^{{\rm{Re}}(\lambda)t}}\vert v\vert ,$$
(3.16)

for all t ≥ 0, which implies that eRe(λ)tKeαt for all t ≥ 0, and hence Re(λ) ≤ α. □

Although the Petrovskii condition is a very simple necessary condition, we stress that it is not sufficient in general. Counterexamples are first-order systems, which are weakly, but not strongly, hyperbolic; see Example 10 below.

3.1.2 Extension of solutions

Now that we have defined and illustrated the notion of well-posedness, let us see how it can be used to solve the Cauchy problem (3.7, 3.8) for initial data more general than in \({{\mathcal S}^\omega}\). Suppose first that \(f \in {{\mathcal S}^\omega}\), as before. Then, if the problem is well posed, Parseval’s identities imply that the solution (3.9) must satisfy

$$\Vert {u(t,.)} \Vert = \Vert {\hat u(t,.)} \Vert = \Vert {{e^{P(i \cdot)t}}\hat f} \Vert \leq K{e^{\alpha t}}\Vert {\hat f} \Vert = K{e^{\alpha t}}\Vert f \Vert ,\quad t \geq 0.$$
(3.17)

Therefore, the \({{\mathcal S}^\omega}\)-solution satisfies the following estimate

$$\Vert u(t,.)\Vert \leq K{e^{\alpha t}}\Vert f\Vert ,\quad t \geq 0,$$
(3.18)

for all \(f \in {{\mathcal S}^\omega}\). This estimate is important because it allows us to extend the solution to the much larger space L2(ℝn). This extension is defined in the following way: let ƒL2(ℝn). Since \({{\mathcal S}^\omega}\) is dense in L2(ℝn) there exists a sequence {ƒj} in \({{\mathcal S}^\omega}\) such that ‖ƒj − ƒ‖ → 0. Therefore, if the problem is well posed, it follows from the estimate (3.18) that the corresponding solutions uj defined by Eq. (3.9) form a Cauchy-sequence in L2(ℝn), and we can define

$$U(t)f(x): = \underset {j \rightarrow \infty} {\lim} {1 \over {{{(2\pi)}^{n/2}}}}\int {{e^{ik \cdot x}}} {e^{P(ik)t}}{\hat f_j}(k){d^n}k,\quad x \in {{\mathbb R}^n},\quad t \geq 0,$$
(3.19)

where the limit exists in the L2(ℝn) sense. The linear map U(t) : L2(ℝn) → L2(ℝn) satisfies the following properties:

  1. (i)

    U(0) = I is the identity map.

  2. (ii)

    U(t + s) = U(t)U(s) for all t, s ≥ 0.

  3. (iii)

    For \(f \in {{\mathcal S}^\omega},u(t,.) = U(t)f\) is the unique solution to the Cauchy problem (3.7, 3.8).

  4. (iv)

    U(t)ƒ‖ ≤ Keαtƒ‖ for all ƒL2(ℝn) and all t ≥ 0.

The family {U(t) : t ≥ 0} is called a semi-group on L2(ℝn). In general, U(t) cannot be extended to negative t as the example of the backwards heat equation, Example 6, shows.

For ƒL2(ℝn) the function u(t, x) := U(t)ƒ(x) is called a weak solution of the Cauchy problem (3.7, 3.8). It can also be constructed in an abstract way by using the Fourier-Plancharel operator Ƒ : L2(ℝn) → L2(ℝn). If the problem is well posed, then for each ƒ ∈ L2(ℝn) and t ≥ 0 the map keP(ik)tƑ(ƒ)(k) defines an L2(ℝn)-function, and, hence, we can define

$$u(t, \cdot): = {{\mathcal F}^{- 1}}\left({{e^{P(i \cdot)t}}{\mathcal F}f} \right),\quad t \geq 0.$$
(3.20)

According to Duhamel’s principle, the semi-group U(t) can also be used to construct weak solutions of the inhomogeneous problem,

$${u_t}(t,x) = P(\partial /\partial x)u(t,\,x) + F(t,\,x),\;x \in {{\mathbb R}^n},\quad t \geq 0,$$
(3.21)
$$u(0,\,x) = f(x),\;x \in {{\mathbb R}^n},$$
(3.22)

where F : [0, ∞) → L2(ℝn), tF(t, ·) is continuous:

$$u(t, \cdot) = U(t)f + \int\limits_0^t U (t - s)F(s, \cdot)ds.$$
(3.23)

for a discussion on semi-groups in a more general context see Section 3.4.

3.1.3 Algebraic characterization

In order to extend the solution concept to initial data more general than analytic, we have introduced the concept of well-posedness in Definition 1. However, given a symbol P(ik), it is not always a simple task to determine whether or not constants K ≥ 0 and α ∈ ℝ exist such that |eP(ik)t|Xeαt for all t ≥ 0 and k ∈ ℝn. Fortunately, the matrix theorem by Kreiss [257] provides necessary and sufficient conditions on the symbol P(ik) for well-posedness.

Theorem 1. Let P(ik), k ∈ ℝn, be the symbol of a constant coefficient linear problem, see Eq. (3.5) , and let α ∈ ℝ. Then, the following conditions are equivalent:

  1. (i)

    There exists a constant K ≥ 0 such that

    $$\vert{e^{P(ik)t}}\vert \leq K{e^{\alpha t}}$$
    (3.24)

    for all t ≥ 0 and k ∈ ℝn.

  2. (ii)

    There exists a constant M > 0 and a family H(k) of m × m Hermitian matrices such that

    $${M^{- 1}}I \leq H(k) \leq MI,\quad H(k)P(ik) + P{(ik)^{\ast}}H(k) \leq 2\alpha H(k)$$
    (3.25)

    for all k ∈ ℝn.

A generalization and complete proof of this theorem can be found in [259]. However, let us show here the implication (ii) ⇒ (i) since it illustrates the concept of energy estimates, which will be used quite often throughout this review (see Section 3.2.3 below for a more general discussion of these estimates). Hence, let H(k) be a family of m × m Hermitian matrices satisfying the condition (3.25). Let k ∈ ℝn and v0 ∈ ℂm be fixed, and define v(t) := eP(ik)tv0 for t ≥ 0. Then we have the following estimate for the “energy” density v(t)*H(k)v(t),

$$\begin{array}{*{20}c} {{d \over {dt}}v{{(t)}^{\ast}}H(k)v(t) = {{[P(ik)v(t)]}^{\ast}}H(k)v(t) + v{{(t)}^{\ast}}H(k)P(ik)v(t)\quad \quad \quad \quad \,} \\ {= v{{(t)}^{\ast}}\left[ {P{{(ik)}^{\ast}}H(k) + H(k)P(ik)} \right]v(t)} \\ {\leq 2\alpha \,v{{(t)}^{\ast}}H(k)v(t),\quad \quad \quad \quad \quad \quad \quad \;} \\ \end{array}$$

which implies the differential inequality

$${d \over {dt}}\left[ {{e^{- 2\alpha t}} v{{(t)}^{\ast}}H(k)v(t)} \right] \leq 0,\quad t \geq 0,\quad k \in {{\mathbb R}^n}.$$
(3.26)

Integrating, we find

$${M^{- 1}}\vert v(t){\vert ^2} \leq v{(t)^{\ast}}H(k)v(t) \leq {e^{2\alpha t}}v_0^{\ast}H(k){v_0} \leq M{e^{2\alpha t}}\vert {v_0}{\vert ^2},$$
(3.27)

which implies the inequality (3.24) with K = M.

3.1.4 First-order systems

Many systems in physics, like Maxwell’s equations, the Dirac equation, and certain formulation of Einstein’s equations are described by first-order partial-differential equations (PDEs). In fact, even systems, which are given by a higher-order PDE, can be reduced to first order at the cost of introducing new variables, and possibly also new constraints. Therefore, let us specialize the above results to a first-order linear problem of the form

$${u_t} = P(\partial /\partial x)u \equiv \sum\limits_{j = 1}^n {{A^j}} {\partial \over {\partial {x^j}}}u + Bu,\quad x \in {{\mathbb R}^n},\quad t \geq 0,$$
(3.28)

where A1, …, An, B are complex m × m matrices. We split P(ik) = P0(ik) + B into its principal symbol, \({P_0}(ik) = i\sum\limits_{j = 1}^n {{k_j}{A^j}}\), and the lower-order term B. The principal part is the one that dominates for large |k| and hence the one that turns out to be important for well-posedness. Notice that P0(ik) depends linearly on k. With these observations in mind we note:

  • A necessary condition for the problem to be well posed is that for each k ∈ ℝn with |k| = 1 the symbol P0(ik) is diagonalizable and has only purely imaginary eigenvalues. To see this, we require the inequality

    $$\vert {e^{\vert k\vert {P_0}(ik\prime)t + Bt}}\vert \leq K{e^{\alpha t}},\quad k\prime: = {k \over {\vert k\vert}},$$
    (3.29)

    for all t ≥ 0 and k ∈ ℝn, k ≠ 0, replace t by t/|k|, and take the limit |k| → ∞, which yields \(\vert{e^{{P_0}(i{k\prime})t}}\vert\, \leq K\) for all k′ ∈ ℝn with |k′| = 1. Therefore, there must exist for each such k′ a complex m × m matrix S(k′) such that S(k′)−1P0(ik′)S(k′) = iΛ(k′), where Λ(k′) is a diagonal real matrix (cf. Lemma 1).

  • In this case the family of Hermitian m × m matrices H(k′) := (S(k′)−1)*S(k′)−1 satisfies

    $$H(k\prime){P_0}(ik\prime) + {P_0}{(ik\prime)^{\ast}}H(k\prime) = 0$$
    (3.30)

    for all k′ ∈ ℝn with |k′| = 1.

  • However, in order to obtain the energy estimate, one also needs the condition M−1IH(k′) ≤ MI, that is, H(k′) must be uniformly bounded and positive. This follows automatically if H(k′) depends continuously on k′, since k′ varies over the (n − 1)-dimensional unit sphere, which is compact.Footnote 3 In turn, it follows that H(k′) depends continuously on k′ if S(k′) does. However, although this may hold in many situations, continuous dependence of S(k′) on k′ cannot always be established; see Example 12 for a counterexample.

These observations motivate the following three notions of hyperbolicity, each of them being a stronger condition than the previous one:

Definition 2. The first-order system (3.28) is called

  1. (i)

    weakly hyperbolic, if all the eigenvalues of its principal symbol P0(ik) are purely imaginary.

  2. (ii)

    strongly hyperbolic, if there exists a constant M > 0 and a family of Hermitian m × m matrices H (k), kSn−1, satisfying

    $${M^{- 1}}I \leq H(k) \leq MI,\quad H(k){P_0}(ik) + {P_0}{(ik)^{\ast}}H(k) = 0,$$
    (3.31)

    for all kSn−1, where Sn−1 := {k ∈ ℝn : |k| = 1} denotes the unit sphere.

  3. (iii)

    symmetric hyperbolic, if there exists a Hermitian, positive definite m × m matrix H (which is independent of k) such that

    $$H{P_0}(ik) + {P_0}{(ik)^{\ast}}H = 0,$$
    (3.32)

    for all kSn−1.

The matrix theorem implies the following statements:

  • Strongly and symmetric hyperbolic systems give rise to a well-posed Cauchy problem. According to Theorem 1, their principal symbol satisfies

    $$\vert {e^{{P_0}(ik)t}}\vert \leq K,\quad k \in {{\mathbb R}^n},\quad t \in {\mathbb R},$$
    (3.33)

    and this property is stable with respect to lower-order perturbations,

    $$\vert {e^{P(ik)t}}\vert = \vert {e^{{P_0}(ik)t + Bt}}\vert \leq K{e^{K\vert B\vert t}},\quad k \in {{\mathbb R}^n},\quad t \in {\mathbb R}.$$
    (3.34)

    The last inequality can be proven by applying Duhamel’s formula (3.23) to the function \(\hat u(t): = {e^{p(ik)t}}\hat f\), which satisfies ût(t) = P0(ik)û(t) + F(t) with F(t) = (t). The solution formula (3.23) then gives \(\vert \hat u(t)\vert \, \leq K(\vert \hat f\vert + \vert B\vert \int\nolimits_0^t {\vert \hat u(s)\vert ds)}\), which yields \(\vert \hat u(t)\vert \, \leq K{e^{K\vert B\vert t}}\vert \hat f\vert\) upon integration.

  • As we have anticipated above, a necessary condition for well-posedness is the existence of a complex m × m matrix S(k) for each kSn−1 on the unit sphere, which brings the principal symbol P0(ik) into diagonal, purely imaginary form. If, furthermore, S(k) can be chosen such that |S(k)| and |S(k)−1|are uniformly bounded for all kSn−1, then H(k) := (S(k)−1)*S(k)−1 satisfies the conditions (3.31) for strong hyperbolicity. If the system is well posed, Theorem 2.4.1 in [259] shows that it is always possible to construct a symmetrizer H(k) satisfying the conditions (3.31) in this manner, and hence, strong hyperbolicity is also a necessary condition for well-posedness. The symmetrizer construction H(k) := (S(k)−1)*S(k)−1 is useful for applications, since S(k) is easily constructed from the eigenvectors and S(k)−1 from the eigenfields of the principal symbol; see Example 15.

  • Weakly hyperbolic systems are not well posed in general because \(\vert {e^{{P_0}(ik)t}}\vert\) might exhibit polynomial growth in |k|t. Although one might consider such polynomial growth as acceptable, such systems are unstable with respect to lower-order perturbations. As the next example shows, it is possible that |eP(ik)t| grows exponentially in |k| if the system is weakly hyperbolic.

Example 10. Consider the weakly hyperbolic system [259]

$${u_t} = \left({\begin{array}{*{20}c} 1 & 1 \\ 0 & 1 \\ \end{array}} \right){u_x} + a\left({\begin{array}{*{20}c} {- 1} & {+ 1} \\ {- 1} & {- 1} \\ \end{array}} \right)u,$$
(3.35)

with a ∈ ℝ a parameter. The principal symbol is \({P_0}(ik) = ik\left({\begin{array}{*{20}c} 1 & 1 \\ 0 & 1 \\ \end{array}} \right)\) and

$${e^{{P_0}(ik)t}} = {e^{ikt}}\left({\begin{array}{*{20}c} 1 & {ikt} \\ 0 & 1 \\ \end{array}} \right).$$
(3.36)

Using the tools described in Section 2 we find for the norm

$$\vert {e^{{P_0}(ik)t}}\vert = \sqrt {1 + {{{k^2}{t^2}} \over 2} + \sqrt {{{\left({1 + {{{k^2}{t^2}} \over 2}} \right)}^2} - 1}} ,$$
(3.37)

which is approximately equal to |k|t for large |k|t. Hence, the solutions to Eq. (3.35) contain modes, which grow linearly in |k|t for large |k|t when a = 0, i.e., when there are no lower-order terms.

However, when a ≠ 0, the eigenvalues of P(ik) are

$${\lambda _ \pm} = ik - a \pm i\sqrt {a(a + ik)} ,$$
(3.38)

which, for large k has real part \({\rm{Re}}({\lambda _ \pm}) \approx \pm \sqrt {\vert a\vert \vert k\vert/2}\). The eigenvalue with positive real part gives rise to solutions, which, for fixed t, grow exponentially in |k|.

Example 11. For the system [353],

$${u_t} = {A^1}{u_x} + {A^2}{u_y} ,\quad {A^1} = \left({\begin{array}{*{20}c} 1 & 1 \\ 0 & 2 \\ \end{array}} \right),\quad {A^2} = \left({\begin{array}{*{20}c} 1 & 0 \\ 0 & 2 \\ \end{array}} \right),$$
(3.39)

the principal symbol, \({P_0}(ik) = i\left({\begin{array}{*{20}c} {{k_1} + {k_2}} & {{k_1}} \\ {0} & {2({k_1} + {k_2})} \\ \end{array}} \right)\), is diagonalizable for all vectors k = (k1, k2) ∈ S1 except for those with k1 + k2 = 0. In particular, P0(ik) is diagonalizable for k = (1, 0) and k = (0, 1). This shows that in general, it is not sufficient to check that the n matrices A1, A2, …, An alone are diagonalizable and have real eigenvalues; one has to consider all possible linear combinations \(\sum\limits_{j = 1}^n {{A^j}{k_j}}\) with kSn−1.

Example 12. Next, we present a system for which the eigenvectors of the principal symbol cannot be chosen to be continuous functions of k:

$${u_t} = {A^1}{u_x} + {A^2}{u_y} + {A^3}{u_z},\quad {A^1} = \left({\begin{array}{*{20}c} 1 & 0 \\ 0 & {- 1} \\ \end{array}} \right),\quad {A^2} = \left({\begin{array}{*{20}c} 0 & 1 \\ 1 & 0 \\ \end{array}} \right),\quad {A^3} = \left({\begin{array}{*{20}c} 0 & 0 \\ 0 & 0 \\ \end{array}} \right).$$
(3.40)

The principal symbol \({P_0}(ik) = i\left({\begin{array}{*{20}c} {{k_1}} & {{k_2}} \\ {{k_2}} & {- {k_1}} \\ \end{array}} \right)\) has eigenvalues \({\lambda _ \pm}(k) = \pm i\sqrt {k_1^2 + k_2^2}\) and for (k1, k2) ≠ (0, 0) the corresponding eigenprojectors are

$${P_ \pm}({k_1},{k_2}) = {1 \over {2{\lambda _ \pm}(k)}}\left({\begin{array}{*{20}c} {{\lambda _ \pm}(k) + i{k_1}} & {i{k_2}} \\ {i{k_2}} & {{\lambda _ \pm}(k) - i{k_1}} \\ \end{array}} \right).$$
(3.41)

When (k1, k2) → (0, 0) the two eigenvalues fall together, and A(k) converges to the zero matrix. However, it is not possible to continuously extend P±(k1, k2) to (k1, k2) = (0, 0). For example,

$${P_ +}(h,0) = \left({\begin{array}{*{20}c} 1 & 0 \\ 0 & 0 \\ \end{array}} \right),\quad {P_ +}(- h,0) = \left({\begin{array}{*{20}c} 0 & 0 \\ 0 & 1 \\ \end{array}} \right),$$
(3.42)

for positive h > 0. Therefore, any choice for the matrix S(k), which diagonalizes A(k), must be discontinuous at k = (0, 0, ±1) since the columns of S(k) are the eigenvectors of A(k).

Of course, A(k) is symmetric and so S(k) can be chosen to be unitary, which yields the trivial symmetrizer H(k) = I. Therefore, the system is symmetric hyperbolic and yields a well-posed Cauchy problem; however, this example shows that it is not always possible to choose S(k) as a continuous function of k.

Example 13. Consider the Klein-Gordon equation

$${\Phi _{tt}} = \Delta \Phi - {m^2}\Phi ,$$
(3.43)

in two spatial dimensions, where m ∈ ℝ is a parameter, which is proportional to the mass of the field Φ Introducing the variables u = (Φ, Φt, Φx, Φy) we obtain the first-order system

$${u_t} = \left({\begin{array}{*{20}c} 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ \end{array}} \right){u_x} + \left({\begin{array}{*{20}c} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ \end{array}} \right){u_y} + \left({\begin{array}{*{20}c} {\;\;\;\;0} & 1 & 0 & 0 \\ {- {m^2}} & 0 & 0 & 0 \\ {\;\;\;\;0} & 0 & 0 & 0 \\ {\;\;\;\;0} & 0 & 0 & 0 \\ \end{array}} \right)u.$$
(3.44)

The matrix coefficients in front of ux and uy are symmetric; hence the system is symmetric hyperbolic with trivial symmetrizer H = diag(m2, 1, 1, 1).Footnote 4 The corresponding Cauchy problem is well posed. However, a problem with this first-order system is that it is only equivalent to the original, second-order equation (3.43) if the constraints (u1)x = u3 and (u1)y = u4 are satisfied.

An alternative symmetric hyperbolic first-order reduction of the Klein-Gordon equation, which does not require the introduction of constraints, is the Dirac equation in two spatial dimensions,

$${v_t} = \left({\begin{array}{*{20}c} 1 & 0 \\ 0 & {- 1} \\ \end{array}} \right){v_x} + \left({\begin{array}{*{20}c} 0 & 1 \\ 1 & 0 \\ \end{array}} \right){v_y} + m\left({\begin{array}{*{20}c} 0 & 1 \\ {- 1} & 0 \\ \end{array}} \right)v,\quad v = \left({\begin{array}{*{20}c} {{v_1}} \\ {{v_2}} \\ \end{array}} \right).$$
(3.45)

This system implies the Klein-Gordon equation (3.43) for either of the two components of v.

Yet another way of reducing second-order equations to first-order ones without introducing constraints will be discussed in Section 3.1.5.

Example 14. In terms of the electric and magnetic fields u = (E, B), Maxwell’s evolution equations,

$${E_t} = + \nabla \wedge B - J,$$
(3.46)
$${B_t} = - \nabla \wedge E,$$
(3.47)

constitute a symmetric hyperbolic system. Here, J is the current density and ∇ and Λ denote the nabla operator and the vector product, respectively. The principal symbol is

$${P_0}(ik)\left({\begin{array}{*{20}c} E \\ B \\ \end{array}} \right) = i\left({\begin{array}{*{20}c} {+ k \wedge B} \\ {- k \wedge E} \\ \end{array}} \right)$$
(3.48)

and a symmetrizer is given by the physical energy density,

$${u^{\ast}}Hu = {1 \over 2}\left({\vert E{\vert ^2} + \vert B{\vert ^2}} \right),$$
(3.49)

in other words, H = 2 −1I is trivial. The constraints V · E = ρ and ∇ · B = 0 propagate as a consequence of Eqs. (3.46, 3.47), provided that the continuity equation holds: (∇ · Eρ)t = −∇ · Jρt = 0, (∇ · B)t = 0.

Example 15. There are many alternative ways to write Maxwell’s equations. The following system [353, 287] was originally motivated by an analogy with certain parametrized first-order hyperbolic formulations of the Einstein equations, and provides an example of a system that can be symmetric, strongly, weakly or not hyperbolic at all, depending on the parameter values. Using the Einstein summation convention, the evolution system in vacuum has the form

$${\partial _t}{E_i} = {\partial ^j}({W_{ij}} - {W_{ji}}) - \alpha ({\partial _i}{W^j}_j - {\partial ^j}{W_{ij}}),$$
(3.50)
$${\partial _t}{W_{ij}} = - {\partial _i}{E_j} - {\beta \over 2}{\delta _{ij}}{\partial ^k}{E_k},$$
(3.51)

where Ei and Wij = iAj, i = 1, 2, 3, represent the Cartesian components of the electric field and the gradient of the magnetic potential Aj, respectively, and where the real parameters α and β determine the dynamics of the constraint hypersurface defined by kEk = 0 and kWijiWkj = 0.

In order to analyze under which conditions on α and β the system (3.50, 3.51) is strongly hyperbolic we consider the corresponding symbol,

$${P_0}(ik)u = i\left({\begin{array}{*{20}c} {(1 + \alpha){k^j}{W_{ij}} - {k^j}{W_{ji}} - \alpha {k_i}{W^j}_j} \\ {- {k_i}{E_j} - {\beta \over 2}{\delta _{ij}}{k^l}{E_l}} \\ \end{array}} \right),\quad u = \left({\begin{array}{*{20}c} {{E_i}} \\ {{W_{ij}}} \\ \end{array}} \right),\quad k \in {S^2}.$$
(3.52)

Decomposing Ei and Wij into components parallel and orthogonal to ki,

$${E_i} = \bar E{k_i} + {\bar E_i},\quad {W_{ij}} = \bar W{k_i}{k_j} + {\bar W_i}{k_j} + {k_i}{\bar V_j} + {\bar W_{ij}} + {1 \over 2}{\gamma _{ij}}\bar U,$$
(3.53)

where in terms of the projector \({\gamma _i}^j: = {\delta _i}^j - {k_i}{k^j}\) orthogonal to k we have defined \(\bar E: = {k^l}{E_l},{\bar E_i}: = {\gamma _i}^j{E_j}\) and \(\bar W: = {k^i}{k^j}{W_{ij}},{\bar W_i}: = {\gamma _i}^k{W_{kj}}{k^j},{\bar V_j}: = {k^i}{W_{ik}}{\gamma ^k}_j,\bar U: = {\gamma ^{ij}}{W_{ij}}\) and \({\bar W_{ij}}: = ({\gamma _i}^k{\gamma _j}^l - {2^{- 1}}{\gamma _{ij}}{\gamma ^{kl}}){W_{kl}}\),Footnote 5 we can write the eigenvalue problem P0(ik)u = iλu as

$$\begin{array}{*{20}c} {\lambda \bar E = - \alpha \bar U,\quad \quad} \\ {\lambda \bar U = - \beta \bar E,\quad \quad} \\ {\lambda \bar W = - \left({1 + {\beta \over 2}} \right)\bar E,} \\ {\lambda {{\bar E}_i} = (1 + \alpha){{\bar W}_i} - {{\bar V}_i},} \\ {\lambda {{\bar V}_i} = - {{\bar E}_i},\quad \quad \quad \quad} \\ {\lambda {{\bar W}_i} = 0,\quad \quad \quad \quad \quad} \\ {\lambda {{\bar W}_{ij}} = 0.\quad \quad \quad \quad \quad} \\ \end{array}$$

It follows that P0(ik) is diagonalizable with purely complex eigenvalues if and only if αβ > 0. However, in order to show that in this case the system is strongly hyperbolic one still needs to construct a bounded symmetrizer H(k). For this, we set \(\mu := \sqrt {\alpha \beta}\) and diagonalize P0(ik) = iS(k)Λ(k)S(k)−1 with Λ(k) = diag(µ, −µ, 0, 1, −1, 0, 0) and

$$S{(k)^{- 1}}u = \left({\begin{array}{*{20}c} {\bar E - {\mu \over \beta}\bar U} \\ {\bar E + {\mu \over \beta}\bar U} \\ {\beta \bar W - \left({1 + {\beta \over 2}} \right)\bar U} \\ {{{\bar E}_i} - {{\bar V}_i} + (1 + \alpha){{\bar W}_i}} \\ {{{\bar E}_i} + {{\bar V}_i} - (1 + \alpha){{\bar W}_i}} \\ {{{\bar W}_i}} \\ {{{\bar W}_{ij}}} \\ \end{array}} \right).$$
(3.54)

Then, the quadratic form associated with the symmetrizer is

$$\begin{array}{*{20}c} {{u^{\ast}}H(k)u = {u^{\ast}}{{(S{{(k)}^{- 1}})}^{\ast}}S{{(k)}^{- 1}}u\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad} \\ {\; = 2\vert \bar E{\vert ^2} + 2{\alpha \over \beta}\vert \bar U{\vert ^2} + {{\left\vert {\beta \bar W - \left({1 + {\beta \over 2}} \right)\bar U} \right\vert}^2} + 2{{\bar E}^i}{{\bar E}_i}} \\ {\quad \; + 2\left[ {{{\bar V}^i} - (1 + \alpha){{\bar W}^i}} \right]\left[ {{{\bar V}_i} - (1 + \alpha){{\bar W}_i}} \right] + {{\bar W}^i}{{\bar W}_i} + {{\bar W}^{ij}}{{\bar W}_{ij}},} \\ \end{array}$$

and H(k) is smooth in kS2. Therefore, the system is indeed strongly hyperbolic for αβ > 0.

In order to analyze under which conditions the system is symmetric hyperbolic we notice that because of rotational and parity invariance the most general k-independent symmetrizer must have the form

$${u^{\ast}}Hu = a{({E^i})^{\ast}}{E_i} + b{({W^{[ij]}})^{\ast}}{W_{[ij]}} + c{({\hat W^{ij}})^{\ast}}{\hat W_{ij}} + d {W^{\ast}}W,$$
(3.55)

with strictly positive constants a, b, c and d, where Ŵij := W(ij)δijW/3 denotes the symmetric, trace-free part of Wij and \(W: = {W^j}_j\) its trace. Then,

$$\begin{array}{*{20}c} {{u^{\ast}}H{P_0}(ik)u = ia{{({E^i})}^{\ast}}\left[ {(\alpha + 2){k^j}{W_{[ij]}} + \alpha {k^j}{{\hat W}_{ij}} - {{2\alpha} \over 3}{k_i}W} \right]\quad \quad \quad \quad \quad \quad \quad \quad} \\ {+ ib{{({W^{[ij]}})}^{\ast}}{E_i}{k_j} - ic{{({{\hat W}^{ij}})}^{\ast}}{E_i}{k_j} - id\left({1 + {{3\beta} \over 2}} \right){W^{\ast}}{k^i}{E_i}.} \\ \end{array}$$

for H to be a symmetrizer, the expression on the right-hand side must be purely imaginary. This is the case if and only if a(α + 2) = b, − = c and 2/3 = d(1 + 3β/2). Since a, b, c and d are positive, these equalities can be satisfied if and only if −2 < α < 0 and β < −2/3. Therefore, if either α and β are both positive or α and β are both negative and α ≤ −2 or β ≥ −2/3, then the system (3.50, 3.51) is strongly but not symmetric hyperbolic.

3.1.5 Second-order systems

An important class of systems in physics are wave problems. In the linear, constant coefficient case, they are described by an equation of the form

$${v_{tt}} = \sum\limits_{j,k = 1}^n {{A^{jk}}} {{{\partial ^2}} \over {\partial {x^j}\partial {x^k}}}v + \sum\limits_{j = 1}^n 2 {B^j}{\partial \over {\partial {x^j}}}{v_t} + \sum\limits_{j = 1}^n {{C^j}} {\partial \over {\partial {x^j}}}v + D{v_t} + Ev,\quad x \in {{\mathbb R}^n},\quad t \geq 0,$$
(3.56)

where v = v(t, x) ∈ ℂm is the state vector, and Aij = Aij, Bj, Cj, D, E denote complex m × m matrices. In order to apply the theory described so far, we reduce this equation to a system that is first order in time. This is achieved by introducing the new variable \(w: = {\upsilon _t} - \sum\limits_{j = 1}^n {{B^j}} {\partial \over {\partial {x^j}}}\upsilon\).Footnote 6 With this redefinition one obtains a system of the form (3.1) with u = (v, w)T and

$$P(\partial /\partial x) = \sum\limits_{j = 1}^n {{B^j}} {\partial \over {\partial {x^j}}} + \left({\begin{array}{*{20}c} 0 & I \\ {\sum\limits_{j,k = 1}^n {({A^{jk}} + {B^j}{B^k})} {{{\partial ^2}} \over {\partial {x^j}\partial {x^k}}} + \sum\limits_{j = 1}^n {({C^j} + D{B^j})} {\partial \over {\partial {x^j}}} + E} & D \\ \end{array}} \right).$$
(3.57)

Now we could apply the matrix theorem, Theorem 1, to the corresponding symbol P(ik) and analyze under which conditions on the matrix coefficients Aij, Bj, Cj, D, E the Cauchy problem is well posed. However, since our problem originates from a second-order equation, it is convenient to rewrite the symbol in a slightly different way: instead of taking the Fourier transform of v and w directly, we multiply û by |k| and write the symbol in terms of the variable \(\hat U: = {(\vert k\vert \hat \upsilon, \hat w)^T}\). Then, the L2-norm of Û controls, through Parseval’s identity, the L2-norms of the first partial derivatives of v, as is the case for the usual energies for second-order systems. In terms of Û the system reads

$${\hat U_t} = Q(ik)\hat U,\quad t \geq 0,\quad k \in {{\mathbb R}^n},$$
(3.58)

in Fourier space, where

$$Q(ik) = i\vert k\vert \sum\limits_{j = 1}^n {{B^j}} {\hat k_j} + \left({\begin{array}{*{20}c} 0 & {\vert k\vert I} \\ {- \vert k\vert \sum\limits_{j,k = 1}^n {({A^{jk}} + {B^j}{B^k})} {{\hat k}_j}{{\hat k}_k} + i\sum\limits_{j = 1}^n {({C^j} + D{B^j})} {{\hat k}_j} + {1 \over {\vert k\vert}}E} & D \\ \end{array}} \right)$$
(3.59)

with \({\hat k_j}: = {k_j}/\vert k\vert\). As for first-order systems, we can split Q(ik) into its principal part,

$${Q_0}(ik): = i\vert k\vert \sum\limits_{j = 1}^n {{B^j}} {\hat k_j} + \vert k\vert \left({\begin{array}{*{20}c} 0 & I \\ {- \sum\limits_{j,k = 1}^n {({A^{jk}} + {B^j}{B^k})} {{\hat k}_j}{{\hat k}_k}} & 0 \\ \end{array}} \right),$$
(3.60)

which dominates for |k| → ∞, and the remaining, lower-order terms. Because of the homogeneity of Q0(ik) in k we can restrict ourselves to values of kSn−1 on the unit sphere, like for first-order systems. Then, it follows as a consequence of the matrix theorem that the problem is well posed if and only if there exists a symmetrizer H(k) and a constant M > 0 satisfying

$${M^{- 1}}I \leq H(k) \leq MI,\quad H(k){Q_0}(ik) + {Q_0}{(ik)^{\ast}}H(k) = 0$$
(3.61)

for all such k. Necessary and sufficient conditions under which such a symmetrizer exists have been given in [261] for the particular case in which the mixed-second-order derivative term in Eq. (3.56) vanishes; that is, when Bj = 0. This result can be generalized in a straightforward manner to the case where the matrices Bj = βjI are proportional to the identity:

Theorem 2. Suppose Bj = βjI, j = 1, 2, …, n. (Note that this condition is trivially satisfied if m = 1.) Then, the Cauchy problem for Eq. (3.56) is well posed if and only if the symbol

$$R(k): = \sum\limits_{i,j = 1}^n {({A^{ij}} + {B^i}{B^j})} {k_i}{k_j},\quad k \in {S^{n - 1}},$$
(3.62)

has the following properties: there exist constants M > 0 and δ > 0 and a family h(k) of Hermitian m × m matrices such that

$${M^{- 1}}I \leq h(k) \leq MI,\quad h(k)R(k) = R{(k)^{\ast}}h(k) \geq \delta I$$
(3.63)

for all kSn−1.

Proof. Since for Bj = βjI the advection term \(i|k|\sum\limits_{j = 1}^n {{B^j}{{\hat k}_j}}\) commutes with any Hermitian matrix H(k), it is sufficient to prove the theorem for Bj = 0, in which case the principal symbol reduces to

$${Q_0}(ik): = \left({\begin{array}{*{20}c} 0 & I \\ {- R(k)} & 0 \\ \end{array}} \right),\quad k \in {S^{n - 1}}.$$
(3.64)

We write the symmetrizer H(k) in the following block form,

$$H(k) = \left({\begin{array}{*{20}c} {{H_{11}}(k)} & {{H_{12}}(k)} \\ {{H_{12}}{{(k)}^{\ast}}} & {{H_{22}}(k)} \\ \end{array}} \right),$$
(3.65)

where H11(k), H22(k) and H12(k) are complex m × m matrices, the first two being Hermitian. then,

$$H(k){Q_0}(ik) + {Q_0}{(ik)^{\ast}}H(k) = \left({\begin{array}{*{20}c} {- {H_{12}}(k)R(k) - R{{(k)}^{\ast}}{H_{12}}{{(k)}^{\ast}}} & {{H_{11}}(k) - R{{(k)}^{\ast}}{H_{22}}(k)} \\ {{H_{11}}(k) - {H_{22}}(k)R(k)} & {{H_{12}}(k) + {H_{12}}{{(k)}^{\ast}}} \\ \end{array}} \right).$$
(3.66)

Now, suppose h(k) satisfies the conditions (3.63). Then, choosing H12(k) := 0, H22(k) := h(k) and H11(k) := h(k)R(k) we find that H(k)Q0(ik) + Q0(ik)*H(k) = 0. Furthermore, M−1IH22(k) ≤ MI and δIH11(k) = h(k)R(k) ≤ MCI where

$$C: = \sup \{\vert R(k)u\vert :k \in {S^{n - 1}},u \in {{\mathbb C}^m},\vert u\vert = 1\}$$
(3.67)

is finite because R(k)u is continuous in k and u. Therefore, H(k) is a symmetrizer for Q0(ik), and the problem is well posed.

Conversely, suppose that the problem is well posed with symmetrizer H(k). Then, the vanishing of H(k)Q0(ik) + Q0(ik)*H(k) yields the conditions H11(k) = H22(k)R(k) = R(k)*H22(k) and the conditions (3.63) are satisfied for h(k) := H22(k).

Remark: The conditions (3.63) imply that R(k) is symmetric and positive with respect to the scalar product defined by h(k). Hence it is diagonalizable, and all its eigenvalues are positive. A practical way of finding h(k) is to construct T(k), which diagonalizes R(k), T(k)−1 R(k)T(k) = P(k) with P(k) diagonal and positive. Then, h(k) := (T(k)−1)*T(k)−1 is the candidate for satisfying the conditions (3.63).

Let us give some examples and applications:

Example 16. The Klein-Gordon equation vtt = Δvm2v on flat spacetime. In this case, Aij = δij and Bj = 0, and R(k) = |k|2 trivially satisfies the conditions of Theorem 2.

Example 17. In anticipation of the following Section 3.2, where linear problems with variable coefficients are treated, let us generalize the previous example on a curved spacetime (M, g). We assume that (M, g) is globally hyperbolic such that it can be foliated by space-like hypersurfaces Σt. In the ADM decomposition, the metric in adapted coordinates assumes the form

$$g = - {\alpha ^2}dt \otimes dt + {\gamma _{ij}}(d{x^i} + {\beta ^i}dt) \otimes (d{x^j} + {\beta ^j}dt),$$
(3.68)

with α > 0 the lapse, βi the shift vector, which is tangent to Σt, and γijdxidxj the induced three-metric on the spacelike hypersurfaces Σt. The inverse of the metric is given by

$${g^{- 1}} = - {1 \over {{\alpha ^2}}}\left({{\partial \over {\partial t}} - {\beta ^i}{\partial \over {\partial {x^i}}}} \right) \otimes \left({{\partial \over {\partial t}} - {\beta ^j}{\partial \over {\partial {x^j}}}} \right) + {\gamma ^{ij}}{\partial \over {\partial {x^i}}} \otimes {\partial \over {\partial {x^j}}},$$
(3.69)

where γij are the components of the inverse three-metric. The Klein-Gordon equation on (M, g) is

$${g^{\mu \nu}}{\nabla _\mu}{\nabla _\nu}v = {1 \over {\sqrt {- \det (g)}}}{\partial _\mu}\left({\sqrt {- \det (g)} {g^{\mu \nu}}{\partial _\nu}v} \right) = {m^2}v,$$
(3.70)

which, in the constant coefficient case, has the form of Eq. (3.56) with

$${A^{jk}} = {\alpha ^2}{\gamma ^{jk}} - {\beta ^j}{\beta ^k},\quad {B^j} = {\beta ^j}.$$
(3.71)

Hence, R(k) = α2γijkikj, and the conditions of Theorem 2 are satisfied with h(k) = 1 since α > 0 and γij is symmetric positive definite.

3.2 Linear problems with variable coefficients

Next, we generalize the theory to linear evolution problems with variable coefficients. That is, we consider equations of the following form:

$${u_t} = P(t,x,\partial /\partial x)u \equiv \sum\limits_{\vert \nu \vert \leq p} {{A_\nu}} (t,x){D_\nu}u,\quad x \in {{\mathbb R}^n},\quad t \geq 0,$$
(3.72)

where now the complex m × m matrices Av(t, x) may depend on t and x. For simplicity, we assume that each matrix coefficient of Av belongs to the class \(C_b^\infty ([0,\infty) \times {{\rm{\mathbb R}}^n})\) of bounded, C-functions with bounded derivatives. Unlike the constant coefficient case, the different k-modes couple when performing a Fourier transformation, and there is no simple explicit representation of the solutions through the exponential of the symbol. Therefore, Definition 1 of well-posedness needs to be altered. Instead of giving an operator-based definition, let us define well-posedness by the basic requirements a Cauchy problem should satisfy:

Definition 3. The Cauchy problem

$${u_t}(t,x) = P(t,x,\partial /\partial x)u(t,x),\;\;x \in {{\mathbb R}^n},\quad t \geq 0,$$
(3.73)
$$u(0,x) = f(x),\,x \in {{\mathbb R}^n},$$
(3.74)

is well posed if any \(f \in C_0^\infty ({{\rm{\mathbb R}}^n})\) gives rise to a unique C-solution u(t, x), and if there are constants K ≥ 1 and α ∈ ℝ such that

$$\vert \vert u(t, \cdot)\vert \vert \leq K{e^{\alpha t}}\vert \vert f\vert \vert$$
(3.75)

for all \(f \in C_0^\infty ({{\rm{\mathbb R}}^n})\) and all t ≥ 0.

Before we proceed and analyze under which conditions on the operator P(t, x, ∂/∂x) the Cauchy problem (3.73, 3.74) is well posed, let us make the following observations:

  • In the constant coefficient case, inequality (3.75) is equivalent to inequality (3.11), and in this sense Definition 3 is a generalization of Definition 1.

  • If u1 and u2 are the solutions corresponding to the initial data \({f_1},{f_2} \in C_0^\infty ({{\rm{\mathbb R}}^n})\), then the difference u = u2u1 satisfies the Cauchy problem (3.73, 3.74) with ƒ = ƒ2ƒ1 and the estimate (3.75) implies that

    $$\vert \vert {u_2}(t, \cdot) - {u_1}(t, \cdot)\vert \vert \leq K{e^{\alpha t}}\vert \vert {f_2} - {f_1}\vert \vert ,\quad \quad t \geq 0.$$
    (3.76)

    In particular, this implies that u2(t, ·) converges to u1(t, ·) if ƒ2 converges to ƒ1 in the L2-sense. In this sense, the solution depends continuously on the initial data. This property is important for the convergence of a numerical approximation, as discussed in Section 7.

  • Estimate (3.75) also implies uniqueness of the solution, because for two solutions u1 and u2 with the same initial data \({f_1} = {f_2} \in C_0^\infty ({{\rm{\mathbb R}}^n})\) the inequality (3.76) implies u1 = u2.

  • As in the constant coefficient case, it is possible to extend the solution concept to weak ones by taking sequences of C-elements. This defines a propagator U(t, s) : L2(ℝn) → L2(ℝn), which maps the solution at time s ≥ 0 to the solution at time ts and satisfies similar properties to the ones described in Section 3.1.2: (i) U(t, t) = I for all t ≥ 0, (ii) U(t, s)U(s, r) = U(t, r) for all tsr ≥ 0, (iii) for \(f \in C_0^\infty ({{\rm{R}}^n}),U(t,0)f\), U(t, 0)ƒ is the unique solution of the Cauchy problem (3.73, 3.74), (iv) ‖U(t, s)ƒ‖ ≤ Keα(ts)ƒ‖ for all ƒ ∈ L2(ℝ) and all ts ≥ 0. Furthermore, the Duhamel formula (3.23) holds with the replacement U(ts) ↦ U (t, s).

3.2.1 The localization principle

Like in the constant coefficient case, we would like to have a criterion for well-posedness that is based on the coefficients Av(t, x) of the differential operator alone. As we have seen in the constant coefficient case, well-posedness is essentially a statement about high frequencies. Therefore, we are led to consider solutions with very high frequency or, equivalently, with very short wavelength. In this regime we can consider small neighborhoods and since the coefficients Av(t, x) are smooth, they are approximately constant in such neighborhoods. Therefore, intuitively, the question of well-posedness for the variable coefficient problem can be reduced to a frozen coefficient problem, where the values of the matrix coefficients Av(t, x) are frozen to their values at a given point.

In order to analyze this more carefully, and for the sake of illustration, let us consider a first-order linear system with variable coefficients

$${u_t} = P(t,x,\partial /\partial x)u \equiv \sum\limits_{j = 1}^n {{A^j}} (t,x){\partial \over {\partial {x^j}}}u + B(t,x)u,\quad \quad x \in {{\mathbb R}^n},\quad t \geq 0,$$
(3.77)

where A1, …, An, B are complex m×m matrices, whose coefficients belong to the class \(C_b^\infty ([0,\infty) \times {{\rm{\mathbb R}}^n})\) of bounded, C-functions with bounded derivatives. As mentioned above, the Fourier transform of this operator does not yield a simple, algebraic symbol like in the constant coefficient case.Footnote 7 However, given a specific point p0 = (t0, x0) ∈ [0, ∞) × ℝn, we may zoom into a very small neighborhood of p0. Since the coefficients Aj(t, x) and B(t, x) are smooth, they will be approximately constant in this neighborhood and we may freeze the coefficients of Aj(t, x) and B(t, x) to their values at the point p0. More precisely, let u(t, x) be a smooth solution of Eq. (3.77). Then, we consider the formal expansion

$$u({t_0} + \varepsilon t,{x_0} + \varepsilon x) = u({t_0},{x_0}) + \varepsilon {u^{(1)}}(t,x) + {\varepsilon ^2}{u^{(2)}}(t,x) + \ldots ,\quad \quad \varepsilon > 0.$$
(3.78)

As a consequence of Eq. (3.77) one obtains

$$\begin{array}{*{20}c} {u_t^{(1)}(t,x) + \varepsilon u_t^{(2)}(t,x) + \ldots = \sum\limits_{j = 1}^n {{A^j}} ({t_0} + \varepsilon t,{x_0} + \varepsilon x)\left[ {{{\partial {u^{(1)}}} \over {\partial {x^j}}}(t,x) + \varepsilon {{\partial {u^{(2)}}} \over {\partial {x^j}}}(t,x) + \ldots} \right]\quad \quad \quad \quad \quad \quad \,\,} \\ {+ B({t_0} + \varepsilon t,{x_0} + \varepsilon x)\left[ {u({t_0},{x_0}) + \varepsilon {u^{(1)}}(t,x) + \ldots} \right].} \end{array}$$

Taking the pointwise limit ε → 0 on both sides of this equation we obtain

$$u_t^{(1)}(t,x) = \sum\limits_{j = 1}^n {{A^j}} ({t_0},{x_0}){{\partial {u^{(1)}}} \over {\partial {x^j}}}(t,x) + {F_0} = {P_0}({t_0},{x_0},\partial /\partial x){u^{(1)}}(t,x) + {F_0},$$
(3.79)

where F0 := B(t0, x0)u(t0, x0). Therefore, if u is a solution of the variable coefficient equation ut = P(t, x, ∂/∂x)u, then, u(1) satisfies the linear constant coefficient problem \(u_t^{(1)}(t,x) = {P_0}({t_0},{x_0},\partial/\partial x){u^{(1)}} + {F_0}\) obtained by freezing the coefficients in the principal part of P(t, x, ∂/∂x)u to their values at the point p0 and by replacing the lower-order term B(t, x) by the forcing term F0. By adjusting the scaling of t, a similar conclusion can be obtained when P(t, x, ∂/∂x) is a higher-derivative operator.

This leads us to the following statement: a necessary condition for the linear, variable coefficient Cauchy problem for the equation ut = P(t, x, ∂/∂x)u to be well posed is that all the corresponding problems for the frozen coefficient equations vt = P0(t0, x0, ∂/∂x)v are well posed. For a rigorous proof of this statement for the case in which P(t, x, ∂/∂x) is time-independent; see [397]. We stress that it is important to replace P(t, x, ∂/∂x) by its principal part P0(t, x, ∂/∂x) when freezing the coefficients. The statement is false if lower-order terms are retained; see [259, 397] for counterexamples.

Now it is natural to ask whether or not the converse statement is true: suppose that the Cauchy problems for all frozen coefficient equations vt = P0(t0, x0, ∂/∂x)v are well posed; is the original, variable coefficient problem also well posed? It turns out this localization principle is valid in many cases under additional smoothness requirements. In order to formulate the latter, let us go back to the first-order equation (3.77). We define its principal symbol as

$${P_0}(t,x,ik): = i\sum\limits_{j = 1}^n {{A^j}} (t,x){k_j}.$$
(3.80)

In analogy to the constant coefficient case we define:

Definition 4. The first-order system (3.77) is called

  1. (i)

    weakly hyperbolic if all the eigenvalues of its principal symbol P0(t, x, ik) are purely imaginary.

  2. (ii)

    strongly hyperbolic if there exist M > 0 and a family of positive definite, Hermitian m × m matrices H(t, x, k), (t, x, k) ∈ Ω × Sn−1, whose coefficients belong to the class \(C_b^\infty (\Omega \times {S^{n - 1}})\), such that

    $${M^{- 1}}I \leq H(t,x,k) \leq MI,\quad \quad H(t,x,k){P_0}(t,x,ik) + {P_0}{(t,x,ik)^{\ast}}H(t,x,k) = 0,$$
    (3.81)

    for all (t, x, k) ∈ Ω × Sn−1, where Ω := [0, ∞) × ℝn.

  3. (iii)

    symmetric hyperbolic if it is strongly hyperbolic and the symmetrizer H(t, x, k) can be chosen independent of k.

We see that these definitions are straight extrapolations of the corresponding definitions (see Definition 2) in the constant coefficient case, except for the smoothness requirements for the symmetrizer H(t, x, k).Footnote 8 There are examples of ill-posed Cauchy problems for which a Hermitian, positive-definite symmetrizer H(t, x, k) exists but is not smooth [397] showing that these requirements are necessary in general.

The smooth symmetrizer is used in order to construct a pseudo-differential operator

$$[H(t)v](x): = {1 \over {{{(2\pi)}^{n/2}}}}\int {H(t,x,k/\vert k\vert){e^{ik \cdot x}}\hat v(k){d^n}k,} \quad \quad \hat v(k) = {1 \over {{{(2\pi)}^{n/2}}}}\int {{e^{- ik \cdot x}}v(x){d^n}x,}$$
(3.82)

from which one defines a scalar product (·, ·)H(t), which, for each t, is equivalent to the L2 product. This scalar product has the property that a solution u to the equation (3.77) satisfies an inequality of the form

$${d \over {dt}}{(u,u)_{H(t)}} \leq b(T){(u,u)_{H(t)}},\quad \quad 0 \leq t \leq T,$$
(3.83)

see, for instance, [411]. Upon integration this yields an estimate of the form of Eq. (3.75). In the symmetric hyperbolic case, we have simply [H(t)v] = H(t, x)v(x) and the scalar product is given by

$${(u,v)_{H(t)}}: = \int {u{{(x)}^{\ast}}H(t,x)v(x){d^n}x,} \quad \quad u,v \in {L^2}({\mathbb R^n}).$$
(3.84)

We will return to the application of this scalar product for deriving energy estimates below. Let us state the important result:

Theorem 3. If the first-order system (3.77) is strongly or symmetric hyperbolic in the sense of Definition 4, then the Cauchy problem ( 3.73 , 3.74 ) is well posed in the sense of Definition 3.

For a proof of this theorem, see, for instance, Proposition 7.1 and the comments following its formulation in Chapter 7 of [411]. Let us look at some examples:

Example 18. For a given, stationary fluid field, the non-relativistic, ideal magnetohydrodynamic equations reduce to the simple system [120]

$${B_t} = \nabla \wedge (v \wedge B)$$
(3.85)

for the magnetic field B, where v is the fluid velocity. The principal symbol for this equation is given by

$${P_0}(x,ik)B = ik \wedge (v(x) \wedge B) = (ik \cdot B)v(x) - (ik \cdot v(x))B.$$
(3.86)

In order to analyze it, it is convenient to introduce an orthonormal frame e1, e2, e3 such that e1 is parallel to k. With respect to this, the matrix corresponding to P0(x, ik) is

$$i\vert k\vert \left(\begin{array}{*{20}c} 0 & 0 & 0 \\ {{v_2}(x)} & {- {v_1}(x)} & 0 \\ {{v_3}(x)} & 0 & {- {v_1}(x)} \\ \end{array}\right),$$
(3.87)

with purely imaginary eigenvalues 0, −i|k|v1(x). However, the symbol is not diagonalizable when k is orthogonal to the fluid velocity, v1(x) = 0, and so the system is only weakly hyperbolic.

One can still show that the system is well posed, if one takes into account the constraint ∇ · B = 0, which is preserved by the evolution equation (3.85). In Fourier space, this constraint forces B1 = 0, which eliminates the first row and column in the principal symbol, and yields a strongly hyperbolic symbol. However, at the numerical level, this means that special care needs to be taken when discretizing the system (3.85) since any discretization, which does not preserve ∇ · B = 0, will push the solution away from the constraint manifold, in which case the system is weakly hyperbolic. For numerical schemes, which explicitly preserve (divergence-transport) or enforce (divergence-cleaning) the constraints, see [159] and [136], respectively. For alternative formulations, which are strongly hyperbolic without imposing the constraint; see [120].

Example 19. The localization principle can be generalized to a certain class of second-order systems [261] [308]: For example, we may consider a second-order linear equation of the form

$${v_{tt}} = \sum\limits_{j,k = 1}^n {{A^{jk}}} (t,x){{{\partial ^2}} \over {\partial {x^j}\partial {x^k}}}v + \sum\limits_{j = 1}^n 2 {B^j}(t,x){\partial \over {\partial {x^j}}}{v_t} + \sum\limits_{j = 1}^n {{C^j}} (t,x){\partial \over {\partial {x^j}}}v + D(t,x){v_t} + E(t,x)v,$$
(3.88)

x ∈ ℝn, t ≥ 0, where now the m × m matrices Ajk Bj, Cj, D and E belong to the class \(C_b^\infty ([0,\infty) \times {{\rm{\mathbb R}}^n})\) of bounded, C-functions with bounded derivatives. Zooming into a very small neighborhood of a given point P0 = (t0, x0) by applying the expansion in Eq. (3.78) to v, one obtains, in the limit ε → 0, the constant coefficient equation

$$v_{tt}^{(2)}(t,x) = \sum\limits_{j,k = 1}^n {{A^{jk}}} ({t_0},{x_0}){{{\partial ^2}{v^{(2)}}} \over {\partial {x^j}\partial {x^k}}}(t,x) + \sum\limits_{j = 1}^n 2 {B^j}({t_0},{x_0}){{\partial v_t^{(2)}} \over {\partial {x^j}}}(t,x) + {F_0},$$
(3.89)

with

$${F_0}: = \sum\limits_{j = 1}^n {{C^j}} ({t_0},{x_0}){{\partial v} \over {\partial {x^j}}}({t_0},{x_0}) + D({t_0},{x_0}){v_t}({t_0},{x_0}) + E({t_0},{x_0})v({t_0},{x_0}),$$
(3.90)

where we have used the fact that \({\upsilon ^{(1)}}(t,x) = t{\upsilon _t}({t_0},{x_0}) + \sum\limits_{j = 1}^n {{x^j}{{\partial \upsilon} \over {\partial {x^j}}}({t_0},{x_0})}\). Eq. (3.89) can be rewritten as a first-order system in Fourier space for the variable

$$\hat U = \left(\begin{array}{*{20}c} {\vert k\vert \hat v\quad \quad \quad \quad \quad \quad} \\ {{{\hat v}_t} - i\sum\limits_{j = 1}^n {{B^j}} ({t_0},{x_0}){k_j}\hat v} \\ \end{array} \right),$$
(3.91)

see Section 3.1.5. Now Theorem 2 implies that the problem is well posed, if there exist constants M > 0 and δ > 0 and a family of positive definite m × m Hermitian matrices h(t, x, k), (t, x, k) ∈ Ω × Sn−1, which is C-smooth in all its arguments, such that M−1Ih(t, x, k) ≤ MI and h(t, x, k)R(t, x, k) = R(t, x, k)*h(t, x, k) ≥ δI for all (t, x, k) ∈ Ω × Sn−1, where \(R(t,x,k): = \sum\limits_{i,j = 1}^n {({A^{ij}}(t,x) + {B^i}(t,x){B^i}(t,x){B^j}(t,x)){k_i}{k_j}}\).

In particular, it follows that the Cauchy problem for the Klein-Gordon equation on a globally-hyperbolic spacetime M = [0, ∞) × ℝn with \(\alpha, {\beta ^i},{\gamma _{ij}} \in C_b^\infty ([0,\infty) \times {{\rm{\mathbb R}}^n})\), is well posed provided that α2γij is uniformly positive definite; see Example 17.

3.2.2 Characteristic speeds and fields

Consider a first-order linear system of the form (3.77), which is strongly hyperbolic. Then, for each t ≥ 0, x ∈ ℝn and kSn−1 the principal symbol P0(t, x, ik) is diagonalizable and has purely complex eigenvalues. In the constant coefficient case with no lower-order terms (B = 0) an eigenvalue (k) of P0(ik) with corresponding eigenvector a(k) gives rise to the plane-wave solution

$$u(t,x) = a(k){e^{i\mu (k)t + ik \cdot x}},\quad \quad t \geq 0,x \in {\mathbb R^n}.$$
(3.92)

if lower-order terms are present and the matrix coefficients Aj(t, x) are not constant, one can look for approximate plane-wave solutions, which have the form

$$u(t,x) = {a_\varepsilon}(t,x){e^{i{\varepsilon ^{- 1}}\psi (t,x)}},\quad \quad t \geq 0,x \in {\mathbb R^n},$$
(3.93)

where ε > 0 is a small parameter, ψ(t, x) a smooth-phase function and aε(t, x) = a0(t, x) + εa1(t, x) + ε2a2(t, x) + … a slowly varying amplitude. Introducing the ansatz (3.93) into Eq. (3.77) and taking the limit ε → 0 yields the problem

$$i{\psi _t}{a_0} = {P_0}(t,x,i\nabla \psi){a_0} = i\sum\limits_{j = 1}^n {{A^j}} (t,x){{\partial \psi} \over {\partial {x^j}}}{a_0}.$$
(3.94)

Setting ω(t, x) := ψt(t, x) and k(t, x) := ∇ψ(t, x), a nontrivial solution exists if and only if the eikonal equation

$$\det \left[ {i\omega I - {P_0}(t,x,ik)} \right] = 0$$
(3.95)

is satisfied. Its solutions provide the phase function ψ(t, x) whose level sets have co-normal ωdt + k · dx. The phase function and a0 determine approximate plane-wave solutions of the form (3.93). For this reason we call ω(k) the characteristic speed in the direction kSn−1, and a0 a corresponding characteristic mode. For a strongly hyperbolic system, the solution at each point (t, x) can be expanded in terms of the characteristic modes ej(t, x, k) with respect to a given direction kSn−1,

$$u(t,x) = \sum\limits_{j = 1}^m {{u^{(j)}}} (t,x,k){e_j}(t,x,k).$$
(3.96)

The corresponding coefficients u(j)(t, x, k) are called the characteristic fields.

Example 20. Consider the Klein-Gordon equation on a hyperbolic spacetime, as in Example 17. In this case the eikonal equation is

$$0 = \det \left[ {i\omega I - {Q_0}(ik)} \right] = \det \left(\begin{array}{*{20}c} {i(\omega - {\beta ^j}{k_j})} & {\vert k\vert} \\ {- {\alpha ^2}{\gamma ^{ij}}{k_i}{k_j}/\vert k\vert} & {i(\omega - {\beta ^j}{k_j})} \\ \end{array} \right) = - {(\omega - {\beta ^j}{k_j})^2} + {\alpha ^2}{\gamma ^{ij}}{k_i}{k_j},$$
(3.97)

which yields \({\omega _ \pm}(k) = {\beta ^j}{k_j} \pm \alpha \sqrt {{\gamma ^{ij}}{k_i}{k_j}}\). The corresponding co-normals ω±(k)dt + kjdxj is null; hence the surfaces of constant phase are null surfaces. The characteristic modes and fields are

$${e_ \pm}(k) = \left(\begin{array}{*{20}c} {i\vert k\vert} \\ {\mp \alpha \sqrt {{\gamma ^{ij}}{k_i}{k_j}}} \\ \end{array} \right),\quad \quad {u^{(\pm)}}(k) = {1 \over 2}\left({{{{U_1}} \over {i\vert k\vert}} \mp {{{U_2}} \over {\alpha \sqrt {{\gamma ^{ij}}{k_i}{k_j}}}}} \right),$$
(3.98)

where U = (U1, U2) = (|k|v, vtjkjv) and v is the Klein-Gordon field.

Example 21. In the formulation of Maxwell’s equations discussed in Example 15, the characteristic speeds are 0, \(\pm \sqrt {\alpha \beta}\) and ±1, and the corresponding characteristic fields are the components of the vector on the right-hand side of Eq. (3.54).

3.2.3 Energy estimates and finite speed of propagation

Here we focus our attention on first-order linear systems, which are symmetric hyperbolic. In this case it is not difficult to derive a priori energy estimates based on integration by parts. Such estimates assume the existence of a sufficiently smooth solution and bound an appropriate norm of the solution at some time t > 0 in terms of the same norm of the solution at the initial time t = 0. As we will illustrate here, such estimates already yield quite a lot of information on the qualitative behavior of the solutions. In particular, they give uniqueness, continuous dependence on the initial data and finite speed of propagation.

The word “energy” stems from the fact that for many problems the squared norm satisfying the estimate is directly or indirectly related to the physical energy of the system, although for many other problems the squared norm does not have a physical interpretation of any kind.

For first-order symmetric hyperbolic linear systems, an a priori energy estimate can be constructed from the symmetrizer H(t, x) in the following way. For a given smooth solution u(t, x) of Eq. (3.77), define the vector field J on Ω = [0, ∞) × ℝn by its components

$${J^\mu}(t,x): = - u{(t,x)^{\ast}}H(t,x){A^\mu}(t,x)u(t,x),\quad \quad \mu = 0,1,2, \ldots ,n,$$
(3.99)

where A0(t, x) := −I. By virtue of the evolution equation, J satisfies

$${\partial _\mu}{J^\mu}(t,x) \equiv {\partial \over {\partial t}}{J^0}(t,x) + \sum\limits_{k = 1}^n {{\partial \over {\partial {x^k}}}} {J^k}(t,x) = u{(t,x)^{\ast}}K(t,x)u(t,x),$$
(3.100)

where the Hermitian m × m matrix K(t, x) is defined as

$$K(t,x): = H(t,x)B(t,x) + B{(t,x)^{\ast}}H(t,x) + {H_t}(t,x) - \sum\limits_{k = 1}^n {{\partial \over {\partial {x^k}}}} \left[ {H(t,x){A^k}(t,x)} \right].$$
(3.101)

if K = 0, Eq. (3.100) formally looks like a conservation law for the current density J. If K ≠ 0, we obtain, instead of a conserved quantity, an energy-like expression whose growth can be controlled by its initial value. For this, we first notice that our assumptions on the matrices H(t, x), B(t, x) and Ak(t, x) imply that K(t, x) is bounded on Ω. In particular, since H(t, x) is uniformly positive, there is a constant α > 0 such that

$$K(t,x) \leq 2\alpha H(t,x),\quad \quad (t,x) \in \Omega .$$
(3.102)

Let ΩT = ∪0≤tT Σt be a tubular region obtained by piling up open subsets Σt of t = const hypersurfaces. This region is enclosed by the initial surface Σ0, the final surface ΣT and the boundary surface \({\mathcal T}: = {\cup _{0 \leq t \leq T}}\partial {\Sigma _t}\), which is assumed to be smooth. Integrating Eq. (3.100) over ΩT and using Gauss’ theorem, one obtains

$$\int\limits_{{\Sigma _T}} {{J^0}(t,x){d^n}x =} \int\limits_{{\Sigma _0}} {{J^0}(t,x){d^n}x -} \int\limits_{\mathcal T} {{e_\mu}{J^\mu}(t,x)dS +} \int\limits_{{\Omega _T}} {u{{(t,x)}^\ast}K(t,x)u(t,x)dt{d^n}x} ,$$
(3.103)

where is the unit outward normal covector to \({\mathcal T}\) and dS the volume element on that surface. Defining the “energy” contained in the surface Σt by

$$E({\Sigma _t}): = \int\limits_{{\Sigma _t}} {{J^0}} (t,x){d^n}x = \int\limits_{{\Sigma _t}} u {(t,x)^\ast}H(t,x)u(t,x){d^n}x$$
(3.104)

and assuming for the moment that the “flux” integral over \({\mathcal T}\) is positive or zero, one obtains the estimate

$$\begin{array}{*{20}c} {E({\Sigma _T}) \leq E({\Sigma _0})\int\limits_0^T {\left({\int\limits_{{\Sigma _t}} u {{(t,x)}^{\ast}} K(t, x)u(t, x){d^n}x} \right)}\,\,\,dt} \\{\leq E({\Sigma _0}) + 2\alpha \int\limits_0^T E ({\Sigma _t})dt,\quad \quad \quad \quad \quad} \end{array}$$
(3.105)

where we have used the inequality (3.102) and the definition of Et) in the last step. Defining the function \(h(T): = \int\nolimits_0^T {E({\Sigma _t})}\) this inequality can be rewritten as

$${d \over {dt}}\left({h(t){e^{- 2\alpha t}}} \right) \leq E({\Sigma _0}){e^{- 2\alpha t}},\quad \quad 0 \leq t \leq T,$$
(3.106)

which yields αh(T) ≤ E0)(e2αT − 1) upon integration. This together with (3.105) gives

$$E({\Sigma _t}) \leq {e^{2\alpha t}}E({\Sigma _0}),\quad \quad 0 \leq t \leq T,$$
(3.107)

which bounds the energy at any time t ∈ [0, T] in terms of the initial energy.

In order to analyze the conditions under which the flux integral is positive or zero, we examine the sign of the integrand eµJµ(t, x). Decomposing eµdxµ = N[a dt + s1dx1 + … + s2dxn] where s = (s1, …, sn) is a unit vector and N > 0 a positive normalization constant, we have

$${e_\mu}{J^\mu}(t,\,x) = N(t,\,x)u{(t,\,x)^{\ast}}[a(t,\,x)H(t,\,x) - H(t,\,x){P_0}(t,\,x,\,s)]u(t,\,x),$$
(3.108)

where \({P_0}(t,x,s) = \sum\limits_{j = 1}^n {{A^j}(t,x){s_j}}\) is the principal symbol in the direction of the unit vector s. This is guaranteed to be positive if the boundary surface \({\mathcal T}\) is such that a(t, x) is greater than or equal to all the eigenvalues of the boundary matrix P0(t, x, s), for each \((t,x) \in {\mathcal T}\). This is equivalent to the condition

$$a(t,x) \geq {\sup}_{u \in {\mathbb {C}}^{m},\,u \neq 0} {{{u^{\ast}}H(t,\,x){P_0}(t,\,x,\,s)u} \over {{u^{\ast}}H(t,\,x)u}}\quad \quad {\rm{for all}}(t,\,x) \in {\mathcal T}.$$
(3.109)

Since H(t, x)P0(t, x, s) is symmetric, the supremum is equal to the maximum eigenvalue of P0(t, x, s). Therefore, condition (3.109) is equivalent to the requirement that a(t, x) be greater than or equal to the maximum characteristic speed in the direction of the unit outward normal s.

With these arguments, we arrive at the following conclusions and remarks:

  • Finite speed of propagation. Let p0 = (t0, x0) ∈ Ω be a given event, and set

    $$v({t_0}): = \sup \left\{{{{{u^\ast}H(t,\,x){P_0}(t,\,x,\,s)u} \over {{u^\ast}H(t,\,x)u}}:0 \leq t \leq {t_0},\,x \in {{\mathbb R}^n},\,s \in {S^{n - 1}},\,u \in {{\mathbb C}^m},\,u \neq 0} \right\}.$$
    (3.110)

    Define the past cone at p0 asFootnote 9

    $${C^ -}({p_0}): = \{(t,\,x) \in \Omega :\vert x\vert\,\leq v({t_0})({t_0} - t)\} .$$
    (3.111)

    The unit outward normal to its boundary is eµdxµ = N[v(t0)dt+x·dx/|x|], which satisfies the condition (3.109). It follows from the estimate (3.107) applied to the domain ΩT = C(p0) that the solution is zero on C(p0) if the initial data is zero on the intersection of the cone C(p0) with the initial surface t = 0. In other words, a perturbation in the initial data outside the ball |x| ≤ v (t0)t0 does not alter the solution inside the cone C(p0). Using this argument, it also follows that if ƒ has compact support, the corresponding solution u(t, ·) also has compact support for all t > 0.

  • Continuous dependence on the initial data. Let \(f \in C_0^\infty ({{\rm{R}}^n})\) be smooth initial data with compact support. As we have seen above, the corresponding smooth solution u(t, ·) also has compact support for each t ≥ 0. Therefore, applying the estimate (3.107) to the case Σt := {t} × ℝn, the boundary integral vanishes and we obtain

    $$E({\Sigma _t}) \leq {e^{2\alpha t}}E({\Sigma _0}),\quad \quad t \geq 0.$$
    (3.112)

    In view of the definition of Et), see Eq. (3.104), and the properties (3.81) of the symmetrizer, it follows that

    $$\Vert u(t, \cdot)\Vert \, \leq M{e^{\alpha t}}\Vert f \Vert ,\quad \quad t \geq 0,$$
    (3.113)

    which is of the required form; see Definition 3. In particular, we have uniqueness and continuous dependence on the initial data.

  • The statements about finite speed of propagation and continuous dependence on the data can easily be generalized to the case of a first-order symmetric hyperbolic inhomogeneous equation ut = P(t, x, ∂/∂x)u + F(t, x), with F : Ω → ℂm a bounded, C-function with bounded derivatives. In this case, the inequality (3.113) is replaced by

    $$\Vert u(t, \cdot)\Vert \, \leq \,M{e^{\alpha t}}\left[ {\Vert f \Vert + \int\limits_0^t {{e^{- \alpha s}}}\Vert F(s, \cdot)\Vert ds} \right],\quad \quad t \geq 0.$$
    (3.114)
  • If the boundary surface \({\mathcal T}\) does not satisfy the condition (3.109) for the boundary integral to be positive, then suitable boundary conditions need to be specified in order to control the sign of this term. This will be discussed in Section 5.2.

  • Although different techniques have to be used to prove them, very similar results hold for strongly hyperbolic systems [353].

  • For definitions of hyperbolicity of a geometric PDE on a manifold, which do not require a 3+1 decomposition of spacetime, see, for instance, [205, 353], for first-order systems and [47] for second-order ones.

Example 22. We have seen that for the Klein-Gordon equation propagating on a globally-hyperbolic spacetime, the characteristic speeds are the speed of light. Therefore, in the case of a constant metric (i.e., Minkowksi space), the past cone C(p0) defined in Eq. (3.111) coincides with the past light cone at the event p0. A slight refinement of the above argument shows that the statement remains true for a Klein-Gordon field propagating on any hyperbolic spacetime.

Example 23. In Example 21 we have seen that the characteristic speeds of the system given in Example 15 are 0, \(\pm \sqrt {\alpha \beta}\) and ±1, where αβ > 0 is assumed for strong hyperbolicity. Therefore, the past cone C(p0) corresponds to the past light cone provided that 0 < αβ ≤ 1. For αβ > 1, the formulation has superluminal constraint-violating modes, and an initial perturbation emanating from a region outside the past light cone at p0 could affect the solution at p0. In this case, the past light cone at p0 is a proper subset of C(p0).

3.3 Quasilinear equations

Next, we generalize the theory one more step and consider evolution systems, which are described by quasilinear partial differential equations, that is, by nonlinear partial differential equations, which are linear in their highest-order derivatives. This already covers most of the interesting physical systems, including the Yang-Mills and the Einstein equations. Restricting ourselves to the first-order case, such equations have the form

$${u_t} = \sum\limits_{j = 1}^n {{A^j}} (t,\,x,\,u){\partial \over {\partial {x^j}}}u + F(t,\,x,\,u),\quad \quad 0 \leq t \leq T,\,\quad x \in {{\mathbb R}^n},$$
(3.115)

where all the coefficients of the complex m×m matrices A1(t, x, u), …, An(t, x, u) and the nonlinear source term F(t, x, u) ∈ ℂm belong to the class \(C_b^\infty ([0,T] \times {{\rm{\mathbb R}}^n} \times {{\rm{\mathbb C}}^m})\) of bounded, C-functions with bounded derivatives. Compared to the linear case, there are two new features the solutions may exhibit:

  • The nonlinear term F(t, x, u) may induce blowup of the solutions in finite time. This is already the case for the simple example where m = 1, all the matrices Aj vanish identically and F(t, x, u) = u2, in which case Eq. (3.115) reduces to ut = u2. In the context of Einstein’s equations such a blowup is expected when a curvature singularity forms, or it could also occur in the presence of a coordinate singularity due to a “bad” gauge condition.

  • In contrast to the linear case, the matrix functions Aj in front of the derivative operator now depend pointwise on the state vector itself, which implies, in particular, that the characteristic speeds and fields depend on u. This can lead to the formation of shocks where characteristics cross each other, like in the simple example of Burger’s equation ut = uux corresponding to the case m = n = 1, A1(t, x, u) = and F(t, x, u) = 0. In general, shocks may form when the system is not linearly degenerated or genuinely nonlinear [250]. The Einstein vacuum equations, on the other hand, can be written in linearly degenerate form (see, for example, [6, 7, 348, 8]) and are therefore expected to be free of physical shocks.

For these reasons, one cannot expect global existence of smooth solutions from smooth initial data with compact support in general, and the best one can hope for is existence of a smooth solution on some finite time interval [0, T], where T might depend on the initial data.

Under such restrictions, it is possible to prove well-posedness of the Cauchy problem. The idea is to linearize the problem and to apply Banach’s fixed-point theorem. This is discussed next.

3.3.1 The principle of linearization

Suppose u(0)(t, x) is a C (reference) solution of Eq. (3.115), corresponding to initial data ƒ(x) = u(0)(0, x). Assuming this solution to be uniquely determined by the initial data ƒ, we may ask if a unique solution u also exists for the perturbed problem

$${u_t}(t,{\mkern 1mu} x) = \sum\limits_{j = 1}^n {{A^j}} (t,{\mkern 1mu} x,{\mkern 1mu} u){\partial \over {\partial {x^j}}}u(t,{\mkern 1mu} x) + F(t,{\mkern 1mu} x,{\mkern 1mu} u) + \delta F(t,{\mkern 1mu} x),\;\;x \in {{\mathbb R^n}},\quad 0 \leq t \leq T,$$
(3.116)
$$u(0,\,x) = f(x) + \delta f(x),\quad x \in {{\mathbb R}^n},$$
(3.117)

where the perturbations δF(t, x) and δƒ(x) belong to the class of bounded, C-functions with bounded derivatives. This leads to the following definition:

Definition 5. Consider the nonlinear Cauchy problem given by Eq. (3.115) and prescribed initial data for u at t = 0. Let u(0) be a C-solution to this problem, which is uniquely determined by its initial data ƒ. Then, the problem is called well posed at mathb ƒu(0), if there are normed vector spaces X, Y, and Z and constants K > 0, ε > 0 such that for all sufficiently-smooth perturbations δƒ and δF lying in Y and Z, respectively, with

$$\Vert \delta f\Vert_{Y} + \Vert \delta F\Vert_{Z} < \varepsilon ,$$
(3.118)

the perturbed problem ( 3.116 , 3.117 ) is also uniquely solvable and the corresponding solution u satisfies uu(0)X and the estimate

$$\Vert u - {u^{(0)}}\Vert_{X} \leq K\left(\Vert {\delta f \Vert_{Y} + \Vert \delta F \Vert_{Z}} \right).$$
(3.119)

Here, the norms X and Y appearing on both sides of Eq. (3.119) are different from each other because ‖uu(0)X controls the function uu(0) over the spacetime region [0, T] × ℝn while is a norm controlling the function δƒ on ℝn.

If the problem is well posed at u(0), we may consider a one-parameter curve ƒε of initial data lying in \(C_0^\infty ({{\rm{\mathbb R}}^n})\) that goes through ƒ and assume that there is a corresponding solution uε(t, x) for each small enough |ε|, which lies close to u(0) in the sense of inequality (3.119). Expanding

$${u_\varepsilon}(t,\,x) = {u^{(0)}}(t,\,x) + \varepsilon {v^{(1)}}(t,\,x) + {\varepsilon ^2}{v^{(2)}}(t,\,x) + \ldots$$
(3.120)

and plugging into the Eq. (3.115) we find, to first order in ε,

$$v_t^{(1)} = \sum\limits_{j = 1}^n {A_0^j} (t,\,x){\partial \over {\partial {x^j}}}{v^{(1)}} + {B_0}(t,\,x){v^{(1)}},$$
(3.121)

with

$$A_0^j(t,\,x) = {A^j}(t,\,x,\,{u^{(0)}}(t,\,x)),\quad \quad {B_0}(t,\,x) = {{\partial {A^j}} \over {\partial u}}(t,\,x,\,{u^{(0)}}(t,\,x)){{\partial {u^{(0)}}} \over {\partial {x^j}}} + {{\partial F} \over {\partial u}}(t,\,x,\,{u^{(0)}}(t,\,x)).$$
(3.122)

Eq. (3.121) is a first-order linear equation with variable coefficients for the first variation, v(1), for which we can apply the theory described in Section 3.2. Therefore, it is reasonable to assume that the linearized problem is strongly hyperbolic for any smooth function u(0)(t, x). In particular, if we generalize the definitions of strongly and symmetric hyperbolicity given in Definition 4 to the quasilinear case by requiring that the symmetrizer H(t, x, k, u) has coefficients in \(C_0^\infty (\Omega \times {S^{n - 1}} \times {{\rm{C}}^m})\), it follows that the linearized problem is well posed provided that the quasilinear problem is strongly or symmetric hyperbolic.

The linearization principle states that the converse is also true: the nonlinear problem is well posed at u(0) if all the linear problems, which are obtained by linearizing Eq. (3.115) at functions in a suitable neighborhood of u(0) are well posed. To prove that this principle holds, one sets up the following iteration. We define the sequence u(k) of functions by iteratively solving the linear problems

$$u_t^{(k + 1)} = \sum\limits_{j = 1}^n {{A^j}} (t,\,x,\,{u^{(k)}}){\partial \over {\partial {x^j}}}{u^{(k + 1)}} + F(t,\,x,\,{u^{(k)}}) + \delta F(t,x),\;\;x \in {{\mathbb R}^n},\quad 0 \leq t \leq T,$$
(3.123)
$${u^{(k + 1)}}(0,x) = f(x) + \delta f(x),\;\;x \in {{\mathbb R}^n},$$
(3.124)

for k = 0, 1, 2, … starting with the reference solution u(0). If the linearized problems are well posed in the sense of Definition 3 for functions lying in a neighborhood of u(0), one can solve each Cauchy problem (3.123, 3.124), at least for small enough time Tk. The key point then, is to prove that Tk does not shrink to zero when k → ∞ and to show that the sequence u(k) of functions converges to a solution of the perturbed problem (3.116, 3.117). This is, of course, a nontrivial task, which requires controlling u(k) and its derivatives in an appropriate way. For particular examples where this program is carried through; see [259]. For general results on quasilinear symmetric hyperbolic systems; see [251, 164, 412, 51].

3.4 Abstract evolution operators

A general framework for treating evolution problems is based on methods from functional analysis. Here, one considers a linear operator A : D(A) ⊂ XX with dense domain, \(\overline {D(A)} = X\), in a Banach space X and asks under which conditions the Cauchy problem

$${u_t}(t) = Au(t),\quad \quad t \geq 0,$$
(3.125)
$$u{\rm{(0) =}}f,$$
(3.126)

possesses a unique solution curve, i.e., a continuously differentiable map u : [0, ∞) → D(A) ⊂ X satisfying Eqs. (3.125, 3.126) for each ƒD(A). Under a mild assumption on this turns out to be the case if and only ifFootnote 10 the operator A is the infinitesimal generator of a strongly continuous semigroup P(t), that is, a map \(P:[0,\infty) \rightarrow {\mathcal L}(X)\), with \({\mathcal L}(X)\) denoting the space of bounded, linear operators on X, with the properties that

  1. (i)

    P(0) = I,

  2. (ii)

    P(t + s) = P(t)P(s) for all t, s ≥ 0,

  3. (iii)

    \(\underset {t \rightarrow 0} {\lim} P(t)u = u\) for all \(u \in X\),

  4. (iv)

    \(D(A) = \left\{{u \in X:\underset {t \rightarrow 0} {\lim} {1 \over t}[P(t)u - u]{\rm{exists in}}X} \right\}\) and \(Au = \underset {t \rightarrow 0} {\lim} {1 \over t}[P(t)u - u],u \in D(A)\).

In this case, the solution curve of the Cauchy problem (3.125, 3.126) is given by u(t) = P(t)ƒ, t ≥ 0, ƒD(A). One can show [327, 51] that P(t) always possesses constants K ≥ 1 and α ∈ ℝ such that

$$\Vert P(t)\Vert \leq K{e^{\alpha t}},\quad \quad t \geq 0,$$
(3.127)

which implies that ‖u(t)‖ ≤ Keαtƒ‖ for all ƒ ∈ D(A) and all t ≥ 0. Therefore, the semigroup P(t) gives existence, uniqueness and continuous dependence on the initial data.

There are several results giving necessary and sufficient conditions for the linear operator A to generate a strongly continuous semigroup; see, for instance, [327, 51]. One useful result, which we formulate for Hilbert spaces, is the following:

Theorem 4 (Lumer-Phillips). Let X be a complex Hilbert space with scalar product (·, ·), and let A : D(A) ⊂ XX be a linear operator. Let α ∈ ℝ. Then, the following statements are equivalent:

  1. (i)

    A is the infinitesimal generator of a strongly continuous semigroup P(t) such thatP(t)‖ ≤ eαt for all t ≥ 0.

  2. (ii)

    AαI is dissipative, that is, Re(u, Auαu) ≤ 0 for all uD(A), and the range of AλI is equal X for some λ > α.

Example 24. As a simple example consider the Hilbert space X = L2(ℝn) with the linear operator A : D(A) ⊂ XX defined by

$$\begin{array}{*{20}c} {D(A): = \{u \in X:(1 + \vert k\vert ^{2}){\mathcal F}u \in {L^2}({{\mathbb R}^n})\} ,} \\{Au: = \Delta u = - {{\mathcal F}^{- 1}}(\vert k\vert ^{2}{\mathcal F}u),\quad \quad u \in D(A),} \end{array}$$

where Ƒ denotes the Fourier-Plancharel operator; see Section 2. Using Parseval’s identity, we find

$${\rm{Re}}(u,\,Au) = {\rm{Re}}({\mathcal F}u, - \vert k \vert ^{2}{\mathcal F}u) = -\Vert (\vert k\vert {\mathcal F}u)^{2} \leq 0,$$
(3.128)

hence A is dissipative. Furthermore, let vL2(ℝn), then

$$u: = {{\mathcal F}^{- 1}}\left({{{{\mathcal F}v} \over {1 + \vert k\vert ^{2}}}} \right)$$
(3.129)

defines an element in D(A) satisfying (IA)u = u − Δu = v. Therefore, the range of AI is equal to X, and Theorem 4 implies that A = Δ generates a strongly continuous semigroup P(t) on X such that ‖P(t)‖ ≤ 1 for all t ≥ 0. The curves u(t) := P(t)ƒ, t ≥ 0, ƒL2(ℝn) are the weak solutions to the heat equation on ℝn; see Section 3.1.2.

In general, the requirement for AαI to be dissipative is equivalent to finding an energy estimate for the squared norm E := ‖u2 of u. Indeed, setting u(t) := P(t)ƒ and using ut = AP(t)ƒ we find

$${d \over {dt}}E(t) = {d \over {dt}}\Vert u(t)\Vert ^{2} = 2{\rm{Re}}(u(t),\,Au(t)) \leq 2\alpha \Vert u(t)\Vert ^{2} = 2\alpha E(t)$$
(3.130)

for all t ≥ 0 and ƒD(A), which yields the estimate

$$\Vert u(t) \Vert \leq {e^{\alpha t}}\Vert f\Vert ,\quad \quad t \geq 0,$$
(3.131)

for all ƒD(A). Given the dissipativity of AαI, the second requirement, that the range of AλI is X for some λ > α, is equivalent to demanding that the linear operator AλI : D(A) → X be invertible. Therefore, proving this condition requires solving the linear equation

$$Au - \lambda u = v$$
(3.132)

for given vX. This condition is important for the existence of solutions, and shows that for general evolution problems, requiring an energy estimate is not sufficient. This statement is rather obvious, because given that AαI is dissipative on D(A), one could just make D(A) smaller, and still have an energy estimate. However, if D(A) is too small, the Cauchy problem is over-determined and a solution might not exist. We will encounter explicit examples of this phenomenon in Section 5, when discussing boundary conditions.

Finding the correct domain D(A) for the infinitesimal generator A is not always a trivial task, especially for equations involving singular coefficients. Fortunately, there are weaker versions of the Lumer-Phillips theorem, which only require checking conditions on a subspace DD(A), which is dense in X. It is also possible to formulate the Lumer-Phillips theorem on Banach spaces. See [327, 152, 51] for more details.

The semigroup theory can be generalized to time-dependent operators A(t), and to quasilinear equations where A(u) depends on the solution u itself. We refer the reader to [51] for these generalizations and for applications to examples from mathematical physics including general relativity. The theory of strongly continuous semigroups has also been used for formulating well-posed initial-boundary value formulations for the Maxwell equations [354] and the linearized Einstein equations [309] with elliptic gauge conditions.

4 Initial-Value Formulations for Einstein’s Equations

In this section, we apply the theory discussed in Section 3 to well-posed Cauchy formulations of Einstein’s vacuum equations. The first such formulation dates back to the 1950s [169] and will be discussed in Section 4.1. Since then, there has been a plethora of new formulations, which distinguish themselves by the choice of variables (metric vs. tetrad, Christoffel symbols vs. connection coefficients, inclusion or not of curvature components as independent variables, etc.), the choice of gauges and the use of the constraint equations in order to modify the evolution equations off the constraint surface. Many of these new formulations have been motivated by numerical calculations, which try to solve a given physical problem in a stable way.

By far the most successful formulations for numerically-evolving compact-object binaries have been the harmonic system, which is based on the original work of [169], and that of Baumgarte-Shapiro-Shibata-Nakamura (BSSN) [390, 44]. For this reason, we review these two formulations in detail in Sections 4.1 and 4.3, respectively. In Section 4.2 we also review the Arnowitt-Deser-Misner (ADM) formulation [30], which is based on a Hamiltonian approach to general relativity and serves as a starting point for many hyperbolic systems, including the BSSN one. A list of references for hyperbolic reductions of Einstein’s equations not discussed here is given in Section 4.4.

4.1 The harmonic formulation

We start by discussing the harmonic formulation of Einstein’s field equations. Like in the potential formulation of electromagnetism, where the Lorentz gauge ∇µAµ = 0 allows one to cast Maxwell’s equations into a system of wave equations, it was observed early in [134, 269] that Einstein’s equations reduce to a system of wave equations when harmonic coordinates,

$${\nabla ^\mu}{\nabla _\mu}{x^\nu} = 0,\qquad \nu = 0,1,2,3,$$
(4.1)

are used. There are many straightforward generalizations of these gauge conditions; one of them is to replace the right-hand side of Eq. (4.1) by given source functions Hv [178, 182, 202].

In order to keep general covariance, we follow [232] and choose a fixed smooth background metric \({\overset \circ g _{\alpha \beta}}\) with corresponding Levi-Civita connection \(\overset \circ \nabla\), Christoffel symbols \(\overset \circ \Gamma {\,^\mu}_{\alpha \beta}\), and curvature tensor \(\overset \circ R {\,^\alpha}_{\beta \mu \nu}\). Then, the generalized harmonic gauge condition can be rewritten asFootnote 11

$${C^\mu}: = {g^{\alpha \beta}}\left({{\Gamma ^\mu}_{\alpha \beta} - {{\overset \circ \Gamma}\,^\mu}_{\alpha \beta}} \right) + {H^\mu} = 0.$$
(4.3)

In the particular case where Hµ = 0 and where the background metric is Minkowski in standard Cartesian coordinates, \(\overset \circ \Gamma {\,^\mu}_{\alpha \beta}\) vanishes, and the condition Cµ = 0 reduces to the harmonic coordinate expression (4.1). However, unlike condition (4.1), Eq. (4.3) yields a coordinate-independent condition for any given vector field Hµ on spacetime since the difference \({C^\mu}_{\alpha \beta}: = {\Gamma ^\mu}_{\alpha \beta} - \overset \circ \Gamma {\,^\mu}_{\alpha \beta}\) between two connections is a tensor field. In terms of the difference, \(\,{h_{\alpha \beta}}: = {g_{\alpha \beta}} - {\overset \circ g _{\alpha \beta}}\), between the dynamical and background metric, this tensor field can be expressed as

$${C^\mu}_{\alpha \beta} = {1 \over 2}{g^{\mu \nu}}\left({{{\overset \circ \nabla}_\alpha}{h_{\beta \nu}} + {{\overset \circ \nabla}_\beta}{h_{\alpha \nu}} - {{\overset \circ \nabla}_\nu}{h_{\alpha \beta}}} \right).$$
(4.4)

Of course, the coordinate-independence is now traded for the introduction of a background metric \({\overset \circ g _{\alpha \beta}}\), and the question remains of how to choose \({\overset \circ g _{\alpha \beta}}\) and the vector field Hµ. A simple possibility is to choose Hµ = 0 and \({\overset \circ g _{\alpha \beta}}\) equal to the initial data for the metric, such that hµv = 0 initially.

Einstein’s field equations in the gauge Cµ = 0 are equivalent to the wave system

$$\begin{array}{*{20}c} {{g^{\mu \nu}}{{\overset \circ \nabla}_\mu}{{\overset \circ \nabla}_\nu}{h_{\alpha \beta}} = 2\,{g_{\sigma \tau}}{g^{\mu \nu}}{C^\sigma}_{\alpha \mu}{C^\tau}_{\beta \nu} + 4\,{C^\mu}_{\nu (\alpha}{g_{\beta)\sigma}}{C^\sigma}_{\mu \tau}{g^{\nu \tau}} - 2\,{g^{\mu \nu}}\,\overset \circ R {\,^\sigma}_{\mu \nu (\alpha}{g_{\beta)\sigma}}} \\ {+ 16\pi {G_N}\left({{T_{\alpha \beta}} - {1 \over 2}{g_{\alpha \beta}}{g^{\mu \nu}}{T_{\mu \nu}}} \right) - 2\,{\nabla _{(\alpha}}{H_{\beta)}}\,,} \\ \end{array}$$
(4.5)

where Tαβ is the stress-energy tensor and Newton’s constant. This system is subject to the harmonic constraint

$$0 = {C^\mu} = {g^{\mu \nu}}{g^{\alpha \beta}}\left({{{\overset \circ \nabla}_\alpha}{h_{\beta \nu}} - {1 \over 2}{{\overset \circ \nabla}_\nu}{h_{\alpha \beta}}} \right) + {H^\mu}.$$
(4.6)

4.1.1 Hyperbolicity

For any given smooth stress-energy tensor Tαβ, the equations (4.5) constitute a quasilinear system of ten coupled wave equations for the ten coefficients of the difference metric hαβ (or equivalently, for the ten components of the dynamical metric gαβ) and, therefore, we can apply the results of Section 3 to formulate a (local in time) well-posed Cauchy problem for the wave system (4.5) with initial conditions

$${h_{\alpha \beta}}(0,x) = h_{\alpha \beta}^{(0)}(x),\qquad {{\partial {h_{\alpha \beta}}} \over {\partial t}}(0,x) = k_{\alpha \beta}^{(0)}(x),$$
(4.7)

where \(h_{\alpha \beta}^{(0)}\) and \(k_{\alpha \beta}^{(0)}\) are two sufficiently-smooth symmetric tensor fields defined on the initial slice t = 0 satisfying the requirement that \({g_{\alpha \beta}}(0,x) = {\overset \circ g _{\alpha \beta}}(0,x) + h_{\alpha \beta}^{(0)}\) has Lorentz signature such that g00(0, x) < 0 and the induced metric gij(0, x), = 1, 2, 3, on t = 0 is positive definiteFootnote 12. For detailed well-posed Cauchy formulations we refer the reader to the original work in [169]; see also [85], [164], and [246], which presents an improvement on the results in the previous references due to weaker smoothness assumptions on the initial data.

An alternative way of establishing the hyperbolicity of the system (4.5) is to cast it into first-order symmetric hyperbolic form [164, 18, 286]. There are several ways of constructing such a system; the simplest one is obtained [164] by introducing the first partial derivatives of gαβ as new variables,

$${k_{\alpha \beta}}: = {{\partial {g_{\alpha \beta}}} \over {\partial t}},\qquad {D_{j\alpha \beta}}: = {{\partial {g_{\alpha \beta}}} \over {\partial {x^j}}},\qquad j = 1,2,3.$$
(4.8)

then, the second-order wave system (4.5) can be rewritten in the form

$${{\partial {g_{\alpha \beta}}} \over {\partial t}} = {k_{\alpha \beta}},$$
(4.9)
$${{\partial {D_{j\alpha \beta}}} \over {\partial t}} = {{\partial {k_{\alpha \beta}}} \over {\partial {x^j}}},$$
(4.10)
$${{\partial {k_{\alpha \beta}}} \over {\partial t}} = - 2{{{g^{0j}}} \over {{g^{00}}}}{{\partial {k_{\alpha \beta}}} \over {\partial {x^j}}} - {{{g^{ij}}} \over {{g^{00}}}}{{\partial {D_{i\alpha \beta}}} \over {\partial {x^j}}} + {\rm{l}}{\rm{.o}}.,$$
(4.11)

where l.o. are lower-order terms not depending on any derivatives of the state vector u = (gαβ, kαβ, Djαβ). The system of equations (4.9, 4.10, 4.11) constitutes a quasilinear first-order symmetric hyperbolic system for u with symmetrizer given by the quadratic form

$${u^{\ast}}H(u)u = \sum\limits_{\alpha ,\beta = 0}^3 {\left({g_{\alpha \beta}^{\ast}{g_{\alpha \beta}} + \vert {g^{00}}\vert k_{\alpha \beta}^{\ast}{k_{\alpha \beta}} + {g^{ij}}D_{i\alpha \beta}^{\ast}{D_{j\alpha \beta}}} \right)} .$$
(4.12)

However, it should be noted that the symmetrizer is only positive definite if gij is; that is, only if the time evolution vector field t is time-like. In many situations, this requirement might be too restrictive. Inside a Schwarzschild black hole, for example, the asymptotically time-like Killing field t is space-like.

However, as indicated above, the first-order symmetric hyperbolic reduction (4.9, 4.10, 4.11) is not unique. A different reduction is based on the variables ũ = (hαβ, Παβ, Φjαβ), where \({\Pi _{\alpha \beta}}: = {n^\mu}{\overset \circ \nabla _\mu}{h_{\alpha \beta}}\) is the derivative of gαβ in the direction of the future-directed unit normal nµ to the time-slices t = const, and \({\Phi _{j\alpha \beta}}: = {\overset \circ \nabla _j}{h_{\alpha \beta}}\). This yields a first-order system, which is symmetric hyperbolic as long as the t = const slices are space-like, independent of whether or not t is time-like [18, 286].

4.1.2 Constraint propagation and damping

The hyperbolicity results described above guarantee that unique solutions of the nonlinear wave system (4.5) exist, at least for short times, and that they depend continuously on the initial data \(h_{\alpha \beta}^{(0)},k_{\alpha \beta}^{(0)}\). However, in order to obtain a solution of Einstein’s field equations one has to ensure that the harmonic constraint (4.3) is identically satisfied.

The system (4.5) is equivalent to the modified Einstein equations

$${R^{\alpha \beta}} + {\nabla ^{(\alpha}}{C^{\beta)}} = 8\pi {G_N}\left({{T^{\alpha \beta}} - {1 \over 2}{g^{\alpha \beta}}{g_{\mu \nu}}{T^{\mu \nu}}} \right),$$
(4.13)

where denotes the Ricci tensor, and where Cµ = 0 if the harmonic constraint holds. From the twice contracted Bianchi identities 2∇βRαβ − ∇α(gµvRµv) = 0 one obtains the following equation for the constraint variable Cα,

$${g^{\mu \nu}}{\nabla _\mu}{\nabla _\nu}{C^\alpha} + {R^\alpha}_\beta {C^\beta} = - 16\pi {G_N}{\nabla _\beta}{T^{\alpha \beta}}.$$
(4.14)

This system describes the propagation of constraint violations, which are present if is nonzero. For this reason, we call it the constraint propagation system, or subsidiary system. Provided the stress-energy tensor is divergence free, ∇βTαβ = 0, this is a linear, second-order hyperbolic equation for Cα.Footnote 13 Therefore, it follows from the uniqueness properties of such hyperbolic problems that Cα = 0 provided the initial data \(h_{\alpha \beta}^{(0)},k_{\alpha \beta}^{(0)}\) satisfies the initial constraints

$${C^\alpha}(0,x) = 0,\qquad {{\partial {C^\alpha}} \over {\partial t}}(0,x) = 0.$$
(4.15)

This turns out to be equivalent to solving Cα(0, x) = 0 plus the usual Hamiltonian and momentum constraints; see [169, 286]. Summarizing, specifying initial data \(h_{\alpha \beta}^{(0)},k_{\alpha \beta}^{(0)}\) satisfying the constraints (4.15), the corresponding unique solution to the nonlinear wave system (4.5) yields a solution to the Einstein equations.

However, in numerical calculations, one cannot assume that the initial constraints (4.15) are satisfied exactly, due to truncation and roundoff errors. The propagation of these errors is described by the constraint propagation system (4.14), and hyperbolicity guarantees that for each fixed time t > 0 of existence, these errors converge to zero if the initial constraint violation converges to zero, which is usually the case when resolution is increased. On the other hand, due to limited computer resources, one cannot reach the limit of infinite resolution, and from a practical point of view one does not want the constraint errors to grow rapidly in time for fixed resolution. Therefore, one would like to design an evolution scheme in which the constraint violations are damped in time, such that the constraint hypersurface is an attractor set in phase space. A general method for damping constraints violations in the context of first-order symmetric hyperbolic formulations of Einstein’s field equations was given in [74]. This method was then adapted to the harmonic formulation in [224]. The procedure proposed in [224] consists in adding lower-order friction terms in Eq. (4.13), which damp constraint violations. Explicitly, the modified system reads

$${R^{\alpha \beta}} + {\nabla ^{(\alpha}}{C^{\beta)}} - \kappa \left({{n^{(\alpha}}{C^{\beta)}} - {1 \over 2}(1 + \rho){g^{\alpha \beta}}{n_\mu}{C^\mu}} \right) = 8\pi {G_N}\left({{T^{\alpha \beta}} - {1 \over 2}{g^{\alpha \beta}}{g_{\mu \nu}}{T^{\mu \nu}}} \right),$$
(4.16)

with nµ the future-directed unit normal to the t = const surfaces, and κ and ρ real constants, where κ > 0 determines the timescale on which the constraint violations Cµ are damped.

With this modification the constraint propagation system reads

$${g^{\mu \nu}}{\nabla _\mu}{\nabla _\nu}{C^\alpha} + {R^\alpha}_\beta {C^\beta} - \kappa {\nabla _\beta}\left({2{n^{(\alpha}}{C^{\beta)}} + \rho {g^{\alpha \beta}}{n_\mu}{C^\mu}} \right) = - 16\pi {G_N}{\nabla _\beta}{T^{\alpha \beta}}.$$
(4.17)

A mode analysis for linear vacuum perturbations of the Minkowski metric reveals [224] that for κ > 0 and ρ > −1 all modes, except those, which are constant in space, are damped. Numerical codes based on the modified system (4.16) or similar systems have been used in the context of binary black-hole evolutions [335, 336, 286, 384, 36, 403, 320], the head-on collision of boson stars [323] and the evolution of black strings in five-dimensional gravity [279], among other references.

For a discussion on possible effects due to nonlinearities in the constraint propagation system; see [185].

4.1.3 Geometric issues

The results described so far guarantee the local-in-time unique existence of solutions to Einstein’s equations in harmonic coordinates, given a sufficiently-smooth initial data set (h(0), k(0)). However, since general relativity is a diffeomorphism invariant theory, some questions remain. The first issue is whether or not the harmonic gauge is sufficiently general such that any solution of the field equations can be obtained by this method, at least for short enough time. The answer is affirmative [169, 164]. Namely, let (M, g), M = (−ε, ε) × ℝ3, be a smooth spacetime satisfying Einstein’s field equations such that the initial surface t = 0 is spacelike with respect to g. Then, we can find a diffeomorphism ϕ : MM in a neighborhood of the initial surface, which leaves it invariant and casts the metric into the harmonic gauge. For this, one solves the harmonic wave map equation (4.2) with initial data

$${\phi ^0}(0,x) = 0,\qquad {{\partial {\phi ^0}} \over {\partial t}}(0,x) = 1,\qquad {\phi ^i}(0,x) = {x^i},\qquad {{\partial {\phi ^i}} \over {\partial t}}(0,x) = 0{.}$$
(4.18)

Since equation (4.2) is a second-order hyperbolic one, a unique solution exists, at least on some sufficiently-small time interval (−ε′, ε′). Furthermore, choosing ε′ > 0 small enough, ϕ : (−ε′, ε′) × ℝ3M describes a diffeomorphism when restricted to its image. By construction, ḡ := (ϕ−1)*g satisfies the harmonic gauge condition (4.3).

The next issue is the question of geometric uniqueness. Let g(1) and g(2) be two solutions of Einstein’s equations with the same initial data on t = 0, i.e., \(g_{\alpha \beta}^{(1)}(0,x) = g_{\alpha \beta}^{(2)}(0,x),{\partial _t}g_{\alpha \beta}^{(1)}(0,x) = {\partial _t}g_{\alpha \beta}^{(2)}(0,x)\). Are these solutions related, at least for small time, by a diffeomorphism? Again, the answer is affirmative [169, 164] because one can transform both solutions to harmonic coordinates using the above diffeomorphism ϕ without changing their initial data. It then follows by the uniqueness property of the nonlinear wave system (4.5) that the transformed solutions must be identical, at least on some sufficiently-small time interval. Note that this geometric uniqueness property also implies that the solutions are, at least locally, independent of the background metric. For further results on geometric uniqueness involving only the first and second fundamental forms of the initial surface; see [127], where it is shown that every such initial-data set satisfying the Hamiltonian and momentum constraints possesses a unique maximal Cauchy development.

Finally, we mention that results about the nonlinear stability of Minkowski spacetime with respect to vacuum and vacuum-scalar perturbations have been established based on the harmonic system [283, 284], offering an alternative proof to the one of [129].

4.2 The ADM formulation

In the usual 3+1 decomposition of Einstein’s field equations (see, for example, [214], for a through discussion of it) one evolves the three metric and the extrinsic curvature (the first and second fundamental forms) relative to a foliation Σt of spacetime by spacelike hypersurfaces. The motivation for this formulation stems from the Hamiltonian description of general relativity (see, for instance, Appendix E in [429]) where the “q” variables are the three metric γij and the associated canonical momenta πij (the “p” variables) are related to the extrinsic curvature Kij according to

$${\pi ^{ij}} = - \sqrt \gamma \left({{K^{ij}} - {\gamma ^{ij}}K} \right),$$
(4.19)

where γ = det(γij) denotes the determinant of the three-metric and K = γijKij the trace of the extrinsic curvature.

In York’s formulation [444] of the 3+1 decomposed Einstein equations, the evolution equations are

$${\partial _0}{\gamma _{ij}} = - 2{K_{ij}},$$
(4.20)
$${\partial _0}{K_{ij}} = R_{ij}^{(3)} - {1 \over \alpha}{D_i}{D_j}\alpha + K{K_{ij}} - 2{K_i}^l{K_{lj}} - 8\pi {G_N}\left[ {{\sigma _{ij}} + {1 \over 2}{\gamma _{ij}}(\rho - \sigma)} \right].$$
(4.21)

Here, the operator 0 is defined as 0 := α−1(t£β) with α and βi denoting lapse and shift, respectively. It is equal to the Lie derivative along the future-directed unit normal n to the time slices when acting on covariant tensor fields orthogonal to n. Next, \(R_{ij}^{(3)}\) and Dj are the Ricci tensor and covariant derivative operator belonging to the three metric γij, and ρ := nαnβTαβ and σij := Tij are the energy density and the stress tensor as measured by observers moving along the future-directed unit normal n to the time slices. Finally, σ := γijTij denotes the trace of the stress tensor. The evolution system (4.20, 4.21) is subject to the Hamiltonian and momentum constraints,

$$H: = {1 \over 2}\left({{\gamma ^{ij}}R_{ij}^{(3)} + {K^2} - {K^{ij}}{K_{ij}}} \right) = 8\pi {G_N}\rho ,$$
(4.22)
$${M_i}: = {D^j}{K_{ij}} - {D_i}K = 8\pi {G_N}{j_i},$$
(4.23)

where ji := −nβT is the flux density.

4.2.1 Algebraic gauge conditions

One issue with the evolution equations (4.20, 4.21) is the principle part of the Ricci tensor belonging to the three-metric,

$$R_{ij}^{(3)} = {1 \over 2}{\gamma ^{kl}}\left({- {\partial _k}{\partial _l}{\gamma _{ij}} - {\partial _i}{\partial _j}{\gamma _{kl}} + {\partial _i}{\partial _k}{\gamma _{lj}} + {\partial _j}{\partial _k}{\gamma _{li}}} \right) + {\rm{l}}{\rm{.o}}{.},$$
(4.24)

which does not define a positive-definite operator. This is due to the fact that the linearized Ricci tensor is invariant with respect to infinitesimal coordinate transformations γijγij + 2(iξj) generated by a vector field ξ = ξii. This has the following implications for the evolution equations (4.20, 4.21), assuming for the moment that lapse and shift are fixed, a priori specified functions, in which case the system is equivalent to the second-order system \(\partial _0^2{\gamma _{ij}} = - 2R_{ij}^{(3)} + {\rm{l}}{\rm{.o}}.\) for the three metric. Linearizing and localizing as described in Section 3 one obtains a linear, constant coefficient problem of the form (3.56), which can be brought into first-order form via the reduction in Fourier space described in Section 3.1.5. The resulting first-order system has the form of Eq. (3.58) with the symbol

$$Q(ik) = i\vert k\vert \sum\limits_{j = 1}^n {\overset \circ \beta} {\,^j}{\hat k_j} + \vert k\vert \left({\begin{array}{*{20}c} 0 & I \\ {- \overset \circ \alpha {\,^2}R(\hat k)} & 0 \\ \end{array}} \right)\,,$$
(4.25)

where R(k) is, up to a factor 2, the principal symbol of the Ricci operator,

$$R(\hat k){\gamma _{ij}} = \overset \circ \gamma {\,^{lm}}\,\left({{{\hat k}_l}{{\hat k}_m}{\gamma _{ij}} + {{\hat k}_i}{{\hat k}_j}{\gamma _{lm}} - {{\hat k}_i}{{\hat k}_l}{\gamma _{mj}} - {{\hat k}_j}{{\hat k}_l}{\gamma _{mi}}} \right).$$
(4.26)

Here, \(\overset \circ \alpha, \overset \circ \beta {\,^i}\) and \({\overset \circ \gamma _{ij}}\) refer to the frozen lapse, shift and three-metric, respectively. According to Theorem 2, the problem is well posed if and only there is a uniformly positive and bounded symmetrizer \(h(\hat k)\) such that \(h(\hat k)R(\hat k)\) is symmetric and uniformly positive for \(\hat k \in {S^2}\). Although \(R(\hat k)\) is diagonalizable and its eigenvalues are not negative, some of them are zero since \(R(\hat k){\gamma _{ij}} = 0\) for γij of the form \({\gamma _{ij}} = 2{{\hat k}_{(i\xi j)}}\) with an arbitrary one-form ξj, so h(k)R(k) cannot be positive.

These arguments were used in [308] to show that the evolution system (4.20, 4.21) with fixed lapse and shift is weakly but not strongly hyperbolic. The results in [308] also analyze modifications of the equations for which the lapse is densitized and the Hamiltonian constraint is used to modify the trace of Eq. (4.21). The conclusion is that such changes cannot make the evolution equations (4.20, 4.21) strongly hyperbolic. Therefore, these equations, with given shift and densitized lapse, are not suited for numerical evolutions.Footnote 14

4.2.2 Dynamical gauge conditions leading to a well-posed formulation

The results obtained so far often lead to the popular statement “The ADM equations are not strongly hyperbolic.” However, consider the possibility of determining the lapse and shift through evolution equations. A natural choice, motivated by the discussion in Section 4.1, is to impose the harmonic gauge constraint (4.3). Assuming that the background metric \({\overset \circ g _{\alpha \beta}}\) is Minkowski in Cartesian coordinates for simplicity, this yields the following equations for the 3+1 decomposed variables,

$$({\partial _t} - {\beta ^j}{\partial _j})\alpha = - {\alpha ^2}fK + {\alpha ^3}{H^t},$$
(4.27)
$$({\partial _t} - {\beta ^j}{\partial _j}){\beta ^i} = - \alpha {\gamma ^{ij}}{\partial _j}\alpha + {\alpha ^2}{\gamma ^{ij}}{\gamma ^{kl}}\left({{\partial _k}{\gamma _{jl}} - {1 \over 2}{\partial _j}{\gamma _{kl}}} \right) + {\alpha ^2}({H^i} + {\beta ^i}{H^t}),$$
(4.28)

with ƒ a constant, which is equal to one for the harmonic time coordinate t. Let us analyze the hyperbolicity of the evolution system (4.27, 4.28, 4.20, 4.21) for the fields u = (α, βi, γij, Kij), where for generality and later use, we do not necessarily assume ƒ = 1 in Eq. (4.27). Since this is a mixed first/second-order system, we base our analysis on the first-order pseudodifferential reduction discussed in Section 3.1.5. After linearizing and localizing, we obtain the constant coefficient linear problem

$$({\partial _t} - \overset \circ \beta {\,^k}{\partial _k})\alpha = - \overset \circ \alpha {\,^2}fK,$$
(4.29)
$$({\partial _t} - \overset \circ \beta {\,^k}{\partial _k}){\beta ^i} = - \overset \circ \alpha \overset \circ \gamma {\,^{ij}}{\partial _j}\alpha + \overset \circ \alpha {\,^2}\overset \circ \gamma {\,^{ij}}\overset \circ \gamma {\,^{kl}}\left({{\partial _k}{\gamma _{jl}} - {1 \over 2}{\partial _j}{\gamma _{kl}}} \right),$$
(4.30)
$$({\partial _t} - \overset \circ \beta {\,^k}{\partial _k}){\gamma _{ij}} = 2{\overset \circ \gamma _{k(i}}{\partial _{j)}}{\beta ^k} - 2\overset \circ \alpha {K_{ij}},$$
(4.31)
$$({\partial _t} - \overset \circ \beta {\,^k}{\partial _k}){K_{ij}} = - {\partial _i}{\partial _j}\alpha + {{\overset \circ \alpha} \over 2}{\overset \circ \gamma ^{kl}}\left({- {\partial _k}{\partial _l}{\gamma _{ij}} - {\partial _i}{\partial _j}{\gamma _{kl}} + {\partial _i}{\partial _k}{\gamma _{lj}} + {\partial _j}{\partial _k}{\gamma _{li}}} \right),$$
(4.32)

where \(\overset \circ \alpha, \overset \circ \beta {\,^k}\) and \({\overset \circ \gamma _{ij}}\) refer to the quantities corresponding to α, βk, γij of the background metric when frozen at a given point. In order to rewrite this in first-order form, we perform a Fourier transformation in space and introduce the variables Û = (a, bi, lij, pij) with

$$a: = \vert k\vert \hat \alpha /\overset \circ \alpha ,\qquad {b_i}: = \vert k\vert {\overset \circ \gamma _{ij}}{\hat \beta ^j}/\overset \circ \alpha ,\qquad {l_{ij}}: = \vert k\vert {\hat \gamma _{ij}},\qquad {p_{ij}}: = 2i{\hat K_{ij}},$$
(4.33)

where \(\vert k\vert := \sqrt {\overset \circ \gamma {\,^{ij}}{k_i}{k_j}}\) and the hatted quantities refer to their Fourier transform. With this, we obtain the first-order system Ût = P(ik)Û where the symbol has the form \(P(ik) = i\overset \circ \beta {\,^s}{k_s}I + \overset \circ \alpha Q(ik)\) with

$$Q(ik)\left({\begin{array}{*{20}c} a \\ {{b_i}} \\ {{l_{ij}}} \\ {{p_{ij}}} \\ \end{array}} \right) = i\vert k\vert \left({\begin{array}{*{20}c} {{f \over 2}p} \\ {- {{\hat k}_i}a + {{\hat k}^j}{l_{ij}} - {1 \over 2}{{\hat k}_i}l} \\ {2{{\hat k}_{(i}}{b_{j)}} + {p_{ij}}} \\ {2{{\hat k}_i}{{\hat k}_j}a + {l_{ij}} + {{\hat k}_i}{{\hat k}_j}l - 2{{\hat k}^s}{{\hat k}_{(i}}{l_{j)s}}} \\ \end{array}} \right),$$
(4.34)

where \({{\hat k}_i}: = {k_i}/\vert k\vert, {{\hat k}^i}: = \overset \circ \gamma {\,^{ij}}{{\hat k}_j},l: + \overset \circ \gamma {\,^{ij}}{l_{ij}}\), and \(p: = \overset \circ \gamma {\,^{ij}}{p_{ij}}\). In order to determine the eigenfields S(k)−1Û such that S(k)−1P(ik)S(k) is diagonal, we decompose

$$\begin{array}{*{20}c} {{b_i} = \bar b{{\hat k}_i} + {{\bar b}_i},\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \,\,} \\ {{l_{ij}} = \bar l{{\hat k}_i}{{\hat k}_j} + 2{{\hat k}_{(i}}{{\bar l}_{j)}} + {{\hat l}_{ij}} + {1 \over 2}({{\overset \circ \gamma}_{ij}} - {{\hat k}_i}{{\hat k}_j})\bar l\prime ,\qquad {p_{ij}} = \bar p{{\hat k}_i}{{\hat k}_j} + 2{{\hat k}_{(i}}{{\bar p}_{j)}} + {{\hat p}_{ij}} + {1 \over 2}({{\overset \circ \gamma}_{ij}} - {{\hat k}_i}{{\hat k}_j})\bar p\prime} \\ \end{array}$$

into pieces parallel and orthogonal to \({{\hat k}_i}\), similar to Example 15. Then, the problem decouples into a tensor sector, involving \(({{\hat l}_{ij}},{{\hat p}_{ij}})\), into a vector sector, involving \(({{\bar b}_i},{{\bar l}_i},{{\bar p}_i})\) and a scalar sector involving \((a,\bar b,\bar l,\bar p,{{\bar l}\prime},{{\bar p}\prime})\). In the tensor sector, we have

$${Q^{({\rm{tensor}})}}(ik)\left({\begin{array}{*{20}c} {{{\hat l}_{ij}}} \\ {{{\hat p}_{ij}}} \\ \end{array}} \right) = i\vert k\vert \left({\begin{array}{*{20}c} {{{\hat p}_{ij}}} \\ {{{\hat l}_{ij}}} \\ \end{array}} \right),$$
(4.35)

which has the eigenvalues ±i|k| with corresponding eigenfields \({{\hat l}_{ij}} \pm {{\hat p}_{ij}}\). In the vector sector, we have

$${Q^{({\rm{vector}})}}(ik)\left({\begin{array}{*{20}c} {{{\bar b}_j}} \\ {{{\bar l}_j}} \\ {{{\bar p}_j}} \\ \end{array}} \right) = i\vert k\vert \left({\begin{array}{*{20}c} {{{\bar l}_j}} \\ {{{\bar b}_j} + {{\bar p}_j}} \\ 0 \\ \end{array}} \right),$$
(4.36)

which is also diagonalizable with eigenvalues 0, ±i|k| and corresponding eigenfields \({{\bar p}_j}\) and \({{\bar l}_j} \pm ({{\bar b}_j} + {{\bar p}_j})\). Finally, in the scalar sector we have

$${Q^{({\rm{scalar}})}}(ik)\left({\begin{array}{*{20}c} a \\ {\bar b} \\ {\bar l} \\ {\bar p} \\ {\bar l\prime} \\ {\bar p\prime} \\ \end{array}} \right) = i\vert k\vert \left({\begin{array}{*{20}c} {{f \over 2}(\bar p + \bar p\prime)} \\ {- a + {1 \over 2}(\bar l - \bar l\prime)} \\ {2\bar b + \bar p} \\ {2a + \bar l\prime} \\ {\bar p\prime} \\ {\bar l\prime} \\ \end{array}} \right).$$
(4.37)

It turns out Q(scaial)(ik) is diagonalizable with purely imaginary values if and only if ƒ > 0 and ƒ ≠ 1. In this case, the eigenvalues and corresponding eigenfields are \(\pm i\vert k\vert, \, \pm i\vert k\vert, \, \pm i\sqrt f \vert k\vert\) and \({{\bar l}{\prime}} \pm {{\bar p}{\prime}},\bar l \pm (2\bar b + \bar p),\,a + f{{\bar l}{\prime}}/(f - 1) \pm \sqrt f [\bar p + (f + 1)/(f - 1)/{{\bar p}{\prime}}]/2\), respectively. A symmetrizer for P(ik), which is smooth in \(k \in {S^2},\overset \circ \alpha, \overset \circ \beta {\,^k}\) and \({\overset \circ \gamma _{ij}}\), can be constructed from the eigenfields as in Example 15.

Remarks:

  • If instead of imposing the dynamical shift condition (4.28), β is a priori specified, then the resulting evolution system, consisting of Eqs. (4.27, 4.20, 4.21), is weakly hyperbolic for any choice of ƒ. Indeed, in that case the symbol (4.36) in the vector sector reduces to the Jordan block

    $${Q^{(vector)}}(ik)\left({\begin{array}{*{20}c} {{{\bar l}_j}} \\ {{{\bar p}_j}} \\ \end{array}} \right) = i\vert k\vert \left({\begin{array}{*{20}c} 0 & 1 \\ 0 & 0 \\ \end{array}} \right)\left({\begin{array}{*{20}c} {{{\bar l}_j}} \\ {{{\bar p}_j}} \\ \end{array}} \right),$$
    (4.38)

    which cannot be diagonalized.

  • When linearized about Minkowski spacetime, it is possible to classify the characteristic fields into physical, constraint-violating and gauge fields; see [106]. For the system (4.294.32) the physical fields are the ones in the tensor sector, \({{\hat l}_{ij}} \pm {{\hat p}_{ij}}\), the constraint-violating ones are \({{\bar p}_j}\) and \({{\bar l}{\prime}} \pm {{\bar p}{\prime}}\), and the gauge fields are the remaining characteristic variables. Observe that the constraint-violating fields are governed by a strongly-hyperbolic system (see also Section 4.2.4 below), and that in this particular formulation of the ADM equations the gauge fields are coupled to the constraint-violating ones. This coupling is one of the properties that make it possible to cast the system as a strongly hyperbolic one.

We conclude that the evolution system (4.27, 4.28, 4.20, 4.21) is strongly hyperbolic if and only if ƒ > 0 and ƒ ≠ 1. Although the full harmonic gauge condition (4.3) is excluded from these restrictions,Footnote 15 there is still a large family of evolution equations for the lapse and shift that give rise to a strongly hyperbolic problem together with the standard evolution equations (4.20, 4.21) from the 3+1 decomposition.

4.2.3 Elliptic gauge conditions leading to a well-posed formulation

Rather than fixing the lapse and shift algebraically or dynamically, an alternative, which has been considered in the literature, is to fix them according to elliptic equations. A natural restriction on the extrinsic geometry of the time slices Σt is to require that their mean curvature, c = −K/3, vanishes or is constant [391]. Taking the trace of Eq. (4.21) and using the Hamiltonian constraint to eliminate the trace of \(R_{ij}^{(3)}\) yields the following equation for the lapse,

$$\left[ {- {D^j}{D_j} + {K^{ij}}{K_{ij}} + 4\pi {G_N}(\rho + \sigma)} \right]\alpha = {\partial _t}K,$$
(4.39)

which is a second-order linear elliptic equation. The operator inside the square parenthesis is formally positive if the strong energy condition, ρ + σ ≥ 0, holds, and so it is invertible when defined on appropriate function spaces. See also [203] for generalizations of this condition. Concerning the shift, one choice, which is motivated by eliminating the “bad” terms in the expression for the Ricci tensor, Eq. (4.24), is the spatial harmonic gauge [25]. In terms of a fixed (possibly time-dependent) background metric \({\overset \circ \gamma _{ij}}\) on Σt, this gauge is defined as (cf. Eq. (4.3))

$$0 = {V^k}: = {\gamma ^{ij}}\left({{\Gamma ^k}_{ij} - {{\overset \circ \Gamma}\,^k}_{ij}} \right) = {\gamma ^{ij}}{\gamma ^{kl}}\left({{{\overset \circ D}_k}{\gamma _{lj}} - {1 \over 2}{{\overset \circ D}_j}{\gamma _{kl}}} \right),$$
(4.40)

where \(\overset \circ D\) is the Levi-Civita connection with respect to \(\overset \circ \gamma\) and \(\overset \circ \Gamma {\,^k}_{ij}\) denote the corresponding Christoffel symbols. The main importance of this gauge is that it permits one to rewrite the Ricci tensor belonging to the three metric in the form

$$R_{ij}^{(3)} = - {1 \over 2}{\gamma ^{kl}}{\overset \circ D _k}{\overset \circ D _l}{\gamma _{ij}} + {D_{(i}}{V_{j)}} + {\rm{l}}{.}{\rm{o}}{.},$$
(4.41)

where \({\overset \circ D _k}\) denotes the covariant derivative with respect to the background metric \(\overset \circ \gamma\) and where the lower-order terms “l.o.” depend only on γij and its first derivatives \({\overset \circ D _k}{\gamma _{ij}}\). When Vk = 0 the operator on the right-hand side is second-order quasilinear elliptic, and with this, the evolution system (4.20, 4.21) has the form of a nonlinear wave equation for the three-metric γij. However, the coefficients and source terms in this equation still depend on the lapse and shift. For constant mean curvature slices the lapse satisfies the elliptic scalar equation (4.39), and with the spatial harmonic gauge the shift is determined by the requirement that Eq. (4.40) is preserved throughout evolution, which yields an elliptic vector equation for it. In [25] it was shown that the coupled hyperbolic-elliptic system consisting of the evolution equations (4.20, 4.21) with the Ricci tensor rewritten in elliptic form using the condition Vk = 0, the constant mean curvature condition (4.39), and this elliptic equation for βi, gives rise to a well-posed Cauchy problem in vacuum. Besides eliminating the “bad” terms in the Ricci tensor, the spatial harmonic gauge also has other nice properties, which were exploited in the well-posed formulation of [25]. For example, the covariant Laplacian of a function ƒ is

$${D^k}{D_k}f = {\gamma ^{ij}}{\overset \circ D _i}{\overset \circ D _j}f - {V^k}{\overset \circ D _k}f,$$
(4.42)

which does not contain any derivatives of the three metric γij if Vk = 0. For applications of the hyperbolic-elliptic formulation in [25] to the global existence of expanding vacuum cosmologies; see [26, 27].

Other methods for specifying the shift have been proposed in [391], with the idea of minimizing a functional of the type

$$I[\beta ] = \int\limits_{{\Sigma _t}} {{\Theta ^{ij}}} {\Theta _{ij}}\sqrt \gamma {d^3}x,$$
(4.43)

where Θij := tγij/2 = −αKij + D(iβj) is the strain tensor. Therefore, the functional I[β] minimizes time changes in the three metric in an averaged sense. In particular, I[β] attains its absolute minimum (zero) if t is a Killing vector field. Therefore, one expects the resulting gauge condition to minimize the time dependence of the coordinate components of the three metric. An alternative is to replace the strain by its trace-free part on the right-hand side of Eq. (4.43), giving rise to the minimal distortion gauge. Both conditions yield a second-order elliptic equation for the shift vector, which has unique solutions provided suitable boundary conditions are specified. For generalizations and further results on these type of gauge conditions; see [73, 203, 204]. However, it seems to be currently unknown whether or not these elliptic shift conditions, together with the evolution system (4.20, 4.21) and an appropriate condition on the lapse, lead to a well-posed Cauchy problem.

4.2.4 Constraint propagation

The evolution equations (4.20, 4.21) are equivalent to the components of the Einstein equations corresponding to the spatial part of the Ricci tensor,

$${R_{ij}} = 8\pi {G_N}\left({{T_{ij}} - {1 \over 2}{\gamma _{ij}}{g^{\mu \nu}}{T_{\mu \nu}}} \right),$$
(4.44)

and in order to obtain a solution of the full Einstein equations one also needs to solve the constraints H = 8πG and Mi = 8πGNji. As in Section 4.2.3, the constraint propagation system can be obtained from the twice contracted Bianchi identities, which, in the 3+1 decomposition, read

$${\partial _0}H + {1 \over {{\alpha ^2}}}{D^j}\left({{\alpha ^2}{M_j}} \right) - 2KH - \left({{K^{ij}} - K{\gamma ^{ij}}} \right){R_{ij}} = 0,$$
(4.45)
$${\partial _0}{M_i} + {1 \over {{\alpha ^2}}}{D_i}\left({{\alpha ^2}H} \right) - K{M_i} + {1 \over \alpha}{D^j}\left({\alpha {R_{ij}} - \alpha {\gamma _{ij}}{\gamma ^{kl}}{R_{kl}}} \right) = 0.$$
(4.46)

The condition of the stress-energy tensor being divergence-free leads to similar evolution equations for ρ and ji. Therefore, the equations (4.44) lead to the following symmetric hyperbolic system [190, 445] for the constraint variables \({\mathcal H}: = H - 8\pi {G_N}\rho\) and \({{\mathcal M}_i}: = {M_i} - 8\pi {G_N}{j_i}\),

$${\partial _0}{\mathcal H} = - {1 \over {{\alpha ^2}}}{D^j}\left({{\alpha ^2}{{\mathcal M}_j}} \right) + 2K{\mathcal H},$$
(4.47)
$${\partial _0}{{\mathcal M}_i} = - {1 \over {{\alpha ^2}}}{D_i}\left({{\alpha ^2}{\mathcal H}} \right) + K{{\mathcal M}_i}.$$
(4.48)

As has also been observed in [190], the constraint propagation system associated with the standard ADM equations, where Eq. (4.44) is replaced by its trace-reversed version RijγijgµvRµv/2 = 8πGNTij is

$$\begin{array}{*{20}c} {{\partial _0}{\mathcal H} = - {1 \over {{\alpha ^2}}}{D^j}\left({{\alpha ^2}{{\mathcal M}_j}} \right) + K{\mathcal H},} \\ {{\partial _0}{{\mathcal M}_i} = - {{{D_i}\alpha} \over \alpha}{\mathcal H} + K{{\mathcal M}_i},\quad \quad \quad \quad} \\ \end{array}$$

which is only weakly hyperbolic. Therefore, it is much more difficult to control the constraint fields in the standard ADM case than in York’s formulation of the 3+1 equations.

4.3 The BSSN formulation

The BSSN formulation is based on the 3+1 decomposition of Einstein’s field equations. Unlike the harmonic formulation, which has been motivated by the mathematical structure of the equations and the understanding of the Cauchy formulation in general relativity, this system has been mainly developed and improved based on its capability of numerically evolving spacetimes containing compact objects in a stable way. Interestingly, it turns out that in spite of the fact that the BSSN formulation is based on an entirely different motivation, mathematical questions like the well-posedness of its Cauchy problem can be answered, at least for most gauge conditions.

In the BSSN formulation, the three metric γij and the extrinsic curvature Kij are decomposed according to

$${\gamma _{ij}} = {e^{4\phi}}{\tilde \gamma _{ij}}\,,$$
(4.49)
$${K_{ij}} = {e^{4\phi}}\left({{{\tilde A}_{ij}} + {1 \over 3}{{\tilde \gamma}_{ij}}K} \right).$$
(4.50)

Here, K = γij Kij and Ãij are the trace and the trace-less part, respectively, of the conformally-rescaled extrinsic curvature. The conformal factor e2ϕ is determined by the requirement for the conformal metric to have unit determinant. Aside from these variables one also evolves the lapse (α), the shift (βi) and its time derivative (Bi), and the variable

$${\tilde \Gamma ^i}: = - {\partial _j}{\tilde \gamma ^{ij}}.$$
(4.51)

In terms of the operator \({\partial _0} = {\partial _t} - {\beta ^j}{\partial _j}\) the BSSN evolution equations are

$${\hat \partial _0}\alpha = - {\alpha ^2}f(\alpha ,\phi ,{x^\mu})(K - {K_0}({x^\mu})),$$
(4.52)
$${\hat \partial _0}K = - {e^{- 4\phi}}\left[ {{{\tilde D}^i}{{\tilde D}_i}\alpha + 2{\partial _i}\phi \cdot{{\tilde D}^i}\alpha} \right] + \alpha \left({{{\tilde A}^{ij}}{{\tilde A}_{ij}} + {1 \over 3}{K^2}} \right) - \alpha S,$$
(4.53)
$${\hat \partial _0}{\beta ^i} = {\alpha ^2}G(\alpha ,\phi ,{x^\mu}){B^i},$$
(4.54)
$${\hat \partial _0}{B^i} = {e^{- 4\phi}}H(\alpha ,\phi ,{x^\mu}){\hat \partial _0}{\tilde \Gamma ^i} - {\eta ^i}({B^i},\alpha ,{x^\mu}),$$
(4.55)
$${\hat \partial _0}\phi = - {\alpha \over 6}\,K + {1 \over 6}{\partial _k}{\beta ^k},$$
(4.56)
$${\hat \partial _0}{\tilde \gamma _{ij}} = - 2\alpha {\tilde A_{ij}} + 2{\tilde \gamma _{k(i}}{\partial _{j)}}{\beta ^k} - {2 \over 3}{\tilde \gamma _{ij}}{\partial _k}{\beta ^k},$$
(4.57)
$$\begin{array}{*{20}c} {{{\hat \partial}_0}{{\tilde A}_{ij}} = {e^{- 4\phi}}{{\left[ {\alpha {{\tilde R}_{ij}} + \alpha R_{ij}^\phi - {{\tilde D}_i}{{\tilde D}_j}\alpha + 4{\partial _{(i}}\phi \cdot{{\tilde D}_{j)}}\alpha} \right]}^{TF}}\quad \quad \quad \quad \quad \quad \quad \quad} \\ {+ \alpha K{{\tilde A}_{ij}} - 2\alpha {{\tilde A}_{ik}}\tilde A_{\,j}^k + 2{{\tilde A}_{k(i}}{\partial _{j)}}{\beta ^k} - {2 \over 3}{{\tilde A}_{ij}}{\partial _k}{\beta ^k} - \alpha {e^{- 4\phi}}{{\hat S}_{ij}},} \\ \end{array}$$
(4.58)
$$\begin{array}{*{20}c} {{{\hat \partial}_0}{{\tilde \Gamma}^i} = {{\tilde \gamma}^{kl}}{\partial _k}{\partial _l}{\beta ^i} + {1 \over 3}{{\tilde \gamma}^{ij}}{\partial _j}{\partial _k}{\beta ^k} + {\partial _k}{{\tilde \gamma}^{kj}}\cdot{\partial _j}{\beta ^i} - {2 \over 3}{\partial _k}{{\tilde \gamma}^{ki}}\cdot{\partial _j}{\beta ^j}\quad \quad \quad \quad \quad \quad \quad \,} \\ {- 2{{\tilde A}^{ij}}{\partial _j}\alpha + 2\alpha \left[ {(m - 1){\partial _k}{{\tilde A}^{ki}} - {{2m} \over 3}{{\tilde D}^i}K + m(\tilde \Gamma _{\,kl}^i{{\tilde A}^{kl}} + 6{{\tilde A}^{ij}}{\partial _j}\phi)} \right] - {S^i}.} \\ \end{array}$$
(4.59)

Here, quantities with a tilde refer to the conformal three metric \({{\tilde \gamma}_{ij}}\), which is also used in order to raise and lower indices. In particular, \({{\tilde D}_i}\) and \({{\tilde \Gamma}^k}_{ij}\) denote the covariant derivative and the Christoffel symbols, respectively, with respect to \({{\tilde \gamma}_{ij}}\). Expressions with a superscript TF refer to their trace-less part with respect to the conformal metric. Next, the sum \({{\tilde R}_{ij}} + R_{ij}^\phi\) represents the Ricci tensor associated with the physical three metric γij, where

$${\tilde R_{ij}} = - {1 \over 2}{\tilde \gamma ^{kl}}{\partial _k}{\partial _l}{\tilde \gamma _{ij}} + {\tilde \gamma _{k(i}}{\partial _{j)}}{\tilde \Gamma ^k} - {\tilde \Gamma _{(ij)k}}{\partial _l}{\tilde \gamma ^{lk}} + {\tilde \gamma ^{ls}}\left({2{{\tilde \Gamma}^k}_{l(i}{{\tilde \Gamma}_{j)ks}} + {{\tilde \Gamma}^k}_{is}{{\tilde \Gamma}_{klj}}} \right),$$
(4.60)
$$R_{ij}^\phi = - 2{\tilde D_i}{\tilde D_j}\phi - 2{\tilde \gamma _{ij}}{\tilde D^k}{\tilde D_k}\phi + 4{\tilde D_i}\phi \,{\tilde D_j}\phi - 4{\tilde \gamma _{ij}}{\tilde D^k}\phi \,{\tilde D_k}\phi .$$
(4.61)

The term \({{\hat \partial}_0}{{\tilde \Gamma}^i}\) in Eq. (4.55) is set equal to the right-hand side of Eq. (4.59). The parameter m in the latter equation modifies the evolution flow off the constraint surface by adding the momentum constraint to the evolution equation for the variable \({{\tilde \Gamma}^i}\). This parameter was first introduced in [10] in order to compare the stability properties of the BSSN evolution equations with those of the ADM formulation.

The gauge conditions, which are imposed on the lapse and shift in Eqs. (4.52, 4.54, 4.55), were introduced in [52] and generalize the Bona-Massó condition [62] and the hyperbolic Gamma driver condition [11]. It is assumed that the functions ƒ (α, ϕ, xµ), G(α, ϕ, xµ) and H(α, ϕ, xµ) are strictly positive and smooth in their arguments, and that K0(xµ) and ηi(Bj, α, xµ) are smooth functions of their arguments. The choice

$$m = 1,\qquad f(\alpha ,\phi ,{x^\mu}) = {2 \over \alpha}\,,\qquad {K_0}({x^\mu}) = 0,$$
(4.62)
$$G(\alpha ,\phi ,{x^\mu}) = {3 \over {4{\alpha ^2}}}\,,\qquad H(\alpha ,\phi ,{x^\mu}) = {e^{4\phi}}\,,\qquad {\eta ^i}({B^j},\alpha ,{x^\mu}) = \eta {B^i},$$
(4.63)

with η a positive constant, corresponds to the evolution system used in many black-hole simulations based on 1 + log slicing and the moving puncture technique (see, for instance, [423] and references therein). Finally, the source terms S, Ŝij and Si are defined in the following way: denoting by \(R_{ij}^{(3)}\) and \(R_{ij}^{(4)}\) the Ricci tensors belonging to the three-metric γij and the spacetime metric, respectively, and introducing the constraint variables

$$H: = {1 \over 2}\left({{\gamma ^{ij}}\;R_{ij}^{(3)} + {2 \over 3}{K^2} - {{\tilde A}^{ij}}{{\tilde A}_{ij}}} \right),$$
(4.64)
$${M_i}: = {\tilde D^j}{\tilde A_{ij}} - {2 \over 3}{\tilde D_i}K + 6{\tilde A_{ij}}{\tilde D^j}\phi ,$$
(4.65)
$${C^i}: = {\tilde \Gamma ^i} + {\partial _j}{\tilde \gamma ^{ij}},$$
(4.66)

the source terms are defined as

$$S: = {\gamma ^{ij}}R_{ij}^{(4)} - 2H,\qquad {\hat S_{ij}}: = {\left[ {R_{ij}^{(4)} + {{\tilde \gamma}_{k(i}}{\partial _{j)}}{C^k}} \right]^{TF}},\quad {S^i}: = 2\alpha \,m\,{\tilde \gamma ^{ij}}{M_j} - {\hat \partial _0}{C^i}.$$
(4.67)

for vacuum evolutions one sets S = 0, Ŝij = 0 and Si =0. When matter fields are present, the Einstein field equations are equivalent to the evolution equations (4.524.59) setting \(S = - 4\pi {G_N}(\rho + \sigma),{{\hat S}_{ij}} = 8\pi {G_N}\sigma _{ij}^{TF},{S^i} = 16\pi {G_N}m\alpha {{\tilde \gamma}^{ik}}{j_k}\) and the constraints H = 8πGNρ, Mi = 8πGNji and Ci = 0.

When comparing Cauchy evolutions in different spatial coordinates, it is very convenient to reformulate the BSSN system such that it is covariant with respect to spatial coordinate transformations. This is indeed possible; see [77, 82]. One way of achieving this is to fix a smooth background three-metric \({\overset \circ \gamma _{ij}}\), similarly as in Section 4.1, and to replace the fields ϕ and \({{\tilde \Gamma}^i}\) by the scalar and vector fields

$$\phi : = {1 \over {12}}\log \left({{\gamma \over {\overset \circ \gamma}}} \right),\qquad {\tilde \Gamma ^i}: = - {\overset \circ D _j}{\tilde \gamma ^{ij}},$$
(4.68)

where γ and \(\overset \circ \gamma\) denote the determinants of γij and \({\overset \circ \gamma_{ij}}\), and \({\overset \circ D_j}\) is the covariant derivative associated to the latter. If \({\overset \circ \gamma _{ij}}\) is flatFootnote 16 and time-independent, the corresponding BSSN equations are obtained by replacing \({\partial _k} \mapsto {\overset \circ D _k}\) and \({{\tilde \Gamma}^k}_{ij} \mapsto {{\tilde \Gamma}^k}_{ij} - \overset \circ \Gamma {\,^k}_{ij}\) in Eqs. (4.524.59, 4.60, 4.61, 4.644.66).

4.3.1 The hyperbolicity of the BSSN evolution equations

In fact, the ADM formulation in the spatial harmonic gauge described in Section 4.2.3 and the BSSN formulation are based on some common ideas. In the covariant reformulation of BSSN just mentioned, the variable \({{\tilde \Gamma}^i}\) is just the quantity Vi defined in Eq. (4.40), where γij is replaced by the conformal metric \({\overset \circ \gamma _{ij}}\). Instead of requiring \({{\tilde \Gamma}^i}\) to vanish, which would convert the operator on the right-hand side of Eq. (4.60) into a quasilinear elliptic operator, one promotes this quantity to an independent field satisfying the evolution equation (4.59) (see also the discussion below Equation (2.18) in [390]). In this way, the \({{\tilde \gamma}_{ij}} - {{\tilde A}_{ij}}\)-block of the evolution equations forms a wave system. However, this system is coupled through its principal terms to the evolution equations of the remaining variables, and so one needs to analyze the complete system. As follows from the discussion below, it is crucial to add the momentum constraint to Eq. (4.59) with an appropriate factor m in order to obtain a hyperbolic system.

The hyperbolicity of the BSSN evolution equations was first analyzed in a systematic way in [373], where it was established that for fixed shift and densitized lapse,

$$\alpha = {e^{12\sigma \phi}}$$
(4.69)

the evolution system (4.53, 4.564.59) is strongly hyperbolic for σ > 0 and m > 1/4 and symmetric hyperbolic for m > 1 and 6σ = 4m − 1. This was shown by introducing new variables and enlarging the system to a strongly or symmetric hyperbolic first-order one. In fact, similar first-order reductions were already obtained in [196, 188]. However, in [373] it was shown that the first-order enlargements are equivalent to the original system if the extra constraints associated to the definition of the new variables are satisfied, and that these extra constraints propagate independently of the BSSN constraints H = 0, Mi = 0 and Ci = 0. This establishes the well-posedness of the Cauchy problem for the system (4.69, 4.53, 4.564.59) under the aforementioned conditions on σ and m. Based on the same method, a symmetric hyperbolic first-order enlargement of the evolution equations (4.52, 4.53, 4.564.59) and fixed shift was obtained in [52] under the conditions ƒ > 0 and 4m = 3ƒ + 1 and used to construct boundary conditions for BSSN. First-order strongly-hyperbolic reductions for the full system (4.524.59) have also been recently analyzed in [82].

An alternative and efficient method for analyzing the system consists in reducing it to a first-order pseudodifferential system, as described in Section 3.1.5. This method has been applied in [308] to derive a strongly hyperbolic system very similar to BSSN with fixed, densitized lapse and fixed shift. This system is then shown to yield a well-posed Cauchy problem. In [52] the same method was applied to the evolution system (4.524.59). Linearizing and localizing, one obtains a first-order system of the form \({{\hat U}_t} = P(ik)\hat U = i\overset \circ \beta {\,^s}{k_s}\hat U + \overset \circ \alpha Q(ik)\hat U\). The eigenvalues of Q(ik) are 0, \(\pm i, \pm i\sqrt m, \pm i\sqrt \mu, \pm \sqrt f, \pm \sqrt {GH}, \pm \sqrt \kappa\), where we have defined µ := (4m − 1)/3 and κ := 4GH/3. The system is weakly hyperbolic provided that

$$f > 0,\qquad \mu > 0,\qquad \kappa > 0,$$
(4.70)

and it is strongly hyperbolic if, in addition, the parameter and the functions ƒ, G, and H can be chosen such that the functions

$${\kappa \over {f - \kappa}}\,,\qquad {{m - 1} \over {\mu - \kappa}}\,,\qquad {{6(m - 1)\kappa} \over {4m - 3\kappa}}$$
(4.71)

are bounded and smooth. In particular, this requires that the nominators converge to zero at least as fast as the denominators when ƒκ, μκ or 3κ → 4m, respectively. Since κ > 0, the boundedness of κ/(ƒκ) requires that ƒκ. For the standard choice m = 1, the conditions on the gauge parameters leading to strong hyperbolicity are, therefore, ƒ > 0, κ > 0 and ƒκ. Unfortunately, for the choice (4.62, 4.63) used in binary black-hole simulations these conditions reduce to

$${e^{4\phi}} \neq 2\alpha ,$$
(4.72)

which is typically violated at some two-surface, since asymptotically, α → 1 and ϕ → 0 while near black holes is small and positive. It is currently not known whether or not the Cauchy problem is well posed if the system is strongly hyperbolic everywhere except at points belonging to a set of zero measure, such as a two-surface. Although numerical simulations based on finite-difference discretizations with the standard choice (4.62, 4.63) show no apparent sign of instabilities near such surfaces, the well-posedness for the Cauchy problem for the BSSN system (4.524.59) with the choice (4.62, 4.63) for the gauge source functions remains an open problem when the condition (4.72) is violated. However, a well-posed problem could be formulated by modifying the choice for the functions G and H such that ƒκ and ƒ, κ > 0 are guaranteed to hold everywhere.

Yet a different approach to analyzing the hyperbolicity of BSSN has been given in [219, 220] based on a new definition of strongly and symmetric hyperbolicity for evolution systems, which are first order in time and second order in space. Based on this definition, it has been verified that the BSSN system (4.69, 4.53, 4.564.59) is strongly hyperbolic for σ > 0 and m > 1/4 and symmetric hyperbolic for 6σ = 4m − 1 > 0. (Note that this generalizes the original result in [373] where, in addition, m > 1 was required.) The results in [220] also discuss more general 3+1 formulations, including the one in [308] and construct constraint-preserving boundary conditions. The relation between the different approaches to analyzing hyperbolicity of evolution systems, which are first order in time and second order in space, has been analyzed in [221].

Strong hyperbolicity for different versions of the gauge evolution equations (4.52, 4.54, 4.55), where the normal operator \({{\hat \partial}_0}\) is sometimes replaced by t, has been analyzed in [222]. See Table I in that reference for a comparison between the different versions and the conditions they are subject to in order to satisfy strong hyperbolicity. It should be noted that when m = 1 and \({{\hat \partial}_0}\) is replaced by t, additional conditions restricting the magnitude of the shift appear in addition to ƒ > 0 and ƒκ.

Table 1 The structure of a Butcher table.

4.3.2 Constraint propagation

As mentioned above, the BSSN evolution equations (4.524.59) are only equivalent to Einstein’s field equation if the constraints

$${\mathcal H}: = H - 8\pi {G_N}\rho = 0,\qquad {{\mathcal M}_i}: = {M_i} - 8\pi {G_N}{j_i} = 0,\qquad {C^i} = 0$$
(4.73)

are satisfied. Using the twice contracted Bianchi identities in their 3+1 decomposed form, Eqs. (4.45, 4.46), and assuming that the stress-energy tensor is divergence free, it is not difficult to show that the equations (4.524.59) imply the following evolution system for the constraint fields [52, 220]:

$${\hat \partial _0}{\mathcal H} = - {1 \over \alpha}\,{D^j}({\alpha ^2}{{\mathcal M}_j}) - \alpha {e^{- 4\phi}}{\tilde A^{ij}}{\tilde \gamma _{ki}}{\partial _j}{C^k} + {{2\alpha} \over 3}\,K{\mathcal H},$$
(4.74)
$${\hat \partial _0}{{\mathcal M}_j} = {{{\alpha ^3}} \over 3}{D_j}({\alpha ^{- 2}}{\mathcal H}) + \alpha K{{\mathcal M}_j} + {{\mathcal M}_i}{\partial _j}{\beta ^i} + {D^i}\left({\alpha {{\left[ {{{\tilde \gamma}_{k(i}}{\partial _{j)}}{C^k}} \right]}^{TF}}} \right),$$
(4.75)
$${\hat \partial _0}{C^i} = 2\alpha \,m\,{\tilde \gamma ^{ij}}{{\mathcal M}_j}.$$
(4.76)

This is the constraint propagation system for BSSN, which describes the propagation of constraint violations, which are usually present in numerical simulations due to truncation and roundoff errors. There are at least three reasons for establishing the well-posedness of its Cauchy problem. The first reason is to show that the unique solution of the system (4.744.76) with zero initial data is the trivial solution. This implies that it is sufficient to solve the constraints at the initial time t = 0. Then, any smooth enough solution of the BSSN evolution equations with such data satisfies the constraint propagation system with \({\mathcal H} = 0,{{\mathcal M}_j} = 0\) and Ci =0, and it follows from the uniqueness property of this system that the constraints must hold everywhere and at each time. In this way, one obtains a solution to Einstein’s field equations. However, in numerical calculations, the initial constraints are not exactly satisfied due to numerical errors. This brings us to the second reason for having a well-posed problem at the level of the constraint propagation system; namely, the continuous dependence on the initial data. Indeed, the initial constraint violations give rise to constraint violating solutions; but, if these violations are governed by a well-posed evolution system, the norm of the constraint violations is controlled by those of the initial violations for each fixed time t > 0. In particular, the constraint violations must converge to zero if the initial constraint violations do. Since the initial constraint errors go to zero when resolution is increased (provided a stable numerical scheme is used to solve the constraints), this guarantees convergence to a constraint-satisfying solution.Footnote 17 Finally, the third reason for establishing well-posedness for the constraint propagation system is the construction of constraint-preserving boundary conditions, which will be explained in detail in Section 6.

The hyperbolicity of the constraint propagation system (4.744.76) has been analyzed in [220, 52, 81, 80], and [315] and shown to be reducible to a symmetric hyperbolic first-order system for m > 1/4. Furthermore, there are no superluminal characteristic fields if 1/4 < m ≤ 1. Because of finite speed of propagation, this means that BSSN with 1/4 < m ≤ 1 (which includes the standard choice m = 1) does not possess superluminal constraint-violating modes. This is an important property, for it shows that constraint violations that originate inside black hole regions (which usually dominate the constraint errors due to high gradients at the punctures or stuffing of the black-hole singularities in the turducken approach [156, 81, 80]) cannot propagate to the exterior region.

In [353] a general result is derived, showing that under a mild assumption on the form of the constraints, strong hyperbolicity of the main evolution system implies strong hyperbolicity of the constraint propagation system, with the characteristic speeds of the latter being a subset of those of the former. The result does not hold in general if “strong” is replaced by “symmetric”, since there are known examples for which the main evolution system is symmetric hyperbolic, while the constraint propagation system is only strongly hyperbolic [108].

4.4 Other hyperbolic formulations

There exist many other hyperbolic reductions of Einstein’s field equations. In particular, there has been a large amount of work on casting the evolution equations into first-order symmetric [2, 182, 195, 3, 21, 155, 248, 443, 22, 74, 234, 254, 383, 377, 18, 285, 285, 86] and strongly hyperbolic [62, 63, 12, 59, 60, 13, 64, 367, 222, 78, 58, 82] form; see [182, 352, 188, 353] for reviews. For systems involving wave equations for the extrinsic curvature; see [128, 2]; see also [424] and [20, 75, 374, 379, 436] for applications to perturbation theory and the linear stability of solitons and hairy black holes.

Recently, there has also been work deriving strongly or symmetric hyperbolic formulations from an action principle [79, 58, 243].

5 Boundary Conditions: The Initial-Boundary Value Problem

In Section 3 we discussed the general Cauchy problem for quasilinear hyperbolic evolution equations on the unbounded domain ℝn. However, in the numerical modeling of such problems one is faced with the finiteness of computer resources. A common approach for dealing with this problem is to truncate the domain via an artificial boundary, thus forming a finite computational domain with outer boundary. Absorbing boundary conditions must then be specified at the boundary such that the resulting IBVP is well posed and such that the amount of spurious reflection is minimized.

Therefore, we examine in this section quasilinear hyperbolic evolution equations on a finite, open domain Σ ⊂ ℝn with C-smooth boundary Σ. Let T > 0. We are considering an IBVP of the following form,

$${u_t} = \sum\limits_{j = 1}^n {{A^j}} (t,x,u){\partial \over {\partial {x^j}}}u + F(t,x,u),x \in \Sigma ,\quad t \in [0,T],$$
(5.1)
$$u(0,x) = f(x),\quad \quad \quad x \in \Sigma ,$$
(5.2)
$$b(t,x,u)u = g(t,x),\quad \quad x \in \partial \Sigma ,\quad t \in [0,T],$$
(5.3)

where u(t, x) m is the state vector, A1(t, x, u), …, An(t, x, u) are complex m×m matrices, F(t,x,u) m, and b(t,x,u) is a complex r × m matrix. As before, we assume for simplicity that all coefficients belong to the class \(C_b^\infty ([0,T] \times \Sigma \times {{\rm{\mathbb C}}^m})\) of bounded, smooth functions with bounded derivatives. The data consists of the initial data \(f \in C_b^\infty (\Sigma, {{\rm{\mathbb C}}^m})\) and the boundary data \(g \in C_b^\infty ([0,T] \times \partial \Sigma, {{\rm{\mathbb C}}^r})\).

Compared to the initial-value problem discussed in Section 3 the following new issues and difficulties appear when boundaries are present:

  • For a smooth solution to exist, the data f and g must satisfy appropriate compatibility conditions at the intersection S := {0} × Σ between the initial and boundary surface [344]. Assuming that u is continuous, for instance, Eqs. (5.2, 5.3) imply that g(0, x) = b(0, x, f(x))f(x) for all x ∈ ∂Σ. If u is continuously differentiable, then taking a time derivative of Eq. (5.3) and using Eqs. (5.1, 5.2) leads to

    $${g_t}(0,x) = c(x)\;\left[ {\sum\limits_{j = 1}^n {{A^j}} (0,x,f(x)){{\partial f} \over {\partial {x^j}}}(x) + F(0,x,f(x))} \right] + {b_t}(0,x,f(x))f(x),\qquad x \in \partial \Sigma ,$$
    (5.4)

    where c(x) is the complex r × m matrix with coefficients

    $$c{(x)^A}_{\;B} = b{(0,x,f(x))^A}_{\;B} + \sum\limits_{C = 1}^m {{{\partial {b^A}_C} \over {\partial {u^B}}}} (0,x,f(x))f{(x)^C},\qquad A = 1, \ldots ,r,\quad B = 1, \ldots ,m.$$
    (5.5)

    Assuming higher regularity of u, one obtains additional compatibility conditions by taking further time derivatives of Eq. (5.3). In particular, for an infinitely-differentiable solution u, one has an infinite family of such compatibility conditions at S, and one must make sure that the data f, g satisfies each of them if the solution u is to be reproduced by the IBVP. If an exact solution u(0) of the partial-differential equation (5.1) is known, a convenient way of satisfying these conditions is to choose the data such that in a neighborhood of S, f and g agree with the corresponding values for u(0) i.e., such that f(x) = u(0)(0, x) and g(t, x) = b(t, x, u(0)(t, x))u(0)(t, x) for (t, x) in a neighborhood of S. However, depending on the problem at hand, this might be too restrictive.

  • The next issue is the question of what class of boundary conditions (5.3) leads to a well-posed problem. In particular, one would like to know, which are the restrictions on the matrix b(t, x, u) implying existence of a unique solution, provided the compatibility conditions hold. In order to illustrate this issue on a very simple example, consider the advection equation ut = ux on the interval [−1, 1]. The most general solution has the form u(t, x) = h(t + x) for some differentiable function h: (−1, ∞) → ℂ. The function h is determined on the interval [−1, 1] by the initial data alone, and so the initial data alone fixes the solution on the strip −1 −tx ≤ 1 − t. Therefore, one is not allowed to specify any boundary conditions at x = −1, whereas data must be specified for u at x = 1 in order to uniquely determine the function h on the interval (1, ∞).

  • Additional difficulties appear when the system has constraints, like in the case of electromagnetism and general relativity. In the previous Section 4, we saw in the case of Einstein’s equations that it is usually sufficient to solve these constraints on an initial Cauchy surface, since the Bianchi identities and the evolution equations imply that the constraints propagate. However, in the presence of boundaries one can only guarantee that the constraints remain satisfied inside the future domain of dependence of the initial surface Σ0:= {0} × Σ unless the boundary conditions are chosen with care. Methods for constructing constraint-preserving boundary conditions, which make sure that the constraints propagate correctly on the whole spacetime domain [0, T] × Σ will be discussed in Section 6.

There are two common techniques for analyzing an IBVP. The first, discussed in Section 5.1, is based on the linearization and localization principles, and reduces the problem to linear, constant coefficient IBVPs which can be explicitly solved using Fourier transformations, similar to the case without boundaries. This approach, called the Laplace method, is very useful for finding necessary conditions for the well-posedness of linear, constant coefficient IBVPs. Likely, these conditions are also necessary for the quasilinear IBVP, since small-amplitude high-frequency perturbations are essentially governed by the corresponding linearized, frozen coefficient problem. Based on the Kreiss symmetrizer construction [258] and the theory of pseudo-differential operators, the Laplace method also gives sufficient conditions for the linear, variable coefficient problem to be well posed; however, the general theory is rather technical. For a discussion and interpretation of this approach in terms of wave propagation we refer to [241].

The second method, which is discussed in Section 5.2, is based on energy inequalities obtained from integration by parts and does not require the use of pseudo-differential operators. It provides a class of boundary conditions, called maximal dissipative, which leads to a well-posed IBVP. Essentially, these boundary conditions specify data to the incoming normal characteristic fields, or to an appropriate linear combination of the in- and outgoing normal characteristic fields. Although technically less involved than the Laplace one, this method requires the evolution equations (5.1) to be symmetric hyperbolic in order to be applicable, and it gives sufficient, but not necessary, conditions for well-posedness.

In Section 5.3 we also discuss absorbing boundary conditions, which are designed to minimize spurious reflections from the boundary surface.

5.1 The Laplace method

Upon linearization and localization, the IBVP (5.1, 5.2, 5.3) reduces to a linear, constant-coefficient problem of the following form,

$${u_t} = \sum\limits_{j = 1}^n {{A^j}} {\partial \over {\partial {x^j}}}u + {F_0}(t,x),x \in \Sigma ,\quad t \geq 0,$$
(5.6)
$$u(0,x) = f(x),\quad \quad x \in \Sigma ,$$
(5.7)
$$bu = g(t,x),\quad \quad x \in \partial \Sigma ,\quad t \geq 0,$$
(5.8)

where Aj = Aj(t0, x0, u(0) (t0, x0)), b = b(t0, x0,u(0) (t0, x0)) denote the matrix coefficients corresponding to Aj(t, x, u) and b(t, x, u) linearized about a solution u(0) and frozen at the point p0 = (t0, x0), and where, for generality, we include the forcing term F0(t, x) with components in the class \(C_b^\infty ([0,\infty) \times \Sigma)\). Since the freezing process involves a zoom into a very small neighborhood of p0, we may replace Σ by ℝn for all points p0 lying inside the domain Σ. We are then back into the case of Section 3, and we conclude that a necessary condition for the IBVP (5.1, 5.2, 5.3) to be well posed at u(0) is that all linearized, frozen coefficient Cauchy problems corresponding to p0 Σ are well posed. In particular, the equation (5.6) must be strongly hyperbolic.

Now let us consider a point p0 ∈ ∂Σ at the boundary. Since Σ is assumed to be smooth, it will be mapped to a plane during the freezing process. Therefore, taking points p0 ∈ ∂Σ, it is sufficient to consider the linear, constant coefficient IBVP (5.6, 5.7, 5.8) on the half space

$$\Sigma : = \{({x_1},{x_2}, \ldots ,{x_n}) \in {{\mathbb R}^n}:{x_1} > 0\} ,$$
(5.9)

say. This is the subject of this subsection. Because we are dealing with a constant coefficient problem on the half-space, we can reduce the problem to an ordinary differential boundary problem on the interval [0, ∞) by employing Fourier transformation in the directions t and y:= (x2, …, xn) tangential to the boundary. More precisely, we first exponentially damp the function u(t, x) in time by defining for η > 0 the function

$${u_\eta}(t,x): = \left\{{\begin{array}{*{20}c} {{e^{- \eta t}}\,u(t,x)} & {{\rm{for}}\;t \geq 0,x \in \Sigma ,} \\ {0\quad \quad \;\,\quad} & {{\rm{for}}\;t < 0,x \in \Sigma .} \\ \end{array}} \right.$$
(5.10)

We denote by ũη(ξ, x1, k) the Fourier transformation of uη(t, x1, y) with respect to the directions t, and y tangential to the boundary and define the Laplace-Fourier transformation of u by

$$\tilde u(s,{x_1},k): = {\hat u_\eta}(\xi ,{x_1},k) = {1 \over {{{(2\pi)}^{n/2}}}}\int {{e^{- st - ik\cdot y}}} u(t,{x_1},y)dt{d^{n - 1}}y,\qquad s: = \eta + i\xi ,$$
(5.11)

then, ũ satisfies the following boundary value problem,

$$A{\partial \over {\partial {x_1}}}\tilde u = B(s,k)\tilde u + \tilde F(s,{x_1},k),{x_1} > 0,$$
(5.12)
$$b\tilde u = \tilde g(s,k)\quad \quad \quad {x_1} = 0,$$
(5.13)

where, for notational simplicity, we set A := A1 and Bj:= Aj, j = 2, …, n, and where B(s, k):= sIiB2k2 − … − iBnkn. Here, \(\tilde F(s,{x_1},k) = {{\tilde F}_0}(s,{x_1},k) + \hat f({x_1},k)\) with \({{\tilde F}_0}\) and \({\hat f}\) denoting the Laplace-Fourier and Fourier transform, respectively, of F0 and f, and \(\tilde g(s,k)\) is the Laplace-Fourier transform of the boundary data g.

In the following, we assume for simplicity that the boundary matrix A is invertible, and that the equation (5.6) is strongly hyperbolic. An interesting example with a singular boundary matrix is mentioned in Example 26 below. If A can be inverted, then we rewrite Eq. (5.12) as the linear ordinary differential equation

$${\partial \over {\partial {x_1}}}\tilde u = M(s,k)\tilde u + {A^{- 1}}\tilde F(s,{x_1},k),\qquad {x_1} > 0,$$
(5.14)

where M(s,k):= A−1B(s,k). We solve this equation subject to the boundary conditions (5.13) and the requirement that ũ vanishes as x1 → ∞. For this, it is useful to have information about the eigenvalues of M(s, k).

Lemma 3 ([258, 259, 228]). Suppose the equation (5.6) is strongly hyperbolic and the boundary matrix A has q negative and mq positive eigenvalues. Then, M(s, k) has precisely q eigenvalues with negative real part and mq eigenvalues with positive real part. (The eigenvalues are counted according to their algebraic multiplicity.) Furthermore, there is a constant δ > 0 such that the eigenvalues κ of M(s,k) satisfy the estimate

$$\vert Re(\kappa)\vert \;\, \geq \delta Re(s),$$
(5.15)

for all Re(s) > 0 and k ∈ ℝn−1.

Proof. Let Re(s) > 0, β ∈ ℝ and k ∈ ℝn−1. Then

$$M(s,k) - i\beta I = {A^{- 1}}\;\left[ {sI - i\beta A - i{k_j}{B^j}} \right] = {A^{- 1}}\;\left[ {sI - {P_0}(i\beta ,ik)} \right].$$
(5.16)

Since the equation (5.6) is strongly hyperbolic there is a constant K and matrices S(β, k) such that (see the comments below Definition 2)

$$\vert S(\beta ,k)\vert + \vert S{(\beta ,k)^{- 1}}\vert \; \leq K,\qquad S{(\beta ,k)^{- 1}}{P_0}(i\beta ,ik)S(\beta ,k) = i\Lambda (\beta ,k),$$
(5.17)

for all (β, k) ∈ ℝn, where Λ(β, k) is a real, diagonal matrix. Hence,

$$M(s,k) - i\beta I = {A^{- 1}}S(\beta ,k)\left[ {sI - i\Lambda (\beta ,k)} \right]S{(\beta ,k)^{- 1}},$$
(5.18)

and since sIiΛ(β, k) is diagonal and its diagonal entries have real part greater than or equal to Re(s), it follows that

$$\vert {[M(s,k) - i\beta I]^{- 1}}\vert \; \leq \;\vert A\vert \vert S(\beta ,k)\vert \vert S{(\beta ,k)^{- 1}}\vert \vert {[sI - i\Lambda (\beta ,k)]^{- 1}}\vert \leq {1 \over {\delta {\rm Re} (s)}},$$
(5.19)

with δ:= (K2A∣)−1. Therefore, the eigenvalues κ of M(s, k) must satisfy

$$\vert \kappa - i\beta \vert \; \geq \delta {\rm Re} (s)$$
(5.20)

for all β ∈ ℝ. Choosing β:= Im(κ) proves the inequality (5.15). Furthermore, since the eigenvalues κ = κ(s, k) can be chosen to be continuous functions of (s, k) [252], and since for k = 0, M(s, 0) = sA−1, the number of eigenvalues κ with positive real part is equal to the number of positive eigenvalues of A. □

According to this lemma, the Jordan normal form of the matrix M(s, k) has the following form:

$$M(s,k) = T(s,k)\left[ {D(s,k) + N(s,k)} \right]T{(s,k)^{- 1}},$$
(5.21)

with T(s, k) a regular matrix, N(s, k) is nilpotent (N(s, k)m = 0) and

$$D(s,k) = {\rm{diag}}({\kappa _1}, \ldots ,{\kappa _q},{\kappa _{q + 1}}, \ldots ,{\kappa _m})$$
(5.22)

is the diagonal matrix with the eigenvalues of M(s,k), where κ1, …,κq have negative real part. Furthermore, N(s, k) commutes with D(s, k). Transforming to the variable (s, x, k):= T(s,k)−1ũ(s, x, k) the boundary value problem (5.12, 5.13) simplifies to

$${\partial \over {\partial {x_1}}}\tilde v = \left[ {D(s,k) + N(s,k)} \right]\tilde v + T{(s,k)^{- 1}}{A^{- 1}}\tilde F(s,{x_1},k),{x_1} > 0,$$
(5.23)
$$bT(s,k)\tilde v = \tilde g(s,k)\quad \quad \quad \quad \quad {x_1} = 0.$$
(5.24)

5.1.1 Necessary conditions for well-posedness and the Lopatinsky condition

Having cast the IBVP into the ordinary differential system (5.23, 5.24), we are ready to obtain a simple necessary condition for well-posedness. For this, we consider the problem for \(\tilde F = 0\) and split = (, +) where := (1, … q) and +:= (q+1, …, m) are the variables corresponding to the eigenvalues of M(s, k) with negative and positive real parts, respectively. Accordingly, we split

$$D(s,k) = \left({\begin{array}{*{20}c} {{D_ -}(s,k)} & 0 \\ 0 & {{D_ +}(s,k)} \\ \end{array}} \right),\qquad N(s,k) = \left({\begin{array}{*{20}c} {{N_ -}(s,k)} & 0 \\ 0 & {{N_ +}(s,k)} \\ \end{array}} \right)\;,$$
(5.25)

and bT(s, k) = (b(s, k), b+(s, k)). When \(\tilde F = 0\) the most general solution of Eq. (5.23) is

$$\begin{array}{*{20}c} {{{\tilde v}_ -}(s,{x_1},k) = {e^{{D_ -}(s,k){x_1}}}{e^{{N_ -}(s,k){x_1}}}{\sigma _ -}(s,k),} \\ {{{\tilde v}_ +}(s,{x_1},k) = {e^{{D_ +}(s,k){x_1}}}{e^{{N_ +}(s,k){x_1}}}{\sigma _ +}(s,k),} \\ \end{array}$$

with constant vectors σ(s, k) q and σ+(s, k) ∈ ℂmq. The expression for + describes modes that grow exponentially in x1 and do not satisfy the required boundary condition at x1 → ∞ unless σ+(s, k) = 0; hence, we set σ+(s, k) = 0. In view of the boundary conditions (5.24), we then obtain the algebraic equation

$${b_ -}(s,k){\sigma _ -}(s,k) = \tilde g.$$
(5.26)

Therefore, a necessary condition for existence and uniqueness is that the r × q matrix b(s, k) be a square matrix, i.e., r = q, and that

$$\det ({b_ -}(s,k)) \neq 0$$
(5.27)

for all Re(s) > 0 and k ∈ ℝn−1. Let us make the following observations:

  • The condition (5.27) implies that we must specify exactly as many linearly-independent boundary conditions as there are incoming characteristic fields, since q is the number of negative eigenvalues of the boundary matrix A = A1.

  • The violation of condition (5.27) at some (s0, k0) with Re(s0) > 0 and k ∈n−1 gives rise to the simple wave solutions

    $$u(t,{x_1},y) = {e^{{s_0}t + i{k_0}\cdot y}}\tilde u({s_0},{x_1},{k_0}),\qquad t \geq 0,\quad ({x_1},y) \in \Sigma ,$$
    (5.28)

    where ũ(s0,·, k0) = T(s, k)(s0,·, k0) ∈ L2(0, ∞) is a nontrivial solution of the problem (5.23, 5.24) with homogeneous data \(\tilde F = 0\) and \(\tilde g = 0\). Therefore, an equivalent necessary condition for well-posedness is that no such simple wave solutions exist. This is known as the Lopatinsky condition.

  • If such a simple wave solution exists for some (s0, k0), then the homogeneity of the problem implies the existence of a whole family,

    $${u_\alpha}(t,{x_1},y) = {e^{\alpha ({s_0}t + i{k_0}\cdot y)}}\tilde u(\alpha {s_0},\alpha {x_1},\alpha {k_0}),\qquad t \geq 0,\quad ({x_1},y) \in \Sigma ,$$
    (5.29)

    of such solutions parametrized by α > 0. In particular, it follows that

    $$\vert {u_\alpha}(t,{x_1},y)\vert \; = {e^{\alpha {\rm{Re}}(s)t}}\vert \tilde u(\alpha {s_0},\alpha {x_1},\alpha {k_0})\vert \; = {e^{\alpha {\rm{Re}}(s)t}}\vert {u_\alpha}(0,{x_1},y)\vert ,$$
    (5.30)

    such that

    $${{\vert {u_\alpha}(t,{x_1},y)\vert} \over {\vert {u_\alpha}(0,{x_1},y)\vert}} = {e^{\alpha {\rm{Re}}(s)t}} \rightarrow \infty$$
    (5.31)

    for all t > 0, as α. Therefore, one has solutions growing exponentially in time at an arbitrarily large rate.Footnote 18

Example 25. Consider the IBVP for the massless Dirac equation in two spatial dimensions (cf. Section 8.4.1 in [259]),

$${u_t} = \left({\begin{array}{*{20}c} 1 & {\,\;0} \\ 0 & {- 1} \\ \end{array}} \right){u_x} + \left({\begin{array}{*{20}c} 0 & 1 \\ 1 & 0 \\ \end{array}} \right){u_y},t \geq 0,\quad x \geq 0,\quad y \in {\mathbb R},\qquad u = \left({\begin{array}{*{20}c} {{u_1}} \\ {{u_2}} \\ \end{array}} \right)$$
(5.32)
$$u(0,x,y) = f(x,y),\quad \quad x \geq 0,\quad y \in {\mathbb R},$$
(5.33)
$$a{u_1} + b{u_2} = g(t,y),\quad \quad t \geq 0,\quad y \in {\mathbb R},$$
(5.34)

where a and b are two complex constants to be determined. Assuming f = 0, Laplace-Fourier transformation leads to the boundary-value problem

$${\tilde u_x} = M(s,k)\tilde u,\quad x > 0,\qquad M(s,k) = \left({\begin{array}{*{20}c} {\;s} & {- ik} \\ {ik} & {- s} \\ \end{array}} \right)$$
(5.35)
$$a{\tilde u_1} + b{\tilde u_2} = \tilde g(s,k),x = 0.$$
(5.36)

The eigenvalues and corresponding eigenvectors of the matrix M(s, k) are κ± = ±λ and e± = (ik, sλ)T, with \(\lambda := \sqrt {{s^2} + {k^2}}\), where the root is chosen such that Re(λ) > 0 for Re(s) > 0. The solution, which is square integrable on [0, ∞), is the one associated with κ; that is,

$$\tilde u(s,x,k) = \sigma {e^{- \lambda x}}{e_ -},$$
(5.37)

with σ a constant. Introduced into the boundary condition (5.36) leads to the condition

$$\left[ {ika + (s + \lambda)b} \right]\sigma = \tilde g(s,k),$$
(5.38)

and the Lopatinsky condition is satisfied if and only if the expression inside the square brackets on the left-hand side is different from zero for all Re(s) > 0 and k ∈ ℝ. Clearly, this implies b ≠ 0, since otherwise this expression is zero for k = 0. Assuming b ≠ 0 and k ≠ 0, we then obtain the condition

$$z + \sqrt {{z^2} + 1} \pm i{a \over b} \neq 0,$$
(5.39)

for all z:= s/∣k∣ with Re(z) > 0, which is the case if and only if ∣a/b∣ ≤ 1 or a/b ∈ ℝ; see Figure 1. The particular case a = 0, b = 1 corresponds to fixing the incoming normal characteristic field u2 to g at the boundary.

Figure 1
figure 1

Image of the lines Re(z) = const > 0 under the map \({\rm{\mathbb C}} \rightarrow {\rm{\mathbb C,}}\,z \mapsto z + \sqrt {{z^2} + 1}\).

Example 26. We consider the Maxwell evolution equations of Example 15 on the half-space x1 > 0, and freeze the incoming normal characteristic fields to zero at the boundary. These fields are the ones defined in Eq. (3.54), which correspond to negative eigenvalues and k = x;Footnote 19 hence

$${E_1} + {\mu \over \beta}({W_{22}} + {W_{33}}) = 0,\qquad {E_A} + {W_{1A}} - (1 + \alpha){W_{A1}} = 0,\qquad {x_1} = 0,\quad {x_A} \in {\mathbb R},\quad t \geq 0,$$
(5.40)

where A = 2, 3 label the coordinates tangential to the boundary, and where we recall that \(\mu = \sqrt {\alpha \beta}\), assuming that α and β have the same sign such that the evolution system (3.50, 3.51) is strongly hyperbolic. In this example, we apply the Lopatinsky condition in order to find necessary conditions for the resulting IBVP to be well posed. For simplicity, we assume that \(\mu = \sqrt {\alpha \beta} = 1\), which implies that the system is strongly hyperbolic for all values of α ≠ 0, but symmetric hyperbolic only if −3/2 < α < 0; see Example 15.

In order to analyze the system, it is convenient to introduce the variables U1:= W22 + W33, UA:= W1A − (1 + α)WA1, Z:= βW11 − (1 + β/2)U1, and \({{\bar W}_{AB}}: = {W_{AB}} - {\delta _{AB}}{U_1}/2\), which are motivated by the form of the characteristic fields with respect to the direction k = −1 normal to the boundary x1 = 0; see Example 15. With these assumptions and definitions, Laplace-Fourier transformation of the system (3.50, 3.51) yields

$$\begin{array}{*{20}c} {s{{\tilde E}_1} = - \alpha {\partial _1}{{\tilde U}_1} + i{k^A}\;\left[ {(1 + \alpha){{\tilde U}_A} + \alpha (2 + \alpha){{\tilde W}_{A1}}} \right]\;,\quad \quad \quad \quad \quad \quad \;} \\ {s{{\tilde E}_A} = - {\partial _1}{{\tilde U}_A} - i{k^B}\;\left[ {{{\tilde \bar W}_{BA}} - (1 + \alpha){{\tilde \bar W}_{AB}}} \right] - \alpha i{k_A}\left[ {\alpha \tilde Z + (1 + \alpha){{\tilde U}_1}} \right]\;,} \\ {s{{\tilde U}_1} = - {1 \over \alpha}\;\left[ {{\partial _1}{{\tilde E}_1} + (1 + \alpha)i{k^A}{{\tilde E}_A}} \right]\;,\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \;} \\ {s{{\tilde U}_A} = - {\partial _1}{{\tilde E}_A} + (1 + \alpha)i{k_A}{{\tilde E}_1},\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad} \\ {s\tilde Z = {{3 + 2\alpha} \over {2\alpha}}i{k^A}{{\tilde E}_A},\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \;\,} \\ {s{{\tilde W}_{A1}} = - i{k_A}{{\tilde E}_1},\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \,} \\ {s{{\tilde \bar W}_{AB}} = - i{k_A}{{\tilde E}_B} + {i \over 2}{\delta _{AB}}{k^C}{{\tilde E}_C},\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \;\;\;} \\ \end{array}$$

where we have used β = 1/α since μ = 1. The last three equations are purely algebraic and can be used to eliminate the zero speed fields \(\tilde Z,{{\tilde W}_{A1}}\) and \({{\tilde \bar W}_{AB}}\) from the remaining equations. The result is the ordinary differential system

$$\begin{array}{*{20}c} {{\partial _1}{{\tilde E}_1} = - \alpha s{{\tilde U}_1} - (1 + \alpha)i{k^A}{{\tilde E}_A},\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \;\;\,} \\ {{\partial _1}{{\tilde U}_1} = - \;\left[ {{s \over \alpha} - (2 + \alpha){{\vert k{\vert ^2}} \over s}} \right]\;{{\tilde E}_1} + {{1 + \alpha} \over \alpha}i{k^A}{{\tilde U}_A},\quad \quad \quad \quad \quad \;} \\ {{\partial _1}{{\tilde E}_A} = - s{{\tilde U}_A} + (1 + \alpha)i{k_A}{{\tilde E}_1},\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \;\;} \\ {{\partial _1}{{\tilde U}_A} = - \;\left[ {s + {{\vert k{\vert ^2}} \over s}} \right]\;{{\tilde E}_A} + {{(1 + \alpha)}^2}{{{k_A}{k^B}} \over s}{{\tilde E}_B} - \alpha (1 + \alpha)i{k_A}{{\tilde U}_1}.} \\ \end{array}$$

In order to diagonalize this system, we decompose A and ŨA into their components parallel and orthogonal to k; if \(\hat k: = k/\vert k\vert\) and \({\hat l}\) form an orthonormal basis of the boundary x1 = 0,Footnote 20 then these are defined as

$${\tilde E_{\Vert}}: = {\hat k^A}{\tilde E_A},\qquad {\tilde E_ \bot}: = {\hat l^A}{\tilde E_A},\qquad {\tilde U_{\Vert}}: = {\hat k^A}{\tilde U_A},\qquad {\tilde U_ \bot}: = {\hat l^A}{\tilde U_A}.$$
(5.41)

then, the system decouples into two blocks, one comprising the transverse quantities (, Ũ) and the other the quantities (1, Ũ1, , Ũ). The first block gives

$${\partial _1}\left({\begin{array}{*{20}c} {{{\tilde E}_ \bot}} \\ {{{\tilde U}_ \bot}} \\ \end{array}} \right) = \left({\begin{array}{*{20}c} 0 & {- s} \\ {- \left[ {s + {{\vert k{\vert ^2}} \over s}} \right]} & 0 \\ \end{array}} \right)\;\,\left({\begin{array}{*{20}c} {{{\tilde E}_ \bot}} \\ {{{\tilde U}_ \bot}} \\ \end{array}} \right)\;,$$
(5.42)

and the corresponding solutions with exponential decay at x1 → ∞ have the form

$$\left({\begin{array}{*{20}c} {{{\tilde E}_ \bot}(s,{x_1},k)} \\ {{{\tilde U}_ \bot}(s,{x_1},k)} \\ \end{array}} \right) = {\sigma _0}{e^{- \lambda {x_1}}}\left({\begin{array}{*{20}c} s \\ \lambda \\ \end{array}} \right),$$
(5.43)

where σ0 is a complex constant, and where we have defined \(\lambda := \sqrt {{s^2} + \vert k{\vert ^2}}\) with the root chosen such that Re(λ) > 0 for Re(s) > 0. The second block is

$${\partial _1}\left({\begin{array}{*{20}c} {{{\tilde E}_1}} \\ {{{\tilde U}_1}} \\ {{{\tilde E}_{\Vert}}} \\ {{{\tilde U}_{\Vert}}} \\ \end{array}} \right) = \left({\begin{array}{*{20}c} 0 & {- \alpha s} & {- i(1 + \alpha)\vert k\vert} & 0 \\ {- {s \over \alpha} + (2 + \alpha){{\vert k{\vert ^2}} \over s}} & 0 & 0 & {i{{1 + \alpha} \over \alpha}\vert k\vert} \\ {i(1 + \alpha)\vert k\vert} & 0 & 0 & {- s} \\ 0 & {- i\alpha (1 + \alpha)\vert k\vert} & {- s + \alpha (2 + \alpha){{\vert k{\vert ^2}} \over s}} & 0 \\ \end{array}} \right)\;\,\left({\begin{array}{*{20}c} {{{\tilde E}_1}} \\ {{{\tilde U}_1}} \\ {{{\tilde E}_{\Vert}}} \\ {{{\tilde U}_{\Vert}}} \\ \end{array}} \right)\;\,,$$
(5.44)

with corresponding decaying solutions

$$\left({\begin{array}{*{20}c} {{{\tilde E}_1}(s,{x_1},k)} \\ {{{\tilde U}_1}(s,{x_1},k)} \\ {{{\tilde E}_{\Vert}}(s,{x_1},k)} \\ {{{\tilde U}_{\Vert}}(s,{x_1},k)} \\ \end{array}} \right) = {\sigma _1}{e^{- \lambda {x_1}}}\left({\begin{array}{*{20}c} {i\vert k\vert s} \\ {- i\vert k\vert \lambda} \\ {s\lambda} \\ {{s^2} - \alpha \vert k{\vert ^2}} \\ \end{array}} \right) + {\sigma _2}{e^{- \lambda {x_1}}}\left({\begin{array}{*{20}c} {is\lambda} \\ {i({s^2}/\alpha - \vert k{\vert ^2})} \\ {\vert k\vert s} \\ {- \alpha \vert k\vert \lambda} \\ \end{array}} \right)\;\,,$$
(5.45)

with complex constants σ1 and σ2.

On the other hand, Laplace-Fourier transformation of the boundary conditions (5.40) leads to

$${\tilde E_1} + \alpha {\tilde U_1} = 0,\quad {\tilde E_A} + {\tilde U_A} = 0,\qquad {x_1} = 0.$$
(5.46)

Introducing into this solutions (5.43, 5.45) gives

$$(s + \lambda){\sigma _0} = 0$$
(5.47)

and

$$\left({\begin{array}{*{20}c} {\vert k\vert (s - \alpha \lambda)} & {s\lambda + {s^2} - \alpha \vert k{\vert ^2})} \\ {s\lambda + {s^2} - \alpha \vert k{\vert ^2}} & {\vert k\vert (s - \alpha \lambda)} \\ \end{array}} \right)\;\,\left({\begin{array}{*{20}c} {{\sigma _1}} \\ {{\sigma _2}} \\ \end{array}} \right) = 0.$$
(5.48)

In the first case, since Re(s + λ) ≥ Re(s) > 0, we obtain σ0 = 0 and there are no simple wave solutions in the transverse sector. In the second case, the determinant of the system is

$$- {s^2}\left[ {{{(s + \lambda)}^2} - {{(1 + \alpha)}^2}\vert k{\vert ^2}} \right]\;\,,$$
(5.49)

which is different from zero if and only if \(z + \sqrt {{z^2} + 1} \neq \pm (1 + \alpha)\) for all Re(z) > 0, where z:= s/∣k∣. Since α is real, this is the case if and only if −2 ≤ α ≤ 0; see Figure 1.

We conclude that the strongly hyperbolic evolution system (3.50, 3.51) with αβ = 1 and incoming normal characteristic fields set to zero at the boundary does not give rise to a well-posed IBVP when α > 0 or α < −2. This excludes the parameter range −3/2 < α < 0 for which the system is symmetric hyperbolic. This case is covered by the results in Section 5.2, which utilize energy estimates and show that symmetric hyperbolic problems with zero incoming normal characteristic fields are well posed.

5.1.2 Sufficient conditions for well-posedness and boundary stability

Next, let us discuss sufficient conditions for the linear, constant coefficient IBVP (5.6, 5.7, 5.8) to be well posed. For this, we first transform the problem to trivial initial data by replacing u(t, x) with u(t, x) − etf(x). Then, we obtain the IBVP

$${u_t} = \sum\limits_{j = 1}^n {{A^j}} {\partial \over {\partial {x^j}}}u + F(t,x),x \in \Sigma ,\quad t \geq 0,$$
(5.50)
$$u(0,x) = 0,\quad \quad x \in \Sigma ,$$
(5.51)
$$bu = g(t,x),\quad \quad x \in \partial \Sigma ,\quad t \geq 0,$$
(5.52)

with \(F(t,x) = {F_0}(t,x) - {e^{- t}}[f(x) + \sum\limits_{j = 1}^n {{A^j}} {\partial \over {\partial {x^j}}}f(x)]\) and g(t, x) replaced by g(t, x) + e−tbf(x). By applying the Laplace-Fourier transformation to it, one obtains the boundary-value problem (5.12, 5.13), which could be solved explicitly, provided the Lopatinsky condition holds. However, in view of the generalization to variable coefficients, one would like to have a method that does not rely on the explicit representation of the solution in Fourier space.

In order to formulate the next definition, let \(\Omega := [0,\infty) \times \bar \Sigma\) be the bulk and \({\mathcal T}: = [0,\infty) \times \partial \Sigma\) the boundary surface, and introduce the associated norms ∥ · ∥η,0,Ω and \(\Vert \cdot \Vert_{\eta, 0, {\mathcal T}}\) defined by

$$\begin{array}{*{20}c} {\Vert u\Vert _{\eta ,0,\Omega}^2\;: = \int\limits_\Omega {{e^{- 2\eta t}}} \vert u(t,{x_1},y){\vert ^2}dt\,d{x_1}{d^{n - 1}}y = \int\limits_{{{\mathbb R}^{n + 1}}} \vert {u_\eta}(t,x){\vert ^2}dt\,{d^n}x,} \\ {\Vert u\Vert _{\eta ,0,T}^2: = \int\limits_T {{e^{- 2\eta t}}} \vert u(t,0,y){\vert ^2}dt\,{d^{n - 1}}y = \int\limits_{{{\mathbb R}^n}} \vert {u_\eta}(t,0,y){\vert ^2}dt\,{d^{n - 1}}y,\;\,} \\ \end{array}$$

where we have used the definition of uη as in Eq. (5.10). Using Parseval’s identities we may also rewrite these norms as

$$\Vert u\Vert _{\eta ,0,\Omega}^2\;: = \int\limits_{\mathbb R} {\left[ {\int\limits_0^\infty {\left({\;\int\limits_{{{\mathbb R}^{n - 1}}} \vert \tilde u(\eta + i\xi ,{x_1},k){\vert ^2}{d^{n - 1}}k} \right)\;} \,d{x_1}} \right]\;\,} d\xi ,$$
(5.53)
$$\Vert u\Vert _{\eta ,0,T}^2: = \int\limits_{\mathbb R} {\left({\;\int\limits_{{{\mathbb R}^{n - 1}}} \vert \tilde u(\eta + i\xi ,0,k){\vert ^2}{d^{n - 1}}k} \right)} \;\,d\xi .$$
(5.54)

The relevant concept of well-posedness is the following one.

Definition 6. [258] The IBVP ( 5.50 , 5.51 , 5.52 ) is called strongly well posed in the generalized sense if there is a constant K > 0 such that each compatible data \(F \in C_0^\infty (\Omega)\) and \(g \in C_0^\infty ({\mathcal T})\) gives rise to a unique solution u satisfying the estimate

$$\eta \,\Vert u\Vert _{\eta ,0,\Omega}^2 + \Vert u\Vert _{\eta ,0,{\mathcal T}}^2\; \leq {K^2}\left({{1 \over \eta}\Vert F\Vert _{\eta ,0,\Omega}^2 + \Vert g\Vert _{\eta ,0,{\mathcal T}}^2} \right)\;,$$
(5.55)

for all η > 0.

The inequality (5.55) implies that both the bulk norm · η,0,Ω and the boundary norm \(\Vert \cdot \Vert_{\eta, 0, {\mathcal T}}\) of u are bounded by the corresponding norms of F and g. For a trivial source term, F = 0, the inequality (5.55) implies, in particular,

$$\Vert u\Vert _{\eta ,0,{\mathcal T}} \;\leq K\Vert g\Vert _{\eta ,0,{\mathcal T}},\qquad \eta > 0,$$
(5.56)

which is an estimate for the solution at the boundary in terms of the norm of the boundary data g. In view of Eq. (5.54) this is equivalent to the following requirement.

Definition 7. [259, 267] The boundary problem ( 5.50 , 5.51 , 5.52 ) is called boundary stable if there is a constant K > 0 such that all solutions ũ(s,·,k) ∈ L2(0,) of Eqs. ( 5.12 , 5.13 ) with \(\tilde F = 0\) satisfy

$$\vert \tilde u(s,0,k)\vert \; \leq K\vert \tilde g(s,k)\vert$$
(5.57)

for all Re(s) > 0 and k ∈ ℝn−1.

Since boundary stability only requires considering solutions for trivial source terms, F = 0, it is a much simpler condition than Eq. (5.55). Clearly, strong well-posedness in the generalized sense implies boundary stability. The main result is that, modulo technical assumptions, the converse is also true: boundary stability implies strong well-posedness in the generalized sense.

Theorem 5. [258, 340] Consider the linear, constant coefficient IBVP ( 5.50 , 5.51 , 5.52 ) on the half space Σ = {(x1, x2, …, xn) ∈ ℝn: x1 > 0}. Assume that equation (5.50) is strictly hyperbolic, meaning that the eigenvalues of the principal symbol P0(ik) are distinct for all kSn−1. Assume that the boundary matrix A = A1 is invertible. Then, the problem is strongly well posed in the generalized sense if and only if it is boundary stable.

Maybe the importance of Theorem 5 is not so much its statement, which concerns only the linear, constant coefficient case for which the solutions can also be constructed explicitly, but rather the method for its proof, which is based on the construction of a smooth symmetrizer symbol, and which is amendable to generalizations to the variable coefficient case using pseudo-differential operators.

In order to formulate the result of this construction, define \(\rho := \sqrt {\vert s{\vert ^2} + \vert k{\vert ^2}}, {s{\prime}}: = s/\rho, {k{\prime}}: = k/\rho\), such that \(({s{\prime}},{k{\prime}}) \in S_ + ^n\) lies on the half sphere \(S_ + ^n: = \{({s{\prime}},{k{\prime}}) \in {\rm{C}} \times {{\rm{R}}^n}:\vert {s{\prime}}{\vert ^2} + \vert k{\vert ^2} = 1,{\rm Re} ({s{\prime}}) > 0\}\) for Re(s) > 0 and k ∈ ℝn−1. Then, we have,

Theorem 6. [258] Consider the linear, constant coefficient IBVP ( 5.50 , 5.51 , 5.52 ) on the half space Σ. Assume that equation (5.50) is strictly hyperbolic, that the boundary matrix A = A1 is invertible, and that the problem is boundary stable. Then, there exists a family of complex m × m matrices \(H({s{\prime}},k),({s{\prime}},k) \in S_ + ^n\), whose coefficients belong to the class \({C^\infty}(S_ + ^m)\), with the following properties:

  1. (i)

    H(s′, k′) = H(s′, k′)* is Hermitian.

  2. (ii)

    H(s′, k′)M(s′, k′) + M(s′, k′)*H(s′, k′) ≥ 2Re(s′)I for all (s′, k′) \(({s{\prime}},{k{\prime}}) \in S_ + ^n\).

  3. (iii)

    There is a constant C > 0 such that

    $${\tilde u^{\ast}}H(s\prime ,k\prime)\tilde u + C\vert b\tilde u{\vert ^2}\; \geq \;\vert \tilde u{\vert ^2}$$
    (5.58)

    for all ũ ∈ ℂm and all (s′, k′) \(\in S_n^ +\).

Furthermore, H can be chosen to be a smooth function of the matrix coefficients of Aj and b.

Let us show how the existence of the symmetrizer H(s′, k′) implies the estimate (5.55). First, using Eq. (5.14) and properties (i) and (ii) we have

$$\begin{array}{*{20}c} {{\partial \over {\partial {x_1}}}\;\left[ {{{\tilde u}^{\ast}}H(s\prime ,k\prime)\tilde u} \right] = {{\left({{{\partial \tilde u} \over {\partial {x_1}}}} \right)}^{\ast}}H(s\prime ,k\prime)\tilde u + {{\tilde u}^{\ast}}H(s\prime ,k\prime){{\partial \tilde u} \over {\partial {x_1}}}\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \;\;} \\ {= \rho {{\tilde u}^{\ast}}\;\left[ {H(s\prime ,k\prime)M(s\prime ,k\prime) + M{{(s\prime ,k\prime)}^{\ast}}H(s\prime ,k\prime)} \right]\tilde u + 2{\rm{Re}}\left({{{\tilde u}^{\ast}}H(s\prime ,k\prime){A^{- 1}}\tilde F} \right)} \\ {\geq 2{\rm{Re}}(s)\vert \tilde u{\vert ^2} - {C_1}\vert \tilde u{\vert ^2} - {1 \over {{C_1}}}\vert H(s\prime ,k\prime){A^{- 1}}\tilde F{\vert ^2},\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad} \\ \end{array}$$

where we have used the fact that M(s, k) = ρM(s′, k′) in the second step, and the inequality \(2{{\rm Re}} (a^\ast b) \leq 2\vert a\Vert b\vert \, \leq {C_1}\vert a{\vert ^2} + C_1^{- 1}\vert b{\vert ^2}\) for complex numbers a and b and any positive constant C1 > 0 in the third step. Integrating both sides from x1 = 0 to ∞ and choosing C1 = Re(s), we obtain, using (iii),

$$\begin{array}{*{20}c} {{\rm{Re}}(s)\int\limits_0^\infty \vert \tilde u{\vert ^2}d{x_1} \leq - {{\left[ {{{\tilde u}^{\ast}}H(s\prime ,k\prime)\tilde u} \right]}_{{x_1} = 0}} + {1 \over {{\rm{Re}}(s)}}\int\limits_0^\infty \vert H{A^{- 1}}\tilde F{\vert ^2}d{x_1}\quad \quad \quad \quad \quad \;\;} \\ {\leq - {{\left. {\vert \tilde u{\vert ^2}} \right\vert}_{{x_1} = 0}} + C\vert \tilde g{\vert ^2} + {1 \over {{\rm{Re}}(s)}}\int\limits_0^\infty \vert H{A^{- 1}}\tilde F{\vert ^2}d{x_1}.} \\ \end{array}$$
(5.59)

Since H is bounded, there exists a constant C2 > 0 such that \(\vert H{A^{- 1}}\tilde F\vert \leq {C_2}\vert \tilde F\vert\) for all \(({s{\prime}},{k{\prime}}) \in S_ + ^n\). Integrating over ξ = Im(s) ℝ and k ∈n−1 and using Parseval’s identity, we obtain from this

$$\eta \,\Vert u\Vert _{\eta ,0,\Omega}^2 + \Vert u\Vert _{\eta ,0,{\mathcal T}}^2 \leq {{C_2^2} \over \eta}\Vert F\Vert _{\eta ,0,\Omega}^2 + C\Vert g\Vert _{\eta ,0,{\mathcal T}}^2,$$
(5.60)

and the estimate (5.55) follows with \({K^2}: = \max \{C_2^2,C\}\).

Example 27. Let us go back to Example 25 of the 2D Dirac equation on the halfspace with boundary condition (5.34) at x = 0. The solution of Eqs. (5.35, 5.36) at the boundary is given by ũ(s, 0, k) = σ(ik, s +λ)T, where \(\lambda = \sqrt {{s^2} + {k^2}}\), and

$$\sigma = {{\tilde g(s,k)} \over {ika + (s + \lambda)b}}.$$
(5.61)

Therefore, the IBVP is boundary stable if and only if there exists a constant K > 0 such that

$${{\sqrt {{k^2} + \vert s + \lambda {\vert ^2}}} \over {\vert ika + (s + \lambda)b\vert}} \leq K$$
(5.62)

for all Re(s) > 0 and k ∈ ℝn−1. We may assume b ≠ 0, otherwise the Lopatinsky condition is violated. For k = 0 the left-hand side is 1/∣b∣. For k ≠ 0 we can rewrite the condition as

$${1 \over {\vert b\vert}}{{\sqrt {1 + \vert \psi (z){\vert ^2}}} \over {\vert \psi (z) \pm i{a \over b}\vert}} \leq K,$$
(5.63)

for all Re(z) > 0, where \(\psi (z): = z + \sqrt {{z^2} + 1}\) and z:= s/∣k∣. This is satisfied if and only if the function \(\vert \psi (z) \pm i{a \over b}\vert\) is bounded away from zero, which is the case if and only if ∣a/b∣ < 1; see Figure 1.

This, together with the results obtained in Example 25, yields the following conclusions: the IBVP (5.32, 5.33, 5.34) gives rise to an ill-posed problem if b = 0 or if ∣a/b∣ > 1 and a/b ∈ ℝ and to a problem, which is strongly well posed in the generalized sense if b ≠ 0 and ∣a/b∣ < 1. The case a∣ = ∣b ≠ 0 is covered by the energy method discussed in Section 5.2. For the case ∣a/b∣ > 1 with a/b ∈ ℝ, see Section 10.5 in [228].

Before discussing second-order systems, let us make a few remarks concerning Theorem 5:

  • The boundary stability condition (5.57) is often called the Kreiss condition. Provided the eigenvalues of the matrix M(s, k) are suitably normalized, it can be shown [258, 228, 241] that the determinant det(b(s, k)) in Eq. (5.27) can be extended to a continuous function defined for all Re(s) ≥ 0 and k ∈n−1, and condition (5.57) can be restated as the following algebraic condition:

    $$\det ({b_ -}(s,k)) \neq 0$$
    (5.64)

    for all Re(s) ≥ 0 and k ∈n−1. This is a strengthened version of the Lopatinsky condition, since it requires the determinant to be different from zero also for s on the imaginary axis.

  • As anticipated above, the importance of the symmetrizer construction in Theorem 6 relies on the fact that, based on the theory of pseudo-differential operators, it can be used to treat the linear, variable coefficient IBVP [258]. Therefore, the localization principle holds: if all the frozen coefficient IBVPs are boundary stable and satisfy the assumptions of Theorem 5, then the variable coefficient problem is strongly well posed in the generalized sense.

  • If the problem is boundary stable, it is also possible to estimate higher-order derivatives of the solutions. For example, if we multiply both sides of the inequality (5.59) by ∣k2, integrate over ξ = Im(s) and k and use Parseval’s identity as before, we obtain the estimate (5.55) with u, F and g replaced by their tangential derivatives uy, Fy and gy, respectively. Similarly, one obtains the estimate (5.55) with u, F and g replaced by their time derivatives ut, Ft and gt if we multiply both sides of the inequality (5.59) by ∣s2 and assume that ut(0, x) = 0 for all x ∈ Σ.Footnote 21 Then, a similar estimate follows for the partial derivative, 1u, in the x1-direction using the evolution equation (5.6) and the fact that the boundary matrix A1 is invertible. Estimates for higher-order derivatives of u follow by an analogous process.

  • Theorem 5 assumes that the initial data f is trivial, which is not an important restriction since one can always achieve f = 0 by transforming the source term F and the boundary data g, as described below Eq. (5.52). Since the transformed F involves derivatives of f, this means that derivatives of f would appear on the right-hand side of the inequality (5.55), and at first sight it looks like one “loses a derivative” in the sense that one needs to control the derivatives of f to one degree higher than the ones of u. However, the results in [341, 342] improve the statement of Theorem 5 by allowing nontrivial initial data and by showing that the same hypotheses lead to a stronger concept of well-posedness (strong well-posedness, defined below in Definition 9 as opposed to strong well-posedness in the generalized sense).

  • The results mentioned so far assume strict hyperbolicity and an invertible boundary matrix, which are too-restrictive conditions for many applications. Unfortunately, there does not seem to exist a general theory, which removes these two assumptions. Partial results include [5], which treats strongly hyperbolic problems with an invertible boundary matrix that are not necessarily strictly hyperbolic, and [293], which discusses symmetric hyperbolic problems with a singular boundary matrix.

5.1.3 Second-order systems

It has been shown in [267] that certain systems of wave problems can be reformulated in such a way that they satisfy the hypotheses of Theorem 6. In order to illustrate this, we consider the IBVP for the wave equation on the half-space Σ:= {(x1, x2, …, xn) ∈ ℝn: x1 > 0}, n ≥ 1,

$${v_{tt}} = \Delta v + F(t,x),\;\;x \in \Sigma ,\quad t \geq 0,$$
(5.65)
$$v(0,x) = 0,\quad {v_t}(0,x) = 0,\;\;x \in \Sigma ,$$
(5.66)
$$Lv = g(t,x),\;\;x \in \partial \Sigma ,\quad t \geq 0,$$
(5.67)

where \(F \in C_0^\infty ([0,\infty) \times \Sigma)\)) and \(g \in C_0^\infty ([0,\infty) \times \partial \Sigma)\), and where L is a first-order linear differential operator of the form

$$L: = a{\partial \over {\partial t}} - b{\partial \over {\partial {x_1}}} - \sum\limits_{j = 2}^n {{c_j}} {\partial \over {\partial {x_j}}},$$
(5.68)

where a, b, c2, …, cn are real constants. We ask under which conditions on these constants the IBVP (5.65, 5.66, 5.67) is strongly well posed in the generalized sense. Since we are dealing with a second-order system, the estimate (5.55) in Definition 6 has to be replaced with

$$\eta \Vert v\Vert _{\eta ,1,\Omega}^2 + \Vert v\Vert _{\eta ,1,{\mathcal T}}^2 \leq {K^2}\left({{1 \over \eta}\Vert F\Vert _{\eta ,0,\Omega}^2 + \Vert g\Vert _{\eta ,0,{\mathcal T}}^2} \right)\;,$$
(5.69)

where the norms \(\Vert \cdot \Vert _{\eta, 1, \Omega}^2\) and \(\Vert \cdot \Vert _{\eta, 1, {\mathcal T}}^2\) control the first partial derivatives of v,

$$\begin{array}{*{20}c} {\Vert v\Vert _{\eta ,1,\Omega}^2: = \int\limits_\Omega {{e^{- 2\eta t}}} \sum\limits_{\mu = 0}^n {{{\left\vert {{{\partial v} \over {\partial {x^\mu}}}(t,{x_1},y)} \right\vert}^2}} \,dt\,d{x_1}{d^{n - 1}}y,} \\ {\Vert v\Vert _{\eta ,1,{\mathcal T}}^2: = \int\limits_{\mathcal T} {{e^{- 2\eta t}}} \sum\limits_{\mu = 0}^n {{{\left\vert {{{\partial v} \over {\partial {x^\mu}}}(t,0,y)} \right\vert}^2}} dt\,{d^{n - 1}}y,\quad \;\;\;} \\ \end{array}$$

with (xμ) = (t, x1, x2, …, xn). Likewise, the inequality (5.57) in the definition of boundary stability needs to be replaced by

$$\vert \tilde u(s,0,k)\vert \; \leq K{{\vert \tilde g(s,k)\vert} \over {\sqrt {\vert s{\vert ^2} + \vert k{\vert ^2}}}}.$$
(5.70)

Laplace-Fourier transformation of Eqs. (5.65, 5.67) leads to the second-order differential problem

$${{{\partial ^2}} \over {\partial x_1^2}}\tilde v = ({s^2} + \vert k{\vert ^2})\tilde v - \tilde F,\,\,{x_1} > 0,$$
(5.71)
$$b{\partial \over {\partial {x_1}}}\tilde v = (as - ic(k))\tilde v - \tilde g,\,\,{x_1} = 0,$$
(5.72)

where we have defined \(c(k): = \sum\limits_{j = 2}^n {{c_j}{k_j}}\) and where \({\tilde F}\) and \({\tilde g}\) denote the Laplace-Fourier transformations of F and g, respectively. In order to apply the theory described in Section 5.1.2, we rewrite this system in first-order pseudo-differential form. Defining

$$\tilde u: = \left({\begin{array}{*{20}c} {\rho \tilde v} \\ {{{\partial \tilde v} \over {\partial {x_1}}}} \\ \end{array}} \right),\qquad \tilde f: = - \left({\begin{array}{*{20}c} 0 \\ {\tilde F} \\ \end{array}} \right),$$
(5.73)

where \(\rho := \sqrt {\vert s{\vert ^2} + \vert k{\vert ^2}}\), we find

$${\partial \over {\partial {x_1}}}\tilde u = M(s,k)\tilde u + \tilde f,\,\,{x_1} > 0,$$
(5.74)
$$L(s,k)\tilde u = \tilde g,\,\,{x_1} = 0,$$
(5.75)

where we have defined

$$M(s,\,k): = \rho \left({\begin{array}{*{20}c} 0 & 1 \\ {{{s\prime}^2} + \vert k\prime {\vert ^2}} & 0 \\ \end{array}} \right),\qquad L(s,k): = \left({as\prime - ic(k\prime), - b} \right),$$
(5.76)

with s′ := s/ρ, k′:= k/ρ. This system has the same form as the one described by Eqs. (5.14, 5.13), and the eigenvalues of the matrix M(s,k) are distinct for Re(s) > 0 and k ∈n−1. Therefore, we can construct a symmetrizer H(s′, k′) according to Theorem 6 provided that the problem is boundary stable. In order to check boundary stability, we diagonalize M(s, k) and consider the solution of Eq. (5.74) for \(\tilde f = 0\), which decays exponentially as x1 → ∞,

$$\tilde u(s,{x_1},k) = {\sigma _ -}{e^{- \lambda {x_1}}}\left({\begin{array}{*{20}c} \rho \\ {- \lambda} \\ \end{array}} \right),$$
(5.77)

where σ is a complex constant and \(\lambda := \sqrt {{s^2} + \vert k{\vert ^2}}\) with the root chosen such that Re(λ) > 0 for Re(s) > 0. Introduced into the boundary condition (5.75), this gives

$$\left[ {as\prime + b\lambda \prime - ic(k\prime)} \right]{\sigma _ -} = {{\tilde g} \over \rho},$$
(5.78)

and the system is boundary stable if and only if the expression inside the square parenthesis is different from zero for all Re(s′) ≥ 0 and kn−1 with ∣s2 + ∣k2 = 1. In the one-dimensional case, n = 1, this condition reduces to (a + b)s′ = 0 with ∣s = 1, and the system is boundary stable if and only if a + b ≠ 0; that is, if and only if the boundary vector field L is not proportional to the ingoing null vector at the boundary surface,

$${\partial \over {\partial t}} + {\partial \over {\partial {x_1}}}.$$
(5.79)

Indeed, if a + b = 0, Lu = a(ut + ux1) is proportional to the outgoing characteristic field, for which it is not permitted to specify boundary data since it is completely determined by the initial data.

When n ≥ 2 it follows that b must be different from zero since otherwise the square parenthesis is zero for purely imaginary s′ satisfying as′ = ic(k′). Therefore, one can choose b=1 without loss of generality. It can then be shown that the system is boundary stable if and only if a > 0 and \(\sum\limits_{j = 2}^n {\vert{c_j}{\vert^2} < {a^2}}\); see [267], which is equivalent to the condition that the boundary vector field L is pointing outside the domain, and that its orthogonal projection onto the boundary surface \({\mathcal T}\),

$$T: = a{\partial \over {\partial t}} - \sum\limits_{j = 2}^n {{c_j}} {\partial \over {\partial {x_j}}},$$
(5.80)

is future-directed time-like. This includes as a particular case the “Sommerfeld” boundary condition utux1 = 0 for which L is the null vector obtained from the sum of the time evolution vector field t and the normal derivative \(N = - {\partial _{{x_1}}}\). While N is uniquely determined by the boundary surface \({\mathcal T}\), t is not unique, since one can transform it to an arbitrary future-directed time-like vector field T, which is tangent to \({\mathcal T}\) by means of an appropriate Lorentz transformation. Since the wave equation is Lorentz-invariant, it is clear that the new boundary vector field \(\hat L = T + N\) must also give rise to a well-posed IBVP, which explains why there is so much freedom in the choice of L.

For a more geometric derivation of these results based on estimates derived from the stress-energy tensor associated to the scalar field v, which shows that the above construction for L is sufficient for strong well-posedness; see Appendix B in [263]. For a generalization to the shifted wave equation; see [369].

As pointed out in [267], the advantage of obtaining a strong well-posedness estimate (5.69) for the scalar-wave problem is the fact that it allows the treatment of systems of wave equations where the boundary conditions can be coupled in a certain way through terms involving first derivatives of the fields. In order to illustrate this with a simple example, consider a system of two wave equations,

$${\left({\begin{array}{*{20}c} {{v_1}} \\ {{v_2}} \\ \end{array}} \right)_{tt}} = \Delta \left({\begin{array}{*{20}c} {{v_1}} \\ {{v_2}} \\ \end{array}} \right) + \left({\begin{array}{*{20}c} {{F_1}(t,x)} \\ {{F_2}(t,x)} \\ \end{array}} \right),\qquad x \in \Sigma ,\quad t \geq 0,$$
(5.81)

which is coupled through the boundary conditions

$$\left({{\partial \over {\partial t}} - {\partial \over {\partial {x_1}}}} \right)\,\,\left({\begin{array}{*{20}c} {{v_1}} \\ {{v_2}} \\ \end{array}} \right) = N\left({\begin{array}{*{20}c} {{v_1}} \\ {{v_2}} \\ \end{array}} \right) + \left({\begin{array}{*{20}c} {{g_1}(t,x)} \\ {{g_2}(t,x)} \\ \end{array}} \right),\qquad x \in \partial \Sigma ,\quad t \geq 0,$$
(5.82)

where N has the form

$$N = \left({\begin{array}{*{20}c} 0 & 0 \\ 0 & X \\ \end{array}} \right),\qquad X = {X^0}{\partial \over {\partial t}} + {X^1}{\partial \over {\partial {x_1}}} + \ldots + {X^n}{\partial \over {\partial {x_n}}},$$
(5.83)

with (X0, X1, …, Xn) ∈ n+1 any vector. Since the wave equation and boundary condition for v1 decouples from the one for v2, we can apply the estimate (5.69) to v1, obtaining

$$\eta \,\Vert {v_1}\Vert _{\eta ,1,\Omega}^2 + \Vert {v_1}\Vert _{\eta ,1,{\mathcal T}}^2 \leq {K^2}\left({{1 \over \eta}\Vert {F_1}\Vert _{\eta ,0,\Omega}^2 + \Vert {g_1}\Vert _{\eta ,0,{\mathcal T}}^2} \right).$$
(5.84)

If we set \({g_3}(t,x): = {g_2}(t,x) + X{\upsilon _1}(t,x),t \geq 0,x \in \partial \Sigma\), we have a similar estimate for v2,

$$\eta \,\Vert {v_2}\Vert _{\eta ,1,\Omega}^2 + \Vert {v_2}\Vert _{\eta ,1,{\mathcal T}}^2 \leq {K^2}\left({{1 \over \eta}\Vert {F_2}\Vert _{\eta ,0,\Omega}^2 + \Vert {g_3}\Vert _{\eta ,0,{\mathcal T}}^2} \right).$$
(5.85)

However, since the boundary norm of v1 is controlled by the estimate (5.84), one also controls

$$\Vert {g_3}\Vert _{\eta ,0,{\mathcal T}}^2 \leq 2\Vert {g_2}\Vert _{\eta ,0,{\mathcal T}}^2 + {C^2}\Vert {v_1}\Vert _{\eta ,1,{\mathcal T}}^2 \leq {{{{(CK)}^2}} \over \eta}\Vert {F_1}\Vert _{\eta ,0,\Omega}^2 + {(CK)^2}\Vert {g_1}\Vert _{\eta ,0,{\mathcal T}}^2 + 2\Vert {g_2}\Vert _{\eta ,0,{\mathcal T}}^2$$
(5.86)

with some constant C > 0 depending only on the vector field X. Therefore, the inequalities (5.84,5.85) together yield an estimate of the form (5.69) for v = (v1, v2), F = (F1,F2) and g = (g1, g2), which shows strong well-posedness in the generalized sense for the coupled system. Notice that the key point, which allows the coupling of v1 and v2 through the boundary matrix operator N, is the fact that one controls the boundary norm of v1 in the estimate (5.84). The result can be generalized to larger systems of wave equations, where the matrix operator N is in triangular form with zero on the diagonal, or where it can be brought into this form by an appropriate transformation [267, 264].

Example 28. As an application of the theory for systems of wave equations, which are coupled through the boundary conditions, we discuss Maxwell’s equations in their potential formulation on the half space Σ [267]. In the Lorentz gauge and the absence of sources, this system is described by four wave equations μμAν = 0 for the components (At, Ax, Ay, Az) of the vector potential Aμ, which are subject to the constraint C:= μAμ = 0, where we use the Einstein summation convention.

As a consequence of the wave equation for Aν, the constraint variable C also satisfies the wave equation, μμC = 0. Therefore, the constraint is correctly propagated if the initial data is chosen such that C and its first time derivative vanish, and if C is set to zero at the boundary. Setting C = 0 at the boundary amounts in the following condition for Aν at x = 0:

$${{\partial {A_t}} \over {\partial t}} = {{\partial {A_x}} \over {\partial x}} + {{\partial {A_y}} \over {\partial y}} + {{\partial {A_z}} \over {\partial z}},$$
(5.87)

which can be rewritten as

$$\left({{\partial \over {\partial t}} - {\partial \over {\partial x}}} \right)({A_t} + {A_x}) = - \left({{\partial \over {\partial t}} + {\partial \over {\partial x}}} \right)({A_t} - {A_x}) + 2{\partial \over {\partial y}}{A_y} + 2{\partial \over {\partial z}}{A_z}.$$
(5.88)

Together with the boundary conditions

$$\begin{array}{*{20}c} {\left({{\partial \over {\partial t}} - {\partial \over {\partial x}}} \right)({A_t} - {A_x}) = 0,\quad \quad \quad \quad \quad \quad \quad} \\ {\left({{\partial \over {\partial t}} - {\partial \over {\partial x}}} \right){A_y} = {\partial \over {\partial y}}({A_t} - {A_x}),} \\ {\left({{\partial \over {\partial t}} - {\partial \over {\partial x}}} \right){A_z} = {\partial \over {\partial z}}({A_t} - {A_x}),} \\ \end{array}$$

this yields a system of the form of Eq. (5.82) with N having the required triangular form, where v is the four-component vector function v = (AtAx, Ay, Az, At + Ax). Notice that the Sommerfeldlike boundary conditions on Ay and Az set the gauge-invariant quantities Ey + Bz and EzBy to zero, where E and B are the electric and magnetic fields, which is compatible with an outgoing plane wave traveling in the normal direction to the boundary.

For a recent development based on the Laplace method, which allows the treatment of second-order IBVPs with more general classes of boundary conditions, including those admitting boundary phenomena like glancing and surface waves; see [262].

5.2 Maximal dissipative boundary conditions

An alternative technique for specifying boundary conditions, which does not require Laplace-Fourier transformation and the use of pseudo-differential operators when generalizing to variable coefficients, is based on energy estimates. In order to understand this, we go back to Section 3.2.3, where we discussed such estimates for linear, first-order symmetric hyperbolic evolution equations with symmetrizer H(t,x). We obtained the estimate (3.107), bounding the energy \(E({\Sigma _t}) = \int\nolimits_{{\Sigma _t}} {{J^0}(t,x){d^n}x}\) at any time t ∈ [0, T] in terms of the initial energy E0), provided that the flux integral

$$\int\limits_{\mathcal T} {{e_\mu}} {J^\mu}(t,x)dS,\qquad {J^\mu}(t,x): = - u{(t,x)^{\ast}}H(t,x){A^\mu}(t,x)u(t,x)$$
(5.89)

was nonnegative. Here, the boundary surface is \({\mathcal T} = [0,T] \times \partial \Sigma\), and its unit outward normal e = (0, s1, …, sn) is determined by the unit outward normal s to Σ. Therefore, the integral is nonnegative if

$$u{(t,x)^{\ast}}H(t,x){P_0}(t,x,s)u(t,x) \leq 0,\qquad (t,x) \in {\mathcal T},$$
(5.90)

where \({P_0}(t,x,s) = \sum\limits_{j = 1}^n {{A^j}(t,x){s_j}}\) is the principal symbol in the direction of the unit normal s. Hence, the idea is to specify homogeneous boundary conditions, b(t, x)u = 0 at \({\mathcal T}\), such that the condition (5.90) is satisfied.Footnote 22 In this case, one obtains an a priori energy estimate as in Section 3.2.3. Of course, there are many possible choices for b(t, x), which fulfill the condition (5.90); however, an additional requirement is that one should not overdetermine the IBVP. For example, setting all the components of u to zero at the boundary does not lead to a well-posed problem if there are outgoing modes, as discussed in Section 5.1.1 for the constant coefficient case. Correct boundary conditions turn out to be a minimal condition on u for which the inequality (5.90) holds. In other words, at the boundary surface, u has to be restricted to a space for which Eq. (5.90) holds and which cannot be extended. The precise definition, which captures this idea, is:

Definition 8. Denote for each boundary point \(p = (t,x) \in {\mathcal T}\) the boundary space

$${V_p}: = \{u \in {\mathbb C^m}:b(t,x)u = 0\} \subset {\mathbb C^m}$$
(5.91)

of state vectors satisfying the homogeneous boundary condition. Vp is called maximal nonpositive if

  1. (i)

    u*H(t, x)P0(t, x, s)u ≤ 0 for all uVp,

  2. (ii)

    Vp is maximal with respect to condition (i); that is, if WpVp is a linear subspace ofm containing Vp, which satisfies (i), then Wp = Vp.

The boundary condition b(t, x)u = g(t, x) is called maximal dissipative if the associated boundary spaces Vp are maximal nonpositive for all \(p \in {\mathcal T}\).

Maximal dissipative boundary conditions were proposed in [189, 275] in the context of symmetric positive operators, which include symmetric hyperbolic operators as a special case. With such boundary conditions, the IBVP is well posed in the following sense:

Definition 9. Consider the linearized version of the IBVP ( 5.1 , 5.2 , 5.3 ), where the matrix functions Aj(t, x) and b(t, x) and the vector function F(t, x) do not depend on u. It is called well posed if there are constants K = K(T) and ε = ε(T) ≥ 0 such that each compatible data \(f \in C_b^\infty (\Sigma, {{\rm{\mathbb C}}^m})\) and \(g \in C_b^\infty ([0,T) \times \partial \Sigma, {{\rm{\mathbb C}}^r})\)) gives rise to a unique C∞-solution u satisfying the estimate

$$\Vert u(t,\cdot)\Vert _{{L^2}(\Sigma)}^2 + \varepsilon \int\limits_0^t \vert \vert u(s,\cdot)\Vert _{{L^2}(\partial \Sigma)}^2ds \leq {K^2}\left[ {\Vert f\Vert _{{L^2}(\Sigma)}^2 + \int\limits_0^t {\left({\Vert F(s,\cdot)\Vert _{{L^2}(\Sigma)}^2 + \Vert g(s,\cdot)\Vert _{{L^2}(\partial \Sigma)}^2} \right)} ds} \right],$$
(5.92)

for all t ∈ [0, T]. If, in addition, the constant ε > 0 can be chosen strictly positive, the problem is called strongly well posed.

This definition strengthens the corresponding definition in the Laplace analysis, where trivial initial data was assumed and only a time-integral of the L2(Σ)-norm of the solution could be estimated (see Definition 6). The main result of the theory of maximal dissipative boundary conditions is:

Theorem 7. Consider the linearized version of the IBVP ( 5.1 , 5.2 , 5.3 ), where the matrix functions Aj(t, x) and b(t, x) and the vector function F(t, x) do not depend on u. Suppose the system is symmetric hyperbolic, and that the boundary conditions (5.3) are maximal dissipative. Suppose, furthermore, that the rank of the boundary matrix P0(t, x, s) is constant in \((t,x) \in {\mathcal T}\).

Then, the problem is well posed in the sense of Definition 9. Furthermore, it is strongly well posed if the boundary matrix P0(t, x, s) is invertible.

This theorem was first proven in [189, 275, 344] for the case where the boundary surface \({\mathcal T}\) is non-characteristic, that is, the boundary matrix P0(t, x, s) is invertible for all \((t,x) \in {\mathcal T}\). A difficulty with the characteristic case is the loss of derivatives of u in the normal direction to the boundary (see [422]). This case was studied in [293, 343, 387], culminating with the regularity theorem in [387], which is based on special function spaces, which control the L2-norms of 2k tangential derivatives and k normal derivatives at the boundary (see also [389]). For generalizations of Theorem 7 to the quasilinear case; see [218, 388].

A more practical way of characterizing maximal dissipative boundary conditions is the following. Fix a boundary point \((t,x) \in {\mathcal T}\), and define the scalar product (·,·) by (u, v):= u*H(t, x)v, u, v ∈m. Since the boundary matrix P0(t, x, s) is Hermitian with respect to this scalar product, there exists a basis e1, e2, …, em of eigenvectors of P0(t, x, s), which are orthonormal with respect to (·, ·). Let λ1, λ2, …, λm be the corresponding eigenvalues, where we might assume that the first r of these eigenvalues are strictly positive, and the last s are strictly negative. We can expand any vector u ∈m as \(u = \sum\limits_{j = 1}^m {{u^{(j)}}} {e_j}\), the coefficients u(j) being the characteristic fields with associated speeds λj. Then, the condition (5.90) at the point p can be written as

$$0 \geq (u,{P_0}(t,x,s)u) = \sum\limits_{j = 1}^m {{\lambda _j}} \vert {u^{(j)}}{\vert ^2} = \sum\limits_{j = 1}^r {{\lambda _j}} \vert {u^{(j)}}{\vert ^2} - \sum\limits_{j = m - s + 1}^m \vert {\lambda _j}\Vert {u^{(j)}}{\vert ^2},$$
(5.93)

where we have used the fact that λ1, …, λr > 0, λms+1, …, λm < 0 and the remaining λj’s are zero. Therefore, a maximal dissipative boundary condition must have the form

$${u_ +} = q{u_ -},\qquad {u_ +}: = \left({\begin{array}{*{20}c} {{u^{(1)}}} \\ \ldots \\ {{u^{(r)}}} \\ \end{array}} \right),\qquad {u_ -}: = \left({\begin{array}{*{20}c} {{u^{(m - s + 1)}}} \\ \ldots \\ {{u^{(m)}}} \\ \end{array}} \right),$$
(5.94)

with q a complex r × s matrix, since u = 0 must imply u+ = 0. Furthermore, the matrix q has to be small enough such that the inequality (5.93) holds. There can be no further conditions since an additional, independent condition on u would violate the maximality of the boundary space Vp.

In conclusion, a maximal dissipative boundary condition must have the form of Eq. (5.94), which describes a linear coupling of the outgoing characteristic fields u to the incoming ones, u+. In particular, there are exactly as many independent boundary conditions as there are incoming fields, in agreement with the Laplace analysis in Section 5.1.1. Furthermore, the boundary conditions must not involve the zero speed fields. The simplest choice for q is the trivial one, q = 0, in which case data for the incoming fields is specified. A nonzero value of q would be chosen if the boundary is to incorporate some reflecting properties, like the case of a perfect conducting surface in electromagnetism, for example.

Example 29. Consider the first-order reformulation of the Klein-Gordon equation for the variables u = (Φ, Φt, Φx, Φy); see Example 13. Suppose the spatial domain is x > 0, with the boundary located at x = 0. Then, s = (−1, 0) and the boundary matrix is

$${P_0}(s) =- \left( {\begin{array}{*{20}{c}} {0\;0\;0\;0} \\ {0\;0\;1\;0} \\ {0\;1\;0\;0} \\ {0\;0\;0\;0} \end{array}} \right).$$
(5.95)

Therefore, the characteristic fields and speeds are Φ, Φy (zero speed fields, λ = 0), Φt − Φx(incoming field with speed λ = 1) and Φt + Φx (outgoing field with speed λ = −1). It follows from Eqs. (5.93, 5.94) that the class of maximal dissipative boundary conditions is

$$({\Phi _t} - {\Phi _x}) = q(t,y)({\Phi _t} + {\Phi _x}) + g(t,y),\qquad t \geq 0,\quad y \in {\mathbb {R}},$$
(5.96)

where the function q satisfies ∣q(t, y)∣ ≤ 1 and g is smooth boundary data. Particular cases are:

  • q = 0: Sommerfeld boundary condition,

  • q = −1: Dirichlet boundary condition,

  • q = 1: Neumann boundary condition.

Example 30. For Maxwell’s equations on a domain Σ ⊂ ℝ3 with C-boundary Σ, the boundary matrix is given by

$${P_0}(s)\left({\begin{array}{*{20}c} E \\ B \\ \end{array}} \right) = \left({\begin{array}{*{20}c} {+ s \wedge B} \\ {- s \wedge E} \\ \end{array}} \right);$$
(5.97)

see Example 14. In terms of the components E of E parallel to the boundary surface Σ, and the ones E, which are orthogonal to it (and, hence, parallel to s) the characteristic speeds and fields are

$$\begin{array}{*{20}c} {0:\,\,{E_ \bot},\quad {B_ \bot},} \\ {\pm 1:\,\,{E_{\Vert}} \pm s \wedge {B_{\Vert}}.} \\ \end{array}$$

Therefore, maximal dissipative boundary conditions have the form

$$({E_{\Vert}} + s \wedge {B_{\Vert}}) = q({E_{\Vert}} - s \wedge {B_{\Vert}}) + {g_{\Vert}},$$
(5.98)

with g some smooth vector-valued function at the boundary, which is parallel to Σ, and q a matrix-valued function satisfying the condition ∣q∣ ≤ 1. Particular cases are:

  • q = −1, g = 0: The boundary condition E = 0 describes a perfectly conducting boundary surface.

  • q = 0, g =0: This is a Sommerfeld-type boundary condition, which, locally, is transparent to outgoing plane waves traveling in the normal direction s,

    $$E(t,x) = {\mathcal E}{e^{i(\omega t - k\cdot x)}},\qquad B(t,x) = s \wedge E(t,x),$$
    (5.99)

    where ω the frequency, k = ωs the wave vector, and \({\mathcal E}\) the polarization vector, which is orthogonal to k. The generalization of this boundary condition to inhomogeneous data g ≠ 0 allows one to specify data on the incoming field E +s Λ B at the boundary surface, which is equal to \(2{\mathcal E}{e^{i\omega t}}\) for the plane waves traveling in the normal inward direction −s.

Recall that the constraints ∇ · E = ρ and ∇ · B = 0 propagate along the time evolution vector field ∂t, (∇ · Eρ)t = 0, (∇ · B)t = 0, provided the continuity equation holds. Since t is tangent to the boundary, no additional conditions controlling the constraints must be specified at the boundary; the constraints are automatically satisfied everywhere provided they are satisfied on the initial surface.

Example 31. Commonly, one writes Maxwell’s equations as a system of wave equations for the electromagnetic potential Aμ in the Lorentz gauge, as discussed in Example 28. By reducing the problem to a first-order symmetric hyperbolic system, one may wonder if it is possible to apply the theory of maximal dissipative boundary conditions and obtain a well-posed IBVP, as in the previous example. As we shall see in Section 5.2.1, the answer is affirmative, but the correct application of the theory is not completely straightforward. In order to illustrate why this is the case, introduce the new independent fields Dμν:= μAν. Then, the set of wave equations can be rewritten as the first-order system for the 20-component vector (Aν, D, Djν), j = x, y, z,

$${\partial _t}{A_\nu} = {D_{t\nu}},\qquad {\partial _t}{D_{t\nu}} = {\partial ^j}{D_{j\nu}},\qquad {\partial _t}{D_{j\nu}} = {\partial _j}{D_{t\nu}},$$
(5.100)

which is symmetric hyperbolic. The characteristic fields with respect to the unit outward normal s = (−1, 0, 0) at the boundary are

$$\begin{array}{*{20}c} {{D_{t\nu}} - {D_{x\nu}} = ({\partial _t} - {\partial _x}){A_\nu}\,\,\,{\rm{(incoming field)}},\quad} \\ {{D_{t\nu}} + {D_{x\nu}} = ({\partial _t} + {\partial _x}){A_\nu}\,\,\,{\rm{(outgoing field)}},\quad} \\ {\quad \quad \quad \quad \quad {D_{y\nu}} = {\partial _y}{A_\nu}\,\,\,{\rm{(zero speed field)}},\,\,\,} \\ {\quad \quad \quad \quad \quad {D_{z\nu}} = {\partial _z}{A_\nu}\,\,\,{\rm{(zero speed field)}}.\,} \\ \end{array}$$

According to Eq. (5.88) we can rewrite the Lorentz constraint in the following way:

$$({D_{tt}} - {D_{xt}}) + ({D_{tx}} - {D_{xx}}) = - ({D_{tt}} + {D_{xt}}) + ({D_{tx}} + {D_{xx}}) + 2{D_{yy}} + 2{D_{zz}}.$$
(5.101)

The problem is that when written in terms of the characteristic fields, the Lorentz constraint not only depends on the in- and outgoing fields, but also on the zero speed fields Dyy and Dzz. Therefore, imposing the constraint on the boundary in order to guarantee constraint preservation leads to a boundary condition, which couples the incoming fields to outgoing and zero speed fields,Footnote 23, which does not fall in the class of admissible boundary conditions.

At this point, one might ask why we were able to formulate a well-posed IBVP based on the second-order formulation in Example 28, while the first-order reduction discussed here fails. As we shall see, the reason for this is that there exist many first-order reductions, which are inequivalent to each other, and a slightly more sophisticated reduction works, while the simplest choice adopted here does not. See also [354, 14] for well-posed formulations of the IBVP in electromagnetism based on the potential formulation in a different gauge.

Example 32. A generalization of Maxwell’s equations is the evolution system

$${\partial _t}{E_{ij}} = - {\varepsilon _{kl(i}}{\partial ^k}{B^l}_{j)},$$
(5.102)
$${\partial _t}{B_{ij}} = + {\varepsilon _{kl(i}}{\partial ^k}{E^l}_{j)},$$
(5.103)

for the symmetric, trace-free tensor fields Eij and Bij, where here we use the Einstein summation convention, the indices i,j,k,l run over 1,2,3, (ij) denotes symmetrization over ij, and εijk is the totally antisymmetric tensor with ε123 = 1. Notice that the right-hand sides of Eqs. (5.102, 5.103) are symmetric and trace-free, such that one can consistently assume that \({E^i}_i = {B^i}_i = 0\). The evolution system (5.102, 5.103), which is symmetric hyperbolic with respect to the trivial symmetrizer, describes the propagation of the electric and magnetic parts of the Weyl tensor for linearized gravity on a Minkowski background; see, for instance, [182].

Decomposing Eij into its parts parallel and orthogonal to the unit outward normal s,

$${E_{ij}} = \bar E\left({{s_i}{s_j} - {1 \over 2}{\gamma _{ij}}} \right) + 2{s_{(i}}{\bar E_{j)}} + {\hat E_{ij}},$$
(5.104)

where \({\gamma _{ij}}: = {\delta _{ij}} - {s_i}{s_j}\bar E: = {s^i}{s^j}{E_{ij}},{{\bar E}_i}:\gamma _i^k{E_{kj}}{s^j},{{\hat E}_{ij}}: = (\gamma _i^k\gamma _j^l - {\gamma _{ij}}{\gamma ^{kl}}/2){E_{kl}}\), and similarly for Bij, the eigenvalue problem λu = P0(s)u for the boundary matrix is

$$\begin{array}{*{20}c} {\lambda \bar E = 0,} \\ {\lambda \bar B = 0,} \\ {\lambda {{\bar E}_i} = - {1 \over 2}{\varepsilon _{kli}}{s^k}{{\bar B}^l},} \\ {\lambda {{\bar B}_i} = + {1 \over 2}{\varepsilon _{kli}}{s^k}{{\bar E}^l},} \\ {\lambda {{\hat E}_{ij}} = - {\varepsilon _{kl(i}}{s^k}{{\hat B}^l}_{j)},} \\ {\lambda {{\hat B}_{ij}} = + {\varepsilon _{kl(i}}{s^k}{{\hat E}^l}_{j)},} \\ \end{array}$$

from which one obtains the following characteristic speeds and fields,

$$\begin{array}{*{20}c} {0:\bar E,\quad \bar B,\quad \,\,} \\ {\pm {1 \over 2}:{{\bar E}_i} \mp {\varepsilon _{kli}}{s^k}{{\bar B}^l},} \\ {\quad \pm 1:{{\hat E}_{ij}} \mp {\varepsilon _{kl(i}}{s^k}{{\hat B}^l}_{j)}.} \\ \end{array}$$

Similar to the Maxwell case, the boundary condition \({{\hat E}_{ij}} - {\varepsilon _{kl(i}}{s^k}{{\hat B}^l}_{j)} = 0\) on the incoming, symmetric trace-free characteristic field is, locally, transparent to outgoing linear gravitational plane waves traveling in the normal direction s. In fact, this condition is equivalent to setting the complex Weyl scalar Ψ0 computed from the adapted, complex null tetrad K:= t + s, L := ts, Q, \({\bar Q}\), to zero at the boundary surface.Footnote 24 Variants of this condition have been proposed in the literature in the context of the IBVP for Einstein’s field equations in order to approximately control the incoming gravitational radiation; see [187, 40, 253, 378, 363, 309, 286, 384, 366].

However, one also needs to control the incoming field \({{\bar E}_i} - {\varepsilon _{kli}}{s^k}{{\bar B}^l}\) at the boundary. This field, which propagates with speed 1/2, is related to the constraints in the theory. Like in electromagnetism, the fields Eij and Bij are subject to the divergence constraints Pj:= iEij = 0, Qj:= iBij = 0. However, unlike the Maxwell case, these constraints do not propagate trivially. As a consequence of the evolution equations (5.102, 5.103), the constraint fields Pj and Qj obey

$${\partial _t}{P_j} = - {1 \over 2}{\varepsilon _{jkl}}{\partial ^k}{Q^l},\qquad {\partial _t}{Q_j} = + {1 \over 2}{\varepsilon _{jkl}}{\partial ^k}{P^l},$$
(5.105)

which is equivalent to Maxwell’s equations except that the propagation speed for the transverse modes is 1/2 instead of 1. Therefore, guaranteeing constraint propagation requires specifying homogeneous maximal dissipative boundary conditions for this system, which have the form of Eq. (5.98) with EP, B ↦ −Q and g = 0. A problem is that this yields conditions involving first derivatives of the fields Eij and Bij, when rewritten as a boundary condition for the main system (5.102, 5.103). Except in some particular cases involving totally-reflecting boundaries, it is not possible to cast these conditions into maximal dissipative form.

A solution to this problem has been presented in [181] and [187], where a similar system appears in the context of the IBVP for Einstein’s field equations for solutions with anti-de Sitter asymptotics, or for solutions with an artificial boundary, respectively. The method consists in modifying the evolution system (5.102, 5.103) by using the constraint equations Pj = Qj = 0 in such a way that the constraint fields for the resulting boundary adapted system propagate along t at the boundary surface. In order to describe this system, extend s to a smooth vector field on Σ with the property that ∣s∣ ≤ 1. Then, the boundary-adapted system reads:

$${\partial _t}{E_{ij}} = - {\varepsilon _{kl(i}}{\partial ^k}{B^l}_{j)} + {s_{(i}}{\varepsilon _{j)kl}}{s^k}{Q^l},$$
(5.106)
$${\partial _t}{B_{ij}} = + {\varepsilon _{kl(i}}{\partial ^k}{E^l}_{j)} - {s_{(i}}{\varepsilon _{j)kl}}{s^k}{P^l}.$$
(5.107)

This system is symmetric hyperbolic, and the characteristic fields in the normal direction are identical to the unmodified system with the important difference that the fields \({{\bar E}_i} \mp {\varepsilon _{kli}}{s^k}{{\bar B}^l}\) now propagate with zero speed. The induced evolution system for the constraint fields is symmetric hyperbolic, and has a trivial boundary matrix. As a consequence, the constraints propagate tangentially to the boundary surface, and no extra boundary conditions for controlling the constraints must be specified.

5.2.1 Application to systems of wave equations

As anticipated in Example 31, the theory of symmetric hyperbolic first-order equations with maximal dissipative boundary conditions can also be used to formulate well-posed IBVP for systems of wave equations, which are coupled through the boundary conditions, as already discussed in Section 5.1.3 based on the Laplace method. Again, the key idea is to show strong well-posedness; that is, an a priori estimate, which controls the first derivatives of the fields in the bulk and at the boundary.

In order to explain how this is performed, we consider the simple case of the Klein-Gordon equation Φtt = ΔΦ − m2Φ on the half plane Σ:= {(x, y) 2: x > 0}. In Example 13 we reduced the problem to a first-order symmetric hyperbolic system for the variables u = (Φ, Φt, Φx, Φy) with symmetrizer H = diag(m2, 1, 1, 1), and in Example 29 we determined the class of maximal dissipative boundary conditions for this first-order reduction. Consider the particular case of Sommerfeld boundary conditions, where Φt = Φx is specified at x = 0. Then, Eq. (3.103) gives the following conservation law,

$$E({\Sigma _T}) = E({\Sigma _0}) + \int\limits_0^T {\int\limits_{\mathbb {R}} {{{\left. {{u^{\ast}}H{P_0}(s)u} \right\vert}_{x = 0}}}} dy\,dt,$$
(5.108)

where \(E({\Sigma _t}) = \int\nolimits_{{\Sigma _t}} {{u^{\ast}}H\,u\,dxdy} = \int\nolimits_{{\Sigma _t}} {({m^2}\vert \Phi {\vert ^2} + \vert {\Phi _t}{\vert ^2} + \vert {\Phi _x}{\vert ^2} + \vert {\Phi _y}{\vert ^2})dxdy}\), and u*HP0(s)u = −2Re(Φ*tΦx); see Example 29. Using the Sommerfeld boundary condition, we may rewrite −2Re(Φ*tΦx) = −(∣Φt2 + ∣Φx2), and obtain the energy equality

$$E({\Sigma _T}) + \int\limits_0^T {\int\limits_{\mathbb {R}} {{{\left[ {\vert {\Phi _t}{\vert ^2} + \vert {\Phi _x}{\vert ^2}} \right]}_{x = 0}}}} dy\,dt = E({\Sigma _0}),$$
(5.109)

controlling the derivatives of Φt and Φx at the boundary surface. However, a weakness of this estimate is that it does not control the zero speed fields Φ and Φy at the boundary, and so one does not obtain strong well-posedness.

On the other hand, the first-order reduction is not unique, and as we show now, different reductions may lead to stronger estimates. For this, we choose a real constant b such that 0 < b ≤ 1/2 and define the new fields ū := (Φ, ΦtbΦx, Φx, Φy), which yield the symmetric hyperbolic system

$${\bar u_t} = \left({\begin{array}{*{20}c} b & 0 & 0 & 0 \\ 0 & {- b} & {1 - {b^2}} & 0 \\ 0 & 1 & b & 0 \\ 0 & 0 & 0 & b \\ \end{array}} \right){\bar u_x} + \left({\begin{array}{*{20}c} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ \end{array}} \right){\bar u_y} + \left({\begin{array}{*{20}c} 0 & 1 & 0 & 0 \\ {- {m^2}} & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ \end{array}} \right)\bar u,$$
(5.110)

with symmetrizer \(\bar H = {\rm{diag(}}{m^2},1,1 - {b^2},1)\). The characteristic fields in terms of Φ and its derivatives are Φ, Φy, Φt + Φx, and Φt − Φx, as before. However, the fields now have characteristic speeds −b, −b, −1, +1, respectively, whereas in the previous reduction they were 0, 0, −1, +1. Therefore, the effect of the new reduction versus the old one is to shift the speeds of the zero speed fields, and to convert them to outgoing fields with speed −b. Notice that the Sommerfeld boundary condition Φt = Φx is still maximal dissipative with respect to the new reduction. Repeating the energy estimates again leads to a conservation law of the form (5.108), but where now the energy and flux quantities are \(E({\Sigma _t}) = \int\nolimits_{{\Sigma _t}} {{{\bar u}^{\ast}}\bar H\,\bar u\,dx\,dy} = \int\nolimits_{{\Sigma _t}} {({m^2}\vert \Phi {\vert ^2} + \vert {\Phi _t} - b{\Phi _x}{\vert ^2} + (1 - {b^2})\vert {\Phi _x}{\vert ^2} + \vert {\Phi _y}{\vert ^2})dx\,dy}\) dxdy and

$${\bar u^{\ast}}\bar H{P_0}(s)\bar u = - b\left[ {{m^2}\vert \Phi {\vert ^2} + \vert {\Phi _t}{\vert ^2} + \vert {\Phi _x}{\vert ^2} + \vert {\Phi _y}{\vert ^2}} \right] + 2b\left[ {\vert {\Phi _t}{\vert ^2} + \vert {\Phi _x}{\vert ^2}} \right] - 2{\rm Re}(\Phi _t^{\ast}{\Phi _x}).$$
(5.111)

Imposing the boundary condition Φt = Φx at x = 0 and using 2b ≤ 1 leads to the energy estimate

$$E({\Sigma _T}) + b\int\limits_0^T {\int\limits_{\mathbb {R}} {{{\left[ {{m^2}\vert \Phi {\vert ^2} + \vert {\Phi _t}{\vert ^2} + \vert {\Phi _x}{\vert ^2} + \vert {\Phi _y}{\vert ^2}} \right]}_{x = 0}}}} dy\,dt \leq E({\Sigma _0}),$$
(5.112)

controlling Φ and all its first derivatives at the boundary surface.

Summarizing, we have seen that the most straightforward first-order reduction of the Klein-Gordon equation does not lead to strong well-posedness. However, strong well-posedness can be obtained by choosing a more sophisticated reduction, in which the time-derivative of Φ is replaced by its derivative ΦtbΦx along the time-like vector (1, −b), which is pointing outside the domain at the boundary surface. In fact, it is possible to obtain a symmetric hyperbolic reduction leading to strong well-posedness for any future-directed time-like vector field u, which is pointing outside the domain at the boundary. Based on the geometric definition of first-order symmetric hyperbolic systems in [205], it is possible to generalize this result to systems of quasilinear wave equations on curved backgrounds [264].

In order to describe the result in [264], let π: EM be a vector bundle over \(M = [0,T] \times \bar \Sigma\) with fiber ℝN; let ∇μ be a fixed, given connection on E and let gμν = gμν(Φ) be a Lorentz metric on M with inverse gμν(Φ), which depends pointwise and smoothly on a vector-valued function Φ = {ΦA}A=1,2, …,N, parameterizing a local section of E. Assume that each time-slice Σt = {t} × Σ is space-like and that the boundary \({\mathcal T} = [0,T] \times \partial \Sigma\) is time-like with respect to gμν(Φ). We consider a system of quasilinear wave equations of the form

$${g^{\mu \nu}}(\Phi){\nabla _\mu}{\nabla _\nu}{\Phi ^A} = {F^A}(\Phi ,\nabla \Phi),$$
(5.113)

where FA(Φ, ∇Φ) is a vector-valued function, which depends pointwise and smoothly on its arguments. The wave system (5.113) is subject to the initial conditions

$${\left. {{\Phi ^A}} \right\vert _{{\Sigma _0}}} = \Phi _0^A\,,\qquad {\left. {{n^\mu}{\nabla _\mu}{\Phi ^A}} \right\vert _{{\Sigma _0}}} = \Pi _0^A\,,$$
(5.114)

where \(\Phi _0^A\) and \(\Pi _0^A\) are given vector-valued functions on Σ0, and where nμ = nμ(Φ) denotes the future-directed unit normal to Σ0 with respect to gμν. In order to describe the boundary conditions, let \({T^\mu} = {T^\mu}(p,\Phi),p \in {\mathcal T}\), be a future-directed vector field on \({\mathcal T}\), which is normalized with respect to gμν, and let Nμ = Nμ(p, Φ) be the unit outward normal to \({\mathcal T}\) with respect to the metric gμν. We consider boundary conditions on \({\mathcal T}\) of the following form

$${\left. {\left[ {{T^\mu} + \alpha {N^\mu}} \right]{\nabla _\mu}{\Phi ^A}} \right\vert _{\mathcal T}} = {c^{\mu \,A}}_B{\left. {{\nabla _\mu}{\Phi ^B}} \right\vert _{\mathcal T}} + {d^A}_B{\left. {{\Phi ^B}} \right\vert _{\mathcal T}} + {G^A},$$
(5.115)

where α = α(p, Φ) > 0 is a strictly positive, smooth function, GA = GA(p) is a given, vector-valued function on \({\mathcal T}\) and the matrix coefficients \({c^{\mu A}}_B = {c^{\mu A}}_B(p,\Phi)\) and \({d^A}_B = {d^A}_B(p,\Phi)\) are smooth functions of their arguments. Furthermore, we assume that \({c^\mu}{^A_B}\) satisfies the following property. Given a local trivialization φ: U × ℝNπ −1(U) of E such that ŪM is compact and contains a portion \({\mathcal U}\) of the boundary \({\mathcal T}\), there exists a smooth map J: UGL(N, ℝ), \(p \mapsto ({J^A}_B(p))\) such that the transformed matrix coefficients

$${\tilde c^{\mu \,A}}_B: = {J^A}_C{c^{\mu \,C}}_D{\left({{J^{- 1}}} \right)^D}_B$$
(5.116)

are in upper triangular form with zeroes on the diagonal, that is

$${\tilde c^{\mu \,A}}_B = 0,\qquad B \leq A.$$
(5.117)

Theorem 8. [264] The IBVP ( 5.113 , 5.114 , 5.115 ) is well posed. Given T > 0 and sufficiently small and smooth initial and boundary data \(\Phi _0^A,\Pi _0^A\) and GA satisfying the compatibility conditions at the edge S = {0} × Σ, there exists a unique smooth solution on M satisfying the evolution equation (5.113) , the initial condition (5.114) and the boundary condition (5.115) . Furthermore, the solution depends continuously on the initial and boundary data.

Theorem 8 provides the general framework for treating wave systems with constraints, such as Maxwell’s equations in the Lorentz gauge and, as we will see in Section 6.1, Einstein’s field equations with artificial outer boundaries.

5.2.2 Existence of weak solutions and the adjoint problem

Here, we show how to prove the existence of weak solutions for linear, symmetric hyperbolic equations with variable coefficients and maximal dissipative boundary conditions. The method can also be applied to a more general class of linear symmetric operators with maximal dissipative boundary conditions; see [189, 275]. The proof below will shed some light on the maximality condition for the boundary space Vp.

Our starting point is an IBVP of the form (5.1, 5.2, 5.3), where the matrix functions Aj(t, x) and b(t, x) do not depend on u, and where F(t, x, u) is replaced by B(t, x)u + F(t, x), such that the system is linear. Furthermore, we can assume that the initial and boundary data is trivial, f = 0, g = 0. We require the system to be symmetric hyperbolic with symmetrizer H(t, x) satisfying the conditions in Definition 4(iii), and assume the boundary conditions (5.3) are maximal dissipative. We rewrite the IBVP on ΩT:= [0, T] × Σ as the abstract linear problem

$$- Lu = F,$$
(5.118)

where L: D(L) ⊂ XX is the linear operator on the Hilbert space X:= L2(ΩT) defined by the evolution equation and the initial and boundary conditions:

$$\begin{array}{*{20}c} {D(L): = \{u \in C_b^\infty ({\Omega _T}):u(p) = 0\,{\rm{for}}\,{\rm{all}}\,p \in {\Sigma _0}\,{\rm{and}}\,u(p) \in {V_p}\,{\rm{for}}\,{\rm{all}}\,p \in {\mathcal T}\} ,\,} \\ {Lu: = \sum\limits_{\mu = 0}^n {{A^\mu}} (t,x){{\partial u} \over {\partial {x^\mu}}} + B(t,x)u,\qquad u \in D(L),\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \,\,} \\ \end{array}$$

where we have defined A0:= − I and x0:= t, where Vp = {u ∈m : b(t, x)u = 0} is the boundary space, and where Σ0:= {0} × Σ, ΣT:= {T} × Σ and \({\mathcal T}: = [0,T] \times \partial \Sigma\) denote the initial, the final and the boundary surface, respectively.

For the following, the adjoint IBVP plays an important role. This problem is defined as follows. First, the symmetrizer defines a natural scalar product on X,

$${\langle v,\,u\rangle _H}: = \int\limits_{{\Omega _T}} {{v^{\ast}}} (t,\,x)H(t,\,x)u(t,\,x)\,dt\,{d^n}x,\qquad u,\,v \in X,$$
(5.119)

which, because of the properties of H, is equivalent to the standard scalar product on L2T). In order to obtain the adjoint problem, we take u ∈ D(L) and \(\upsilon \in C_b^\infty ({\Omega _T})\), and use Gauss’s theorem to find

$${\langle v,Lu\rangle_H} = \langle {L^{\ast}}v,{u\rangle}_H + \int\limits_{{\Sigma _0}} {{v^{\ast}}} H(t,x)u\,{d^n}x - \int\limits_{{\Sigma _T}} {{v^{\ast}}} H(t,x)u\,{d^n}x + \int\limits_{\mathcal T} {{v^{\ast}}} H(t,x){P_0}(t,x,s)u\,dS,$$
(5.120)

where we have defined the formal adjoint L*: D(L*) ⊂ XX of L by

$${L^{\ast}}v: = - \sum\limits_{\mu = 0}^n {{A^\mu}} (t,x){{\partial v} \over {\partial {x^\mu}}} - H{(t,x)^{- 1}}\sum\limits_{\mu = 0}^n {{{\partial [H(t,x){A^\mu}(t,x)]} \over {\partial {x^\mu}}}} v + H{(t,x)^{- 1}}B{(t,x)^{\ast}}H(t,x)v{.}$$
(5.121)

In order for the integrals on the right-hand side of Eq. (5.120) to vanish, such that 〈v, LuH = 〈L*v, uH, we first notice that the integral over Σ0 vanishes, because u = 0 on Σ0. The integral over ΣT also vanishes if we require v = 0 on ΣT. The last term also vanishes if we require v to lie in the dual boundary space

$$V_p^{\ast}: = \{v \in {{\mathbb {C}}^m}:{v^{\ast}}H(t,x){P_0}(t,x,s)u = 0\,{\rm{for}}\,{\rm{all}}\,u \in {V_p}\} ,$$
(5.122)

for each \(p \in {\mathcal T}\). Therefore, if we define

$$D({L^{\ast}}): = \{v \in C_b^\infty ({\Omega _T}):v(p) = 0\,{\rm{for}}\,{\rm{all}}\,p \in {\Sigma _T}\,{\rm{and}}\,v(p) \in V_p^{\ast}\,{\rm{for}}\,{\rm{all}}\,p \in {\mathcal T}\} ,$$
(5.123)

we have (v, Lu)H = 〈L*v, uH for all u ∈ D(L) and v ∈ D(L*); that is, the operator L* is adjoint to L. There is the following nice relation between the boundary spaces Vp and V*p:

Lemma 4. Let \(p \in {\mathcal T}\) be a boundary point. Then, Vp is maximal nonpositive if and only if V*p is maximal nonnegative.

Proof. Fix a boundary point \(p = (t,x) \in {\mathcal T}\) and define the matrix \({\mathcal B}: = H(t,x){P_0}(t,x,s)\) with s the unit outward normal to Σ at x. Since the system is symmetric hyperbolic, \({\mathcal B}\) is Hermitian. We decompose ℂm = E+EE0 into orthogonal subspaces E+, E, E0 on which \({\mathcal B}\) is positive, negative and zero, respectively. We equip E± with the scalar products (·,·)±, which are defined by

$${({u_ \pm},{v_ \pm})_ \pm}: = \pm u_ \pm ^{\ast}{\mathcal B}{v_ \pm},\qquad {u_ \pm},{v_ \pm} \in {E_ \pm}.$$
(5.124)

In particular, we have \({u^{\ast}}{{\mathcal B}_u} = {({u_ +},{u_ +})_ +} - {({u_ -},{u_ -})_ -}\) for all u ∈m. Therefore, if Vp is maximal nonpositive, there exists a linear transformation q: EE+ satisfying ∣qu+ ≤ ∣u for all u ∈ E, such that (cf. Eq. (5.94))

$${V_p} = \{u \in {{\mathbb {C}}^m}:{u_ +} = q{u_ -}\} .$$
(5.125)

Let v ∈ V*p. Then,

$$0 = {v^{\ast}}{\mathcal B}u = {({v_ +},{u_ +})_ +} - {({v_ -},{u_ -})_ -} = {({v_ +},q{u_ -})_ +} - {({v_ -},{u_ -})_ -} = {({q^\dagger}{v_ +},{u_ -})_ -} - {({v_ -},{u_ -})_ -}$$
(5.126)

for all u ∈ Vp, where q: E+E is the adjoint of q with respect to the scalar products (·,··)± defined on E±. Therefore, v = qv+, and

$$V_p^{\ast} = \{v \in {{\mathbb C}^m}:{v_ -} = {q^\dagger}{v_ +}\} .$$
(5.127)

Since q has the same norm as q, which is one, it follows that V*p is maximal nonnegative. The converse statement follows in an analogous way. □

The lemma implies that solving the original problem −Lu = F with u ∈ D(L) is equivalent to solving the adjoint problem L*v = F with v ∈ D(L*), which, since v(T, x) = 0 is held fixed at ΣT, corresponds to the time-reversed problem with the adjoint boundary conditions. From the a priori energy estimates we obtain:

Lemma 5. There is a constant δ = δ(T) such that

$$\Vert Lu\Vert _{H}\; \geq \delta \Vert u\Vert _{H},\qquad \Vert {L^{\ast}}v\Vert _{H}\; \geq \delta \Vert v\Vert _{H}$$
(5.128)

for all u ∈ D(L) and v ∈ D(L*), where ∥·H is the norm induced by the scalar product {·,·}H.

Proof. Let u ∈ D(L) and set F := −Lu. From the energy estimates in Section 3.2.3 one easily obtains

$$E({\Sigma _t}) \leq C\Vert F\Vert _H^2,\qquad 0 \leq t \leq T,$$
(5.129)

for some positive constants C depending on T. Integrating both sides from t = 0 to t = T gives

$$\Vert u\Vert _H^2\; \leq CT\Vert F\Vert _H^2\; = CT\Vert Lu\Vert _H^2,$$
(5.130)

which yields the statement for L setting δ:= (CT) −1/2. The estimate for L* follows from a similar energy estimate for the adjoint problem. □

In particular, Lemma 5 implies that (strong) solutions to the IBVP and its adjoint are unique. Since L and L* are closable operators [345], their closures \(\overline L\) and \(\overline {{L^\ast}}\) satisfy the same inequalities as in Eq. (5.128). Now we are ready to define weak solutions and to prove their existence:

Definition 10. uX is called a weak solution of the problem (5.118) if

$${\langle {L^{\ast}}v,u\rangle _H} = - {\langle v,F\rangle _H}$$
(5.131)

for all v ∈D(L*).

In order to prove the existence of such u ∈ X, we introduce the linear space \(Y = D(\overline {L\ast})\) and equip it with the scalar product 〈·, ·〉Y defined by

$${\langle v,w\rangle _Y}: = {\langle \overline {{L^{\ast}}} v,\overline {{L^{\ast}}} w\rangle _H},\qquad v,w \in Y.$$
(5.132)

The positivity of this product is a direct consequence of Lemma 5, and since \(\overline {L^\ast}\) is closed, Y defines a Hilbert space. Next, we define the linear form J: Y → ℂ on Y by

$$J(v): = - {\langle F,v\rangle _H}.$$
(5.133)

This form is bounded, according to Lemma 5,

$$\vert J(v)\vert \; \leq \;\Vert F{\Vert _H}\Vert v{\Vert _H}\; \leq {\delta ^{- 1}}\Vert F{\Vert _H}\Vert \overline {{L^{\ast}}} v{\Vert _H} = {\delta ^{- 1}}\Vert F{\Vert _H}\Vert v{\Vert _Y}$$
(5.134)

for all v ∈ Y. Therefore, according to the Riesz representation lemma there exists a unique w ∈ Y such that (w, ν)Y = J(v) for all vY. Setting \(u: = \overline {{L^{\ast}}} w \in X\) gives a weak solution of the problem.

If u ∈ X is a weak solution, which is sufficiently smooth, it follows from the Green type identity (5.120) that u has vanishing initial data and that it satisfies the required boundary conditions, and hence is a solution to the original IBVP (5.118). The difficult part is to show that a weak solution is indeed sufficiently regular for this conclusion to be made. See [189, 275, 344, 343, 387] for such “weak=strong” results.

5.3 Absorbing boundary conditions

When modeling isolated systems, the boundary conditions have to be chosen such that they minimize spurious reflections from the boundary surface. This means that inside the computational domain, the solution of the IBVP should lie as close as possible to the true solution of the Cauchy problem on the unbounded domain. In this sense, the dynamics outside the computational domain is replaced by appropriate conditions on a finite, artificial boundary. Clearly, this can only work in particular situations, where the solutions outside the domain are sufficiently simple so that they can be computed and used to construct boundary conditions, which are, at least, approximately compatible with them. Boundary conditions, which give rise to a well-posed IBVP and achieve this goal are called absorbing, non-reflecting or radiation boundary conditions in the literature, and there has been a substantial amount of work on the construction of such conditions for wave problems in acoustics, electromagnetism, meteorology, and solid geophysics (see [206] for a review). Some recent applications to general relativity are mentioned in Sections 6 and 10.3.1.

One approach in the construction of absorbing boundary conditions is based on suitable series or Fourier expansions of the solution, and derives a hierarchy of local boundary conditions with increasing order of accuracy [153, 46, 240]. Typically, such higher-order local boundary conditions involve solving differential equations at the boundary surface, where the order of the differential equation is increasing with the order of the accuracy. This problem can be dealt with by introducing auxiliary variables at the boundary surface [207, 208].

The starting point for a slightly different approach is an exact nonlocal boundary condition, which involves the convolution with an appropriate integral kernel. A method based on an efficient approximation of this integral kernel is then implemented; see, for instance, [16, 17] for the case of the 2D and 3D flat wave equations and [271, 270, 272] for the Regge-Wheeler [347] and Zerilli [453] equations describing linear gravitational waves on a Schwarzschild background. Although this method is robust, very accurate and stable, it is based on detailed knowledge of the solutions, which might not always be available in more general situations.

In the following, we illustrate some aspects of the problem of constructing absorbing boundary conditions on some simple examples [372]. Specifically, we construct local absorbing boundary conditions for the wave equation with a spherical outer boundary at radius R > 0.

5.3.1 The one-dimensional wave equation

Consider first the one-dimensional case,

$${u_{tt}} - {u_{xx}} = 0,\qquad \vert x\vert < R,\quad t > 0.$$
(5.135)

The general solution is a superposition of a left- and a right-moving solution,

$$u(t,x) = {f_ \nwarrow}(x + t) + {f_ \nearrow}(x - t).$$
(5.136)

Therefore, the boundary conditions

$$({b_ -}u)(t, - R) = 0,\qquad ({b_ +}u)(t, + R) = 0,\qquad {b_ \pm}: = {\partial \over {\partial t}} \pm {\partial \over {\partial x}},\qquad t > 0,$$
(5.137)

are perfectly absorbing according to our terminology. Indeed, the operator b+ has as its kernel the right-moving solutions f↗(xt); hence, the boundary condition (b+u)(t, R) = 0 at x = R is transparent to these solutions. On the other hand, b+f↖(t + x) = 2f(t + x), which implies that at x = R, the boundary condition requires that f↖(v) = f↖(1) is constant for advanced time ν = t + x > R. A similar argument shows that the left boundary condition (bu)(t, − R) = 0 implies that f↗(−u) = f↗(−R) is constant for retarded time u = tx > R. Together with initial conditions for u and its time derivative at t = 0 satisfying the compatibility conditions, Eqs. (5.135, 5.137) give rise to a well-posed IBVP. In particular, the solution is identically zero after one crossing time t ≥ 2R for initial data, which are compactly supported inside the interval (−R,R).

5.3.2 The three-dimensional wave equation

Generalizing the previous example to higher dimensions is a nontrivial task. This is due to the fact that there are infinitely many propagation directions for outgoing waves, and not just two as in the one-dimensional case. Ideally, one would like to control all the propagation directions k, which are outgoing at the boundary (k ··n > 0, where n is the unit outward normal to the boundary), but this is obviously difficult. Instead, one can try to control specific directions (starting with the one that is normal to the outer boundary). Here, we illustrate the method of [46] on the three-dimensional wave equation,

$${u_{tt}} - \Delta u = 0,\qquad \vert x\vert < R,\quad t > 0.$$
(5.138)

The general solution can be decomposed into spherical harmonics Yℓm according to

$$u(t,r,\vartheta ,\varphi) = {1 \over r}\sum\limits_{\ell = 0}^\infty {\sum\limits_{m = - \ell}^\ell {{u_{\ell m}}}} (t,r){Y^{\ell m}}(\vartheta ,\varphi),$$
(5.139)

which yields the family of reduced equations

$$\left[ {{{{\partial ^2}} \over {\partial {t^2}}} - {{{\partial ^2}} \over {\partial {r^2}}} + {{\ell (\ell + 1)} \over {{r^2}}}} \right]\;{u_{\ell m}}(t,r) = 0,\qquad 0 < r < R,\quad t > 0.$$
(5.140)

for = 0 this equation reduces to the one-dimensional wave equation, for which the general solution is u00(t,r) = U00↗(rt) + U00↖(r + t) with U00↗ and U00↖ two arbitrary functions. Therefore, the boundary condition

$${{\mathcal B}_0}:\qquad b(ru){\vert _{r = R}} = 0,\qquad b: = {r^2}\;\left({{\partial \over {\partial t}} + {\partial \over {\partial r}}} \right)\;,\qquad t > 0,$$
(5.141)

is perfectly absorbing for spherical waves. For ≥ 1, exact solutions can be generated from the solutions for = 0 by applying suitable differential operators to u00(t,r). For this, we define the operators [92]

$${a_\ell} \equiv {\partial \over {\partial r}} + {\ell \over r}\,,\qquad a_\ell ^\dagger \equiv - {\partial \over {\partial r}} + {\ell \over r},$$
(5.142)

which satisfy the operator identities

$${a_{\ell + 1}}a_{\ell + 1}^\dagger = a_\ell ^\dagger {a_\ell} = - {{{\partial ^2}} \over {\partial {r^2}}} + {{\ell (\ell + 1)} \over {{r^2}}}\;.$$
(5.143)

As a consequence, for each =1,2,3, …, we have

$$\begin{array}{*{20}c} {\left[ {{{{\partial ^2}} \over {\partial {t^2}}} - {{{\partial ^2}} \over {\partial {r^2}}} + {{\ell (\ell + 1)} \over {{r^2}}}} \right]a_\ell ^\dagger a_{\ell - 1}^\dagger \ldots a_1^\dagger = \left[ {{{{\partial ^2}} \over {\partial {t^2}}} + a_\ell ^\dagger {a_\ell}} \right]a_\ell ^\dagger a_{\ell - 1}^\dagger \ldots a_1^\dagger \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \;\;} \\ {= a_\ell ^\dagger \left[ {{{{\partial ^2}} \over {\partial {t^2}}} + a_{\ell - 1}^\dagger {a_{\ell - 1}}} \right]a_{\ell - 1}^\dagger \ldots a_1^\dagger} \\ {= a_\ell ^\dagger a_{\ell - 1}^\dagger \ldots a_1^\dagger \left[ {{{{\partial ^2}} \over {\partial {t^2}}} - {{{\partial ^2}} \over {\partial {r^2}}}} \right]\;.\quad \;\,} \\ \end{array}$$

Therefore, we have the explicit in- and outgoing solutions

$$\begin{array}{*{20}c} {{u_{\ell m \nwarrow}}(t,r) = a_\ell ^\dagger a_{\ell - 1}^\dagger \ldots a_1^\dagger {V_{\ell m}}(r + t) = \sum\limits_{j = 0}^\ell {{{(- 1)}^j}} {{(2\ell - j)!} \over {(\ell - j)!\,j!}}{{(2r)}^{j - \ell}}V_{\ell m}^{(j)}(r + t),\;} \\ {{u_{\ell m \nearrow}}(t,r) = a_\ell ^\dagger a_{\ell - 1}^\dagger \ldots a_1^\dagger {U_{\ell m}}(r - t) = \sum\limits_{j = 0}^\ell {{{(- 1)}^j}} {{(2\ell - j)!} \over {(\ell - j)!\,j!}}{{(2r)}^{j - \ell}}U_{\ell m}^{(j)}(r - t),} \\ \end{array}$$
(5.144)

where Vℓm and Uℓm are arbitrary smooth functions with j’th derivatives \(V_{\ell m}^{(j)}\) and \(U_{\ell m}^{(j)}\), respectively. In order to construct boundary conditions, which are perfectly absorbing for uℓm, one first notices the following identity:

$${b^{\ell + 1}}a_\ell ^\dagger a_{\ell - 1}^\dagger \ldots a_1^\dagger U(r - t) = 0$$
(5.145)

for all = 0, 1, 2, … and all sufficiently smooth functions U. This identity follows easily from Eq. (5.144) and the fact that b+1(rk) = k(k + 1) ·… (k + )rk++1 = 0 if k ∈ {0, −1, −2, …, −}. Therefore, given L ∈ {1, 2, 3, …}, the boundary condition

$${{\mathcal B}_L}:\qquad {b^{L + 1}}(ru){\vert _{r = R}} = 0$$
(5.146)

leaves the outgoing solutions with L unaltered. Notice that this condition is local in the sense that its formulation does not require the decomposition of u into spherical harmonics. Based on the Laplace method, it was proven in [46] (see also [369]) that each boundary condition \({{\mathcal B}_L}\) yields a well-posed IBVP. By uniqueness this implies that initial data corresponding to a purely outgoing solution with L yields a purely outgoing solution (without reflections). In this sense, the condition \({{\mathcal B}_L}\) is perfectly absorbing for waves with ℓL. For waves with > L, one obtains spurious reflections; however, for monochromatic radiation with wave number k, the corresponding amplitude reflection coefficients can be calculated to decay as (kR)−2(L+1) in the wave zone kR ≫ 1 [88]. Furthermore, in most scenarios with smooth solutions, the amplitudes corresponding to the lower few ’s will dominate over the ones with high so that reflections from high ’s are unimportant. For a numerical implementation of the boundary condition \({{\mathcal B}_2}\) via spectral methods and a possible application to general relativity see [314].

5.3.3 The wave equation on a curved background

When the background is curved, it is not always possible to construct in- and outgoing solutions explicitly, as in the previous example. Therefore, it is not even clear how a hierarchy of absorbing boundary conditions should be formulated. However, in many applications the spacetime is asymptotically flat, and if the boundary surface is placed sufficiently far from the strong field region, one can assume that the metric is a small deformation of the flat, Minkowski metric. To first order in M/R with M the ADM mass and R the areal radius of the outer boundary, these correction terms are given by those of the Schwarzschild metric, and approximate in- and outgoing solutions for all (ℓ,m) modes can again be computed [372].Footnote 25 The M/R terms in the background metric induce two kind of corrections in the in- and outgoing solutions uℓm. The first is a curvature correction term, which just adds M/R terms to the coefficients in the sum of Eq. (5.144). This term is local and still obeys Huygens’ principle. The second term is fast decaying (it decays as R/r+1) and describes the backscatter off the curvature of the background. As a consequence, it is nonlocal (it depends on the past history of the unperturbed solution) and violates Huygens’ principle.

By construction, the boundary conditions \({{\mathcal B}_L}\) are perfectly absorbing for outgoing waves with angular momentum number L, including their curvature corrections to first order in M/R. If the first-order correction terms responsible for the backscatter are taken into account, then \({{\mathcal B}_L}\) are not perfectly absorbing anymore, but the spurious reflections arising from these correction terms have been estimated in [372] to decay at least as fast as (M/R)(kR)−2 for monochromatic waves with wave number k satisfying Mk−1R.

The well-posedness of higher-order absorbing boundary conditions for wave equations on a curved background can be established by assuming the localization principle and the Laplace method [369]. Some applications to general relativity are discussed in Sections 6 and 10.3.1.

6 Boundary Conditions for Einstein’s Equations

The subject of this section is the discussion of the IBVP for Einstein’s field equations. There are at least three difficulties when formulating Einstein’s equations on a finite domain with artificial outer boundaries. First, as we have seen in Section 4, the evolution equations are subject to constraints, which, in general, propagate with nontrivial characteristic speeds. As a consequence, in general there are incoming constraint fields at the boundary that need to be controlled in order to make sure that the constraints propagate correctly, i.e., that constraint-satisfying initial data yields a solution of the evolution equations and the constraints on the complete computational domain, and not just on its domain of dependence. The control of these incoming constraint fields leads to constraint-preserving boundary conditions, and a nontrivial task is to fit these conditions into one of the admissible boundary conditions discussed in the previous Section 5, for which well-posedness can be shown.

A second issue is the construction of absorbing boundary conditions. Unlike the simple examples considered in Section 5.3, for which the fields evolve on a fixed background and in- and outgoing solutions can be represented explicitly, or at least characterized precisely, in general relativity it is not even clear how to define in- and outgoing gravitational radiation since there are no local expressions for the gravitational energy density and flux. Therefore, the best one can hope for is to construct boundary conditions, which approximately control the incoming gravitational radiation in certain regimes, like, for example, in the weak field limit where the field equations can be linearized around, say, a Schwarzschild or Minkowski spacetime.

Finally, the third issue is related to the diffeomorphism invariance of the theory. Ideally, one would like to formulate a geometric version of the IBVP, for which the data given on the initial and boundary surfaces Σ0 and \({\mathcal T}\) can be characterized in terms of geometric quantities such as the first and second fundamental forms of these surfaces as embedded in the yet unknown spacetime (M, g). In particular, this means that one should be able to identify equivalent data sets, i.e., those which are related to each other by a diffeomorphism of M, leaving Σ0 and \({\mathcal T}\) invariant, by local transformations on Σ0 and \({\mathcal T}\), without knowing the solution (M, g). It is currently not even clear if such a geometric uniqueness property does exist; see [186, 355] for further discussions on these points.

A well-posed IBVP for Einstein’s vacuum field equations was first formulated by Friedrich and Nagy [187] based on a tetrad formalism, which incorporates the Weyl curvature tensor as an independent field. This formulation exploits the freedom of choosing local coordinates and the tetrad orientation in order to impose very precise gauge conditions, which are adapted to the boundary surface \({\mathcal T}\) and tailored to the IBVP. These gauge conditions, together with a suitable modification of the evolution equations for the Weyl curvature tensor using the constraints (cf. Example 32), lead to a first-order symmetric hyperbolic system in which all the constraint fields propagate tangentially to \({\mathcal T}\) at the boundary. As a consequence, no constraint-preserving boundary conditions need to be specified, and the only incoming fields are related to the gravitational radiation, at least in the context of the approximations mentioned above. With this, the problem can be shown to be well posed using the techniques described in Section 5.2.

After the pioneering work of [187], there has been much effort in formulating a well-posed IBVP for metric formulations of general relativity, on which most numerical calculations are based. However, with the exception of particular cases in spherical symmetry [249], the linearized field equations [309] or the restriction to flat, totally reflecting boundaries [404, 405, 106, 98, 219, 220, 410, 29, 15], not much progress had been made towards obtaining a manifestly well-posed IBVP with nonreflecting, constraint-preserving boundary conditions. The difficulties encountered were similar to those described in Examples 31 and 32. Namely, controlling the incoming constraint fields usually resulted in boundary conditions for the main system involving either derivatives of its characteristic fields or fields propagating with zero speed, when it was written in first-order symmetric hyperbolic form. Therefore, the theory of maximal dissipative boundary conditions could not be applied in these attempts. Instead, boundary conditions controlling the incoming characteristic constraint fields were specified and combined with more or less ad hoc conditions controlling the gauge and gravitational degrees of freedom and verified to satisfy the Lopatinsky condition (5.27) using the Laplace method; see [395, 108, 378, 220, 363, 368].

The breakthrough in the metric case came with the work by Kreiss and Winicour [267] who formulated a well-posed IBVP for the linearized Einstein vacuum field equations with harmonic coordinates. Their method is based on the pseudo-differential first-order reduction of the wave equation described in Section 5.1.3, which, when combined with Sommerfeld boundary conditions, yields a problem, which is strongly well posed in the generalized sense and, when applied to systems of equations, allows a certain hierarchical coupling in the boundary conditions. This work was then generalized to shifted wave equations and higher-order absorbing boundary conditions in [369]. Later, it was recognized that the results in [267] could also be established based on the usual a priori energy estimates based on integration by parts [263]. Finally, it was found that the boundary conditions imposed were actually maximal dissipative for a specific nonstandard class of first-order symmetric hyperbolic reduction of the wave system; see Section 5.2.1. Unlike the reductions considered in earlier work, such nonstandard class has the property that the boundary surface is noncharacteristic, which implies that no zero speed fields are present, and yields a strong well-posed system. Based on this reduction and the theory of quasilinear symmetric hyperbolic formulations with maximal dissipative boundary conditions [218, 388], it was possible to extend the results in [267, 263] and formulate a well-posed IBVP for quasilinear systems of wave equations [264] with a certain class of boundary conditions (see Theorem 8), which was sufficiently flexible to treat the Einstein equations. Furthermore, the new reduction also offers the interesting possibility to extend the proof to the discretized case using finite difference operators satisfying the summation by parts property, discussed in Sections 8.3 and 9.4.

In order to parallel the presentation in Section 4, here we focus on the IBVP for Einstein’s equations in generalized harmonic coordinates and the IBVP for the BSSN system. The first case, which is discussed in Section 6.1, is an application of Theorem 8. In the BSSN case, only partial results have been obtained so far, but since the BSSN system is widely used, we nevertheless present some of these results in Section 6.2. In Section 6.3 we discuss some of the problems encountered when trying to formulate a geometric uniqueness theorem and, finally, in Section 6.4 we briefly mention alternative approaches to the IBVP, which do not require an artificial boundary.

For an alternative approach to treating the IBVP, which is based on the imposition of the Gauss-Codazzi equations at T; see [191, 192, 194, 193]. For numerical studies, see [249, 104, 40, 404, 405, 98, 287, 244, 378, 253, 61, 362, 35, 33, 368, 57, 56], especially [366] and [369] for a comparison between different boundary conditions used in numerical relativity and [365] for a numerical implementation of higher absorbing boundary conditions. For review articles on the IBVP in general relativity, see [372, 355, 435].

At present, there are no numerical simulations that are based directly on the well-posed IBVP for the tetrad formulation [187] or the well-posed IBVP for the harmonic formulation [267, 263, 264] described in Section 6.1, nor is there a numerical implementation of the constraint-preserving boundary conditions for the BSSN system presented in Section 6.2. The closest example is the harmonic approach described in [286, 363, 366], which has been shown to be well posed in the generalized sense in the high-frequency limit [369]. However, as mentioned above, the well posed IBVP in [264] opens the door for a numerical discretization based on the energy method, which can be proven to be stable, at least in the linearized case.

6.1 The harmonic formulation

Here, we discuss the IBVP formulated in [264] for the Einstein vacuum equations in generalized harmonic coordinates. The starting point is a manifold of the form M = [0, T] × Σ, with Σ a three-dimensional compact manifold with C-boundary Σ, and a given, fixed smooth background metric \({\overset \circ g _{\alpha \beta}}\) with corresponding Levi-Civita connection \(\overset \circ \nabla\), as in Section 4.1. We assume that the time slices Σt:= {t} × Σ are space-like and that the boundary surface \({\mathcal T}: = [0,T] \times \partial \Sigma\) is time-like with respect to \({\overset \circ g_{\alpha \beta}}\).

In order to formulate the boundary conditions, we first construct a null tetrad \(\{{K^\mu},{L^\mu},{Q^\mu},{{\bar Q}^\mu}\}\), which is adapted to the boundary. This null tetrad is based on the choice of a future-directed time-like vector field Tμ tangent to \({\mathcal T}\), which is normalized such that gμνTμTν = −1. One possible choice is to tie Tμ to the foliation Σt, and then define it in the direction orthogonal to the cross sections Σt = {t} × Σ of the boundary surface. A more geometric choice has been proposed in [186], where instead Tμ is chosen as a distinguished future-directed time-like eigenvector of the second fundamental form of \({\mathcal T}\), as embedded in (M, g). Next, we denote by Nμ the unit outward normal to \({\mathcal T}\) with respect to the metric gμν and complete Tμ and Nμ to an orthonormal basis {Tμ, Nμ, Vμ, Wμ} of TpM at each point \(p \in {\mathcal T}\). Then, we define the complex null tetrad by

$${K^\mu}: = {T^\mu} + {N^\mu},\qquad {L^\mu}: = {T^\mu} - {N^\mu},\qquad {Q^\mu}: = {V^\mu} + i\,{W^\mu},\qquad {\bar Q^\mu}: = {V^\mu} - i\,{W^\mu},$$
(6.1)

where \(i = \sqrt {- 1}\). Notice that the construction of these vectors is implicit, since it depends on the dynamical metric gαβ, which is yet unknown. However, the dependency is algebraic, and does not involve any derivatives of gαβ. We also note that the complex null vector Qμ is not unique since it can be rotated by an angle φ ∈ ℝ, QμeQμ. Finally, we define a radial function r on \({\mathcal T}\) as the areal radius of the cross sections Σt with respect to the background metric.

Then, the boundary conditions, which were proposed in [264] for the harmonic system (4.5), are:

$${\overset \circ \nabla}_{K}{\left. {{h_{KK}} + {2 \over r}{h_{KK}}} \right\vert _{\mathcal T}} = {q_K},$$
(6.2)
$${\left. {{{\overset \circ \nabla}_K}{h_{KL}} + {1 \over r}({h_{KL}} + {h_{Q\bar Q}})} \right\vert _{\mathcal T}} = {q_L},$$
(6.3)
$${\left. {{{\overset \circ \nabla}_K}{h_{KQ}} + {2 \over r}{h_{KQ}}} \right\vert _{\mathcal T}} = {q_Q},$$
(6.4)
$${\left. {{{\overset \circ \nabla}_{K}}{{h}_{QQ}} - {{\overset \circ \nabla}_{Q}}{{h}_{QK}}} \right\vert _{\mathcal T}} = {q_{QQ}},$$
(6.5)
$${\left. {{{\overset \circ \nabla}_K}{h_{Q\bar Q}} + {{\overset \circ \nabla}_L}{h_{KK}} - {{\overset \circ \nabla}_Q}{h_{K\bar Q}} - {{\overset \circ \nabla}_{\bar Q}}{h_{KQ}}} \right\vert _{\mathcal T}} = {\left. {2{H_K}} \right\vert _{\mathcal T}},$$
(6.6)
$${\left. {{{\overset \circ \nabla}_K}{h_{LQ}} + {{\overset \circ \nabla}_L}{h_{KQ}} - {{\overset \circ \nabla}_Q}{h_{KL}} - {{\overset \circ \nabla}_{\bar Q}}{h_{QQ}}} \right\vert _{\mathcal T}} = {\left. {2{H_Q}} \right\vert _{\mathcal T}},$$
(6.7)
$${\left. {{{\overset \circ \nabla}_K}{h_{LL}} + {{\overset \circ \nabla}_L}{h_{Q\bar Q}} - {{\overset \circ \nabla}_Q}{h_{L\bar Q}} - {{\overset \circ \nabla}_{\bar Q}}{h_{LQ}}} \right\vert _{\mathcal T}} = {\left. {2{H_L}} \right\vert _{\mathcal T}},$$
(6.8)

where \({\overset \circ \nabla _K}{h_{LQ}}: = {K^\mu}{L^\alpha}{Q^\beta}{\overset \circ \nabla _\mu}{h_{\alpha \beta}},{h_{KL}}: = {K^\alpha}{L^\beta}{h_{\alpha \beta}},{H_K}:{K^\mu}{H_\mu}\), etc., and where qK and qL are real-valued given smooth functions on \({\mathcal T}\) and qQ and qQQ are complex-valued given smooth functions on \({\mathcal T}\). Since Q is complex, these constitute ten real boundary conditions for the metric coefficients hαβ. The content of the boundary conditions (6.2, 6.3, 6.4, 6.5) can be clarified by considering linearized gravitational waves on a Minkowski background with a spherical boundary. The analysis in [264] shows that in this context the four real conditions (6.2),(6.3, 6.4) are related to the gauge freedom; and the two conditions (6.5) control the gravitational radiation. The remaining conditions (6.6, 6.7, 6.8) enforce the constraint Cμ = 0 on the boundary, see Eq. (4.6), and so together with the constraint propagation system (4.14) and the initial constraints (4.15) they guarantee that the constraints are correctly propagated. Based on these observations, it is expected that these boundary conditions yield small spurious reflections in the case of a nearly-spherical boundary in the wave zone of an asymptotically-flat curved spacetime.

6.1.1 Well-posedness of the IBVP

The IBVP consisting of the harmonic Einstein equations (4.5), initial data (4.7) and the boundary conditions (6.26.8) can be shown to be well posed as an application of Theorem 8. For this, we first notice that the evolution equations (4.5) have the required form of Eq. (5.113), where E is the vector bundle of symmetric, covariant tensor fields hμν on M. Next, the boundary conditions can be written in the form of Eq. (5.115) with α = 1. In order to compute the matrix coefficients \({c^\mu}{^A_B}\), it is convenient to decompose hμν = hAeAμν in terms of the basis vectors

$$\begin{array}{*{20}c} {{e_{1\,\alpha \beta}}: = {K_\alpha}{K_\beta},\quad {e_{2\,\alpha \beta}}: = - 2{K_{(\alpha}}{{\bar Q}_{\beta)}},\quad {e_{3\,\alpha \beta}}: = - 2{K_{(\alpha}}{Q_{\beta)}},\quad {e_{4\,\alpha \beta}}: = 2{Q_{(\alpha}}{{\bar Q}_{\beta)}},} \\ {{e_{5\,\alpha \beta}}: = {{\bar Q}_\alpha}{{\bar Q}_\beta},\quad {e_{6\,\alpha \beta}}: = {Q_\alpha}{Q_\beta}\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad} \\ {{e_{7\,\alpha \beta}}: = - 2{L_{(\alpha}}{{\bar Q}_{\beta)}},\quad {e_{8\,\alpha \beta}}: = - 2{L_{(\alpha}}{Q_{\beta)}},\quad {e_{9\,\alpha \beta}}: = 2{K_{(\alpha}}{L_{\beta)}},\quad {e_{10\,\alpha \beta}}: = {L_\alpha}{L_\beta},} \\ \end{array}$$

with \({h^1} = {h_{LL}}/4,\,{h^2} = {{\bar h}^3} = {h_{LQ}}/4,\,{h^4} = {h_{Q\bar Q}}/4,\,{h^5} = {{\bar h}^6} = {h_{QQ}}/4,\,{h^7} = {{\bar h}^8} = {h_{KQ}}/4,\,{h^9} = {h_{KL}}/4,\,{h^{10}} = {h_{KK}}/4\). With respect to this basis, the only nonzero matrix coefficients are

$$\begin{array}{*{20}c} {{c^{\mu \,1}}_2 = {{\bar Q}^\mu},} & {{c^{\mu \,1}}_3 = {Q^\mu},} & {{c^{\mu \,1}}_4 = - {L^\mu},} \\ {{c^{\mu \,2}}_5 = {{\bar Q}^\mu},} & {{c^{\mu \,2}}_7 = - {L^\mu},} & {{c^{\mu \,2}}_9 = {Q^\mu},} \\ {{c^{\mu \,3}}_6 = {Q^\mu},} & {{c^{\mu \,3}}_8 = - {L^\mu},} & {{c^{\mu \,3}}_9 = {{\bar Q}^\mu},} \\ {{c^{\mu \,4}}_7 = {{\bar Q}^\mu},} & {{c^{\mu \,4}}_8 = {Q^\mu},} & {{c^{\mu \,4}}_{10} = - {L^\mu},} \\ {{c^{\mu \,5}}_7 = {Q^\mu},} & {{c^{\mu \,6}}_8 = {{\bar Q}^\mu},} & {} \\ \end{array}$$

which has the required upper triangular form with zeros in the diagonal. Therefore, the hypothesis of Theorem 8 are verified and one obtains a well-posed IBVP for Einstein’s equations in harmonic coordinates.

This result also applies the the modified system (4.16), since the constraint damping terms, which are added, do not modify the principal part of the main evolution system nor the one of the constraint propagation system.

6.2 Boundary conditions for BSSN

Here we discuss boundary conditions for the BSSN system (4.524.59), which is used extensively in numerical calculations of spacetimes describing dynamic black holes and neutron stars. Unfortunately, to date, this system lacks an initial-boundary value formulation for which well-posedness in the full nonlinear case has been proven. Without doubt the reason for this relies on the structure of the evolution equations, which are mixed first/second order in space, and whose principal part is much more complicated than the harmonic case, where one deals with a system of wave equations.

A first step towards formulating a well-posed IBVP for the BSSN system was performed in [52], where the evolution equations (4.52, 4.53, 4.564.59) with a fixed shift and the relation f = μ ≡ (4m −1)/3 were reduced to a first-order symmetric hyperbolic system. Then, a set of six boundary conditions consistent with this system could be formulated based on the theory of maximal dissipative boundary conditions. Although this gives rise to a well-posed IBVP, the boundary conditions specified in [52] are not compatible with the constraints, and therefore, one does not necessarily obtain a solution to the full set of Einstein’s equations beyond the domain of dependence of the initial data surface. In a second step, constraint-preserving boundary conditions for BSSN with a fixed shift were formulated in [220], and cast into maximal dissipative form for the linearized system (see also [15]). However, even at the linearized level, these boundary conditions are too restrictive because they constitute a combination of Dirichlet and Neumann boundary conditions on the metric components, and in this sense they are totally reflecting instead of absorbing. More general constraint-preserving boundary conditions were also considered in [220] and, based on the Laplace method, they were shown to satisfy the Lopatinsky condition (5.27).

Radiative-type constraint-preserving boundary conditions for the BSSN system (4.524.59) with dynamical lapse and shift were formulated in [315] and shown to yield a well-posed IBVP in the linearized case. The assumptions on the parameters in this formulation are m = 1, f > 0, κ = 4GH/3 > 0, fκ, which guarantee that the BSSN system is strongly hyperbolic, and as long as e ≠ 2α, they allow for the gauge conditions (4.62, 4.63) used in recent numerical calculations, where f = 2/α and κ = e/α2; see Section 4.3.1. In the following, we describe this IBVP in more detail. First, we notice that the analysis in Section 4.3.1 reveals that for the standard choice m = 1 the characteristic speeds with respect to the unit outward normal si to the boundary are

$${\beta ^s},\qquad {\beta ^s} \pm \alpha ,\qquad {\beta ^s} \pm \alpha \,\sqrt f ,\qquad {\beta ^s} \pm \alpha \,\sqrt {GH} ,\qquad {\beta ^s} \pm \alpha \,\sqrt \kappa ,$$
(6.9)

where βs = βisi is the normal component of the shift. According to the theory described in Section 5 it is the sign of these speeds, which determines the number of incoming fields and boundary conditions that must be specified. Namely, the number of boundary conditions is equal to the number of characteristic fields with positive speed. Assuming ∣βs∣ is small enough such that \(\vert {\beta ^s}/\alpha \vert < \min \{1,\sqrt f, \sqrt {GH}, \sqrt \kappa \}\), which is satisfied asymptotically if βs → 0 and α → 1, it is the sign of the normal component of the shift, which determines the number of boundary conditions. Therefore, in order to keep the number of boundary conditions fixed throughout evolutionFootnote 26 one has to ensure that either βs > 0 or βs ≤ 0 at the boundary surface. If the condition βs → 0 is imposed asymptotically, the most natural choice is to set the normal component of the shift to zero at the boundary, βs = 0 at \({\mathcal T}\). The analysis in [52] then reveals that there are precisely nine incoming characteristic fields at the boundary, and thus, nine conditions have to be imposed at the boundary. These nine boundary conditions are as follows:

  • Boundary conditions on the gauge variables

    There are four conditions that must be imposed on the gauge functions, namely the lapse and shift. These conditions are motivated by the linearized analysis, where the gauge propagation system, consisting of the evolution equations for lapse and shift obtained from the BSSN equations (4.524.55, 4.59), decouples from the remaining evolution equations. Surprisingly, this gauge propagation system can be cast into symmetric hyperbolic form [315], for which maximal dissipative boundary conditions can be specified, as described in Section 5.2. It is remarkable that the gauge propagation system has such a nice mathematical structure, since the equations (4.52, 4.54, 4.55) have been specified by hand and mostly motivated by numerical experiments instead of mathematical analysis.

    In terms of the operator \({\Pi ^i}_j = {\delta ^i}_j - {s^i}{s_j}\) projecting onto vectors tangential to the boundary, the four conditions on the gauge variables can be written as

    $${s^i}{\partial _i}\alpha = 0,$$
    (6.10)
    $${\beta ^s} = 0,$$
    (6.11)
    $${\Pi ^i}_j\,\left({{\partial _t} + {{\sqrt {3\kappa}} \over 2}{s^k}{\partial _k}} \right){\beta ^j} = {\kappa \over {f - \kappa}}{\Pi ^i}_j\,{\tilde \gamma ^{jk}}{\partial _k}\alpha .$$
    (6.12)

    Eq. (6.10) is a Neumann boundary condition on the lapse, and Eq. (6.11) sets the normal component of the shift to zero, as explained above. Geometrically, this implies that the boundary surface \({\mathcal T}\) is orthogonal to the time slices Σt. The other two conditions in Eq. (6.12) are Sommerfeld-like boundary conditions involving the tangential components of the shift and the tangential derivatives of the lapse; they arise from the analysis of the characteristic structure of the gauge propagation system. An alternative to Eq. (6.12) also described in [315] is to set the tangential components of the shift to zero, which, together with Eq. (6.11) is equivalent to setting βi = 0 at the boundary. This alternative may be better suited for IBVP with non-smooth boundaries, such as cubes, where additional compatibility conditions must be enforced at the edges.

  • Constraint-preserving boundary conditions

    Next, there are three conditions requiring that the momentum constraint be satisfied at the boundary. In terms of the BSSN variables this implies

    $${\tilde D^j}{\tilde A_{ij}} - {2 \over 3}{\tilde D_i}K + 6{\tilde A_{ij}}{\tilde D^j}\phi = 8\pi {G_N}{j_i}.$$
    (6.13)

    As shown in [315], Eqs. (6.13) yields homogeneous maximal dissipative boundary conditions for a symmetric hyperbolic first-order reduction of the constraint propagation system (4.74, 4.75, 4.76). Since this system is also linear and its boundary matrix has constant rank if βs = 0, it follows from Theorem 7 that the propagation of constraint violations is governed by a well-posed IBVP. This implies, in particular, that solutions whose initial data satisfy the constraints exactly automatically satisfy the constraints on each time slice Σt. Furthermore, small initial constraint violations, which are usually present in numerical applications yield solutions for which the growth of the constraint violations can be bounded in terms of the initial violations.

  • Radiation controlling boundary conditions

    Finally, the last two boundary conditions are intended to control the incoming gravitational radiation, at least approximately, and specify the complex Weyl scalar Ψ0, cf. Example 32. In order to describe this boundary condition we first define the quantities \({{\bar {\mathcal E}}_{ij}}: = {{\tilde R}_{ij}} + R_{ij}^\phi + {e^{4\phi}}({1 \over 3}K{{\tilde A}_{ij}} - {{\tilde A}_{il}}\tilde A_j^l) - 4\pi {G_N}{\sigma _{ij}}\) and \({{\bar {\mathcal B}}_{kij}}: = {e^{4\phi}}\left[ {{{\tilde D}_k}{{\tilde A}_{ij}} - 4\left({{{\tilde D}_{(i}}\phi} \right){{\tilde A}_{j)k}}} \right]\), which determine the electric and magnetic parts of the Weyl tensor through \({E_{ij}} = {{\bar {\mathcal E}}_{ij}} - {1 \over 3}{\gamma _{ij}}{\gamma ^{kl}}{{\bar {\mathcal E}}_{kl}}\) and \({B_{ij}} = {\varepsilon _{kl(i}}{{\bar {\mathcal B}}^{kl}}{\,_{j)}}\), respectively. Here, εkij denotes the volume form with respect to the three metric γij. In terms of the operator \({P^{ij}}_{lm} = {\Pi ^i}_{(l}{\Pi ^j}_{m)} - {1 \over 2}{\Pi ^{ij}}{\Pi _{lm}}\) projecting onto symmetric trace-less tangential tensors to the boundary, the boundary condition reads

    $${P^{ij}}_{lm}{\bar {\mathcal E} _{ij}} + \left({{s^k}{P^{ij}}_{lm} - {s^i}{P^{kj}}_{lm}} \right){\bar {\mathcal B} _{kij}} = {P^{ij}}_{lm}{G_{ij}},$$
    (6.14)

    with Gij a given smooth tensor field on the boundary surface \({\mathcal T}\). The relation between Gij and Ψ0 is the following: if n = α−1(tβii) denotes the future-directed unit normal to the time slices, we may construct an adapted Newman-Penrose null tetrad \(\{K,L,Q,\bar Q\}\) at the boundary by defining K := n + s, L := ns, and by choosing Q to be a complex null vector orthogonal to K and L, normalized such that \({Q^\mu}{{\bar Q}_\mu} = 2\). Then, we have Ψ0 = (EkliBkl)QkQl = GklQkQl. For typical applications involving the modeling of isolated systems one may set Gij to zero. However, this in general is not compatible with the initial data (see the discussion in Section 10.3), an alternative is then to freeze the value of Gij to the one computed from the initial data.

    The boundary condition (6.14) can be partially motivated by considering an isolated system, which, globally, is described by an asymptotically-flat spacetime. Therefore, if the outer boundary is placed far enough away from the strong field region, one may linearize the field equations on a Minkowski background to a first approximation. In this case, one is in the same situation as in Example 32, where the Weyl scalar Ψ0 is an outgoing characteristic field when constructed from the adapted null tetrad. Furthermore, one can also appeal to the peeling behavior of the Weyl tensor [328], in which Ψ0 is the fastest decaying component along an outgoing null geodesics and describes the incoming radiation at past null infinity. While Ψ0 can only be defined in an unambiguous way at null infinity, where a preferred null tetrad exists, the boundary condition (6.14) has been successfully numerically implemented and tested for truncated domains with artificial boundaries in the context of the harmonic formulation; see, for example, [366]. Estimates on the amount of spurious reflection introduced by this condition have also been derived in [88, 89]; see also [135].

6.3 Geometric existence and uniqueness

The results mentioned so far concerning the well-posed IBVP for Einstein’s field equations in the tetrad formulation of [187], in the metric formulation with harmonic coordinates described in Section 6.1, or in the linearized BSSN formulation described in Section 6.2 allow one, from the PDE point of view, to construct unique solutions on a manifold of the form M = [0, T] × Σ, given appropriate initial and boundary data. However, since general relativity is a diffeomorphism invariant theory, one needs to pose the IBVP from a geometric perspective. In particular, the following questions arise, which, for simplicity, we only formulate for the vacuum case:

  • Geometric existence. Let (M, g) be any smooth solution of Einstein’s vacuum field equations on the manifold M = [0, T] × Σ corresponding to initial data (h, k) on Σ0 and boundary data ψ on \({\mathcal T}\), where h and k represent, respectively, the first and second fundamental forms of the initial surface Σ0 as embedded in (M, g). Is it possible to reproduce this solution with any of the well-posed IBVP mentioned so far, at least on a submanifold M′ = [0, T′] × Σ with 0 < T′ ≤ T? That is, does there exist initial data f and boundary data q for this IBVP and a diffeomorphism ϕ: M′ → ϕ(M′) ⊂ M, which leaves Σ0 and \({{\mathcal T}\prime}\) invariant, such that the metric constructed from this IBVP is equal to ⊂*g on M′?

  • Geometric uniqueness. Is the solution (M,g) uniquely determined by the data (h,k,ψ)? Given a well-posed IBVP for which geometric existence holds, the question about geometric uniqueness can be reduced to the analysis of this particular IBVP in the following way: let u1 and u2 be two solutions of the IBVP on the manifold M = [0, T] × Σ with corresponding data (f1, q1) and (f2, q2). Suppose the two solutions induce the same data (h, k) on Σ0 and ψ on \({\mathcal T}\). Does there exist a diffeomorphism ϕ: M′ = [0, T′] × Σ → ϕ(M′) ⊂ M, which leaves Σ0 and \({{\mathcal T}\prime}\) invariant, such that the metrics g1 and g2 corresponding to u1 and u2 are related to each other by g2 = φ*g1 on M′?

These geometric existence and uniqueness problems have been solved in the context of the Cauchy problem without boundaries; see [127] and Section 4.1.3. However, when boundaries are present, several new difficulties appear as pointed out in [186]; see also [187, 184]:

  1. (i)

    It is a priori not clear what the boundary data ψ should represent geometrically. Unlike the case of the initial surface, where the data represents the first and second fundamental forms of Σ0 as a spatial surface embedded in the constructed spacetime (M, g), it is less clear what the geometric meaning of ψ should be since it is restricted by the characteristic structure of the evolution equations, as discussed in Section 5.

  2. (ii)

    The boundary data (qK, qL, qQ, qQQ) in the boundary conditions (6.2, 6.3, 6.4, 6.5) for the harmonic formulation and the boundary data Gij in the boundary condition (6.14) for the BSSN formulation ultimately depend on the specific choice of a future-directed time-like vector field T at the boundary surface \({\mathcal T}\). Together with the unit outward normal N to \({\mathcal T}\), this vector defines the preferred null directions K = T + N and L = TN, which are used to construct the boundary-adapted null tetrad in the harmonic case and the projection operators \({\Pi ^\mu}_\nu = {\delta ^\mu}_\nu + {T^\mu}{T_\nu} - {N^\mu}{N_\nu}\) and \({P^{\mu \nu}}_{\alpha \beta} = {\Pi ^\mu}_\alpha {\Pi ^\nu}_\beta - {1 \over 2}{\Pi ^{\mu \nu}}{\Pi _{\alpha \beta}}\) in the BSSN one. Although it is tempting to define T as the unit, future-directed time-like vector tangent to \({\mathcal T}\), which is orthogonal to the cross sections Σt, this definition would depend on the particular foliation Σt the formulation is based on, and so the resulting vector T would be gauge-dependent. A similar issue arises in the tetrad formulation of [187].

  3. (iii)

    When addressing the geometric uniqueness issue, an interesting question is whether or not it is possible to determine from the data sets (f1, q1) and (f2, q2) alone if they are equivalent in the sense that their solutions u1 and u2 induce the same geometric data (h, k, ψ). Therefore, the question is whether or not one can identify equivalent data sets by considering only transformations on the initial and boundary surfaces Σ0 and \({\mathcal T}\), without knowing the solutions u1 and u2.

Although a complete answer to these questions remains a difficult task, there has been some recent progress towards their understanding. In [186] a method was proposed to geometrically single out a preferred time direction T at the boundary surface \({\mathcal T}\). This is done by considering the trace-free part of the second fundamental form, and proving that under certain conditions, which are stable under perturbations, the corresponding linear map on the tangent space possesses a unique time-like eigenvector. Together with the unit outward normal vector N, the vector field T defines a distinguished adapted null tetrad at the boundary, from which geometrically meaningful boundary data could be defined. For instance, the complex Weyl scalar Ψ0 can then be defined as the contraction Ψ0 = CαβγδKαQβKγQδ of the Weyl tensor Cαβγδ associated to the metric gμν along the null vectors K and Q, and the definition is unique up to the usual spin rotational freedom Qe eQ, and therefore, the Weyl scalar Ψ0 is a good candidate for forming part of the boundary data ψ.

In [355] it was suggested that the unique specification of a vector field T may not be a fundamental problem, but rather the manifestation of the inability to specify a non-incoming radiation condition correctly. In the linearized case, for example, setting the Weyl scalar Ψ0 to zero computed from the boundary-adapted tetrad is transparent to gravitational plane waves traveling along the specific null direction K = T + N, see Example 32, but it induces spurious reflections for outgoing plane waves traveling in other null direction. Therefore, a genuine non-incoming radiation condition should be, in fact, independent of any specific null or time-like direction at the boundary, and can only depend on the normal vector N. This is indeed the case for much simpler systems like the scalar wave equation on a Minkowski background [153], where perfectly absorbing boundary conditions are formulated as a nonlocal condition, which is independent of a preferred time direction at the boundary.

Aside from controlling the incoming gravitational degrees of freedom, the boundary data ψ should also comprise information related to the geometric evolution of the boundary surface. In [187] this was achieved by specifying the mean curvature of \({\mathcal T}\) as part of the boundary data. In the harmonic formulation described in Section 6.1 this information is presumably contained in the functions qK, qL and qQ, but their geometric interpretation is not clear.

In order to illustrate some of the issues related to the geometric existence and uniqueness problem in a simpler context, in what follows we analyze the IBVP for linearized gravitational waves propagating on a Minkowski background. Before analyzing this case, however, we make two remarks. First, it should be noted [186] that the geometric uniqueness problem, especially an understanding of point (iii), also has practical interest, since in long term evolutions it is possible that the gauge threatens to break down at some point, requiring a redefinition. The second remark concerns the formulation of the Einstein IBVP in generalized harmonic coordinates, described in Sections 4.1 and 6.1, where general covariance was maintained by introducing a background metric \({\overset \circ g _{\mu \nu}}\) on the manifold M. IBVPs based on this approach have been formulated in [369] and [264] and further developed in [434] and [433]. However, one has to emphasize that this approach does not automatically solve the geometric existence and uniqueness problems described here: although it is true that the IBVP is invariant with respect to any diffeomorphism ϕ: MM, which acts on the dynamical and the background metric at the same time, the question of the dependency of the solution on the background metric remains.

6.3.1 Geometric existence and uniqueness in the linearized case

Here we analyze some of the geometric existence and uniqueness issues of the IBVP for Einstein’s field equations in the much simpler setting of linearized gravity on Minkowski space, where the vacuum field equations reduce to

$$- {\nabla ^\mu}{\nabla _\mu}{h_{\alpha \beta}} - {\nabla _\alpha}{\nabla _\beta}h + 2{\nabla ^\mu}{\nabla _{(\alpha}}{h_{\beta)\mu}} = 0,$$
(6.15)

where hαβ denotes the first variation of the metric, h:= ηαβhαβ its trace with respect to the Minkowski background metric ηαβ, and ∇μ is the covariant derivative with respect to ηαβ. An infinitesimal coordinate transformation parametrized by a vector field ξμ induces the transformation

$${h_{\alpha \beta}} \mapsto {\tilde h_{\alpha \beta}} = {h_{\alpha \beta}} + 2{\nabla _{(\alpha}}{\xi _{\beta)}},$$
(6.16)

where ξα := ηαβξβ.

Let us consider the linearized Cauchy problem without boundaries first, where initial data is specified at the initial surface Σ0 = {0} × ℝ3. The initial data is specified geometrically by the first and second fundamental forms of Σ0, which, in the linearized case, are represented by a pair \((h_{ij}^{(0)},k_{ij}^{(0)})\) of covariant symmetric tensor fields on Σ0. We assume \((h_{ij}^{(0)},k_{ij}^{(0)})\) to be smooth and to satisfy the linearized Hamiltonian and momentum constraints

$${G^{ijrs}}{\partial _i}{\partial _j}h_{rs}^{(0)} = 0,\qquad {G^{ijrs}}{\partial _j}k_{rs}^{(0)} = 0,$$
(6.17)

where Gijrs:= δi(rδs)jδijδrs. A solution hαβ of Eq. (6.15) with the induced data corresponding to \((h_{ij}^{(0)},k_{ij}^{(0)})\) up to a gauge transformation (6.16) satisfies

$${\left. {{h_{ij}}} \right\vert _{{\Sigma _0}}} = h_{ij}^{(0)} + 2{\partial _{(i}}{X_{j)}},\qquad {\left. {{\partial _t}{h_{ij}} - 2{\partial _{(i}}{h_{j)0}}} \right\vert _{{\Sigma _0}}} = - 2(k_{ij}^{(0)} + {\partial _i}{\partial _j}f),$$
(6.18)

where Xj = ξj and f = ξ0 are smooth and represent the initial gauge freedom. Then, one has:

Theorem 9. The initial-value problem ( 6.15 , 6.18 ) possesses a smooth solution hαβ, which is unique up to an infinitesimal coordinate transformation hαβ = hαβ + 2∇(αξβ) generated by a vector field ξα.

Proof. We first show the existence of a solution in the linearized harmonic gauge \({C_\beta} = {\nabla ^\mu}{h_{\beta \mu}} - {1 \over 2}{\nabla _\beta}h = 0\), for which Eq. (6.15) reduces to the system of wave equations ∇μμhαβ = 0. The initial data, \(({h_{\alpha \beta}}{\vert _{{\Sigma _0}}},{\partial _t}{h_{\alpha \beta}}{\vert _{{\Sigma _0}}})\), for this system is chosen such that \({h_{ij}}{\vert _{{\Sigma _0}}} = h_{ij}^{(0)},{\partial _t}{h_{ij}}{\vert _{{\Sigma _0}}} = 2{\partial _{(i}}{h_{j)0}}{\vert _{{\Sigma _0}}} - 2k_{ij}^{(0)}\) and \({\partial _t}{h_{00}}{\vert _{{\Sigma _0}}} = 2{\delta ^{ij}}k_{ij}^{(0)},{\partial _t}{h_{{0_j}}}{\vert _{{\Sigma _0}}} = {\partial ^i}(h_{ij}^{(0)} - {1 \over 2}{\delta _{ij}}{\delta ^{kl}}h_{kl}^{(0)}) + {1 \over 2}{\partial _j}{h_{00}}{\vert _{{\Sigma _0}}}\), where \((h_{ij}^{(0)},k_{ij}^{(0)})\) satisfy the constraint equations (6.17) and where the initial data for h00 and h0j is chosen smooth but otherwise arbitrary. This choice implies the satisfaction of Eq. (6.18) with Xj = 0 and f = 0 and the initial conditions \({C_\beta}{\vert _{{\Sigma _0}}} = 0\) and \({\partial _t}{C_\beta}{\vert _{{\Sigma _0}}} = 0\) on the constraint fields Cβ. Therefore, solving the wave equation ∇μμhαβ = 0 with such data, we obtain a solution of the linearized Einstein equations (6.15) in the harmonic gauge with initial data satisfying (6.18) with Xj = 0 and f = 0. This shows geometric existence for the linearized harmonic formulation.

As for uniqueness, suppose we had two smooth solutions of Eqs. (6.15, 6.18). Then, since the equations are linear, the difference hαβ between these two solutions also satisfies Eqs. (6.15, 6.18) with trivial data \(h_{ij}^{(0)} = 0,\,\,k_{ij}^{(0)} = 0\). We show that hαβ can be transformed away by means of an infinitesimal gauge transformation (6.16). For this, define \({\tilde h_{\alpha \beta}}: = {h_{\alpha \beta}} + 2{\nabla _{(\alpha}}{\xi _{\beta)}}\) where ξβ is required to satisfy the inhomogeneous wave equation

$$0 = {\nabla ^\alpha}{\tilde h_{\alpha \beta}} - {1 \over 2}{\nabla _\beta}\tilde h = {\nabla ^\alpha}{h_{\alpha \beta}} - {1 \over 2}{\nabla _\beta}h + {\nabla ^\alpha}{\nabla _\alpha}{\xi _\beta}$$
(6.19)

with initial data for ξβ defined by \({\xi _0}{\vert _{{\Sigma _0}}} = - f,\,\,{\xi _i}{\vert _{{\Sigma _0}}} = - {X_i},\,\,{\partial _t}{\xi _0}{\vert _{{\Sigma _0}}} = - {h_{00}}/2,\,\,{\partial _t}{\xi _i}{\vert _{{\Sigma _0}}} = - {h_{0i}} + {\partial _i}f\). Then, by construction, \({\tilde h_{\alpha \beta}}\) satisfies the harmonic gauge, and it can be verified that \({\tilde h_{\alpha \beta}}{\vert _{{\Sigma _0}}} = {\partial _t}{\tilde h_{\alpha \beta}}{\vert _{{\Sigma _0}}} = 0\). Therefore, \({\tilde h_{\alpha \beta}}\) is a solution of the wave equation \({\nabla ^\mu}{\nabla _\mu}{\tilde h_{\alpha \beta}} = 0\) with trivial initial data, and it follows that \({\tilde h_{\alpha \beta}} = 0\) and that hαβ = −2∇(αξβ) is a pure gauge mode. □

It follows from the existence part of the proof that the quantities \({h_{00}}{\vert _{{\Sigma _0}}}\) and \({h_{0j}}{\vert_{{\Sigma _0}}}\), corresponding to linearized lapse and shift, parametrize pure gauge modes in the linearized harmonic formulation.

Next, we turn to the IBVP on the manifold M = [0, T] × Σ. Let us first look at the boundary conditions (6.26.5), which, in the linearized case, reduce to

$${\left. {{\nabla _K}{h_{KK}}} \right\vert _{\mathcal T}} = {q_K},\qquad {\left. {{\nabla _K}{h_{KL}}} \right\vert _{\mathcal T}} = {q_L},\qquad {\left. {{\nabla _K}{h_{KQ}}} \right\vert _{\mathcal T}} = {q_Q},\qquad {\left. {{\nabla _K}{h_{QQ}} - {\nabla _Q}{h_{QK}}} \right\vert _{\mathcal T}} = {q_{QQ}}.$$
(6.20)

There is no problem in repeating the geometric existence part of the proof on M imposing these boundary condition, and using the IBVP described in Section 6.1. However, there is a problem when trying to prove the uniqueness part. This is because a gauge transformation (6.16) induces the following transformations on the boundary data,

$$\begin{array}{*{20}c} {{{\tilde q}_K} = {q_K} + 2\nabla _K^2{\xi _K},\qquad {{\tilde q}_L} = {q_L} + \nabla _K^2{\xi _L} + {\nabla _K}{\nabla _L}{\xi _K},\qquad {{\tilde q}_Q} = {q_Q} + \nabla _K^2{\xi _Q} + {\nabla _K}{\nabla _Q}{\xi _K},} \\ {{{\tilde q}_{QQ}} = {q_{QQ}} + {\nabla _Q}({\nabla _K}{\xi _Q} - {\nabla _Q}{\xi _K}),\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad} \\ \end{array} \quad \quad$$

which overdetermines the vector field ξβ at the boundary. On the other hand, replacing the boundary condition (6.5) by the specification of the Weyl scalar Ψ0, leads to [286, 369]

$${\left. {\nabla _K^2{h_{QQ}} + {\nabla _Q}({\nabla _Q}{h_{KK}} - 2{\nabla _K}{h_{KQ}})} \right\vert _{\mathcal T}} = {\Psi _0}.$$
(6.21)

Since the left-hand side is gauge-invariant, there is no over-determination of ξβ at the boundary any more, and the transformation properties of the remaining boundary data qK, qL and qQ provides a complete set of boundary data for ξK, ξL and ξQ, which may be used in conjunction with the wave equation ∇μμξβ = 0 in order to formulate a well-posed IBVP [369]. Provided Ψ0 is smooth and the compatibility conditions are satisfied at the edge \(S = {\Sigma _0} \cap {\mathcal T}\), it follows:

Theorem 10. [355] The IBVP ( 6.15 , 6.18 , 6.21 ) possesses a smooth solution hαβ, which is unique up to an infinitesimal coordinate transformation \({{\tilde h}_{\alpha \beta}} = {h_{\alpha \beta}} + 2{\nabla _{(\alpha}}{\xi _{\beta)}}\) generated by a vector field ξα.

In conclusion, we can say that, in the simple case of linear gravitational waves propagating on a Minkowksi background, we have resolved the issues (i–iii). Correct boundary data is given to the linearized Weyl scalar Ψ0 computed from the boundary-adapted tetrad. To linear order, Ψ0 is invariant with respect to coordinate transformations, and the time-like vector field T appearing in its definition can be defined geometrically by taking the future-directed unit normal to the initial surface Σ0 and parallel transport it along the geodesics orthogonal to Σ0.

Whether or not this result can be generalized to the full nonlinear case is not immediately clear. In our linearized analysis we have imposed no restrictions on the normal component ξN of the vector field generating the infinitesimal coordinate transformation. However, such a restriction is necessary in order to keep the boundary surface fixed under a diffeomorphism. Unfortunately, it does not seem possible to restrict ξN in a natural way with the boundary conditions constructed so far.

6.4 Alternative approaches

Although the formulation of Einstein’s equations on a finite space domain with an artificial time-like boundary is currently the most used approach in numerical simulations, there are a number of difficulties associated with it. First, as discussed above, spurious reflections from the boundary surface may contaminate the solution unless the boundary conditions are chosen with great care. Second, in principle there is a problem with wave extraction, since gravitational waves can only be defined in an unambiguous (gauge-invariant) way at future null infinity. Third, there is an efficiency problem, since in the far zone the waves propagate along outgoing null geodesics so that hyperboloidal surfaces, which are asymptotically null, should be better adapted to the problem. These issues have become more apparent as numerical simulations have achieved higher accuracy to the point that boundary and wave extraction artifacts are noticeable, and have driven a number of other approaches.

One of them is that of compactification schemes, which include spacelike or null infinity into the computational domain. For schemes compactifying spacelike infinity; see [335, 336]. Conformal compactifications are reviewed in [172, 183], and a partial list of references to date includes [328, 176, 177, 180, 179, 170, 245, 172, 247, 100, 446, 447, 316, 87, 451, 452, 448, 449, 450, 305, 364, 42].

Another approach is Cauchy-characteristic matching (CCM) [99, 392, 401, 143, 148, 53], which combines a Cauchy approach in the strong field regime (thereby avoiding the problems that the presence of caustics would cause on characteristic evolutions) with a characteristic one in the wave zone. Data from the Cauchy evolution is used as inner boundary conditions for the characteristic one and, viceversa, the latter provides outer boundary conditions for the Cauchy IBVP. An understanding of the Cauchy IBVP is still a requisite. CCM is reviewed in [432]. A related idea is Cauchy-perturbative matching [455, 356, 4, 370], where the Cauchy code is instead coupled to one solving gauge-invariant perturbations of Schwarzschild black holes or flat spacetime. The multipole decomposition in the Regge-Wheeler-Zerilli equations [347, 453, 376, 294, 307] implies that the resulting equations are 1+1 dimensional and can therefore extend the region of integration to very large distances from the source. As in CCM, an understanding of the IBVP for the Cauchy sector is still a requisite.

One way of dealing with the ambiguity of extracting gravitational waves from Cauchy evolutions at finite radii is by extrapolating procedures; see, for example, [72, 331] for some approaches and quantification of their accuracies. Another approach is Cauchy characteristic extraction (CCE) [350, 37, 349, 32, 34, 54]. In CCE a Cauchy IBVP is solved, and the numerical data on a world tube is used to provide inner boundary conditions for a characteristic evolution that “transports” the data to null infinity. The difference with CCM is that in CCE there is no “feedback” from the characteristic evolution to the Cauchy one, and the extraction is done as a post-processing step.

7 Numerical Stability

In the previous sections we have discussed continuum initial and IBVPs. In this section we start with the study of the discretization of such problems. In the same way that a PDE can have a unique solution yet be ill posedFootnote 27, a numerical scheme can be consistent yet not convergent due to the unbounded growth of small perturbations as resolution is increased. The definition of numerical stability is the discrete version of well-posedness. One wants to ensure that small initial perturbations in the numerical solution, which naturally appear due to discretization errors and finite precision, remain bounded for all resolutions at any given time t > 0. Due to the classical Lax-Richtmyer theorem [276], this property, combined with consistency of the scheme, is equivalent in the linear case to convergence of the numerical solution, and the latter approaches the continuum one as resolution is increased (at least within exact arithmetic). Convergence of a scheme is in general difficult to prove directly, especially because the exact solution is in general not known. Instead, one shows stability.

The different definitions of numerical stability follow those of well-posedness, with the L2 norm in space replaced by a discrete one, which is usually motivated by the spatial approximation. For example, discrete norms under which the summation by parts property holds are natural in the context of some finite difference approximations and collocation spectral methods (see Sections 8 and 9).

We start with a general discussion of some aspects of stability, and explicit analyses of simple, low-order schemes for test models. There follows a discussion of different variations of the von Neumann condition, including an eigenvalue version, which can be used to analyze in practice necessary conditions for IBVPs. Next, we discuss a rather general stability approach for the method of lines, the notion of time-stability, Runge-Kutta methods, and we close the section with some references to other approaches not covered here, as well as some discussion in the context of numerical relativity.

7.1 Definitions and examples

Consider a well-posed linear initial-value problem (see Definition 3)

$${u_t}(t,x) = P(t,x,\partial /\partial x)u(t,x),\quad x \in {{\mathbb{R}}^n},\quad t \geq 0,$$
(7.1)
$$u(0,x) = f(x),\quad x \in {{\mathbb{R}}^n}\,.$$
(7.2)

Definition 11. An approximation-discretization to the Cauchy problem ( 7.1 , 7.2 ) is numerically stable if there is some discrete norm in space ∥ · ∥d and constants Kd, αd such that the corresponding approximation v satisfies

$${\Vert {v(t,\cdot)} \Vert_{\rm{d}}} \leq{K_{\rm{d}}}{e^{{\alpha _{\rm{d}}}t}}{\Vert f \Vert_{\rm{d}}},$$
(7.3)

for high enough resolutions, smooth initial data f, and t ≥ 0.

Note:

  • The previous definition applies both to the semi-discrete case (where space but not time is discretized) as well as the fully-discrete one. In the latter case, Eq. (7.3) is to be interpreted at fixed time. For example, if the timestep discretization is constant,

    $${t_k} = k\Delta t,\qquad k = 0,1,2 \ldots$$
    (7.4)

    then Eq. (7.3) needs to hold for fixed tk and arbitrarily large k. In other words, the solution is allowed to grow with time, but not with the number of timesteps at fixed time when resolution is increased.

  • The norm ∥ · ∥d in general depends on the spatial approximation, and in Sections 8 and 9 we discuss some definitions for the finite difference and spectral cases.

  • From Definition 11, one can see that an ill-posed problem cannot have a stable discretization, since otherwise one could take the continuum limit in (7.3) and reach the contradiction that the original system was well posed.

  • As in the continuum, Eq. (7.3) implies uniqueness of the numerical solution v.

  • In Section 3 we discussed that if, in a well-posed homogeneous Cauchy problem, a forcing term is added to Eq. (7.1),

    $${u_t}(t,x) = P(t,x,\partial /\partial x)u(t,x)\qquad \mapsto \qquad {u_t}(t,x) = P(t,x,\partial /\partial x)u(t,x) + F(t,x),$$
    (7.5)

    then the new problem admits another estimate, related to the original one via Duhamel’s formula, Eq. (3.23). A similar concept holds at the semi-discrete level, and the discrete estimates change accordingly (in the fully-discrete case the integral in time is replaced by a discrete sum),

    $$\Vert v(t,\cdot) \Vert_{\rm{d}} \leq{K_{\rm{d}}}{e^{{\alpha _{\rm{d}}}t}}\Vert f \Vert_{\rm{d}}\qquad \mapsto \qquad \Vert v(t,\cdot) \Vert_{\rm{d}} \leq{K_{\rm{d}}}{e^{{\alpha _{\rm{d}}}t}} \left(\Vert f \Vert_{\rm{d}} + \int\limits_0^t \Vert F(s, \cdot) \Vert_{\rm{d}} ds \right).$$
    (7.6)

    In other words, the addition of a lower-order term does not affect numerical stability, and without loss of generality one can restrict stability analyses to the homogeneous case.

  • The difference w:= uv between the exact solution and its numerical approximation satisfies an equation analogous to (7.5), where F is related to the truncation error of the approximation. If the scheme is numerically stable, then in the linear and semi-discrete cases Eq. (7.6) implies

    $${\Vert {w(t,\cdot)} \Vert_{\rm{d}}} \leq{K_{\rm{d}}}{e^{{\alpha _{\rm{d}}}t}}\int\limits_0^t {{{\Vert {F(s,\cdot)} \Vert}_{\rm{d}}}} ds\,.$$
    (7.7)

    If the approximation is consistent, the truncation error converges to zero as resolution is increased, and Equation (7.7) implies that so does the norm of the error ∥w(t, ·)∥d· That is, stability implies convergence. The inverse is also true and this equivalence between convergence and stability is the celebrated Lax-Richtmyer theorem. The equivalence also holds in the fully-discrete case.

  • In the quasi-linear case, one follows the principle of linearization, as described in Section 3.3. One linearizes the problem, and constructs a stable numerical scheme for the linearization. The expectation, then, is that the scheme also converges for the nonlinear scheme. For particular problems and discretizations this expectation can be rigorously proven (see, for example, [259]).

From here on {xj, tk} denotes some discretization of space and time. This includes both finite difference and spectral collocation methods, which are the ones discussed in Sections 8 and 9, respectively. In addition, we use the shorthand notation

$$v_j^k: = v({t_k},{x_j}){.}$$
(7.8)

In order to gain some intuition into the general problem of numerical stability we start with some examples of simple, low-order approximations for a test problem. Consider uniform grids both in space and time

$${t_k} = k\Delta t,\quad {x_j} = j\Delta x,\qquad k = 0,1,2, \ldots ,\quad j = 0,1,2, \ldots N,$$
(7.9)

and the advection equation,

$${u_t} = a{u_x}\,,\qquad x \in [0,2\pi ],\quad t \geq 0,$$
(7.10)

on a periodic domain with 2π = N Δx, and smooth periodic initial data. Then the solution u can be represented by a Fourier series:

$$u(t,x) = {1 \over {\sqrt {2\pi}}}\sum\limits_{\omega \in {\mathbb{Z}}} {\hat u} (t,\omega){e^{i\omega x}},$$
(7.11)

where

$$\hat u(t,\omega) = {1 \over {\sqrt {2\pi}}}\int\nolimits_0^{2\pi} {{e^{- i\omega x}}} u(t,x)dx,$$
(7.12)

and the stability of the following schemes can be analyzed in Fourier space.

Example 33. The one-sided Euler scheme.

Eq. (7.10) is discretized with a one-sided FD approximation for the spatial derivative and evolved in time with the forward Euler scheme,

$${{v_j^{k + 1} - v_j^k} \over {\Delta t}} = a{{v_{j + 1}^k - v_j^k} \over {\Delta x}}.$$
(7.13)

In Fourier space the approximation becomes

$${\hat v^{k + 1}}(\omega) = \hat q(\omega){\hat v^k}(\omega) = {\left[ {\hat q(\omega)} \right]^{k + 1}}{\hat v^0}(\omega)\,,$$
(7.14)

where

$$\hat q(\omega) = 1 + a\lambda \left({{e^{i\omega \Delta x}} - 1} \right)$$
(7.15)

is called the amplification factor and

$$\lambda = {{\Delta t} \over {\Delta x}}$$
(7.16)

the Courant-Friedrich-Levy (CFL) factor

Using Parseval’s identity, we find

$${\Vert {v({t_k},\cdot)} \Vert^2} = \sum\limits_{\omega \in {\mathbb {Z}}} \vert \hat q(\omega){\vert ^{2k}}\vert {\hat v^0}(\omega){\vert ^2},$$
(7.17)

and therefore, we see that the inequality (7.3) can only hold for all k if

$$\vert \hat q(\omega)\vert \leq1\quad {\rm{for all}}\omega \in \,{\mathbb{Z}}.$$
(7.18)

for a > 0, this is the case if and only if the CFL factor satisfies

$$0 < \lambda \leq{1 \over a},$$
(7.19)

and in this case the well-posedness estimate (7.3) holds with Kd = 1 and αd = 0. The upper bound in condition (7.19) for this example is known as the CFL limit, and (7.18) as the von Neumann condition. If \(a = 0,\,\hat q(\omega) = 1\), while for a < 0 the scheme is unconditionally unstable even though the underlying continuum problem is well posed.

Next we consider a scheme very similar to the previous one, but which turns out to be unconditionally unstable for a ≠ 0, regardless of the direction of propagation.

Example 34. A centered Euler scheme.

Consider first the semi-discrete approximation to Eq. (7.10),

$${d \over {dt}}{v_j} = a{{{v_{j + 1}} - {v_{j - 1}}} \over {2\Delta x}}\,;$$
(7.20)

it is easy to check that it is stable for all values of Δx. Next discretize time through an Euler scheme, leading to

$${{v_j^{k + 1} - v_j^k} \over {\Delta t}} = a{{v_{j + 1}^k - v_{j - 1}^k} \over {2\Delta x}}\,.$$
(7.21)

The solution again has the form given by Eq. (7.14), now with

$$\vert \hat q(\omega)\vert \,=\, \vert 1 + ia\lambda \sin (\omega \Delta x)\vert \, \geq 1.$$
(7.22)

At fixed time tk, the norm of the solution to the fully-discrete approximation (7.21) for arbitrary small initial data with ωΔxπℤ grows without bound as the timestep decreases.

The semi-discrete centered approximation (7.20) and the fully-discrete centered Euler scheme (7.21) constitute the simplest example of an approximation, which is not fully-discrete stable, even though its semi-discrete version is. This is related to the fact that the Euler time integration is not locally stable, as discussed in Section 7.3.2.

The previous two examples were one-step methods, where vk+1 can be computed in terms of vk. The following is an example of a two-step method.

Example 35. Leap-frog.

A way to stabilize the centered Euler scheme is by approximating the time derivative by a centered difference instead of a forward, one-sided operator:

$$v_j^{k + 1} = v_j^{k - 1} + a\lambda \left({v_{j + 1}^k - v_{j - 1}^k} \right)\,.$$
(7.23)

Enlarging the system by introducing

$$w_j^k: = \left({\begin{array}{*{20}c} {v_j^k} \\ {v_j^{k - 1}} \\ \end{array}} \right)$$
(7.24)

it can be cast into the one-step method

$${\hat w^{k + 1}} = \hat Q(\omega){\hat w^k} = \hat Q{(\omega)^{k + 1}}{\hat w^0},\qquad {\rm{with}}\hat Q(\omega) = \left({\begin{array}{*{20}c} {2ia\lambda \sin (\omega \Delta x)} & 1 \\ 1 & 0 \\ \end{array}} \right).$$
(7.25)

By a similar procedure, a general multi-step method can always be reduced to a one-step one. Therefore, in the stability results below we can assume without loss of generality that the schemes are one-step.

In the above example the amplification matrix \(\hat Q(\omega)\) can be diagonalized through a transformation that is uniformly bounded:

$$\hat Q(\omega) = \hat T(\omega)\left({\begin{array}{*{20}c} {{\mu _ +}} & 0 \\ 0 & {{\mu _ -}} \\ \end{array}} \right){\hat T^{- 1}}(\omega),\qquad \hat Q{(\omega)^k} = \hat T(\omega)\left({\begin{array}{*{20}c} {\mu _ + ^k} & 0 \\ 0 & {\mu _ - ^k} \\ \end{array}} \right){\hat T^{- 1}}(\omega),$$
(7.26)

with μ± = z ± (1 + z2)1/2, z:= iaλ sin (ωΔx), and

$$\hat T(\omega) = \left({\begin{array}{*{20}c} {{\mu _ +}} & {{\mu _ -}} \\ 1 & 1 \\ \end{array}} \right).$$
(7.27)

The eigenvalues μ± are of unit modulus, ∣μ±∣ = 1. In addition, the norms of \(\hat T(\omega)\) and its inverse are

$$\vert \hat T(\omega)\vert = \sqrt {2\left({1 + \vert z\vert} \right)} \,,\quad \vert {\hat T^{- 1}}(\omega)\vert = {1 \over {\sqrt {2\left({1 - \vert z\vert} \right)}}}\,.$$
(7.28)

Therefore, the condition number of \(\hat T(\omega)\) can be bounded for all ω:

$$\vert \hat T(\omega)\vert \cdot\vert {\hat T^{- 1}}(\omega)\vert = {\left({{{1 + \vert z\vert} \over {1 - \vert z\vert}}} \right)^{1/2}} \leq{\left({{{1 + \vert a\vert \lambda} \over {1 - \vert a\vert \lambda}}} \right)^{1/2}} < \infty \,,$$
(7.29)

provided that

$$\lambda < {1 \over {\vert a\vert}}\,,$$
(7.30)

and it follows that the Leap-frog scheme is stable under the condition (7.30).

The previous examples were explicit methods, where the solution \(\upsilon _j^{k + 1}\) (or \(w_j^{k + 1}\)) can be explicitly computed from the one at the previous timestep, without inverting any matrices.

Example 36. Crank-Nicholson.

Approximating Eq. (7.10) by

$$\left({1 - a{{\Delta t} \over 2}{D_0}} \right)v_j^{k + 1} = \left({1 + a{{\Delta t} \over 2}{D_0}} \right)v_j^k,$$
(7.31)

with

$${D_0}{v_j}: = {1 \over {2\Delta x}}\left({{v_{j + 1}} - {v_{j - 1}}} \right),$$
(7.32)

defines an implicit method. Fourier transform leads to

$$\left[ {1 - ia{\lambda \over 2}\sin (\omega x)} \right]\hat v_j^{k + 1} = \left[ {1 + ia{\lambda \over 2}\sin (\omega x)} \right]\hat v_j^k.$$
(7.33)

The expressions inside the square brackets on both sides are different from zero and have equal magnitude. As a consequence, the amplification factor in this case satisfies

$$\vert \hat q(\omega)\vert = 1\quad {\rm{for all}}\omega \in {\mathbb {Z}}\,\,{\rm and}\,\,\lambda > 0,$$
(7.34)

and the scheme is unconditionally stable at the expense of having to invert a matrix to advance the solution in time.

Example 37. Iterated Crank-Nicholson.

Approximating the Crank-Nicholson scheme through an iterative scheme with a fixed number of iterations is usually referred to as the Iterated Crank-Nicholson (ICN) method. For Eq. (7.10) it proceeds as follows [414]:

  • First iteration: an intermediate variable (1) is calculated using a second-order-in-space centered difference (7.32) and an Euler, first-order forward-time approximation,

    $${1 \over {\Delta t}}\left({{}^{(1)}\tilde v_j^{n + 1} - v_j^n} \right) = {D_0}\,v_j^n\,.$$
    (7.35)

    Next, a second intermediate variable is computed through averaging,

    $$^{(1)}\bar v_j^{n + 1/2} = {1 \over 2}\left({{}^{(1)}\tilde v_j^{n + 1} + v_j^n} \right).$$
    (7.36)

    The full time step for this first iteration is

    $${1 \over {\Delta t}}\left({v_j^{n + 1} - v_j^n} \right) = {D_0}{\,^{(1)}}\bar v_j^{n + 1/2}.$$
    (7.37)
  • Second iteration: it follows the same steps. Namely, the intermediate variables

    $$\begin{array}{*{20}c} {{1 \over {\Delta t}}\left({{}^{(2)}\tilde v_j^{n + 1} - v_j^n} \right) = {D_0}{\,^{(1)}}\bar v_j^{n + 1/2},\quad \quad \quad \quad \quad \quad} \\ {{}^{(2)}\bar v_j^{n + 1/2} = {1 \over 2}\left({{}^{(2)}\tilde v_j^{n + 1} + v_j^n} \right),} \\ \end{array}$$

    are computed, and the full step is obtained from

    $${1 \over {\Delta t}}\left({v_j^{n + 1} - v_j^n} \right) = {D_0}{\,^{(2)}}\bar v_j^{n + 1/2}\,.$$
    (7.38)
  • Further iterations proceed in the same way.

The resulting discretization is numerically stable for λ ≤ 2/a and p = 2, 3, 6, 7, 10, 11, … iterations, and unconditionally unstable otherwise. In the limit p the ICN scheme becomes the implicit, unconditionally-stable Crank-Nicholson scheme of the previous example. For any fixed number of iterations, though, the method is explicit and stability is contingent on the CFL condition λ ≤ 2/a. The method is unconditionally unstable for p = 4, 5, 8, 9, 12, 13, … because the limit of the amplification factor approaching one in absolute value [cf. Eq. (7.34)] as p increases is not monotonic. See [414] for details and [380] for a similar analysis for “theta” schemes.

Similar definitions to the one of Definition 11 are introduced for the IBVP. For simplicity we explicitly discuss the semi-discrete case. In analogy with the definition of a strongly-well-posed IBVP (Definition 9) one has

Definition 12. A semi-discrete approximation to the linearized version of the IBVP ( 5.1 , 5.2 , 5.3 ) is numerically stable if there are discrete norms ∥ · d at Σ and ∥ · , d at ∂Σ and constants Kd = Kd(T) and εd = εd(T) ≥ 0 such that for high-enough resolution the corresponding approximation v satisfies

$$v(t,\cdot)_{\text{d}}^2 + {\varepsilon _{\text{d}}}\int\limits_0^t {v(s,\cdot)_{\partial ,{\text{d}}}^2} {\mkern 1mu} ds{\text{ }} \leqslant K_{\text{d}}^2\left[ {f_{\text{d}}^2 + \int\limits_0^t {(F(s,\cdot)_{\text{d}}^2 + g(s,\cdot)_{\partial ,{\text{d}}}^2)} ds} \right],$$
(7.39)

for all t ∈ [0, T]. If the constant εd can be chosen strictly positive, the problem is called strongly stable.

In addition, the semi-discrete version of Definitions 6 and 7 lead to the concepts of strong stability in the generalized sense and boundary stability, respectively, which we do not write down explicitly here. The definitions for the fully-discrete case are similar, with time integrals such as those in Eq. (7.39) replaced by discrete sums.

7.2 The von Neumann condition

Consider a discretization for a linear system with variable, time-independent coefficients such that

$${{\bf{v}}^{k + 1}} = {\bf{Q}}{{\bf{v}}^k}\,,$$
(7.40)
$${{\bf{v}}^0} = {\bf{f}}\,,$$
(7.41)

where vk denotes the gridfunction \({{\rm{v}}^k} = \{\upsilon _j^k:j = 0,1, \ldots, N\}\) and Q is called the amplification matrix. We assume that Q is also time-independent. Then

$${{\bf{v}}^k} = {{\bf{Q}}^k}{\bf{f}}$$
(7.42)

and the approximation (7.40, 7.41) is stable if and only if there are constants Kd and αd such that

$${\Vert {{{\bf{Q}}^k}} \Vert_{\rm{d}}} \leq{K_{\rm{d}}}{e^{{\alpha _{\rm{d}}}{t_k}}}$$
(7.43)

for all k = 0, 1, 2, … and high enough resolutions.

In practice, condition (7.43) is not very manageable as a way of determining if a given scheme is stable since it involves computing the norm of the power of a matrix. A simpler condition based on the eigenvalues {qi} of Q as opposed to the norm of Qk is von Neumann’s one:

$$\vert {q_i}\vert \leq{e^{{\alpha _{\rm{d}}}\Delta t}}\quad {\rm{for\,\, all\,\, eigenvalues}}\,\,{q_i}\,\,{\rm{of}}\,\,{\bf{Q}}\,\,{\rm{and\,\, all}}\,\,\Delta t > 0.$$
(7.44)

This condition is necessary for numerical stability: if qi is an eigenvalue of Q, \(q_i^k\) is an eigenvalue of Qk and

$$\vert q_i^k\vert \,\, \leq\,\,\Vert {{{\bf{Q}}^k}} \Vert\,\, \leq{K_{\rm{d}}}{e^{{\alpha _{\rm{d}}}{t_k}}} = {K_{\rm{d}}}{e^{{\alpha _{\rm{d}}}k\Delta t}}.$$
(7.45)

That is,

$$|{q_i}|{\text{ }} \leqslant K_{\text{d}}^{1/k}{e^{{\alpha _{\text{d}}}\Delta t}},$$
(7.46)

which, m order to be valid for all k, implies Eq. (7.44).

As already mentioned, in order to analyze numerical stability, one can drop lower-order terms. Doing so typically leads to Q depending on Δt and Δx only through a quotient (the CFL factor) of the form (with p = 1 for hyperbolic equations)

$$\lambda = {{\Delta t} \over {{{(\Delta x)}^p}}}\,,$$
(7.47)
$${\bf{Q}}(\Delta t,\Delta x) = {\bf{Q}}(\lambda)\,.$$
(7.48)

then for Eq. (7.44) to hold for all Δt > 0 while keeping the CFL factor fixed (in particular, for small Δt > 0), the following condition has to be satisfied:

$$\vert {q_i}\vert \leq1\quad {\rm{for\,\,all\,\,eigenvalues}}\,\,{q_i}\,\,{\rm{of}}\,\,{\bf{Q}}\,,$$
(7.49)

and one has a stronger version of the von Neumann condition, which is the one encountered in Example 33; see Eq. (7.18).

7.2.1 The periodic, scalar case

We return to the periodic scalar case, such as the schemes discussed in Examples 33, 34, 35, and 36 with some more generality. Suppose then, in addition to the linearity and time-independent assumptions of the continuum problem, that the initial data and discretization (7.40, 7.41) are periodic on the interval [0, 2π]. Through a Fourier expansion we can write the grid function f = (f(x0), f(x1), …, f(xN)) corresponding to the initial data as

$${\bf{f}} = {1 \over {\sqrt {2\pi}}}\sum\limits_{\omega \in {\mathbb{Z}}} {\hat f} (\omega){{\bf{e}}^{i\omega}},$$
(7.50)

where \({{\rm{e}}^{i\omega}} = ({e^{i\omega {x_0}}},{e^{i\omega {x_1}}}, \ldots, {e^{i\omega {x_N}}})\). The approximation becomes

$${{\bf{v}}^k} = {1 \over {\sqrt {2\pi}}}\sum\limits_{\omega \in {\mathbb{Z}}} {\hat f} (\omega){{\bf{Q}}^k}{{\bf{e}}^{i\omega}}.$$
(7.51)

Assuming that Q is diagonal in the basis e, such that

$${\bf{Q}}{{\bf{e}}^{i\omega}} = \hat q(\omega){{\bf{e}}^{i\omega}},$$
(7.52)

as is many times the case, we obtain, using Parseval’s identity,

$$\Vert {{{\bf{v}}^k}} \Vert = {\left({\sum\limits_{\omega \in {\mathbb{Z}}} \vert \hat f(\omega){\vert ^2}\vert \hat q{{(\omega)}^k}{\vert ^2}} \right)^{1/2}}.$$
(7.53)

if

$$\vert \hat q(\omega)\vert \leq{e^{{\alpha _d}\Delta t}}\quad {\rm{for\,\, all\,\, eigenvalues}}\,\,\omega \in {\mathbb Z} \,\,{\rm and}\,\,\Delta t > 0,$$
(7.54)

for some constant αd then

$$\Vert {{{\bf{v}}^k}} \Vert \leq{e^{{\alpha _{\rm{d}}}k\Delta t}}{\left({\sum\limits_\omega \vert \hat f(\omega){\vert ^2}} \right)^{1/2}} = {e^{{\alpha _{\rm{d}}}k\Delta t}}\Vert {\bf{f}} \Vert = {e^{{\alpha _{\rm{d}}}{t_k}}}\Vert {\bf{f}} \Vert$$
(7.55)

and stability follows. Conversely, if the scheme is stable and (7.52) holds, (7.54) has to be satisfied. Take

$${\bf{f}} = {{\bf{e}}^{i\omega}}\,,$$
(7.56)

then

$$\vert {\hat q^k}(\omega)\vert \Vert {\bf{f}} \Vert = \Vert {{{\bf{v}}^k}} \Vert \leq{K_{\rm{d}}}{e^{{\alpha _{\rm{d}}}{t_k}}}\Vert {\bf{f}} \Vert,$$
(7.57)

or

$$|\hat q(\omega )|{\mkern 1mu} {\mkern 1mu} {\text{ }} \leqslant K_{\text{d}}^{1/k}{e^{{\alpha _{\text{d}}}\Delta t}}$$
(7.58)

for arbitrary k, which implies (7.54). Therefore, provided the condition (7.52) holds, stability is equivalent to the requirement (7.54) on the eigenvalues of Q.

7.2.2 The general, linear, time-independent case

However, as mentioned, the von Neumann condition is not sufficient for stability, neither in its original form (7.44) nor in its strong one (7.49), unless, for example, Q can be uniformly diagonalized. This means that there exists a matrix T such that

$$\Lambda = {{\bf{T}}^{- 1}}{\bf{QT}} = {\rm{diag}}({q_0}, \ldots ,{q_N})$$
(7.59)

is diagonal and the condition number of T with respect to the same norm,

$${\kappa _{\rm{d}}}({\bf{T}}): = {\Vert {\bf{T}} \Vert_{\rm{d}}}{\Vert {{{\bf{T}}^{- 1}}} \Vert_{\rm{d}}}$$
(7.60)

is bounded

$${\kappa _{\rm{d}}}({\bf{T}}) \leq{K_{\rm{d}}}$$
(7.61)

for some constant Kd independent of resolution (an example is that of Q being normal, QQ* = Q*Q). In that case

$${{\bf{v}}^k} = {\bf{T}}{\Lambda ^k}{{\bf{T}}^{- 1}}{\bf{f}}$$
(7.62)

and

$${\Vert {{{\bf{v}}^k}} \Vert_{\rm{d}}} \leq\kappa ({\bf{T}}){\underset {i}{\rm {max}}} \vert {q_i}{\vert ^k}{\Vert {\bf{f}} \Vert_{\rm{d}}} \leq{K_{\rm{d}}}{e^{{\alpha _{\rm{d}}}k\Delta t}}{\Vert {\bf{f}} \Vert_{\rm{d}}} = {K_{\rm{d}}}{e^{{\alpha _{\rm{d}}}{t_k}}}{\Vert {\bf{f}} \Vert_{\rm{d}}}\,.$$
(7.63)

Next, we discuss two examples where the von Neumann condition is satisfied but the resulting scheme is unconditionally unstable. The first one is for a well-posed underlying continuum problem and the second one for an ill-posed one.

Example 38. An unstable discretization, which satisfies the von Neumann condition for a trivially-well-posed problem [228].

Consider the following system on a periodic domain with periodic initial data

$${u_t} = 0\,,\qquad u = \left({\begin{array}{*{20}c} {{u_1}} \\ {{u_2}} \\ \end{array}} \right)\,,$$
(7.64)

discretized as

$${{{{\bf{v}}^{k + 1}} - {{\bf{v}}^k}} \over {\Delta t}} = - \Delta x\left({\begin{array}{*{20}c} 0 & 1 \\ 0 & 0 \\ \end{array}} \right)D_0^2{{\bf{v}}^k}$$
(7.65)

with D0 given by Eq. (7.32). The Fourier transform of the amplification matrix and its k-th power are

$$\hat{\bf{Q}} = \left({\begin{array}{*{20}c} 1 & {\lambda {{\sin}^2}(\omega \Delta x)} \\ 0 & 1 \\ \end{array}} \right)\,,\qquad {\hat{\bf{Q}}^k} = \left({\begin{array}{*{20}c} 1 & {k\lambda {{\sin}^2}(\omega \Delta x)} \\ 0 & 1 \\ \end{array}} \right)\,.$$
(7.66)

The von Neumann condition is satisfied, since the eigenvalues are 1. However, the discretization is unstable for any value of λ > 0. For the unit vector e = (0, 1)T, for instance, we have

$$\vert {\hat{\bf{Q}}^k}{\bf{e}}\vert = \sqrt {1 + {{\left({k\lambda} \right)}^2}{{\sin}^4}\left({\omega \Delta x} \right)} \,,$$
(7.67)

which grows without bound as k is increased for sin (ωΔx) ≠ 0.

The von-Neumann condition is clearly not sufficient for stability in this example because the amplification matrix not only cannot be uniformly diagonalized, but it cannot be diagonalized at all because of the Jordan block structure in (7.65).

Example 39. Ill-posed problems are unconditionally unstable, even if they satisfy the von Neumann condition. The following example is drawn from [107].

Consider the periodic Cauchy problem

$${u_t} = A{\mu _x},$$
(7.68)

where u = (u1, u2)T, A is a 2 × 2 constant matrix, and the following discretization. The right-hand side of the equation is approximated by a second-order centered derivative plus higher (third) order numerical dissipation (see Section 8.5)

$$A{u_x} \rightarrow A{D_0}v - \epsilon I{(\Delta x)^3}D_ + ^2D_ - ^2v\,,$$
(7.69)

where I is the 2 × 2 identity matrix, ϵ ≥ 0 an arbitrary parameter regulating the strength of the numerical dissipation and D+, D are first-order forward and backward approximations of d/dx,

$${D_ +}{v_j}: = {{{v_{j + 1}} - {v_j}} \over {\Delta x}},\quad {D_ -}{v_j}: = {{{v_j} - {v_{j - 1}}} \over {\Delta x}}.$$
(7.70)

The resulting system of ordinary differential equations is marched in time (method of lines, discussed in Section 7.3) through an explicit method: the iterated Crank-Nicholson (ICN) one with an arbitrary but fixed number of iterations p (see Example 37).

If the matrix A is diagonalizable, as in the scalar case of Example 37, the resulting discretization is numerically stable for λ ≤ 2/a and p = 2, 3, 6, 7, 10, 11,…, even without dissipation. On the other hand, if the system (7.68) is weakly hyperbolic, as when the principal part has a Jordan block,

$$A = \left({\begin{array}{*{20}c} {a\;\;1} \\ {0\;\;2} \\ \end{array}} \right),$$
(7.71)

one can expect on general grounds that any discretization will be unconditionally unstable. As an illustration, this was explicitly shown in [107] for the above scheme and variations of it. In Fourier space the amplification matrix and its k-th power take the form

$$\hat Q = \left({\begin{array}{*{20}c} {c\;b} \\ {0\;c} \\ \end{array}} \right)\,,\qquad {\hat Q^k} = \left({\begin{array}{*{20}c} {{c^k}} & {k{c^{k - 1}}b} \\ 0 & {{c^k}} \\ \end{array}} \right),$$
(7.72)

with coefficients c, b depending on {a, λ, ωΔx, ϵ} such that for an arbitrary small initial perturbation at just one gridpoint,

$$v_0^0 = {(0,2\pi \epsilon)^{\rm{T}}},\qquad v_j^0 = {(0,0)^{\rm{T}}}\quad {\rm{otherwise}},$$
(7.73)

the solution satisfies

$$\Vert {{\bf{v}}^k}\Vert _{\rm{d}}^2 \geq C{k^{5/4}}\Vert {{\bf{v}}^0}\Vert _{\rm{d}}^2\quad {\rm{for\;some\;constant}}\;\;C,$$
(7.74)

and is therefore unstable regardless of the value of λ and ϵ. On the other hand, the von Neumann condition ∣a∣ ≤ 1 is satisfied if and only if

$$0 \leq \epsilon\lambda \leq 1/8.$$
(7.75)

Notice that, as expected, the addition of numerical dissipation cannot stabilize the scheme independent of its amount. Furthermore, adding dissipation with a strength parameter ϵ > 1/(8λ) violates the von Neumann condition (7.75) and the growth rate of the numerical instability worsens.

7.3 The method of lines

A convenient approach both from an implementation point of view as well as for analyzing numerical stability or constructing numerically-stable schemes is to decouple spatial and time discretizations. That is, one first analyzes stability under some spatial approximation assuming time to be continuous (semi-discrete stability) and then finds conditions for time integrators to preserve stability in the fully-discrete case.

In general, this method provides only a subclass of numerically-stable approximations. However, it is a very practical one, since spatial and time stability are analyzed separately and stable semi-discrete approximations and appropriate time integrators can then be combined at will, leading to modularity in implementations.

7.3.1 Semi-discrete stability

Consider the approximation

$${{\bf{v}}_t}(t) = {\bf{Lv}}\,,\quad t > 0$$
(7.76)
$${\bf{v}}(0) = {\bf{f}}$$
(7.77)

for the initial value problem (7.1, 7.2). The scheme is semi-discrete stable if the solution to Eqs. (7.76, 7.77) satisfies the estimate (7.3).

In the time-independent case, the solution to (7.76, 7.77) is

$${\bf{v}}(t) = {e^{{\bf{L}}t}}{\bf{f}}$$
(7.78)

and stability holds if and only if there are constants Kd and αd such that

$$\Vert {e^{{\bf{L}}t}}{\Vert _{\rm{d}}} \leq {K_{\rm{d}}}{e^{{\alpha _{\rm{d}}}t}}\quad {\rm{for\,\,all}}\;t \geq 0.$$
(7.79)

The von Neumann condition now states that there exists a constant αd, independent of spatial resolution (i.e., the size of the matrix L), such that the eigenvalues i of L satisfy

$$\underset {i} {\max} \,{\rm Re} ({\ell _i}) \leq {\alpha _{\rm{d}}}.$$
(7.80)

This is the discrete-in-space version of the Petrovskii condition; see Lemma 2. As already pointed out, it is not always a sufficient condition for stability, unless L can be uniformly diagonalized. Also, if the lower-order terms are dropped from the analysis then

$${\bf{L}} = {1 \over {{{(\Delta x)}^p}}}\tilde{\bf{L}}$$
(7.81)

with \({{\rm{\tilde L}}}\) independent of Δx, and in order for (7.80) to hold for all Δx (in particular small Δx),

$$\underset {i} {\max} \,{\rm Re} ({\ell _i}) \leq 0 .$$
(7.82)

which is a stronger version of the semi-discrete von Neumann condition.

Semi-discrete stability also follows if L is semi-bounded, that is, there is a constant αd independent of resolution such that (cf. Eq. (3.25) in Theorem 1)

$${\langle {\bf{v}},{\bf{Lv}}\rangle _{\rm{d}}} + {\langle {\bf{Lv}},{\bf{v}}\rangle _{\rm{d}}} \leq 2{\alpha _{\rm{d}}}\Vert {\bf{v}}\Vert _{\rm{d}}^2\quad {\rm{for\;all}}\;\;{\bf{v}}.$$
(7.83)

In that case, the semi-discrete approximation (7.76, 7.77) is numerically stable, as follows immediately from the following energy estimate arguments,

$${d \over {dt}}\Vert {\bf{v}}\Vert _{\rm{d}}^2 = {d \over {dt}}{\langle {\bf{v}},\;{\bf{v}}\rangle _{\rm{d}}} = {\langle {\bf{Lv}},\;{\bf{v}}\rangle _{\rm{d}}} + {\langle {\bf{v}},\;{\bf{Lv}}\rangle _{\rm{d}}} \leq 2{\alpha _{\rm{d}}}\Vert {\bf{v}}\Vert _{\rm{d}}^2.$$
(7.84)

For a large class of problems, which can be shown to be well posed using the energy estimate, one can construct semi-bounded operators L by satisfying the discrete counterpart of the properties of the differential operator P in Eq. (7.1) that were used to show well-posedness. This leads to the construction of spatial differential approximations satisfying the summation by parts property, discussed in Sections 8.3 and 9.4.

7.3.2 Fully-discrete stability

Now we consider explicit time integration for systems of the form (7.76, 7.77) with time-independent coefficients. That is, if there are N points in space we consider the system of ordinary differential equations (ODEs)

$${{\bf{v}}_t} = \;{\bf{Lv}},$$
(7.85)

where L is an N × N matrix.

In the previous Section 7.3.1 we derived necessary conditions for semi-discrete stability of such systems. Namely, the von Neumann one in its weak (7.80) and strong (7.82) forms. Below we shall derive necessary conditions for fully-discrete stability for a large class of time integration methods, including Runge-Kutta ones. Upon time discretization, stability analyses of (7.85) require the introduction of the notion of the region of absolute stability of ODE solvers. Part of the subtlety in the stability analysis of fully-discrete systems is that the size N of the system of ODEs is not fixed; instead, it depends on the spatial resolution. However, the obtained necessary conditions for fully-discrete stability will also turn out to be sufficient when combined with additional assumptions. We will also discuss sufficient conditions for fully-discrete stability using the energy method.

Necessary conditions. Recall the von Neumann condition for the semi-discrete system (7.85): if i is an eigenvalue of L, a necessary condition for semi-discrete stability is [cf. Eq. (7.80)]

$$\underset {i} {\max} \,{\rm Re} ({\ell _i}) \leq {\alpha _{\rm{d}}}.$$
(7.86)

for some αd independent of N.

Suppose now that the system of ODEs (7.85) is evolved in time using a one-step explicit scheme,

$${{\bf{v}}^{k + 1}} = \;{\bf{Q}}{{\bf{v}}^k},$$
(7.87)

and recall (cf. Eq. (7.49)) that a necessary condition for the stability of the fully-discrete system (7.87) under the assumption (7.48) that Q depends only on the CFL factor is that its eigenvalues {qi} satisfy

$$\underset i {\max} \,\vert {q_i}\vert \; \leq \;1$$
(7.88)

for all spatial resolutions N. Next, assume that the ODE solver is such that

$${\bf{Q}} = {\rm{R}}(\Delta t{\bf{L}})\,,\;\;{\rm{where}}\,{\rm{R}}\;{\rm{is}}\;{\rm{a}}\;{\rm{polynomial}}\;{\rm{in}}\;\Delta t{\bf{L}},^{28}$$
(7.89)

Footnote 28 and notice that if i is an eigenvalue of L, then

$${r_i}: = {\rm{R}}(\Delta t{\ell _i})$$
(7.90)

is an eigenvalue of Q,

$$\{{r_i}\} \subset \{{q_i}\} \,,$$
(7.91)

and (7.88) implies that a necessary condition for fully-discrete stability is

$$\underset i {\max} \vert {r_i}\vert \; \leq \;1\,.$$
(7.92)

Definition 13. The region of absolute stability of a one-step time integration scheme R is defined as

$${{\mathcal S}_{\rm{R}}}: = \{z \in \mathbb{C}:\vert {\rm{R}}(z)\vert \; \leq \;1\} {.}$$
(7.93)

The necessary condition (7.92) can then be restated as:

Lemma 6 (Fully-discrete von Neumann condition for the method of lines.). Consider the semi-discrete system (7.85) and a one-step explicit time discretization (7.87) satisfying the assumptions (7.89) . Then, a necessary condition for fully-discrete stability is that the spectrum of the scaled spatial approximation ΔtL is contained in the region of absolute stability of the ODE solver R,

$$\sigma (\Delta t{\bf{L}}) \subset {{\mathcal S}_{\rm{R}}},$$
(7.94)

for all spatial resolutions N.

In the absence of lower-order terms and under the already assumed conditions (7.89) the strong von Neumann condition (7.82) then implies that \({{\mathcal S}_{\rm{R}}}\) must overlap the half complex plane {z ∈ ℂ: Re(z) ≤ 0}. In particular, this is guaranteed by locally-stable schemes, defined as follows.

Definition 14. An ODE solver R is said to be locally stable if its region of absolute stability \({{\mathcal S}_{\rm{R}}}\) contains an open half disc D(r) := {μ ∈ ℂ : ∣μ∣ < r, Re(μ) < 0} for some r > 0 such that

$${D_ -}(r) \subset {{\mathcal S}_{\rm{R}}}.$$
(7.95)

As usual, the von Neumann condition is not sufficient for numerical stability and we now discuss an example, drawn from [268], showing that the particular version of Lemma 6 is not either.

Example 40. Consider the following advection problem with boundaries:

$$\begin{array}{*{20}c} {{u_t} = - {u_x},\;\;x \in [0,\;1],\quad t \geq 0,} \\ {u(t,\;0) = 0,\;\;x = 0,\quad t \geq 0,\quad \,\,\,\,\,} \\ {u(0,\;x) = f(x),\;\;x \in [0,\;1],\quad t = 0,\quad \quad \,\,} \\ \end{array}$$

where f is smooth and compactly supported on [0, 1], and the one-sided forward Euler discretization, cf. Example 33, with injection of the boundary condition (see Section 10):

$$\begin{array}{*{20}c} {{{v_j^{k + 1} - v_j^k} \over {\Delta t}} = - {{v_j^k - v_{j - 1}^k} \over {\Delta x}}\,,\quad {\rm{for}}\;j = 1,2, \ldots N,} \\ {\quad \quad \quad \;\;v_0^k = 0\,,\;\;\,{\rm{for}}\;j = 0.} \\ \end{array}$$

The corresponding semi-discrete scheme can be written in the form

$${d \over {dt}}{\bf{v}} = {\bf{Lv}}$$
(7.96)

for (notice that in the following expression the boundary point is excluded and not evolved)

$${\bf{v}} = {\left({{v_1},{v_2}, \ldots ,{v_N}} \right)^T},$$
(7.97)

with L the banded matrix

$${\bf{L}} = {1 \over {\Delta x}}\left({\begin{array}{*{20}c} {- 1} & 0 & 0 & \cdots & 0 \\ 1 & {- 1} & 0 & \cdots & 0 \\ 0 & 1 & {- 1} & \cdots & 0 \\ \vdots & \vdots & \ddots & \ddots & 0 \\ 0 & 0 & \cdots & 1 & {- 1} \\ \end{array}} \right),$$
(7.98)

followed by integration in time through the Euler method,

$${{\bf{v}}^{k + 1}} = \left({{\bf{1}} + \Delta t{\bf{L}}} \right){{\bf{v}}^k}.$$
(7.99)

Since L is triangular, its eigenvalues are the elements of the diagonal; namely, {i} = {−1/Δx}, i.e., there is a single, degenerate eigenvalue q = −1/Δx.

The region of local stability of the Euler method is

$${S_E} = \{z \in \mathbb{C}:\vert 1 + z\vert \; \leq \;1\} ,$$
(7.100)

which is a closed disk of radius 1 in the complex plane centered at z0 = −1; see Figure 2. The von Neumann condition then is

$$\vert 1 + q\Delta t\vert \; = \left\vert {1 - {{\Delta t} \over {\Delta x}}} \right\vert \leq 1$$
(7.101)

or

$${{\Delta t} \over {\Delta x}} \leq 2.$$
(7.102)

On the other hand, if the initial data has compact support, the numerical solution at early times is that of the periodic problem discussed in Example 33, for which the Fourier analysis gave a stability condition of

$${{\Delta t} \over {\Delta x}} \leq 1.$$
(7.103)
Figure 2
figure 2

Regions of absolute stability of RK time integrators with s = n. Second-order RK (RK2) and the Euler method are not locally stable.

Sufficient conditions. Under additional assumptions, fully-discrete stability does follow from semi-discrete stability if the time integration is locally stable:

Theorem 11 (Kreiss-Wu [268]). Assume that

  1. (i)

    A consistent semi-discrete approximation to a constant-coefficient, first-order IBVP is stable in the generalized sense (see Definition 12 and the following remarks).

  2. (ii)

    The resulting system of ODEs is integrated with a locally-stable method of the form (7.89) , with stability radius r > 0.

  3. (iii)

    If α ∈ ℝ is such that ∣α∣ < r and

    $${\rm{R}}(i\alpha) = {e^{i\phi}}\,,\quad \phi \in \;{\rm{R}},$$
    (7.104)

    then there is no β ∈ ℝ such that ∣β∣ < r, R() = e, and βα.

Then the fully-discrete system is numerically stable, also in the generalized sense, under the CFL condition

$$\Delta t\Vert {\bf{L}}{\Vert _{\rm{d}}}\; \leq \;\lambda \quad for\;any\;\lambda < r.$$
(7.105)

Remarks

  • Condition (iii) can be shown to hold for any consistent approximation, if r is sufficiently small [268].

  • Explicit, one-step Runge-Kutta (RK) methods, which will be discussed in Section 7.5, are in particular of the form (7.89) when applied to linear, time-independent problems. In fact, consider an arbitrary, consistent, one-step, explicit ODE solver (7.87) of the form given in Eq. (7.89),

    $${\bf{Q}} = {\rm{R}}(\Delta t{\bf{L}}) = \sum\limits_{j = 0}^s {{\alpha _j}} {{{{\left({\Delta t{\bf{L}}} \right)}^j}} \over {j!}}\quad {\rm{with}}\;{\alpha _s} \neq 0;$$
    (7.106)

    the integer s is referred to as the number of stages. Since the exact solution to Eq. (7.85) is v(t) = etLv(0) and, in particular,

    $${\bf{v}}({t_{k + 1}}) = {e^{\Delta t{\bf{L}}}}{\bf{v}}({t_k}) = \sum\limits_{j = 0}^\infty {{{{{\left({\Delta t{\bf{L}}} \right)}^j}} \over {j!}}} {\bf{v}}({t_k}),$$
    (7.107)

    Eq. (7.106) must agree with the first n terms of the Taylor expansion of eΔtL, where n is the order of the globalFootnote 29 truncation error of the ODE solver, defined through

    $${\rm{R}}(\Delta t{\bf{L}}) - {e^{\Delta t{\bf{L}}}} = {\mathcal O}{\left({\Delta t{\bf{L}}} \right)^{n + 1}}.$$
    (7.108)

    Therefore, we must have

    $${\alpha _j} = 1\;\;{\rm{for}}\;0 \leq j \leq n.$$
    (7.109)

    We then see that a scheme of order n needs at least n stages, sn, and

    $${\rm{R}}(\Delta t{\bf{L}}) = \sum\limits_{j = 0}^n {{{{{\left({\Delta t{\bf{L}}} \right)}^j}} \over {j!}}} + \sum\limits_{j = n + 1}^s {{\alpha _j}} {{{{\left({\Delta t{\bf{L}}} \right)}^j}} \over {j!}}.$$
    (7.110)

    The above expression in particular shows that when s = n (i.e., when the second sum on the right-hand side is zero) the scheme is unique, with coefficients given by Eq. (7.109). In particular, for n = 1 = s, such a scheme corresponds to the Euler one discussed in the example above; see Eq. (7.99).

  • As we will discuss in Section 7.5, in the nonlinear case it is possible to choose RK methods with s = n if and only if n < 5.

  • When s = n, first and second-order RK are not locally stable, while third and fourth order are. The fifth-order Dormand-Prince scheme (also introduced in Section 7.5) is also locally stable.

Using the energy method, fully-discrete stability can be shown (resulting in a more restrictive CFL limit) for third-order Runge-Kutta integration and arbitrary dissipative operators L [282, 409]:

Theorem 12 (Levermore). Suppose L is dissipative, that is Eq. (7.83) with ad = 0 holds. Then, the third-order Runge-Kutta approximation v(tk+1) = R3tL)v(tk), where

$${{\rm{R}}_3}(z) = 1 + z + {{{z^2}} \over 2} + {{{z^3}} \over 6},$$
(7.111)

for the semi-discrete problem (7.85) is strongly stable,

$$\Vert {{\rm{R}}_3}(\Delta t{\bf{L}}){\Vert _{\rm{d}}} \leq 1$$
(7.112)

under the CFL timestep restriction ΔtLd ≤ 1.

Notice that the restriction αd = 0 is not so severe, since one can always achieve it by replacing L with LαdI. A generalization of Theorem 12 to higher-order Runge-Kutta methods does not seem to be known.

7.4 Strict or time-stability

Consider a well-posed problem. For the sake of simplicity, we assume it is a linear initial-value one; a similar discussion holds for linear IBVPs. According to Definition 3, there are constants K and α such that smooth solutions satisfy

$$\Vert u(t,\cdot)\Vert \; \leq \;K{e^{\alpha t}}\Vert u(0,\cdot)\Vert \quad \quad {\rm{for}}\;{\rm{all}}\;t \geq 0.$$
(7.113)

Definition 15. For a numerically-stable semi-discrete approximation, there are resolution independent constants Kd, αd such that for all initial data

$$\Vert v(t,\cdot){\Vert _{\rm{d}}}\, \leq {K_{\rm{d}}}{e^{{\alpha _{\rm{d}}}t}}\Vert v(0,\cdot){\Vert _{\rm{d}}}\quad \quad for\;all\;t \geq 0.$$
(7.114)

The approximation is called strict or time stable if, for finite differences,

$${\alpha _{\rm{d}}} \leq \alpha + {\mathcal O}(\Delta x){.}$$
(7.115)

Similar definitions hold in the fully-discrete case and/or when the spatial approximation is not a finite difference one. Essentially, (7.115) attempts to capture the notion that the numerical solution should not have, at a fixed resolution, growth in time, which is not present at the continuum. However, the problem with the definition is that it is not useful if the estimate (7.113) is not sharp, since neither will be the estimate (7.114), and the numerical solution can still exhibit artificial growth.

Example 41. Consider the problem (drawn from [278])

$${u_t} = u\prime + {u \over x},\;\;\;a \leq x \leq b,\quad t \geq 0,$$
(7.116)
$$u(t,\;b) = {1 \over b},\;\;x = b,\quad t \geq 0,$$
(7.117)
$$u(0,\;x) = {1 \over x},\;\;a \leq x \leq b,\quad t = 0,$$
(7.118)

where b > a > 0, for which the solution is the stationary one

$$u(t,\;x) = {1 \over x}{.}$$
(7.119)

Defining

$$E(t) = \langle u(t,\cdot),\;u(t,\cdot)\rangle \;{\rm{with}}\;\langle u,\;v\rangle = \int\limits_a^b u (x)\,v(x)\,dx,$$
(7.120)

it follows that

$${d \over {dt}}E(t) = {1 \over {{b^2}}} - u{(a)^2} + 2\langle u,\;u/x\rangle \leq {1 \over {{b^2}}} + 2\langle u,\;u/x\rangle \leq {1 \over {{b^2}}} + {2 \over a}\langle u,\;u\rangle = {1 \over {{b^2}}} + {2 \over a}2E(t){.}$$
(7.121)

This energy estimate implies that the spatial norm of u(t) cannot grow faster than et/a. However, in principle this need not be a sharp bound. In fact, Eq. (7.116) is, in disguise, the advection equation (xu)t = (xu)x for which the general solution (xu)(t, x) = F(t + x) does not grow in time. Therefore, a numerical scheme for the problem (7.116, 7.117, 7.118) whose solutions are allowed to grow exponentially at the rate et/a is strictly stable according to Definition 15 but the classification of the scheme as such is not of much use if the growth does take place. In order to illustrate this, we show the results for two schemes. In the first one, spurious growth takes place although the scheme is strictly stable, and in the second case one obtains a strictly-stable scheme with respect to a sharp energy estimate, which does not exhibit such growth.

If the system (7.116, 7.117, 7.118) is approximated by

$${v_t} = Dv + {v \over x},$$
(7.122)

where D is a finite difference operator satisfying summation by parts (SBP) (Section 8.3) and the boundary condition is imposed through a projection method (Section 10), it can be shown — as discussed in Section 8 — that the following semi-discrete estimate holds, at least for analytic boundary conditions, with the discrete SBP scalar product 〈u, vΣ defined by Eq. (8.21) in the next section,

$${d \over {dt}}{E_{\rm{d}}}(t) \leq {1 \over {{b^2}}} + {2 \over a}{E_{\rm{d}}}(t)\,,\quad \quad {E_{\rm{d}}}: = {\langle v,\;v\rangle _{\rm{d}}}.$$
(7.123)

Technically, the semi-discrete approximation (7.122) is strictly stable, since the continuum (7.121) and semi-discrete (7.123) estimates agree. However, this does not preclude spurious growth, as the bounds are not sharp. The left panel of Figure 3 shows results for the D2−1 SBP operator (see the definition in Example 51 below), boundary conditions through an orthogonal projection, and third-order Runge-Kutta time integration. At any time, the errors do converge to zero with increasing resolution as expected since the scheme is numerically stable. However, at any fixed resolution there is spurious growth in time.

Figure 3
figure 3

Comparison [278] between numerical solutions to (7.122) (left, and right labeled as Non-strictly stable) and (7.124) (right, labeled as Strictly stable). The left panel shows that the scheme is numerically stable but not time-stable. The right panel shows that a conservative scheme is time-stable and considerably more accurate. Reprinted with permission from [278]; copyright by IOP.

On the other hand, discretizing the system as

$${v_t} = {1 \over x}D\;(xv)$$
(7.124)

is, in general, not equivalent to (7.122) because difference operators do not satisfy the Leibnitz rule exactly. In fact, defining

$${\tilde E_{\rm{d}}}: = {\langle xv,\;xv\rangle _\Sigma},$$
(7.125)

it follows that

$${d \over {dt}}{\tilde E_{\rm{d}}}(t) = - {{{a^2}v{{(a)}^2}} \over 2} + {{{b^2}v{{(b)}^2}} \over 2},$$
(7.126)

which, being an equality, is as sharp as an estimate can be.

The right panel of Figure 3 shows a comparison between discretizations (7.122) and (7.124), as well as (7.122) with the addition of numerical dissipation (see Section 8.5), in all cases at the same fixed resolution. Even though numerical dissipation does stabilize the spurious growth in time, the strictly-stable discretization (7.124) is considerably more accurate. Technically, according to Definition 15, the approximation (7.122) is also strictly stable, but it is more useful to reserve the term to the cases in which the estimate is sharp. The approximation (7.124), on the other hand, is (modulo the flux at boundaries, discussed in Section 10) energy preserving or conservative.

In order to construct conservative or time-stable semi-discrete schemes, one essentially needs to write the approximation by grouping terms in such a way that when deriving at the semi-discrete level what would be the conservation law at the continuum, the need of using the Leibnitz rule is avoided. In addition, the numerical imposition of boundary conditions also plays a role (see Section 10).

In many application areas, conservation or time-stability play an important role in the design of numerical schemes. That is not so much (at least so far) the case for numerical solutions of Einstein’s equations, because in general relativity there is no gauge-invariant local notion of conserved energy unlike many other nonlinear hyperbolic systems (most notably, in Newtonian or special relativistic Computational Fluid Dynamics); see, however, [400]. In addition, there are no generic sharp estimates for the growth of the solution that can guide the design of numerical schemes. However, in simpler settings such as fields propagating on some stationary fixed-background geometry, there is a notion of conserved local energy and accurate conservative schemes are possible. Interestingly, in several cases such as Klein-Gordon or Maxwell fields in stationary background spacetimes the resulting conservation of the semi-discrete approximations follows regardless of the constraints being satisfied (see, for example, [278]). A local conservation law in stationary spacetimes can also guide the construction of schemes to guarantee stability in the presence of coordinate singularities [105, 375, 225, 310], as discussed in Section 7.6.

In addition, there has been work done on variational, symplectic or mimetic integration techniques for Einstein’s equations, which aim at exactly or approximately preserving the discrete constraints, while solving the discrete evolution equations. See, for example, [304, 139, 201, 200, 76, 110, 359, 173, 358, 174, 360, 357].

7.5 Runge-Kutta methods

In the method-of-lines approach one ends up effectively integrating a system of ordinary differential equations, which we now generically denote by

$${d \over {dt}}y(t) = f(t,y){.}$$
(7.127)

The majority of approaches in numerical relativity use one-step, multi-stage, explicit Runge-Kutta (RK) methods, which take the following form.

Definition 16. An explicit, s-stage, Runge-Kutta method is of the form

$$\begin{array}{*{20}c} {{k_1} = f\left({{t_n},{y_n}} \right),\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad} \\ {{k_2} = f\left({{t_n} + {c_2}\Delta t,{y_n} + \Delta t{a_{21}}{k_1}} \right),\quad \quad \quad \quad \quad \quad \;\;\;} \\ {{k_3} = f\left({{t_n} + {c_3}\Delta t,{y_n} + \Delta t\left({{a_{31}}{k_1} + {a_{32}}{k_2}} \right)} \right),\quad \quad \quad \;} \\ {\vdots \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad} \\ {{k_s} = f\left({{t_n} + {c_s}\Delta t,{y_n} + \Delta t\left({{a_{s1}}{k_1} + \ldots + {a_{s,s - 1}}{k_{s - 1}}} \right)} \right),\,\,\,} \\ {{y_{n + 1}} = {y_n} + \Delta t({b_1}{k_1} + \ldots + {b_s}{k_s}).\quad \quad \quad \quad \quad \quad \quad \quad \quad} \\ \end{array}$$

with bl (1 ≤ ls), cj (2 ≤ js), and aji(1 ≤ ij − 1) real numbers.

Next we present a few examples, in increasing order of accuracy. The simplest one is a forward finite-difference scheme [cf. Eq. (7.70)].

Example 42. The Euler method (first-order, one-stage):

$$\begin{array}{*{20}c} {{k_1} = f\left({{t_n},{y_n}} \right),} \\ {{y_{n + 1}} = {y_n} + \Delta t{k_1}.\quad} \\ \end{array}$$

It corresponds to all the coefficients set to zero except b1 = 1.

Example 43. Second-order, two-stages RK:

$$\begin{array}{*{20}c} {{k_1} = f\left({{t_n},{y_n}} \right),\quad \quad \quad \;} \\ {{k_2} = f\left({{t_n} + {{\Delta t} \over 2},{{\Delta t} \over 2}{k_1}} \right),} \\ {{y_{n + 1}} = {y_n} + \Delta t{k_2}.\quad \quad \quad \quad \quad} \\ \end{array}$$

The non-vanishing coefficients are c2 = 1/2 = a21, b2 = 1.

Example 44. Third-order, four-stages RK:

$$\begin{array}{*{20}c} {{k_1} = f\left({{t_n},{y_n}} \right),\quad \quad \quad \quad \quad \quad} \\ {{k_2} = f\left({{t_n} + {{\Delta t} \over 2},{y_n} + {{\Delta t} \over 2}{k_1}} \right),\quad} \\ {{k_3} = f\left({{t_n} + \Delta t,{y_n} + \Delta t{k_2}} \right),\quad \quad} \\ {{k_4} = f\left({{t_n} + \Delta t,{y_n} + \Delta t{k_3}} \right),\quad \quad} \\ {{y_{n + 1}} = {y_n} + \Delta t\left({{{{k_1}} \over 6} + {{2{k_2}} \over 3} + {{{k_4}} \over 6}} \right).\quad \quad} \\ \end{array}$$

In the previous case, the number of stages is larger than the order of the scheme. It turns out that it is possible to have a third-order RK scheme with three stages.

Example 45. Third-order, three-stages Heun-RK method

$$\begin{array}{*{20}c} {{k_1} = f\left({{t_n},{y_n}} \right),\quad \quad \quad \quad \quad \quad} \\ {{k_2} = f\left({{t_n} + {{\Delta t} \over 3},{y_n} + {{\Delta t} \over 3}{k_1}} \right),\quad} \\ {{k_3} = f\left({{t_n} + {{2\Delta t} \over 3},{y_n} + {{2\Delta t} \over 3}{k_2}} \right),} \\ {{y_{n + 1}} = {y_n} + \Delta t\left({{{{k_1}} \over 4} + {{3{k_3}} \over 4}} \right).\quad \quad \quad \quad} \\ \end{array}$$

It is also possible to find a fourth-order, four-stages RK scheme. Actually, there are multiple ones, with one and two-dimensional free parameters. A popular choice is

Example 46. The standard fourth-order, four-stages RK method:

$$\begin{array}{*{20}c} {{k_1} = f\left({{t_n},{y_n}} \right),\quad \quad \quad \quad \quad \quad \;} \\ {{k_2} = f\left({{t_n} + {{\Delta t} \over 2},{y_n} + {{\Delta t} \over 2}{k_1}} \right),\quad} \\ {{k_3} = f\left({{t_n} + {{\Delta t} \over 2},{y_n} + {{\Delta t} \over 2}{k_2}} \right),\quad \;} \\ {{k_4} = f\left({{t_n} + \Delta t,{y_n} + \Delta t{k_3}} \right),\quad \;\quad \;} \\ {{y_{n + 1}} = {y_n} + \Delta t\left({{{{k_1}} \over 6} + {{2{k_2}} \over 6} + {{2{k_3}} \over 6} + {{{k_4}} \over 6}} \right).} \\ \end{array}$$

At this point it is clear that since the RK methods have the same structure, it is not very efficient to explicitly write down all the stages and the final step. Butcher’s tables are a common way to represent them; they have the following structure:

Table 2 shows representations of Examples 42, 43, 44, 45, while Table 3 shows the classical fourth-order Runge-Kutta method of Example 46.

Table 2 From left to right: The Euler method, second and third-order RK, third-order Heun.
Table 3 The standard fourth-order Runge-Kutta method.

The above examples explicitly show that up to, and including, fourth-order accuracy there are Runge-Kutta methods of order p and s stages with s = p. It is interesting that even though the first RK methods date back to the end of the 19h century, the question of whether there are higherorder (than four) RK methods remained open until the following result was shown by Butcher in 1963 [93]: s = p cannot be achieved anymore starting with fifth-order accurate schemes, and there are a number of barriers.

Theorem 13. For p ≥ 5 there are no Runge-Kutta methods with s = p stages.

However, there are fifth and sixth-order RK methods with six and seven stages, respectively. Butcher in 1965 [94] and 1985 [95] respectively showed the following barriers.

Theorem 14. For p ≥ 7 there are no Runge-Kutta methods with s = p + 1 stages.

Theorem 15. For p ≥ 8 there are no Runge-Kutta methods with s = p + 2 stages.

Seventh and eighth-order methods with s = 9 and s = 11 stages, respectively, have been constructed, as well as a tenth-order one with s = 17 stages.

7.5.1 Embedded methods

In practice many approaches in numerical relativity use an adaptive timestep method. One way of doing so is to evolve the system of equations two steps with timestep Δt and one with 2Δt. The difference in both solutions at t + 2Δt can be used, along with Richardson extrapolation, to estimate the new timestep needed to achieve any given tolerance error.

In more detail: if we call y2 the solution at t + 2Δt evolved from time t in two steps of size Δt, and \({{\tilde y}_1}\) the solution at the same time advanced from time t in one step of size 2Δt, then the following holds

$$y(t + 2\Delta t) - {y_2} = {{{y_2} - {{\tilde y}_1}} \over {{2^p} - 1}} + {\mathcal O}\left({{{\left({\Delta t} \right)}^{p + 1}}} \right),$$
(7.128)

where y denotes the exact solution. Therefore, the term

$${{{y_2} - {{\tilde y}_1}} \over {{2^p} - 1}}$$
(7.129)

can be used as an estimate of the error and to choose the next timestep.

Embedded methods also compute two solutions and use their difference to estimate the error and adapt the timestep. However, this is done by reusing the stages. Two Runge-Kutta methods, of order p and p′ (in most cases — but not always — p′ = p +1 or p = p′ + 1) are constructed, which share the intermediate function values, so that there is no overhead cost. Therefore, their Butcher table looks as follows:

Table 4 The structure of embedded methods.

Embedded methods are denoted by p(p′), where p is the order of the scheme, which advances the solution. For example, a 5(4) method would be of fifth order, with a fourth-order scheme, which shares its function calls used to estimate the error.

Table 5 shows the coefficients for a popular embedded method, the seven stages Dormand-Prince 5(4) [146]. Dormand-Prince methods are embedded methods, which minimize a quantification of the truncation error for the highest-order component, which is the one used for the evolution.

Table 5 The 5(4) Dormand-Prince method.

7.6 Remarks

The classical reference for the stability theory of finite-difference for time-dependent problems is [361]. A modern account of stability theory for initial-boundary value discretizations is [228]. [227] includes a discussion of some of the main stability definitions and results, with emphasis on multiple aspects of high-order methods, and [415, 416] many examples at a more introductory level. We have omitted discussing the discrete version of the Laplace theory for IBVP, developed by Gustafsson, Kreiss and Sundström (known as GKS theory or GKS stability) [229] since it has been used very little (if at all) in numerical relativity, where most stability analyses instead rely on the energy method.

The simplest stability analysis is that of a periodic, constant-coefficient test problem. An eigenvalue analysis can include boundary conditions and is typically used as a rule of thumb for CFL limits or to remove some instabilities. The eigenvalues are usually numerically computed for a number of different resolutions. See [171, 175] for some examples within numerical relativity.

Our discussion of Runge-Kutta methods follows [96] and [230], which we refer to, along with [231], for the rich area of methods for solving ordinary differential equations, in particular Runge-Kutta ones. We have only mentioned (one-step) explicit methods, which are the ones used the most in numerical relativity, but they are certainly not the only ones. For example, stiff problems in general require implicit integrations. [274, 322, 273] explored implicit-explicit (IMEX) time integration schemes in numerical relativity. Among many of the topics that we have not included is that of dense output. This refers to methods, which allow the evaluation of an approximation to the numerical solution at any time between two consecutive timesteps, at an order comparable or close to that of the integration scheme, and at low computational cost.

8 Spatial Approximations: Finite Differences

As mentioned in Section 7.6, a general stability theory (referred to as GKS) for IBVPs was developed by Gustafsson, Kreiss and Sundström [229], and a simpler approach, when applicable, is the energy method. The latter is particularly and considerably simpler than a GKS analysis for complicated systems such as Einstein’s field equations, high-order schemes, spectral methods, and/or complex geometries. The Einstein vacuum equations can be written in linearly-degenerate form and are therefore expected to be free of physical shocks (see the discussion at the beginning of Section 3.3) and ideally suited for methods, which exploit the smoothness of the solution to achieve fast convergence, such as high-order finite-difference and spectral methods. In addition, an increasing number of approaches in numerical relativity use some kind of multi-domain or grid structure approach (see Section 11). There are multi-domain schemes for which numerical stability can relatively easily be established for a large class of linear symmetric hyperbolic problems and maximal dissipative boundary conditions through the energy method. In particular, such schemes could be applied to the symmetric hyperbolic formulations of Einstein’s equations discussed in Sections 4 and 6.

In this section we discuss spatial finite difference (FD) approximations of arbitrary high order for which the energy method can be applied, and in Section 10 boundary closures for them. We start by reviewing polynomial interpolation, followed by the systematic construction of FD approximations of arbitrary high order and stencils through interpolation. Next, we introduce the concept of operators satisfying SBP, present a semi-discrete stability analysis, and the construction of high-order operators optimized in terms of minimizing their boundary truncation error and their associated timestep (CFL) limits (more specifically, their spectral radius). Finally, we discuss numerical dissipation, with emphasis on the region near boundaries or grid interfaces.

8.1 Polynomial interpolation

Although interpolation is not strictly a finite differencing topic, we briefly present it here because it is used below and in Section 9, when discussing spectral methods.

Given a set of (N + 1) distinct points \(\{{x_j}\} _{j = 0}^N\) (sometimes referred to as nodal points or nodes) and arbitrary associated function values f(xj), the interpolation problem amounts to finding (in this case) a polynomial \({{\mathcal I}_N}[f](x)\) of degree less than or equal to N such that \({{\mathcal I}_N}[f]({x_j}) = f({x_j})\) for j = 0, 1, 2, …, N.

It can be shown that there is one and only one such polynomial. Existence can be shown by explicit construction: suppose one had

$${{\mathcal I}_N}[f](x) = \sum\limits_{j = 0}^N f ({x_j})l_j^{(N)}(x),$$
(8.1)

where, for each \(j = 0,1, \ldots, N, l_j^{(N)}(x)\) is a polynomial of degree less than or equal to N such that

$$l_j^{(N)}({x_i}) = {\delta _{ij}}\qquad {\rm{for}}\, i = 0,1, \ldots ,N.$$
(8.2)

Then \({{\mathcal I}_N}[f](x)\) as given by Eq. (8.1) would interpolate f(x) at the (N + 1) nodal points {xi}. The Lagrange polynomials, defined as

$$l_j^{(N)}(x) = \left({\prod\limits_{k = 0,k \neq j}^N {(x - {x_k})}} \right){\left({\prod\limits_{k = 0,k \neq j}^N {({x_j} - {x_k})}} \right)^{- 1}},$$
(8.3)

indeed do satisfy Eq. (8.2). Uniqueness of the interpolant can be shown by using the property that polynomials of order N can have at most N roots, applied to the difference between any two interpolants.

Defining the interpolation error by

$${E_N}(x) = \vert f(x) - {{\mathcal I}_N}[f](x)\vert$$
(8.4)

and assuming that f is differentiable enough, it can be seen that EN satisfies

$${E_N}(x) = {1 \over {(N + 1)!}}\vert{f^{(N + 1)}}({\xi _x}){\omega _{N + 1}}(x)\vert\, ,$$
(8.5)

where \({\omega _{N + 1}}(x): = \prod\limits_{j = 0}^N {(x - {x_j})}\) is called the nodal polynomial of degree (N + 1), and ξx is in the smallest interval \({{\mathcal I}_x}\) containing {x0, x1,…, xN} and x. In other words, if we assume the ordering x0 < x1 < … < xN, then x can actually be outside [x0, xN]. For example, if x < x0, then \({{\mathcal I}_x} = [x,{x_N}]\). Sometimes, approximating f(x) by \({{\mathcal I}_N}[f](x)\) when x ∉ [x0, xN] is called extrapolation, and interpolation only if x ∈ [x0, xN], even though an interpolating polynomial is used as approximation.

8.2 Finite differences through interpolation

FD approximations of the p-th derivative of a function f at a point x are local linear combinations of function values at node points,

$$\begin{array}{*{20}c} {{D^{(p)}}f(x): = {1 \over {{{(\Delta x)}^p}}}\sum\limits_{i = 0}^N f ({x_i}){a_i}(x)\quad \quad \quad \quad \,\,} \\ {= {{{d^p}} \over {d{x^p}}}f(x) + {\mathcal O}\left({{{\left({\Delta x} \right)}^r}} \right)\, ,} \\ \end{array}$$
(8.6)

where the nodes {xi} do not need to include x (that is, the grid can be staggered). In the FD case, the number of nodes is usually kept fixed as resolution is changed, resulting in a fixed convergence order r. This is in contrast to discrete spectral collocation methods, discussed in Section 9, which can be seen as approximation through global polynomial interpolation at special nodal points.

A way of systematically constructing FD operators with an arbitrary distribution of nodes, any desired convergence order, and which are centered, one-sided or partially off-centered in any way is through interpolation. A local polynomial interpolant is used to approximate the function f, and the FD approximation is defined as the exact derivative of the interpolant. That is,

$$f(x) \approx {\mathcal I}[f](x): = \sum\limits_{i = 0}^N f ({x_i})\ell _i^{(N)}(x)$$
(8.7)

and, for instance, for a first derivative,

$$Df(x): = {d \over {dx}}{\mathcal I}[f](x) = \sum\limits_{i = 0}^N f ({x_i}){d \over {dx}}\ell _i^{(N)}(x)\, .$$
(8.8)

Notice that the expression (8.8) does have the form of Eq. (8.6), where {xi} are the nodal points of the interpolant and

$${a_i}(x) = (\Delta x){d \over {dx}}\ell _i^{(N)}(x).$$
(8.9)

The truncation error for the FD approximation (8.8) to the first derivative can be estimated by differentiating the error formula for the interpolant, Eq. (8.5),

$$\begin{array}{*{20}c} {{E_{f\prime}}(x): = {d \over {dx}}f(x) - {d \over {dx}}{\mathcal I}[f](x) = {d \over {dx}}\left({{1 \over {(N + 1)!}}{f^{(N + 1)}}({\xi _x}){\omega _{N + 1}}(x)} \right)\quad \quad \quad \quad \quad \quad} \\ {= {1 \over {(N + 1)!}}\left({{d \over {dx}}{f^{(N + 1)}}({\xi _x})} \right){\omega _{N + 1}}(x) + {1 \over {(N + 1)!}}{f^{(N + 1)}}({\xi _x}){d \over {dx}}{\omega _{N + 1}}(x).} \\ \end{array}$$
(8.10)

The derivative of the first term in Eq. (8.10) is more complicated to estimate than the second one without analyzing the details of the dependence of ξ on x. But if we restrict x to be a nodal point x = xk, then ωN+1(xk) = 0 and the previous equation simplifies to

$${E_{f\prime}}({x_k}) = {1 \over {(N + 1)!}}{f^{(N + 1)}}({\xi _{{x_k}}})\prod\limits_{i = 0,i \neq k}^N {({x_k} - {x_i})} .$$
(8.11)

Notice that Eq. (8.11) implies that the resulting FD approximation has design convergence order r = N. For the usual case of equally spaced nodes, for instance, where xk = kΔx, we obtain

$${E_{f\prime}}({x_k}) = {{{{(\Delta x)}^N}} \over {(N + 1)!}}{f^{(N + 1)}}\left({{\xi _{{x_k}}}} \right)\prod\limits_{i = 0,i \neq k}^N {(k - i)}$$
(8.12)

and the error is proportional to (Δx)N. If the nodes are not equally spaced, the error can be bounded by a constant proportional to (Δx)N, where in this case Δx is the maximal distance between neighboring nodal points.

Example 47. A first-order one-sided FD approximation for d/dx.

We construct a first-degree interpolant using two nodal points {x0, x1},

$${\mathcal I}[f](x) = f({x_0})\ell _0^{(1)}(x) + f({x_1})\ell _1^{(1)}(x),$$
(8.13)

with

$$\ell _0^{(1)}(x) = {{x - {x_1}} \over {{x_0} - {x_1}}}\, ,\qquad \ell _1^{(1)}(x) = {{x - {x_0}} \over {{x_1} - {x_0}}}\, .$$
(8.14)

then

$$Df(x): = {d \over {dx}}{\mathcal I}[f](x) = {{f({x_1}) - f({x_0})} \over {{x_1} - {x_0}}}\, .$$
(8.15)

if we evaluate x at x0 or x1 we obtain the standard first-order forward and backward FD approximations D+ and D, respectively [cf. Eqs. (7.70)]. From Eq. (8.12) we recover the known first-order convergence for these approximations, r = 1, which can also be obtained directly through a Taylor expansion in Δx of Eq. (8.15).

Example 48. A second-order centered finite-difference approximation for d/dx.

Now we construct a second-degree interpolant using three nodal points {x0, x1, x2},

$${\mathcal I}[f](x) = f({x_0})\ell _0^{(2)}(x) + f({x_1})\ell _1^{(2)}(x) + f({x_2})\ell _2^{(2)}(x)\, ,$$
(8.16)

with

$$\ell _0^{(2)}(x) = {{(x - {x_1})(x - {x_2})} \over {({x_0} - {x_1})({x_0} - {x_2})}}\, ,\qquad \ell _1^{(2)}(x) = {{(x - {x_0})(x - {x_2})} \over {({x_1} - {x_0})({x_1} - {x_2})}}\, ,\qquad \ell _2^{(2)}(x) = {{(x - {x_0})(x - {x_1})} \over {({x_2} - {x_0})({x_2} - {x_1})}}\, .$$
(8.17)

If we assume the points to be equally spaced, x2x1 = Δx = x1x0, and evaluate the derivative at the center one, x = x1, we obtain

$$Df({x_1}): = {d \over {dx}}{\mathcal I}[f]({x_1}) = {{f({x_2}) - f({x_0})} \over {2\Delta x}}\, ,$$
(8.18)

the standard second-order centered FD operator D0 [cf. Eq. (7.32)].

One can proceed in this way to systematically construct any FD approximation to any derivative with any desired convergence order and distribution of nodal points. The result for centered-difference approximations to d/dx with even accuracy order r at equally spaced nodes can be written in terms of D0, D+, D as follows [228],

$${D_r} = {D_0}\sum\limits_{\nu = 0}^{r/2 - 1} {{{(- 1)}^\nu}} {\alpha _\nu}{\left({{{(\Delta x)}^2}{D_ +}{D_ -}} \right)^\nu}\, ,$$
(8.19)

with

$$\begin{array}{*{20}c} {{\alpha _0} = 1\, ,\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \,\,\,} \\ {{\alpha _\nu} = {\nu \over {4\nu + 2}}{\alpha _{\nu - 1}}\qquad {\rm{for}}\,\nu = 1,2, \ldots ,(r/2 - 1)\, .} \\ \end{array}$$

Example 49. The fourth, sixth, eighth and tenth-order centered FD approximations to d/dx are:

$$\begin{array}{*{20}c} {{D_4} = {D_0} - {D_0}{{{{(\Delta x)}^2}} \over 6}{D_ +}{D_ -}\, ,} \\ {{D_6} = {D_4} + {D_0}{{{{(\Delta x)}^4}} \over {30}}D_ + ^2D_ - ^2\, ,} \\ {{D_8} = {D_6} - {D_0}{{{{(\Delta x)}^6}} \over {140}}D_ + ^3D_ - ^3\, ,} \\ {{D_{10}} = {D_8} + {D_0}{{{{(\Delta x)}^8}} \over {630}}D_ + ^4D_ - ^4\, .} \\ \end{array}$$

8.3 Summation by parts

Since numerical stability is, by definition, the discrete counterpart of well-posedness, one way to come up with schemes that are, by construction, numerically stable, is by designing them so that they satisfy the same properties used at the continuum when showing well-posedness through an energy estimate. As discussed in Section 3.2.3 one such property is integration by parts or the application of Gauss’ theorem, which leads to its numerical counterpart: SBP [265, 266].

Consider a discrete grid consisting of points \(\{{x_i}\} _{i = 0}^N\) and uniform spacing Δx on some, possibly unbounded, domain [a, b].

Definition 17. A difference operator D approximating ∂/∂x is said to satisfy SBP on the domain [a, b] with respect to a positive definite scalar product Σ,

$$\Sigma = \Delta x\left({{\sigma _{ij}}} \right)\, ,$$
(8.20)
$${\langle u,v\rangle _\Sigma}: = \Delta x\sum\limits_{i,j = 0}^N {{u_i}} {v_j}{\sigma _{ij}}\, ,$$
(8.21)

if the property

$${\langle u,Dv\rangle _\Sigma} + {\langle Du,v\rangle _\Sigma} = u(b)v(b) - u(a)v(a)$$
(8.22)

holds for all grid functions u and v.

This is the discrete counterpart of integration by parts for the \({d \over {dx}}\) operator,

$$\langle f,{d \over {dx}}g\rangle + \langle {d \over {dx}}f,g\rangle = f(b)g(b) - f(a)g(a),$$
(8.23)

for all continuously-differentiable functions f and g and the scalar product

$$\langle f,\, g\rangle : = \int\limits_a^b f (x)g(x)dx.$$
(8.24)

Similar definitions for SBP can be introduced for higher-dimensional domains.

If the interval is infinite, say (−∞, b) or (−∞, ∞), certain fall-off conditions are required and Eq. (8.22) replaced by dropping the corresponding boundary term(s).

Example 50. Standard centered differences as defined by Eq. (8.19) in the domain (−∞, ∞) or for periodic domains and functions satisfy SBP with respect to the trivial scalar product (σij·= δij),

$${\langle u,{D_0}v\rangle _\Sigma} + {\langle {D_0}u,v\rangle _\Sigma} = 0,\qquad {\langle u,v\rangle _\Sigma}: = \Delta x\sum\limits_{i \in {\mathbb Z}} {{u_i}} {v_i}\, .$$
(8.25)

The scalar product or associated norm are said to be diagonal if

$${\sigma _{ij}} = {\sigma _{ii}}{\delta _{ij}}\, ,$$
(8.26)

that is, if Σ is diagonal. It is called restricted full if

$${\sigma _{{i_b}j}} = {\sigma _{{i_b}{i_b}}}{\delta _{{i_b}j}}\, ,$$
(8.27)

where ib denote boundary point indices, ib = 0 or ib = N:

$$\Sigma = \Delta x\left({\begin{array}{*{20}c} {{\sigma _{00}}\, 0} & \cdots & 0 \\ 0 & {} & {} \\ \vdots & {{\Sigma _{{\rm{interior}}}}} & \vdots \\ {} & {} & 0 \\ 0 & \cdots & {0\, \, {\sigma _{NN}}} \\ \end{array}} \right)$$
(8.28)

In the case of bounded, non-periodic domains, one possibility is to use centered differences and the trivial scalar product in the interior and modify both of them at and near boundaries. That is, the scalar product has the form

$$\Sigma = \Delta x\left({\begin{array}{*{20}c} {{\Sigma _{\rm{l}}}\quad \quad} \\ I \\ {\quad \quad {\Sigma _{\rm{r}}}} \\ \end{array}} \right)\, ,$$
(8.29)

with Σl and Σr blocks of size independent of Δx.

Accuracy and Efficiency. As mentioned, in the absence of boundaries, standard centered FDs (which have even order of accuracy 2p) satisfy SBP with respect to the trivial (Σ = ΔxI) scalar product. In their presence the operators can be modified at and near boundaries so as to satisfy SBP, examples are given below. It can be seen that the accuracy at those points drops to p in the diagonal case and to 2p − 1 in the restricted full one. Therefore, the latter is more desirable from an accuracy perspective, but less so from a stability one, as we will discuss at the end of this subsection. Depending on the system, numerical dissipation might be enough to stabilize the discretization in the restricted full case. This is discussed below in Section 8.5.

When constructing SBP operators, the discrete scalar product cannot be arbitrarily fixed and afterward the difference operator solved for so that it satisfies the SBP property (8.22) — in general this leads to no solutions. The coefficients of Σ and those of D have to be simultaneously solved for. The resulting systems of equations lead to SBP operators being in general not unique, with increasing freedom with the accuracy order. In the diagonal case the resulting norm is automatically positive definite but not so in the full-restricted case.

We label the operators by their order of accuracy in the interior and near boundary points. For diagonal norms and restricted full ones this would be D2pp and D2p − (p − 1), respectively.

Example 51. D2−1: For the simplest case, p = 1, the SBP operator and scalar product are unique:

$${D_{2 - 1}}{u_i} = {1 \over {\Delta x}}\left\{{\begin{array}{*{20}c} {\left({- {u_0} + {u_1}} \right)\quad \quad} & {{\rm{for}}\, i = 0\, ,\quad \quad \quad \, \, \, \, \, \,} \\ {\left({{1 \over 2}{u_{i + 1}} - {1 \over 2}{u_{i - 1}}} \right)} & {{\rm{for}}\, i = 1 \ldots (N - 1)\, ,} \\ {\left({{u_n} - {u_{n - 1}}} \right)\quad \quad} & {{\rm{for}}\, i = N\, ,\quad \quad \quad \, \,} \\ \end{array}} \right.$$
(8.30)

with σ00 = σNN = 1/2 and σij = δij otherwise. That is, the SBP scalar product (8.21) is the simple trapezoidal rule for integration.

The operator D4−2 and its associated scalar product are also unique in the diagonal norm case:

Example 52. D4−2:

$${D_{4 - 2}}{u_i} = {1 \over {\Delta x}}\left\{{\begin{array}{*{20}c} {\left({- {{24} \over {17}}{u_0} + {{59} \over {34}}{u_1} - {4 \over {17}}{u_2} - {3 \over {34}}{u_3}} \right)\quad \quad \quad} & {{\rm{for}}\, i = 0\, ,\, \quad \quad \quad \quad} \\ {\left({- {1 \over 2}{u_0} + {1 \over 2}{u_2}} \right)\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad} & {{\rm{for}}\, i = 1\, ,\quad \quad \quad \quad \,} \\ {\left({{4 \over {43}}{u_0} - {{59} \over {86}}{u_1} + {{59} \over {86}}{u_3} - {4 \over {43}}{u_4}} \right)\quad \quad \quad \quad} & {{\rm{for}}\, i = 2\, ,\quad \quad \quad \quad \,} \\ {\left({{3 \over {98}}{u_0} - {{59} \over {98}}{u_2} + {{32} \over {49}}{u_4} - {4 \over {49}}{u_5}} \right)\quad \quad \quad \quad} & {{\rm{for}}\, i = 3\, ,\quad \quad \quad \quad \,} \\ {\left({{1 \over {12}}{u_{i - 2}} - {2 \over 3}{u_{i - 1}} + {2 \over 3}{u_{i + 1}} - {1 \over {12}}{u_{i + 2}}} \right)\quad \quad \quad} & {{\rm{for}}\, i = 4 \ldots (N - 4)\, ,} \\ {- \left({{3 \over {98}}{u_N} - {{59} \over {98}}{u_{N - 2}} + {{32} \over {49}}{u_{N - 4}} - {4 \over {49}}{u_{N - 5}}} \right)} & {{\rm{for}}\, i = N - 3,\quad \quad \, \,} \\ {- \left({{4 \over {43}}{u_N} - {{59} \over {86}}{u_{N - 1}} + {{59} \over {86}}{u_{N - 3}} - {4 \over {43}}{u_{N - 4}}} \right)} & {{\rm{for}}\, i = N - 2,\quad \quad \, \,} \\ {- \left({- {1 \over 2}{u_N} + {1 \over 2}{u_{N - 2}}} \right)\quad \quad \quad \quad \quad \quad \quad \quad} & {{\rm{for}}\, i = N - 1,\quad \quad \, \,} \\ {- \left({- {{24} \over {17}}{u_N} + {{59} \over {34}}{u_{N - 1}} - {4 \over {17}}{u_{N - 2}} - {3 \over {34}}{u_{N - 3}}} \right)} & {{\rm{for}}\, i = N\, ,\quad \quad \quad \, \, \,} \\ \end{array}} \right.$$
(8.31)

with scalar product

$$\Sigma = \Delta x\, {\rm{diag}}\left\{{{{17} \over {48}},{{59} \over {48}},{{43} \over {48}},{{49} \over {48}},1,1, \ldots ,1,1,{{49} \over {48}},{{43} \over {48}},{{59} \over {48}},{{17} \over {48}}} \right\}\, .$$
(8.32)

On the other hand, the operators D6−3, D8−4, D10−5 have one, three and ten free parameters, respectively. Up to D10−5 their associated scalar products are unique, while for D10−5 one of the free parameters enters in Σ. For the full-restricted case, D4−3, D6−5, D8−7 have three, four and five free parameters, respectively, all of which appear in the corresponding scalar products.

A possibility [396] is to use the non-uniqueness of SBP operators to minimize the boundary stencil size s. If the difference operator in the interior is a standard centered difference with accuracy-order 2p then there are b points at and near each boundary, where the accuracy is of order q (with q = p in the diagonal case and q = 2p − 1 in the full restricted one). The integer b can be referred to as the boundary width. The boundary stencil size s is the number of gridpoints that the difference operator uses to evaluate its approximation at those b boundary points.

However, minimizing such size, as well as any naive or arbitrary choice of the free parameters, easily leads to a large spectral radius and as a consequence restrictive CFL (see Section 7) limit in the case of explicit evolutions. Sometimes it also leads to rather large boundary truncation errors. Thus, an alternative is to numerically compute the spectral radius for these multi-parameter families of SBP operators and find in each case the parameter choice that leads to a minimum [399, 281]. It turns out that in this way the order of accuracy can be increased from the very low one of D2−1 to higher-order ones such as D10−5 or D8−7 with a very small change in the CFL limit. It involves some work, but since the SBP property (8.22) is independent of the system of equations one wants to solve, it only needs to be done once. In the full-restricted case, when marching through parameter space and minimizing the spectral radius, this minimization has to be constrained with the condition that the resulting norm is actually positive definite.

The non-uniqueness of high-order SBP operators can be further used to minimize a combination of the average of the boundary truncation error (ABTE), defined below, without a significant increase in the spectral radius. For definiteness consider a left boundary. If a Taylor expansion of the FD operator is written as

$$Du{\vert_{{x_i}}} = {\left. {{{du} \over {dx}}} \right\vert_{{x_i}}} + {c_i}{(\Delta x)^q}{\left. {{{{d^{q + 1}}u} \over {d{x^{q + 1}}}}} \right\vert_{{x_i}}}{\rm{for}}\, i = 0,{\rm{1}}, \ldots ,{\rm{b}},$$
(8.33)

then

$${\rm{ABTE}}: = {\left({{1 \over b}\sum\limits_{i = 0}^b {c_i^2}} \right)^{1/2}}.$$
(8.34)

Table 6 illustrates the results of this optimization procedure for the D10−5 operator; see [141] for more details.

Table 6 Comparison, for the D10−5 operator, of both the spectral radius and average boundary truncation error (ABTE) when minimizing the bandwidth or a combination of the spectral radius and ABTE. For comparison, the spectral radius and ABTE for the lowest-accuracy operator, D1−2, (which is unique) are 1.414 and 0.25, respectively. Note: the ABTE, as defined, is larger for this operator, but its convergence rate is faster.

The coefficients for the SBP operators

$${D_{2 - 1}},\,{D_{4 - 2}},\,{D_{4 - 3}},\,{D_{6 - 3}},\,{D_{6 - 5}},\,{D_{8 - 4}},\,{D_{8 - 7}},\,{D_{10 - 5}},$$
(8.35)

and, in particular, for their optimized versions, are available, along with their associated dissipation operators described below in Section 8.5, from the arXiv in [141] and also as complete source code from the Einstein Toolkit [151].

Remarks:

  • The requirement of uniform spacing is not an actual restriction, since a coordinate transformation can always be used so that the computational grid is uniformly spaced even though the physical distance between gridpoints varies. In fact, this is routinely done in the context of multiple-domains or curvilinear coordinates (see Section 11). In that case, though, stability needs to be guaranteed for systems with variable coefficients, since they appear due to the coordinate transformation(s) even if the original system had constant coefficients. This has relevance in terms of the distinction between diagonal and block-diagonal SBP norms, as mentioned below.

  • A similar concept of SBP holds for discrete expansions into Legendre polynomials using Gauss-type quadratures, as discussed in Section 9.4.

  • The definition of SBP depends only on the computational domain, not on the system of equations being solved. This allows the construction of optimized SBP operators once and for all.

  • Difference operators satisfying SBP, which are genuinely multi-dimensional, can be explicitly constructed (see, for example, [103, 102]). However, they become rather complicated even for simple geometries as higher-order accuracy is sought. An easier approach, for the case in which the domain is the cross product of one dimensional ones (say, topologically a cube in three dimensions), which is usually the case in many domain-decompositions for complex geometries (Section 11) is to simply apply a one-dimensional operator satisfying SBP in each direction, and this is the approach that we will discuss from now on. The question then is whether SBP holds in several dimensions; the answer is affirmative in the case of diagonal norms but not necessarily otherwise.

8.4 Stability

If a simple symmetric hyperbolic system with constant coefficients in one dimension of the form

$${u_t} = A{u_x}$$
(8.36)

with A = AT symmetric, is discretized in space by approximating the space derivative with an operator satisfying SBP, a semi-discrete energy estimate can be derived, modulo boundary conditions (discussed in Section 10). For this, we define

$${E_{\rm{d}}}(t): = {\langle u,u\rangle _\Sigma}\, .$$
(8.37)

Taking a time derivative, using the symmetry of A, the fact that and commute because the latter has constant coefficients, and the SBP property,

$${{d{E_{\rm{d}}}} \over {dt}} = {\langle {d \over {dt}}u,u\rangle _\Sigma} + {\langle u,{d \over {dt}}u\rangle _\Sigma} = {\langle D\,Au,u\rangle _\Sigma} + {\langle Au,Du\rangle _\Sigma} = \left[ {{{(Au)}^T}u} \right]_a^b\, .$$
(8.38)

Therefore, modulo the boundary terms, an energy estimate and semi-discrete stability follow. As usual, the addition of lower-order undifferentiated terms to the right-hand side of Eq. (8.36) still gives an energy estimate at the semi-discrete level, modulo boundary terms.

When considering variable coefficients, Eq. (8.38) becomes

$${{d{E_{\rm{d}}}} \over {dt}} = [{(Au)^T}u]_a^b - {\langle u,[D,A]u\rangle _\Sigma}$$
(8.39)

and the commutator between A and D needs to be uniformly bounded for all resolutions in order to obtain an energy estimate.

Estimates for the term involving the commutator [D, A] have been given in [317, 319, 318, 407]. In order to discuss them, we first notice that the SBP property of D with respect to Σ and the symmetry of A imply that the operator B := [D, A] is symmetric with respect to Σ,

$${\langle u,[D,A]v\rangle _\Sigma} = {\langle [D,A]u,v\rangle _\Sigma}$$
(8.40)

for all grid functions u and v. Therefore, its norm is equal to its spectral radius and we have the estimate

$${\langle u,[D,A]u\rangle _\Sigma} \leq \rho ([D,A])\Vert u\Vert_\Sigma ^2$$
(8.41)

for all grid functions u. Hence, the problem is reduced to finding an upper bound for the spectral radius of [D, A], which is independent of the scalar product Σ.

Next, in order to find such an upper bound, we write the FD operator as

$${(Du)_j} = {1 \over {\Delta x}}\sum\limits_{k = 0}^N {{d_{jk}}} {u_k},\qquad j = 0,1, \ldots N,$$
(8.42)

where the djk’s are the coefficient of a banded matrix; that is, there exists b > 0, which is independent of N such that djk = 0 for ∣kj∣ > b. Then, we have

$$B{u_j} = [D,A]{u_j} = \sum\limits_{k = 0}^N {{d_{jk}}} {{A({x_k}) - A({x_j})} \over {\Delta x}}{u_k},\qquad j = 0,1, \ldots N,$$
(8.43)

from which it follows, under the assumption that A is continuously differentiable and that its derivative is bounded, \(\vert B{u_j}\vert \, \leq \,\vert {A_x}{\vert _\infty}\sum\limits_{k = 0}^N {\vert k - j} \Vert {d_{jk}}\Vert{u_k}\vert,\,j = 0,1, \ldots N\), where \(\vert {A_x}{\vert _\infty}: = \underset {a \leq x \leq b} {\sup} \vert {A_x}(x)\vert\). Now we can easily estimate the spectral radius of B, based on the simple observation that

$$\rho (B) \leq \Vert B\Vert,\qquad \Vert B\Vert: = \underset{u \neq 0}{\sup} {{\Vert Bu\Vert} \over {\Vert u\Vert}},$$
(8.44)

for any norm ∥·∥ on the space of grid functions u. Choosing the 1-norm \(\Vert u\Vert : = \sum\limits_{j = 0}^N {\vert {u_j}\vert}\) we find

$$\Vert Bu\Vert = \sum\limits_{j = 0}^N \vert B{u_j}\vert \; \leq \;\vert {A_x}{\vert _\infty}\sum\limits_{j,k = 0}^N \vert k - j\vert \vert {d_{jk}}\vert \vert \;{u_k}\vert \leq \;\vert {A_x}{\vert _\infty}\left({\underset{k = 0, \ldots ,N}{\max} \sum\limits_{j = 0}^N \vert k - j\vert \vert {d_{jk}}\vert} \right)\Vert u\Vert ,$$
(8.45)

from which it follows that

$$\rho ([D,A]) \leq {C_1}\vert {A_x}{\vert _\infty},\qquad {C_1}: = \underset{k = 0, \ldots ,N}{\max} \sum\limits_{j = 0}^N \vert k - j\vert \vert {d_{jk}}\vert .$$
(8.46)

The important point to notice here is that for each fixed k, the sum in the expression for C1 involves at most 2b + 1 non-vanishing terms, since djk = 0 for ∣kj∣ > b. For the SBP operators and scalar products used in practice, namely those for which the latter has the structure given by Eq. (8.29), C1 can be bounded by a constant, which is independent of resolution. Since the spectral radius of B is equal to that of its transposed, we may interchange the roles of j and k in the definition of the constant C1, and with these observations we arrive at the following result:

Lemma 7 (Discrete commutator estimate). Consider a FD operator D of the form (8.42) , which satisfies the SBP property with respect to a scalar product Σ, and let A = AT be symmetric. Then, the following commutator estimate holds:

$$\vert {\langle u,[D,A]u\rangle _\Sigma}\vert \leq C\vert {A_x}{\vert _\infty}\Vert u\vert _\Sigma ^2$$
(8.47)

for all grid functions u and resolutions N, where C := min{C1, C2} with

$${C_1}: = \underset{k = 0, \ldots ,N} {\max} \sum\limits_{j = 0}^N \vert k - j\vert \vert {d_{jk}}\vert ,\quad {C_2}: = \underset{j = 0, \ldots ,N}{\max} \sum\limits_{k = 0}^N \vert k - j\vert \vert {d_{jk}}\vert .$$
(8.48)

Remarks

  • A key ingredient used above to uniformly bound the norm of [D, A] is that the SBP scalar products used in practice have the form (8.29). In those cases, both the boundary width and the boundary stencil size (defined below Example 52) associated with the corresponding difference operators are independent of N. Therefore, the constants C1 and C2 can also be bounded independently of N.

  • For the D2−1 operator defined in Example 51, for instance, Eq. (8.48) gives C1 = 3/2, C2 = 1, and we obtain the optimal estimate corresponding to the one in the continuum limit,

    $$\left\vert {\;\langle u,\left[ {{d \over {dx}},A} \right]u\rangle} \right\vert \leq \;\vert {A_x}{\vert _\infty}\Vert u\Vert ^{2}\,.$$
    (8.49)
  • For the D4−2 operator defined in Example 52, in turn, Eq. (8.48) gives C1 = 1770/731 ≈ 2.421 and C2 =42/17 ≈ 2.471.

  • For spectral methods the constants C1 and C2 typically grow with N as the coefficients djk do not form a banded matrix anymore. This leads to difficulties when estimating the commutator; see [407] for a discussion on this point.

  • It is also possible to avoid the estimate on the commutator between D and A altogether through skew-symmetric differencing [260], in which the problem is discretized according to

    $${u_t} = {1 \over 2}(AD + DA)\;u - {1 \over 2}{A_x}u.$$
    (8.50)

    A straightforward energy estimate shows that this leads to strict stability, after the imposition of appropriate boundary conditions.

  • SBP by itself is not enough to obtain an energy estimate since the boundary conditions still need to be imposed, and in a way such that the boundary terms in the estimate after SBP are under control. This is the topic of Section 10.

8.5 Numerical dissipation

The use of numerical dissipation consistently with the underlying system of equations is a standard way of filtering unresolved modes, stabilizing a scheme, or both, without spoiling the convergence order of the scheme. As an example of unresolved modes, for centered differences, the mode with highest frequency for any given resolution does not propagate at all, while the semi-discrete group velocity with highest frequency is exactly in the opposite direction to the one at the continuum. In addition, the speed increases with the order of the scheme. See, for example, [281], for more details.

Some schemes, such as those with upwind FDs, are intrinsically dissipative, with a fixed “amount” of dissipation for a given resolution. Another approach is to add to the discretization a dissipative operator Qd with a tunable strength factor ϵ ≥ 0,

$$\dot u = (\ldots) \rightarrow \dot u = (\ldots) + {\epsilon Q_d}u\,.$$
(8.51)

The operator Qd has derivatives of higher order than the ones in the principal part of the equation, mimicking dissipative physical systems and/or parabolic equations but in such a way that ∥Qd∥ → 0 as Δx → 0. Furthermore, Qd is usually chosen so that Qd scales with the gridspacing as the highest FD approximation (so that the amplification factor depends only on Δtx). For example, for first-order-in-space systems, FDs scale as

$$D\sim{1 \over {\Delta x}}$$
(8.52)

and Qd is usually chosen to scale in the same way. More precisely, in the absence of boundaries the standard way to add numerical dissipation to a first-order-in-space system is through Kreiss-Oliger dissipation

$${Q_d} = {(- 1)^{r - 1}}{(\Delta x)^{2r - 1}}D_ + ^{(r)}D_ - ^{(r)}\,,$$
(8.53)

where D(r) denotes the application of D r times and D+, D denote forward and backward onesided FDs, respectively, as defined in Eq. (7.70). Thus, Qd scales with the gridspacing as (Δx)−1, like Eq. (8.52). If the accuracy order of the scheme is not higher than (2r − 1) in the absence of dissipation, it is not decreased by the addition of numerical dissipation of the form (8.53).

The main property that sometimes allows numerical dissipation to stabilize otherwise unstable schemes is when they strictly carry away energy (as in the energy definitions involved in well-posedness or numerical-stability analysis) from the system. For example, the operators (8.53) are semi-negative definite

$${\langle u,{Q_d}u\rangle _\Sigma} \leq 0,$$
(8.54)

with respect to the trivial scalar product \({\bf{\Sigma}} = \Delta x{\rm{{\mathbb I}}}\), under which centered differences satisfy SBP.

In the presence of boundaries, it is standard to simply set the operators (8.53) to zero near them. The result is, in general, not semi-negative definite as in (8.54), which cannot only not help resolve instabilities but also trigger them. Many times this is not the case in practice if the boundary is an outer one, where the solution is weak, but not for inter-domain boundaries (see Section 10). For example, for a discretization of the standard wave equation on a multi-domain, curvilinear grid setting, using the D6−5 SBP operator with Kreiss-Oliger dissipation set to zero near interpatch boundaries does not lead to stability while the more elaborate construction below does [141].

For SBP-based schemes, adding artificial dissipation may lead to an unstable scheme unless the dissipation operator is semi-negative under the SBP scalar product. In addition, the dissipation operator should ideally be non-vanishing all the way up to the boundary and preserve the accuracy of the scheme everywhere (which is more difficult in the SBP case, as it is non-uniform). In [303], a prescription for operators satisfying both conditions for arbitrary-high-order SBP scalar products, is presented. A compatible dissipation operator is constructed as

$${Q_d} = - {(\Delta x)^{2p}}\;{\Sigma ^{- 1}}D_p^T{B_p}{D_p},$$
(8.55)

where Σ is the SBP scalar product, Dp is a consistent approximation of dp/dxp with minimal bandwidth (other choices are presumably possible), and Bp is called the boundary operator. The latter has to be positive semi-definite and its role is to allow boundary points to be treated differently from interior points. Bp cannot be chosen freely, but has to follow certain restrictions (which become somewhat involved in the non-diagonal SBP case) based on preserving the accuracy of the schemes near and at boundaries; see [303] for more details.

8.6 Going further

Besides the applications already mentioned, high-order FD operators satisfying SBP have been used, for example, in simulations of black-hole binaries immersed in an external magnetic field in the force-free approximation [321], orbiting binary black holes in vacuum [325], and for the metric sector in binary black-hole-neutron-star evolutions [124] and binary neutron-star evolutions, which include magnetohydrodynamics [23]. Other works are referred to in combination to multi-domain interface numerical methods in Section 10.

In [398], the authors present a numerical spectrum stability analysis for block-diagonal-based SBP operators in the presence of curvilinear coordinates. However, the case of non-diagonal SBP norms and the full Einstein equations in multi-domain scenarios for orders higher than four in the interior needs further development and analysis.

Efficient algorithms for computing the weights for generic FDs operators (though not necessarily satisfying SBP or with proven stability) are given in [166].

Discretizing second-order time-dependent problems without reducing them to first order leads to a similar concept of SBP for operators approximating second derivatives. There is steady progress in an effort to combine SBP with penalty interface and outer boundary conditions for high-order multi-domain simulations of second-order-in-space systems. At present though these tools have not yet reached the state of those for first-order systems, and they have not been used within numerical relativity except for the test case of a ‘shifted advection equation’ [302]. The difficulties appear in the variable coefficient case. We discuss some of these difficulties and the state of the art in Section 10. In short, unlike the first-order case, SBP by itself does not imply an energy estimate in the variable coefficient case, even if using diagonal norms, unless the operators are built taking into account the PDE as well. In [300] the authors explicitly constructed minimal-width diagonal norms SBP difference operators approximating d2/dx2 up to eighth order in the interior, and in [118] non-minimal width operators up to sixth order using full norms are given.

[440] presents a stability analysis around flat spacetime for a family of generalized BSSN-type formulations, along with numerical experiments, which include binary black-hole inspirals.

SBP operators have also been constructed to deal with coordinate singularities in specific systems of equations [105, 375, 225]. Since a sharp semi-discrete energy estimate is explicitly derived in these references, (strict) stability is guaranteed. In particular, in [225] schemes for which the truncation error converges pointwise everywhere — including the origin — are derived for wave equations on arbitrary space dimensions decomposed in spherical harmonics. Interestingly enough, popular schemes [158] to deal with the singularity at the origin, which had not been explicitly designed to satisfy SBP, were found a posteriori to do so at the origin and closed at the outer boundary, see [225] for more details. In these cases the SBP operators are tailored to deal with specific equations and coordinate singularities; therefore, they are problem dependent. For this reason their explicit construction has so far been restricted to second and fourth-order operators (with diagonal scalar products), though the procedure conceptually extends to arbitrary orders. For higher-order operators, optimization of at least the spectral radius might become necessary to address.

In [166] the authors use SBP operators to design high-order quadratures. The reference also includes a detailed description of many properties of SBP operators.

Superconvergence of some estimates in the case of diagonal SBP operators is discussed in [239].

9 Spatial Approximations: Spectral Methods

In this section, we review some of the theory for spectral spatial approximations, and their applications in numerical relativity. These are global representations, which display very fast convergence for smooth enough functions. They are therefore very well suited for Einstein’s vacuum equations, where physical shocks are not expected since they can be written in linearly-degenerate form, as discussed at the beginning of Section 3.3.

We start in Section 9.1 discussing expansions onto orthogonal polynomials, which are solutions to Sturm-Liouville problems. In those cases it is easy to see that for smooth functions the decay of the error in truncated expansions with respect to the number of polynomials is in general faster than any power law, which is usually referred to as spectral convergence. Next, in Section 9.2 we discuss a few properties of general orthogonal polynomials; most important that they can be generated through a three-term recurrence formula. Follows Section 9.3 with a discussion of the most-used families of polynomials in bounded domains; namely, Legendre and Chebyshev ones, including the minmax property of Chebyshev points. Approximating integrals through a global interpolation with a careful choice of nodal points makes it possible to maximize the degree with respect to which they are exact for polynomials (Gauss quadratures). When applied to compute discrete truncated expansions, they lead to two remarkable features. One of them is SBP for Legendre polynomials, in analogy with the FD version discussed in Section 8.3. As in that case, SBP can also be sometimes used to show semi-discrete stability when solving time-dependent partial differential equations (PDEs). The second one is an exact equivalence, for general Jacobi polynomials, between the resulting discrete expansions and interpolation at the Gauss points, a very useful property for collocation methods. Gauss quadratures and SBP are discussed in Section 9.4, followed by interpolation at Gauss points in Section 9.5. In Sections 9.6, 9.7 and 9.8 we discuss spectral differentiation, the collocation method for time-dependent PDEs, and applications to numerical relativity.

The results for orthogonal polynomials to be discussed are classical ones, but we present them because spectral methods are less widespread in the relativity community, at least compared to FDs. The proofs and a detailed discussion of many other properties can be found in, for example, [197] and references therein. [237] is a modern approach to the use of spectral methods in time-dependent problems with the latest developments, while [70] discusses many issues, which appear in applications, and [167] presents a very clear practical guide to spectral methods, in particular to the collocation approach. A good fraction of our presentation of this section follows [197] and [237], to which we refer when we do not provide any other references, or for further material.

9.1 Spectral convergence

9.1.1 Periodic functions

An intuition about the expansion of smooth functions into orthogonal polynomials and spectral convergence can be obtained by first considering the periodic case in [0, 2π] and expansion in Fourier modes,

$${p_j}(x): = {1 \over {\sqrt {2\pi}}}{e^{ijx}}{\rm{for}}\;{\rm{integer}}\;j \in {\mathbb Z}.$$
(9.1)

These are orthonormal under the standard complex scalar product in L2([0, 2π]),

$$\langle f,g\rangle : = \int\limits_0^{2\pi} {\overline {f(x)}} g(x)dx,\qquad f,g \in {L^2}([0,2\pi ]),$$
(9.2)
$$\langle {p_j},{p_{j\prime}}\rangle = {\delta _{jj\prime}}\,.$$
(9.3)

Furthermore, they form a complete set of orthonormal functions under the norm induced by the above scalar product. More explicitly, the expansion of a continuous, periodic function in these modes,

$$f(x) = \sum\limits_{j = - \infty}^\infty {{{\hat f}_j}} {p_j}(x)\,,$$
(9.4)

converges to f in the L2 norm if

$$\sum\limits_{j = - \infty}^\infty \vert {\hat f_j}\vert ^{2} < \infty \,.$$
(9.5)

The Fourier coefficients \({{\hat f}_j}\) can be computed from the orthonormality condition (9.3) of the basis elements defined in Eq. (9.1),

$${\hat f_j} = \langle {p_j},f\rangle = {1 \over {\sqrt {2\pi}}}\int\limits_0^{2\pi} f (x){e^{- ijx}}dx\,.$$
(9.6)

The truncated expansion of f is (assuming N to be even)

$${{\mathcal P}_N}[f](x) = \sum\limits_{j = - N/2}^{N/2} {{{\hat f}_j}} {p_j}(x)\,,$$
(9.7)

where the notation is motivated by the fact that the \({{\mathcal P}_N}\) operator can also be seen as the orthogonal projection under the above scalar product to the space spanned by {pj: j = −N/2 … N/2}, see also Section 9.2 below. The error of the truncated expansion, using the orthonormality of the basis functions (Parseval’s property) is

$$\Vert f - {{\mathcal P}_N}[f]\Vert ^{2} = \sum\limits_{\vert j\vert > N/2} \vert {\hat f_j}\vert ^{2},$$
(9.8)

from which it can be seen that a fast decay in the error relies on a fast decay of the high frequency Fourier coefficients \(\vert{{\hat f}_j}\vert\) as \(j \rightarrow \infty\). Using the explicit definition of the basis elements pj in Eq. (9.1) and the scalar product in (9.2), we have

$$\vert {\hat f_j}\vert = \vert \langle {p_j},f\rangle \vert = {1 \over {\sqrt {2\pi}}}\left\vert {\int\limits_0^{2\pi} f (x){e^{- ijx}}dx} \right\vert \,.$$
(9.9)

Integrating by parts multiple times,

$$\begin{array}{*{20}c} {\vert {{\hat f}_j}\vert \; = {1 \over {\sqrt {2\pi} \vert j\vert}}\left\vert {\int\limits_0^{2\pi} {f\prime} (x){e^{- ijx}}dx} \right\vert = \ldots = {1 \over {\sqrt {2\pi} \vert j\vert ^{s}}}\left\vert {\int\limits_0^{2\pi} {{f^{(s)}}} (x){e^{- ijx}}dx} \right\vert = {1 \over {\vert j\vert ^{s}}}\left\vert {\langle {f^{(s)}},{p_j}\rangle} \right\vert} \\ {\leq {1 \over {\vert j\vert ^{s}}}\left\Vert {{f^{(s)}}} \right\Vert\Vert {p_j}\Vert = {1 \over {\vert j\vert ^{s}}}\left\Vert {{f^{(s)}}} \right\Vert,\quad \quad \quad \quad \;\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad} \\ \end{array}$$

and the process can be repeated for increasing s as long as the s-derivative f(s) remains bounded in the L2 norm. In particular, if fC, then the Fourier coefficients decay to zero

$$\vert {\hat f_j}\vert \; \rightarrow 0{\rm{as}}j \rightarrow \infty$$
(9.10)

faster than any power law, which is usually referred to as spectral convergence. The spectral denomination comes from the property that the decay rate of the error is dominated by the spectrum of an associated Sturm-Liouville problem, as discussed below. The convergence rate for each Fourier mode in the remainder can be extended to the whole sum (9.8). More precisely, the following result can be shown (see, for example, [237]):

Theorem 16. For any \(f \in H_p^s\left[ {0,2\pi} \right]\) (p standing for periodic) there exists a constant C > 0 independent of such that

$$\Vert f - {{\mathcal P}_N}[f]\Vert \leq C{N^{- s}}\left\Vert {{{{d^s}f} \over {d{x^s}}}} \right\Vert$$
(9.11)

for all N ≥ 1.

In fact, an estimate for the difference between u and its projection similar to (9.11) but on the infinity norm can also be obtained [237].

In preparation for the discussion below for non-periodic functions, we rephrase and re-derive the previous results in the following way. Integrating by parts twice, the differential operator

$${\mathcal D} = - \partial _x^2$$
(9.12)

is seen to be self-adjoint under the standard scalar product (9.2),

$$\langle f,{\mathcal D}g\rangle =\langle {\mathcal D}f,g\rangle$$
(9.13)

for periodic, twice-continuously-differentiable functions f and g. Therefore, the eigenfunctions pj of the problem

$${\mathcal D}{p_j}(x) = {\lambda _j}{p_j}(x)$$
(9.14)

are orthogonal (and can be chosen orthonormal) — they turn out to be the Fourier modes (9.1) — represent an orthonormal complete set for periodic functions in L2, the expansion (9.4) converges, and the error in the truncated expansion (9.7) is given by the decay of high-order coefficients; see Eq. (9.8). Assuming f is smooth enough, the fast decay of such modes is a consequence of \({\mathcal D}\) being self-adjoint, the basis elements pj being solutions to the problem (9.14),

$${p_j} = {1 \over {{\lambda _j}}}{\mathcal D}{p_j}\,,$$
(9.15)

and the eigenvalues satisfying λjj2 for large j (in the Fourier case, λj = j2 holds exactly). Combining these properties,

$$\begin{array}{*{20}c} {\vert {{\hat f}_j}\vert \; = \vert \langle f,{p_j}\rangle \vert = {1 \over {\vert {\lambda _j}\vert}}\vert \langle f,{\mathcal D}{p_j}\rangle \vert \; = {1 \over {\vert {\lambda _j}\vert}}\vert \langle {\mathcal D}f,{p_j}\rangle \vert = {1 \over {\vert {\lambda _j}\vert ^{2}}}\left\vert {\langle {{\mathcal D}^{(2)}}f,{p_j}\rangle} \right\vert = \ldots = {1 \over {\vert {\lambda _j}\vert ^{s}}}\left\vert {\langle {{\mathcal D}^{(s)}}f,{p_j}\rangle} \right\vert} \\ {\leq {1 \over {\vert {\lambda _j}\vert ^{s}}}\left\Vert {{{\mathcal D}^{(s)}}f\;} \right\Vert,\;\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad} \\ \end{array}$$
(9.16)

where \({{\mathcal D}^{(s)}}\) denotes the application of \({\mathcal D}\) s times (in this case, \({{\mathcal D}^{(s)}}\) is equal to \({(- 1)^s}\partial _x^{2s})\).

The main property that leads to spectral convergence is then the fast decay of the Fourier coefficients; see Eq. (9.16), provided the norm of \({{\mathcal D}^{(s)}}\) remains bounded for large s.

Before moving to the non-periodic case we notice that in either the full or truncated expansions, the integrals (9.6) need to be computed. Numerically approximating the latter leads to discrete expansions and an appropriate choice of quadratures for doing so leads to a powerful connection between the discrete expansion and interpolation. We discuss this in Section 9.4, directly for the non-periodic case.

9.1.2 Singular Sturm-Liouville problems

Next, consider non-periodic domains (a, b) (which can actually be unbounded; for example, (0, ∞) as in the case of Laguerre polynomials) in the real axis. We discuss how bases of orthogonal polynomials with spectral convergence properties arise as solutions to singular Sturm-Liouville problems.

For this we need to consider more general scalar products. For a continuous, strictly-positive weight function ω on the open interval (a, b), we define

$${\langle h,g\rangle _\omega} = \int\limits_a^b h (x)g(x)\omega (x)dx,$$
(9.17)

and its induced norm, \(\Vert h\Vert _\omega := \sqrt {{{\langle h,h\rangle}_\omega}}\), on the Hilbert space \(L_\omega ^2(a,b)\), of all real-valued, measurable functions h, g on the interval (a, b) for which ∥hω and ∥gω are finite.

Consider now the Sturm-Liouville problem

$${\mathcal D}{p_j}(x) = \omega (x){\lambda _j}{p_j}(x),$$
(9.18)

where \({\mathcal D}\) is a second-order linear-differential operator on (a, b) along with appropriate boundary conditions so that it is self-adjoint under the non-weighted scalar product,

$${\langle f,{\mathcal D}g\rangle _{\omega = 1}} = {\langle {\mathcal D}f,g\rangle _{\omega = 1}}\,,$$
(9.19)

for all twice-continuously-differentiable functions f, g on (a, b), which are subject to the boundary conditions. Then, the set of eigenfunctions is also complete, and orthonormal under the weighted scalar product, and there is again a full and truncated expansion as in the Fourier case,

$$f(x) = \sum\limits_{j = 0}^\infty {{{\hat f}_j}} {p_j}(x)\,,$$
(9.20)
$${{\mathcal P}_N}[f](x) = \sum\limits_{j = 0}^N {{{\hat f}_j}} {p_j}(x)\,,$$
(9.21)

with coefficients

$${\hat f_j} = {\langle {p_j},f\rangle _\omega}\,.$$
(9.22)

The truncation error is similarly given by

$$\Vert f - {{\mathcal P}_N}[f]\Vert _\omega ^2 = \sum\limits_{j > N} {\hat f_j^2}$$
(9.23)

and spectral convergence is again obtained if the coefficients \({{\hat f}_j}\) decay to zero as j → ∞ faster than any power law. Consider then, the singular Sturm-Liouville problem

$${\mathcal D}{p_j}(x) = - {\partial _x}[m(x){\partial _x}{p_j}(x)] + n(x){p_j}(x) = \omega (x){\lambda _j}{p_j}(x)$$
(9.24)

with the functions m, n : (a, b) → ℝ being continuous and bounded and such that n(x) ≥ 0 and m(x) > 0 for all x ∈ (a, b) and — thus the singular part of the problem —,

$$m(a) = m(b) = 0.$$
(9.25)

for twice-continuously-differentiable functions with bounded derivatives, the boundary terms arising from integration by parts of the expression \({\langle f,{\mathcal D}g\rangle _{\omega = 1}}\) cancel due to Eq. (9.25), and it follows that the operator \({\mathcal D}\) is self-adjoint; see Eq. (9.19). Therefore, one can proceed as in the Fourier case and arrive to

$$\vert {\hat f_j}\vert \leq {1 \over {\vert {\lambda _j}\vert ^{s}}}{\left\Vert {{{\left({{{\mathcal D} \over \omega}} \right)}^{(s)}}f} \right\Vert_\omega}$$
(9.26)

with spectral convergence if fC and, for example,

$$\vert {\lambda _j}\vert \; \simeq {j^2}\,,{\rm{for}}\;{\rm{large}}\;{\rm{enough}}\;j > {j_{\min}}\,.$$
(9.27)

Theorem 17. The solutions to the singular Sturm-Liouville problem (9.24) with (a, b) = (−1, 1) and

$$m(x) = {(1 - x)^{\alpha + 1}}{(1 + x)^{1 + \beta}},$$
(9.28)
$$\omega (x) = {(1 - x)^\alpha}{(1 + x)^\beta},$$
(9.29)
$$n(x) = 0,$$
(9.30)

where α,β > −1, are the Jacobi polynomials \(P_j^{(\alpha, \beta)}(x)\). Here, \(P_j^{(\alpha, \beta)}(x)\) has degree j, and it corresponds to the eigenvalue

$${\lambda _j} = j(j + \alpha + \beta + 1)\,.$$
(9.31)

Notice that the eigenvalues satisfy the asymptotic condition (9.27) and, roughly speaking, guarantees spectral convergence. More precisely, the following holds (see, for example, [197]) — in analogy with Theorem 16 for the Fourier case — for the expansion \({{\mathcal P}_N}[f]\) of a function f in Jacobi polynomials:

Theorem 18. For any \(f \in H_\omega ^s[0,2\pi ]\) (ω refers to the weight function) there exists a constant C > 0 independent of N such that

$$\Vert f - {{\mathcal P}_N}[f]\Vert_{\omega} \leq C{N^{- s}}{\left\Vert {{{(1 - {x^2})}^{s/2}}{{{d^s}f} \over {d{x^s}}}} \right\Vert_\omega}$$
(9.32)

for all N > s.

Sturm-Liouville problems are discussed in, for example, [431]. Below we discuss some properties of general orthogonal polynomials.

9.2 Some properties of orthogonal polynomials

Given any weighted scalar product 〈·, · 〉ω as in Eq. (9.17), the expansion of a function into polynomials is optimal in the associated norm if orthogonally projected to the space of polynomials PN of degree at most N. More precisely, given a function \(f \in L_\omega ^s(a,b)\) on some interval (a, b) and an orthonormal basis PN of polynomials \(\{{p_j}\} _{j = 0}^N\), where pj has degree j, its orthogonal projection,

$${{\mathcal P}_N}[f] = \sum\limits_{j = 0}^N {{{\langle {p_j},f\rangle}_\omega}} {p_j}$$
(9.33)

onto the subspace spanned by PN satisfies

$${{\mathcal P}_N}[f] = \{{f_N} \in {{\bf{P}}_N}:{f_N}{\rm{minimizes}}\Vert f - {f_N}\Vert_{\omega}\} ,$$
(9.34)

or

$$\Vert {{\mathcal P}_N}[f] - f\Vert_{\omega} \leq \Vert {f_N} - f\Vert_{\omega}$$
(9.35)

for all fn in the span of PN, that is, it minimizes the error for such fN.

The operator (9.33) is a projection in the sense that

$${\mathcal P}_N^2 = {{\mathcal P}_N}\,,$$
(9.36)

and it is orthogonal with respect to 〈·, ·〉ω: the residual \(r: = f - {{\mathcal P}_N}[f]\) satisfies

$${\langle r,{{\mathcal P}_N}[f]\rangle _\omega} = 0\,.$$
(9.37)

Notice that, unlike the interpolation problem (discussed below) here, the solution of the above least-squares problem (9.34) is not required to agree with f at any prescribed set of points.

In order to obtain an orthonormal basis of PN, a Gram-Schmidt procedure could be applied to the standard basis \(\{{x^j}\} _{j = 0}^N\). However, exploiting properties of polynomials, a more efficient approach can be used, where the first two polynomials p0, p1 are constructed and then a three-term recurrence formula is used.

In the following construction, each orthonormal polynomial is chosen to be monic, meaning that its leading coefficient is one.

  • The zero-th-order polynomial:

    The conditions that p0 has degree zero and that it is monic only leaves the choice

    $${p_0}(x) = 1.$$
    (9.38)
  • The first-order one:

    Writing p1 (x) = x + b1 the condition 〈p0, p1ω = 0 yields

    $${p_1}(x) = x - {{{{\langle 1,x\rangle}_\omega}} \over {{{\langle 1,1\rangle}_\omega}}}\,.$$
    (9.39)
  • The higher-order polynomials:

    Theorem 19 (Three-term recurrence formula for orthogonal polynomials). For monic polynomials \(\{{p_k}\} _{k = 0}^N\), which are orthogonal with respect to the scalar product 〈·, ·〉ω, where each pk is of degree k, the following relation holds

    $${p_{k + 1}} = x{p_k} - {{{{\langle x{p_k},{p_k}\rangle}_\omega}} \over {{{\langle {p_k},{p_k}\rangle}_\omega}}}{p_k} - {{{{\langle x{p_k},{p_{k - 1}}\rangle}_\omega}} \over {{{\langle {p_{k - 1}},{p_{k - 1}}\rangle}_\omega}}}{p_{k - 1}},$$
    (9.40)

    for k = 1, 2, …, N − 1.

    Proof. Let 1 ≤ kN − 1. Since xpk is a polynomial of degree k + 1, it can be expanded as

    $$x{p_k}(x) = \sum\limits_{j = 0}^{k + 1} {{a_j}} {p_j}(x),$$
    (9.41)

    where the orthogonality of the polynomials \(\{{p_k}\} _{k = 0}^N\) implies that aj = 〈xpk, pjω/〈pj, pjω for j = 0, 1, 2, …, k + 1. However, since 〈xpk, pjω = 〈pk, xpjω and xpj can be expanded in terms of the polynomials p0, p1, …, pj+1, it follows again by the orthogonality of \(\{{p_k}\} _{k = 0}^N\) that aj = 0 for jk − 2. Finally, ak+1 = 1 since pk and pk+1 are both monic. This proves Eq. (9.40). □

    Notice that pk+1, as defined in Eq. (9.40), remains monic and can therefore be automatically used for constructing pk+2, without any rescaling.

Eqs. (9.38, 9.39, 9.40) allow one to compute orthogonal polynomials for any weight function ω, without the expense of a Gram-Schmidt procedure. For specific weight cases, there are even more explicit recurrence formulae, such as those in Eqs. (9.43, 9.44) and (9.48, 9.49, 9.50) below for Legendre and Chebyshev polynomials, respectively.

9.3 Legendre and Chebyshev polynomials

For finite intervals, Legendre and Chebyshev polynomials are the ones most typically used. In the Chebyshev case, the polynomials themselves, their roots and quadrature points in general can be computed in closed form. They also satisfy a minmax property and lend themselves to using a fast Fourier transform (FFT).

9.3.1 Legendre

Legendre polynomials correspond to the trivial weight function ω ≡ 1 and the choice α = β = 0 in Eqs. (9.28, 9.29, 9.30). The eigenvalues are

$${\lambda _j} = j(j + 1)\,,$$
(9.42)

and the first two polynomials

$${P_0}(x) = 1\,,\qquad {P_1}(x) = x\,.$$
(9.43)

A variation (in that it leads to non-monic polynomials) of the three term recurrence formula (9.40) is

$${P_{j + 1}}(x) = \left({{{2j + 1} \over {N + 1}}} \right)x{P_j}(x) - \left({{j \over {j + 1}}} \right){P_{j - 1}}(x),\qquad j = 1,2,3, \ldots ,$$
(9.44)

leading to the normalization

$${\langle {P_i},{P_j}\rangle _\omega} = \int\limits_{- 1}^1 {{P_i}} (x){P_j}(x)dx = {2 \over {2j + 1}}{\delta _{ij}}\,.$$
(9.45)

9.3.2 Chebyshev

Chebyshev polynomials correspond to the choice ω(x) = (l − x2)−1/2, x ∈ (−1, 1) and α = β = −1/2 in Eqs. (9.28, 9.29, 9.30). In particular, the eigenvalues are

$${\lambda _j} = {j^2}\,.$$
(9.46)

A closed-form expression for Chebyshev polynomials (which lends itself to the use of FFT) is

$${T_j}(x) = \cos \left({j{{\cos}^{- 1}}(x)} \right),\qquad j = 0,1,2, \ldots .$$
(9.47)

At first sight it might appear confusing that, given the above definition through trigonometric functions, Tj(x) are actually polynomials in x (of degree j, in fact). To get an idea of why this is so, we can compute the first few:

$$\begin{array}{*{20}c} {{T_0}(x) = \cos (0)\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad} \\ {= 1 ,\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \;} \\ {{T_1}(x) = \cos ({{\cos}^{- 1}}(x))\quad \quad \quad \quad \quad \quad \quad \quad \quad \;\;\;\;} \\ {=x ,\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad} \\ {{T_2}(x) = \cos (2{{\cos}^{- 1}}(x)) = - 1 + 2{{\cos}^2}({{\cos}^{- 1}}(x))} \\ {= - 1 + 2{x^2}.\quad \quad \quad \quad \quad \quad \quad \quad \quad \;} \\ \end{array}$$

Notice from the above expressions that T2(x) = 2xT1(x) − T0. In fact, a slight variation of the three-term recurrence formula (9.40) for Chebyshev polynomials becomes

$${T_0}(x) = 1,$$
(9.48)
$${T_1}(x) = x,$$
(9.49)
$${T_{j + 1}}(x) = 2x{T_j}(x) - {T_{j - 1}},\qquad j = 1,2,3, \ldots .$$
(9.50)

As in the Legendre case, this is a variation of Eqs. (9.38, 9.39, 9.40) in that the resulting polynomials are not monic: the leading coefficient is given by

$${T_j}(x) = {2^{j - 1}}{x^j} + \ldots ,$$
(9.51)

That is, the polynomial 21−jTj(x) is monic.

Both Eq. (9.47) and Eq. (9.50) lead to the normalization

$${\langle {T_i},{T_j}\rangle _\omega} = \int\limits_{- 1}^1 {{T_i}} (x){T_j}(x)\;{(1 - {x^2})^{- 1/2}}dx = {\delta _{ij}}\left\{{\begin{array}{*{20}c} \pi & {{\rm{for}}\;i = 0} \\ {{\pi \over 2}} & {{\rm{for}}\;i > 0} \\ \end{array}} \right..$$
(9.52)

From the explicit expression (9.47), it can be noticed that the roots {xj} of Tk are

$${x_j} = - \cos \left({{{2j + 1} \over {2k}}\pi} \right),\qquad j = 0,1,2, \ldots ,k - 1.$$
(9.53)

These points play an important role below in Gauss quadratures and collocation methods (Section 9.4).

9.3.3 The minmax property of Chebyshev points

Chebyshev polynomials satisfy a rather remarkable property in the context of interpolation. In Section 8.1 we pointed out that the error in polynomial interpolation of a function f at (N +1) nodal points xj satisfies [cf. Eq. (8.5)]

$${E_N}(x) = {1 \over {(N + 1)!}}\left\vert {{f^{(N + 1)}}({\xi _x}){\omega _{N + 1}}(x)} \right\vert \,,$$
(9.54)

where \({\omega _{N + 1}}(x): = \Pi _{j = 0}^N(x - {x_j})\).

When doing global interpolation, that is, keeping the endpoints {x0, xN} fixed and increasing N, it is not true that the error converges to zero even if the function is C. For example, for each N, f(N+1)(x) could remain bounded as a function of x,

$$\vert {f^{(N + 1)}}(x)\vert \;\leq {C_N}\qquad {\rm{for}}\;{\rm{all}}x \in [{x_0},{x_N}],$$
(9.55)

but CN could grow with N. A classical example of non-convergence of polynomial interpolation on uniform grids is the following:

Example 53. Runge phenomenon (see, for instance, [154]).

Consider the function

$$f(x) = {1 \over {1 + {x^2}}}\,,\qquad x \in [ - 5,5]$$
(9.56)

and its interpolating polynomial \({{\mathcal I}_N}[f](x)\) [cf. Eq. (8.1)] at equally-spaced points

$${x_i} = - 5 + {i \over N}10\,,\qquad i = 0, \ldots ,N\,.$$
(9.57)

then

$$\vert f(x) - {{\mathcal I}_N}[f](x)\vert \rightarrow \infty \qquad {\rm{as}}N \rightarrow \infty \qquad {\rm{for}}\;{\rm{all}}\;x\;{\rm{satisfying}}\;{x_c} < \vert x\vert < 5,$$
(9.58)

where xc 3.63.

The error (8.5) can be decomposed into two terms, one related to the behavior of the derivatives of f, f(N+1)(ξx)/(N + 1)!, and another related to the distribution of the nodal points, ωN+1(x). We assume in what follows that x ∈ [x0, xN] and that [x0, xN] = [−1, 1]. The analogue results for an arbitrary interval can easily be obtained by a shifting and rescaling of coordinates. It can then be shown that, for all choices of nodal points,

$${\max\limits_{x \in [ - 1,1]}}\vert {\omega _{N + 1}}(x)\vert \; \geq {2^{- N}}.$$
(9.59)

Furthermore, the nodes, which minimize this maximum (thus the minmax term), are the roots of the Chebyshev polynomial of order (n + 1), for which the equality is achieved:

$${\max\limits_{x \in [ - 1,1]}} \vert {\omega _{N + 1}}(x)\vert \; = {2^{- N}}$$
(9.60)

when {x0, x1,…, xN} are given by Eq. (9.53); see, for example, [138], for the proof.

In other words, using Chebyshev points, that is, the roots of the Chebyshev polynomials, as interpolating nodes, minimizes the maximum error associated with the nodal polynomial term. Notice that, in this case, the nodal polynomial is given by TN+1(x)/2N.

9.4 Gauss quadratures and summation by parts

When computing a discrete expansion in terms of orthogonal polynomials

$${f_N}(x) = \sum\limits_{j = 0}^N {{{\hat f}_j}} {p_j}(x)$$
(9.61)

one question is how to efficiently numerically approximate the coefficients \(\{{{\hat f}_j} = {\langle {p_j},f\rangle _\omega}\}\) given in Eq. (9.33). This involves computing weighted integrals of the form

$$\int\limits_a^b q (x)\,\omega (x)\,dx.$$
(9.62)

if approximating the weighted integral (9.62) by a quadrature rule,

$$\int\limits_a^b q (x)\,\omega (x)\,dx \approx \sum\limits_{i = 0}^N {{A_i}} q({x_i}),$$
(9.63)

where the points {xi} are given but having the freedom to choose the coefficients {Ai}, by a counting argument one would expect to be able to choose the latter in such a way that Eq. (9.63) is exact for all polynomials of degree at most N. That is indeed the case, and the answer is obtained by approximating q(x) by its polynomial interpolant (8.1) and integrating the latter,

$$\int\limits_a^b q (x)\omega (x)dx \approx \int\limits_a^b {\sum\limits_{i = 0}^N q} ({x_i})\ell _i^{(N)}(x)\,\omega (x)\,dx = \sum\limits_{i = 0}^N {A_i^{(N)}} q({x_i})$$
(9.64)

where the \(\{\ell _i^N\} _{i = 0}^N\) are the Lagrange polynomials (8.3) and the coefficients

$$A_i^{(N)} = \int\limits_a^b {\ell _i^{(N)}} (x)\omega (x)dx\qquad i = 0,1, \ldots ,N,$$
(9.65)

are independent of the integrand q(x). If the weight function is nontrivial, they might not be known in closed form, but since they are independent of the function being integrated they need to be computed only once for each set of nodal points {xi}.

Suppose now that, in addition to having the freedom to choose the coefficients {Ai}, we can choose the nodal points {xi}. Then we have (N + 1) points and (N + 1) {Ai}, i.e., (2N + 2) degrees of freedom. Therefore, we expect that we can make the quadrature exact for all polynomials of degree at most (2N + 1). This is indeed true and is referred to as Gauss quadratures. Furthermore, the optimal choice of Ai remains the same as in Eq. (9.65), and only the nodal points need to be adjusted.

Theorem 20 (Gauss quadratures). Let ω be a weight function on the interval (a, b), as introduced in Eq. (9.17) , and let pN+1 be the associated orthogonal polynomial of degree N + 1. Then, the quadrature rule (9.63) with the choice (9.65) for the discrete weights, and as nodal points {xj} the roots of pN+1 is exact for all polynomials of degree at most (2N + 1).

The following remarks are in order:

  • The roots of pN+1(x) are referred to as Gauss points or nodes.

  • Suppose that ω(x) = (1−x2)−1/2. Then the (N +1) Gauss points, i.e., the roots of the Chebyshev polynomial TN+1(x) [see Eq. (9.68)], are exactly the points that minimize the infinity norm of the nodal polynomial in the interpolation problem, as discussed in Section 9.3.3.

One can see that the Gauss points actually lie inside the interval (a, b), and do not contain the endpoints a or b. Now suppose that for some reason we want the nodes to include the end points of integration,

$${x_0} = a,\qquad {x_N} = b.$$
(9.66)

One reason for including the end points of the interval in the set of nodes is when applying boundary conditions in the collocation approach, as discussed in Section 10. Then we are left with two less degrees of freedom compared to Gauss quadratures and therefore expect to be able to make the quadrature exact for polynomials of order up to (2N +1) − 2 = (2N − 1). This leads to:

Theorem 21 (Gauss-Lobatto quadratures). If we choose the discrete weights according to Eq. (9.65) as before but as nodal points, the Gauss-Lobatto ones, i.e., the roots of the polynomial

$${m_{N + 1}}(x) = {p_{N + 1}}(x) + \alpha {p_N}(x) + \beta {p_{N - 1}}(x),$$
(9.67)

with α and β chosen so that mN+1(a) = 0 = mn+1(b), then the quadrature rule (9.63) is exact for all polynomials of degree at most (2N − 1).

Note that the coefficients a and β in the previous equations are obtained by solving the simple system

$$\begin{array}{*{20}c} {{m_{N + 1}}(a) = 0 = {p_{N + 1}}(a) + \alpha {p_N}(a) + \beta {p_{N - 1}}(a),} \\ {{m_{N + 1}}(b) = 0 = {p_{N + 1}}(b) + \alpha {p_N}(b) + \beta {p_{N - 1}}(b).\;} \\ \end{array}$$

One can similarly enforce that only one of the end points coincides with a quadrature one, leading to Gauss-Radau quadratures. The proofs of Theorems 20 and 21 can be found in most numerical analysis books, in particular [242].

For Chebyshev polynomials there are closed form expressions for the nodes and weights in Eqs. (9.63) and (9.65):

Chebyshev-Gauss quadratures. For j = 0, 1, …, N,

$${x_j} = - \cos \left({{{2j + 1} \over {2N + 2}}\pi} \right),$$
(9.68)
$$A_j^{(N)} = {\pi \over {N + 1}}.$$
(9.69)

Chebyshev-Gauss-Lobatto quadratures.

$${x_j} = - \cos \left({{{\pi j} \over N}} \right),\qquad {\rm{for}}j = 0,1, \ldots ,N,$$
(9.70)
$$A_j^{(N)} = \left\{{\begin{array}{*{20}c} {{\pi \over N}} & {{\rm{for}}\;j = 1,2, \ldots ,(N - 1)} \\ {{\pi \over {2N}}} & {{\rm{for}}\;j = 0\;{\rm{and}}\;N\quad \quad \;\;} \\ \end{array} .} \right.$$
(9.71)

Summation by parts. For any two polynomials p(x), q(x) of degree N, in the Legendre case SBP follows for Gauss, Gauss-Lobatto or Gauss-Radau quadratures, in analogy with the FD case described in Section 8.3.

Since both products q(x)dp(x)/dx and p(x)dq(x)/dx are polynomials of degree (2N − 1), their quadratures are exact (in fact, the equality holds for each term separately):

$$\left\langle {{d \over {dx}}p,q} \right\rangle _\omega ^{\rm{d}} + \left\langle {p,{d \over {dx}}q} \right\rangle _\omega ^{\rm{d}} = {\left\langle {{d \over {dx}}p,q} \right\rangle _\omega} + {\left\langle {p,{d \over {dx}}q} \right\rangle _\omega}\,,$$
(9.72)

where we have introduced the discrete counterpart of the weighted scalar product (9.17),

$$\langle h,g\rangle _\omega ^{\rm{d}}: = \sum\limits_{i = 0}^N {{A_i}} h({x_i})g({x_i})\,,$$
(9.73)

with the nodes {xi} and discrete weights {Ai} those of the corresponding quadrature.

On the other hand, in the Legendre case,

$${\left\langle {{d \over {dx}}p,q} \right\rangle _{\omega = 1}} + {\left\langle {p,{d \over {dx}}q} \right\rangle _{\omega = 1}} = p(b)q(b) - p(a)q(a)\,,$$
(9.74)

and therefore

$$\left\langle {{d \over {dx}}p,q} \right\rangle _{\omega = 1}^{\rm{d}} + \left\langle {p,{d \over {dx}}q} \right\rangle _{\omega = 1}^{\rm{d}} = p(b)q(b) - p(a)q(a)\,.$$
(9.75)

Property (9.75) will be used in Section 9.7 when discussing stability through the energy method, much as in the FD case.

9.5 Discrete expansions and interpolation

Suppose one approximates a function f through its discrete truncated expansion,

$$f(x) \approx {\mathcal P}_N^{\rm{d}}[f](x) = \sum\limits_{j = 0}^N {\hat f_j^{\rm{d}}} {p_j}(x).$$
(9.76)

That is, instead of considering the exact projection coefficients \(\{{{\hat f}_j}\}\), these are approximated by discretizing the corresponding integrals using Gauss, Gauss-Lobatto or Gauss-Radau quadratures,

$${\hat f_j} = {\langle {p_j},f\rangle _\omega} = \int\limits_a^b f (x){p_j}(x)\omega (x)dx \approx \hat f_j^{\rm{d}}: = \sum\limits_{i = 0}^N f ({x_i}){p_j}({x_i}){A_i}$$
(9.77)

with Ai given by Eq. (9.65) and {xi} any of the Gauss-type points. Putting the pieces together,

$${\mathcal P}_N^{\rm{d}}[f](x) = \sum\limits_{i = 0}^N f ({x_i})\left({{A_i}\sum\limits_{j = 0}^N {{p_j}} ({x_i}){p_j}(x)} \right)\,.$$
(9.78)

Since if f is a polynomial of degree smaller than or equal to N, the discrete truncated expansion \({\mathcal P}_N^{\rm{d}}[f] = f\) is exact for Gauss or Gauss-Radau quadratures according to the results in Section 9.4, it follows that the above term inside the square parenthesis is exactly the i-th Lagrange interpolating polynomial,

$${A_i}\sum\limits_{j = 0}^N {{p_j}} ({x_i}){p_j}(x) = \ell _i^N(x).$$
(9.79)

Therefore, we arrive at the remarkable result:

Theorem 22. Let ω be a weight function on the interval (a, b), as introduced in Eq. (9.17) , and denote by \({\mathcal P}_N^{\rm{d}}[f]\) the discrete truncated expansion of f corresponding to Gauss, Gauss-Lobatto or Gauss-Radau quadratures. Then,

$${\mathcal P}_N^{\rm{d}}[f](x) = {\mathcal I}[f](x) = \sum\limits_{i = 0}^N f ({x_i})\ell _i^N(x)\,.$$
(9.80)

That is, the discrete truncated expansion in orthogonal polynomials of f is exactly equivalent to the interpolation of f at the Gauss, Gauss-Lobatto or Gauss-Radau points.

The above simple proof did not assume any special properties of the polynomial basis, but does not hold for the Gauss-Lobatto case (for which the associated quadrature is exact for polynomials of degree 2N − 1). However, the result still holds (at least for Jacobi polynomials); see, for example, [237].

Examples of Gauss-type nodal points {xi} are those given in Eq. (9.68) or Eq. (9.70). As we will see below, the identity (9.80) is very useful for spectral differentiation and collocation methods, among other things, since one can equivalently operate with the interpolant, which only requires knowledge of the function at the nodes.

9.6 Spectral collocation differentiation

The equivalence (9.80) between the discrete truncated expansion and interpolation at Gauss-type points allows the approximation of the derivative of a function in a very simple way,

$${d \over {dx}}f(x) \approx {d \over {dx}}{\mathcal P}_N^{\rm{d}}[f](x) = {d \over {dx}}{\mathcal I}[f](x) = \sum\limits_{j = 0}^N f ({x_j}){d \over {dx}}\ell _j^N(x).$$
(9.81)

Therefore, knowing the values of the function f at the collocation points, i.e., the Gauss-type points, we can construct its interpolant \({\mathcal P}_N^{\rm{d}}\), take an exact derivative thereof, and evaluate the result at the collocation points to obtain the values of the discrete derivative of at these points. This leads to a matrix-vector multiplication, where the corresponding matrix elements Dij can be computed once and for all:

$${d \over {dx}}f({x_i}) \approx \sum\limits_{j = 0}^N {{D_{ij}}} f({x_j}),\qquad i = 0,1, \ldots ,N\,,$$
(9.82)

with

$${D_{ij}}: = {d \over {dx}}\ell _j^N({x_i})\,.$$
(9.83)

We give the explicit expressions for this differentiation matrix for Chebyshev polynomials both at Gauss and Gauss-Lobatto points (see, for example, [167, 237]).

Chebyshev-Gauss.

$${D_{ij}} = \left\{{\begin{array}{*{20}c} {{{{x_i}} \over {2(1 - x_i^2)}}\quad \quad \quad} & {{\rm{for}}\;i = j,} \\ {{{{{T\prime}_{N + 1}}({x_i})} \over {({x_i} - {x_j}){{T\prime}_{N + 1}}({x_j})}}} & {{\rm{for}}\;i \neq j,} \\ \end{array}} \right.$$
(9.84)

with a prime denoting differentiation.

Chebyshev-Gauss-Lobatto.

$${D_{ij}} = \left\{{\begin{array}{*{20}c} {- {{2{N^2} + 1} \over 6}}\quad \quad& {{\rm{for}}\;i = j = 0,\quad \quad \quad \quad} \\ {{{{c_i}} \over {{c_j}}}{{{{(- 1)}^{i + j}}} \over {({x_i} - {x_j})}}}\quad & {{\rm{for}}\;i \neq j,\quad \;\quad \quad \quad \quad \;} \\ {- {{{x_i}} \over {2(1 - x_i^2)}}}\quad & {{\rm{for}}\;i = j = 1, \ldots ,(N - 1),} \\ {{{2{N^2} + 1} \over 6}}\quad\quad & {{\rm{for}}\;i = j = N,\quad \quad \quad \quad} \\ \end{array}} \right.$$
(9.85)

where ci = 1 for = 1, … (N − 1) and ci = 2 for i = 0, N.

9.7 The collocation approach

When solving a quasilinear evolution equation

$${u_t} = P(t,x,u,\partial /\partial x)u + F(t,x,u)$$
(9.86)

using spectral expansions in space and some time evolution scheme, one could proceed in the following way: Work with the truncated expansion of u(t, x),

$${{\mathcal P}_N}[u](t,x) = \sum\limits_{i = 0}^N {{b_i}} (t){p_i}(x),$$
(9.87)

and write the PDE (9.86) as a system of (N + 1) coupled evolution ordinary differential equations for the bi(t) coefficients, subject to the initial condition

$${b_i}(t = 0) = {\langle u(t = 0,\cdot),{p_i}\rangle _\omega}\,,$$
(9.88)

where the quadratures can be approximated using, say, Gauss-type points. One problem with this approach is that if the equation is nonlinear or already at the linear level with variable coefficients, the right-hand side of Eq. (9.86) needs to be re-expressed in terms of truncated expansions at each timestep. Besides the complexity of doing so, accuracy is lost because higher-order modes due to nonlinearities or coupling with variable coefficients are not represented. Or, worse, inaccuracies from those absent modes move to lower frequency ones. This is one of the reasons why collocation methods are instead usually preferred for this class of problems.

In the collocation approach the differential equation is exactly solved, in physical space, at the collocation points \(\{{x_i}\} _{i = 0}^N\), which are those appearing in Gauss quadratures (Section 9.4). Assume for definiteness that we are dealing with a symmetric hyperbolic system in three dimensions,

$${u_t}(t,x) = \sum\limits_{j = 1}^3 {{A^j}} (t,x,u){\partial \over {\partial {x^j}}}u + F(t,x,u),$$
(9.89)

see Section 3.3. We approximate by its discrete truncated expansion

$${u_N}: = {\mathcal P}_N^{\rm{d}}[u].$$
(9.90)

then the system is solved at the collocation points,

$${d \over {dt}}{u_N}(t,{x_i}) = \sum\limits_{j = 1}^3 {{A^j}} (t,{x_i},{u_N}){\partial \over {\partial {x^j}}}{u_N}(t,{x_i}) + F(t,{x_i},{u_N}),$$
(9.91)

where the spatial derivatives are approximated using spectral differentiation as described in Section 9.6. The system can then be evolved in time using the preferred time integration method; see Section 7.3.

From an implementation perspective, there is actually very little difference between a spectral collocation method and a FD one: the only two being that the grid points need to be Gauss-type ones and that the derivative is computed using global interpolation at those points. In fact, the actual projection (9.90) never needs to be computed for actually solving the system (9.91): given initial data (t = 0, x), by construction the interpolant coincides with it at the nodal points,

$${u_N}(t = 0,{x_i}) = {u_N}(t = 0,{x_i}),$$
(9.92)

and the system (9.91) for uN(t, xi) is directly numerically evolved subject to the initial condition (9.92).

Stability. As discussed in Section 9.4, in the Legendre case the discrete truncated expansion using Gauss-type quadratures leads to SBP. In analogy with the FD case (Section 8.3), when the continuum system can be shown to be well posed through the energy method, a semi-discrete energy estimate can be shown by using the SBP property, at least for constant coefficient systems, and modulo boundary conditions (discussed in the following Section 10). Consider the same case discussed in Section 8.4, a constant coefficient symmetric hyperbolic system in one dimension,

$${u_t} = A{u_x},\qquad A = {A^T},$$
(9.93)

and a collocation approach

$${d \over {dt}}{u_N}(t,{x_i}) = A{\partial \over {\partial x}}{u_N}(t,{x_i}),$$
(9.94)

at Legendre-Gauss-type nodes. Then, defining

$${E_{\rm{d}}} = \langle {u_N},{u_N}\rangle _{\omega = 1}^{\rm{d}}\,,$$
(9.95)

and taking a time derivative, as in Section 8.4,

$${{d{E_{\rm{d}}}} \over {dt}} = \left\langle {{d \over {dt}}{u_N},{u_N}} \right\rangle _{\omega = 1}^{\rm{d}} + \left\langle {{u_N},{d \over {dt}}{u_N}} \right\rangle _{\omega = 1}^{\rm{d}} = \left\langle {{\partial \over {\partial x}}A{u_N},{u_N}} \right\rangle _{\omega = 1}^{\rm{d}} + \left\langle {A{u_N},{\partial \over {\partial x}}{u_N}} \right\rangle _{\omega = 1}^{\rm{d}}\,.$$
(9.96)

Now, uN∂uN/∂x is a polynomial of degree (2N − 1), so the above discrete scalar product is exact for Gauss, Gauss-Lobatto and Gauss-Radau collocation points [cf. Eq. (9.75)]. Therefore, we obtain the energy equality

$${{d{E_{\rm{d}}}} \over {dt}} = [{(A{u_N})^T}{u_N}]_a^b,$$
(9.97)

and numerical stability follows modulo boundary conditions. For the case of a symmetric-hyperbolic system with variable coefficients and lower-order terms, one obtains an energy estimate using skew-symmetric differencing; see Eq. (8.50), and appropriate boundary conditions.

The weighted norm case ω ≠ 1 is more involved. In fact, already the advection equation is not well posed under the Chebyshev norm; see, for example, [237].

Spectral viscosity. In analogy with numerical dissipation (Section 8.5), spectral viscosity (SV) adds a resolution-dependent dissipation term to the evolution equations without sacrificing spectral convergence. SV was introduced by Tadmor in [408]. For simplicity, consider the Fourier case. Then spectral viscosity involves adding to the evolution equations a dissipative term of the form

$${d \over {dt}}{u_N} = (\ldots) - {\epsilon_N}{(- 1)^s}{{{\partial ^s}} \over {\partial {x^s}}}\left[ {{Q_m}(t,x){{{\partial ^s}{u_N}} \over {\partial {x^s}}}} \right],\quad s \geq 1,$$
(9.98)

where s is the (fixed) order of viscosity, the viscosity amplitude scales as

$${\epsilon_N} = {{{C_s}} \over {{N^{2s - 1}}}}\,,\qquad {C_s} > 0,$$
(9.99)

and the smoothing functions Qm effectively apply the viscosity to only the upper portion of the spectrum. In more detail, if \({{\hat Q}_j}(t)\) are the Fourier coefficients [cf. Eq. (9.6)] of Qm(t, x), then they are only applied to frequencies j > mN in such a way that they satisfy

$$1 - {\left({{{{m_N}} \over {\vert j\vert}}} \right)^{(2s - 1)/\theta}} \leq {\hat Q_j}(t) \leq 1$$
(9.100)

with

$${m_N}\sim{N^\theta}\,\qquad \theta < {{2s - 1} \over {2s}}\,.$$
(9.101)

The case s = 1 corresponds to a dissipation term involving a second derivative and s > 1 is referred to as super (or hyper) viscosity. Higher values of s (up to \(s \sim \sqrt N\)) dissipate ‘less aggressively’.

The Legendre and Chebyshev cases are similar and are discussed in [291, 292]. The web-page [406] keeps a selected list of publications on spectral viscosity.

9.8 Going further, applications in numerical relativity

Based on the minimum gridspacing between spectral collocation points, one would naively expect the CFL limit to scale as 1/N2, where N is the number of points. The expectation indeed holds, but the reason is related to the \({\mathcal O}({N^2})\) (N2) scaling of the eigenvalues of Jacobi polynomials as solutions to Sturm-Liouville problems (in fact, the result holds for non-collocation spectral methods as well) [212].

There are relatively few rigorous results on convergence and stability of Chebyshev collocation methods for IBVPs; some of them are [211] and [210].

Even though this review is concerned with time-dependent problems, we note in passing that there are a significant number of efforts in relativity using spectral methods for the constraint equations; see [215]. The use of spectral methods in relativistic evolutions can be traced back to pioneering work in the mid-1980s [66] (see also [67, 68, 213]). Over the last decade they have gained popularity, with applications in scenarios as diverse as relativistic hydrodynamics [313, 427, 428], characteristic evolutions [43], absorbing and/or constraint-preserving boundary conditions [314, 369, 365, 363], constraint projection [244], late time “tail” behavior of black-hole perturbations [382, 420], cosmological studies [19, 49, 50], extreme-mass-ratio inspirals within perturbation theory and self-forces [112, 162, 111, 425, 114, 113, 123] and, prominently, binary black-hole simulations (see, for example, [384, 329, 71, 381, 132, 288, 402, 131, 90, 289]) and black-hole-neutron-star ones [150, 168]. The method of lines (Section 7.3) is typically used with a small enough timestep so that the time integration error is smaller than the one due to the spatial approximation and spectral convergence is observed. Spectral collocation methods were first used in spherically-symmetric black-hole evolutions of the Einstein equations in [255] and in three dimensions in [254]. The latter work showed that some constraint violations in the Einstein-Christoffel [22] type of formulations do not go away with resolution but are a feature of the continuum evolution equations (though the point — namely, that time instabilities are in some cases not a product of lack of resolution — applies to many other scenarios).

Most of these references use explicit symmetric hyperbolic first-order formulations. More recently, progress has been made towards using spectral methods for the BSSN formulation of the Einstein equations directly in second-order form in space [419, 163], and, generally, on multi-domain interpatch boundary conditions for second-order systems [413] (numerical boundary conditions are discussed in the next Section 10). A spectral spacetime approach (as opposed to spectral approximation in space and marching in time) for the 1+1 wave equation in compactified Minkowski was proposed in [233]; in higher dimensions and dynamical spacetimes the cost of such approach might be prohibitive though.

[83] presents an implementation of the harmonic formulation of the Einstein equations on a spherical domain using a double Fourier expansion and, in addition, significant speed-ups using Graphics Processing Units (GPUs).

[215] presents a detailed review of spectral methods in numerical relativity.

A promising approach, which, until recently, has been largely unexplored within numerical relativity is the use of discontinuous Galerkin methods [238, 457, 162, 163, 339].

10 Numerical Boundary Conditions

In most practical computations, one inevitably deals with an IBVP and numerical boundary conditions have to be imposed. Usually the boundary is artificial and, as discussed in Section 5.3, absorbing boundary conditions are imposed. In other cases the boundary of the computational domain may actually represent infinity via compactification; see Section 6.4. Here we discuss some approaches for imposing numerical boundary conditions, with emphasis on sufficient conditions for stability based on the energy method, simplicity, and applicability to high order and spectral methods. In addition to outer boundaries, we also discuss interface ones appearing when there are multiple grids.

General stability results through the energy method are available for symmetric hyperbolic first-order linear systems with maximal dissipative boundary conditions. Unfortunately, in many cases of physical interest the boundary conditions are often neither in maximal dissipative form nor is the system linear. In particular, this is true for Einstein’s field equations, which are nonlinear, and, as we have seen in Section 6, require constraint-preserving absorbing boundary conditions, which do not always result in algebraic conditions on the fields at the boundary. Therefore, in many cases one does “the best that one can”, implementing the outer boundary conditions using discretizations, which are known to be stable, at least in the linearized, maximal dissipative case. Fortunately, since the outer boundaries are usually placed in the weak field, wave zone, more often than not this approach works well in practice. At the same time, it should be noted that the IBVPs for general relativity formulated in [187] and [264] (discussed in Section 5) are actually based on a symmetric hyperbolic first-order reduction of Einstein’s field equations with maximal dissipative boundary conditions (including constraint-preserving ones). Therefore, it should be possible to construct numerical schemes, which can be provably stable, at least in the linearized regime, using the techniques described in the last two Sections 8 and 9, and in Section 10.1 below. A numerical implementation of the formulations of [187] and [264] has not yet been pursued.

The situation at interface boundaries between grids, which are at least partially contained in the strong field region, is more subtle. Fortunately, only the characteristic structure of the equations is in principle needed at such boundaries, and not constraint-preserving boundary conditions. Methods for dealing with interfaces are discussed in Section 10.2.

Finally, in Section 10.3 we give an overview of some applications to numerical relativity of the boundary treatments discussed in Sections 10.1 and 10.2. As mentioned above, most of the techniques that we discuss have been mainly developed for first-order symmetric hyperbolic systems with maximal dissipative boundary conditions. In Section 10.3 we also point out ongoing and prospective work for second-order systems, as well as the important topic of absorbing boundary conditions in general relativity.

Most of the methods reviewed below involve decomposition of the principal part, its time derivative, or both, into characteristic variables, imposing the boundary conditions and changing back to the original variables. This can be done a priori, analytically, and the actual online numerical computational cost of these operations is negligible.

10.1 Outer boundary conditions

10.1.1 Injection

Injecting boundary conditions is presumably the simplest way to numerically impose them. It implies simply overwriting, at every or some number of timesteps, the numerical solution for each incoming characteristic variable or its time derivative with the conditions that they should satisfy.

Stability of the injection approach can be analyzed through GKS theory [229], since energy estimates are, in general, not available for it (the reason for this should become more clear when discussing the projection and penalty methods). Stability analyses, in general, not only depend on the spatial approximation (and time integration in the fully-discrete case) but are in general also equation-dependent. Therefore, stability is, in general, difficult to establish, especially for highorder schemes and nontrivial systems. For this reason a semi-discrete eigenvalue analysis is many times performed. Even though this only provides necessary conditions (namely, the von Neumann condition (7.80)) for stability, it serves as a rule of thumb and to discard obviously unstable schemes.

As an example, we discuss the advection equation with “freezing” boundary condition,

$${u_t} = {u_x},\;\;x \in [ - 1,1],\quad t \geq 0,$$
(10.1)
$$u(0,x) = f(x),\;\;x \in [ - 1,1],$$
(10.2)
$$u(t,\;1) = g(t),\;\;t \geq 0,$$
(10.3)

where the boundary data is chosen to be constant, g(t) = f(1). As space approximation we use a Chebyshev collocation method at Gauss-Lobatto nodes. The approximation then takes the form

$${d \over {dt}}{u_N}(t,{x_i}) = (Q{u_N})\;(t,{x_i}),\quad \;i = 0,1, \ldots ,N,$$
(10.4)

where Q is the corresponding Chebyshev differentiation matrix; see Section 9.6. The boundary condition is then imposed by replacing the Nth equation by

$${d \over {dt}}{u_N}(t,{x_N}) = {d \over {dt}}g(t) = 0{.}$$
(10.5)

For this problem and approximation an energy estimate can be derived [209], but we present an eigenvalue analysis as a typical example of those done for more complicated systems and injection boundary conditions. Figure 4 shows the semi-discrete spectrum using a different number of collocation points N. As needed by the strong version of the von Neumann condition, no positive real component is present [cf. Eq. (7.82)]. We also note that, as discussed at the beginning of Section 9.8, the spectral radius scales as ∼ N2.

Figure 4
figure 4

Semi-discrete spectrum for the advection equation and freezing boundary conditions [Eqs. (10.1, 10.2, 10.3], a Chebyshev collocation method, and numerical injection of the boundary condition. The number of collocation points is (from left to right) N = 20, 40, 60. The scheme passes the von Neumann condition (no positive real component in the spectrum). In fact, as discussed in the body of the text, the method can be shown to actually be numerically stable.

There are other difficulties with the injection method, besides the fact that stability results are usually partial and incomplete for realistic systems and/or high-order or spectral methods. One of them is that it sometimes happens that a full GKS analysis can actually be carried out for a simple problem and scheme, and the result turns out to be stable but not time-stable (see Section 7.4 for a discussion on time-stability). Or the scheme is not time-stable when applied to a more complicated system (see, for example, [116, 1]).

Seeking stable numerical boundary conditions for realistic systems, which preserve the accuracy of high-order finite-difference and spectral methods has been a recurring theme in numerical methods for time-dependent partial differential equations for a long time, especially for nontrivial domains, with substantial progress over the last decade — especially with the penalty method discussed below. Before doing so we review another method, which improves on the injection one in that stability can be shown for rather general systems and arbitrary high-order FD schemes.

10.1.2 Pro jections

Assume that a given IBVP is well posed and admits an energy estimate, as discussed in Section 5.2. Furthermore, assume that, up to the control of boundary terms, a semi-discrete approximation to it also admits an energy estimate. The key idea of the projection method [317, 319, 318] is to impose the boundary conditions by projecting at each time the numerical solution to the space of gridfunctions satisfying those conditions, the central aspect being that the projection is chosen to be orthogonal with respect to the scalar product under which a semi-discrete energy estimate in the absence of boundaries can be shown. The orthogonality of the projection then guarantees that the estimate including the control of the boundary term holds.

In more detail, if the spatial approximation prior to the imposition of the boundary conditions is written as

$${u_t} = Qu$$
(10.6)

and [for example, as is many times the case when SBP holds] there is a semi-discrete energy,

$${E_{\rm{d}}} = \langle u,\;u\rangle ,$$
(10.7)

with respect to some scalar product, for which an estimate holds up to boundary terms:

$${d \over {dt}}{E_{\rm{d}}} = \langle Qu,\;u\rangle + \langle u,\;Qu\rangle ,$$
(10.8)

with

$$\langle Qu,\;u\rangle + \langle uQu\rangle \leq 2\alpha {E_{\rm{d}}} + {\rm{boundary}}\;{\rm{terms}}$$
(10.9)

for some constant independent of the initial data and resolution. Due to the SBP property, the boundary terms are exactly those present in the continuum energy estimate, except that so far they cannot be bounded because the numerical solution is not yet required to satisfy the boundary conditions.

The latter are then imposed by changing the semi-discrete equation (10.6) to

$${u_t} = {\mathcal P}Qu$$
(10.10)

where \({\mathcal P}\) projects u to the space of gridfunctions satisfying the desired boundary conditions, is time-independent (which in particular requires the assumption that the boundary conditions are time-independent) and symmetric

$$\langle u,{\mathcal P}v\rangle = \langle {\mathcal P}u,v\rangle$$
(10.11)

(and, being a projection, \({{\mathcal P}^2} = {\mathcal P}\)). Since the projection is assumed to be time-independent, projecting both sides of Eq. (10.6) implies that

$${({\mathcal P}u)_t} = {\mathcal P}Qu$$
(10.12)

and for any solution of (10.6) satisfying the boundary conditions, \({\mathcal P}u = u\),

$${u_t} = {\mathcal P}Qu.$$
(10.13)

then

$$\begin{array}{*{20}c} {{d \over {dt}}{E_{\rm{d}}} = \langle {\mathcal P}Qu,u\rangle + \langle u,{\mathcal P}Qu\rangle = \langle Qu,{\mathcal P}u\rangle + \langle {\mathcal P}u,Qu\rangle = \langle Qu,u\rangle + \langle u,Qu\rangle} \\ {\leq 2\alpha {E_{\rm{d}}} + \;\;{\rm{boundary}}\;\;{\rm{terms}}.\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad} \\ \end{array}$$

Since the solution now satisfies the boundary conditions, the boundary terms can be bounded as in the continuum and stability of the semi-discrete IBVP follows. In principle, this method only guarantees stability for time-independent, homogeneous boundary conditions. Homogeneity can be assumed by a redefinition of the variables; however, the restriction of a time-independent boundary condition is more severe.

Details on how to explicitly construct the projection can be found in [227]. The orthogonal projection method guarantees stability for a large class of problems admitting a continuum energy estimate. However, its implementation is somewhat involved.

10.1.3 Penalty conditions

A simple and robust method for imposing numerical boundary conditions, either at outer or interpatch boundaries, such as those appearing in domain decomposition approaches, is through penalty terms. The boundary conditions are not imposed strongly but weakly, preserving the convergence order of the spatial approximation and leading to numerical stability for a large class of problems. It can be applied both to the case of FDs and spectral approximations. In fact, the spirit of the method can be traced back to Finite Elements-discontinuous Galerkin methods (see [147] and [28] for more recent results). Terms are added to the evolution equations at the boundaries to consistently penalize the mismatch between the numerical solution and the boundary conditions the exact solution is subject to.

Finite differences. In the FD context the method is known as the Simultaneous Approximation Technique (SAT) [117]. For semi-discrete approximations of IBVPs of arbitrary high order with an energy estimate, both the order of accuracy and the presence of energy estimates are preserved when imposing the boundary conditions through it.

As an example, consider the half-space IBVP for the advection equation,

$${u_t} = \lambda {u_x},\;\;x \leq 0,\quad t \geq 0,$$
(10.14)
$$u(0,x) = f(x),\;\;x \leq 0,$$
(10.15)
$$u(t,0) = g(t),\quad t \geq 0,$$
(10.16)

where λ ≥ 0 is the characteristic speed in the −x direction.

We first consider a semi-discrete approximation using some FD operator D satisfying SBP with respect to a scalar product Σ, which we assume to be either diagonal or restricted full (see Section 8.3). As usual, Δx denotes the spacing between gridpoints,

$${x_i} = i\Delta x,\quad \quad i = 0, - 1, - 2, \ldots .$$
(10.17)

In the SAT approach the boundary conditions are imposed through a penalty term, which is applied at the boundary point x0,

$${d \over {dt}}{u_i} = \lambda D{u_i} + {{{\delta _{i,0}}S} \over {{\sigma _{00}}\Delta x}}(g - {u_0}),$$
(10.18)

where δi,0 is the Kronecker delta, S a real parameter to be restricted below, and σ00 is the 00-component of the SBP scalar product Σ. Introducing the semi-discrete SBP energy

$${E_{\rm{d}}} = {\langle u,u\rangle _\Sigma},$$
(10.19)

its time derivative is

$${d \over {dt}}{E_{\rm{d}}} = (\lambda - 2S)u_0^2 + 2g{u_0}S{.}$$
(10.20)

For homogeneous boundary conditions (g = 0) both numerical and time-stability follow if Sλ/2:

$${d \over {dt}}{E_{\rm{d}}} = - (2S - \lambda)u_0^2 \leq 0{.}$$
(10.21)

Strong stability follows if S > λ/2, including the case of inhomogeneous boundary conditions: for any such that

$$0 < {\varepsilon ^2} < 2S - \lambda ,$$
(10.22)

Eq. (10.20) implies

$${d \over {dt}}{E_{\rm{d}}} = (\lambda - 2S)u_0^2 + 2(\varepsilon {u_0})\;\left({{{gS} \over \varepsilon}} \right) \leq (\lambda - 2S)u_0^2 + ({\varepsilon ^2}u_0^2) + {\left({{{gS} \over \varepsilon}} \right)^2} = - \tilde \varepsilon u_0^2 + {\left({{S \over \varepsilon}} \right)^2}{g^2},$$
(10.23)

where

$$\tilde \varepsilon : = 2S - \lambda - {\varepsilon ^2} > 0.$$
(10.24)

Integrating Eq. (10.23) in time,

$$\Vert {u(t)} \Vert_\Sigma ^2 + \tilde \varepsilon \int\nolimits_0^t {{u_0}} {(\tau)^2}d\tau \leq \Vert {u(0)} \Vert_\Sigma ^2 + {\left({{S \over \varepsilon}} \right)^2}\int\nolimits_0^t g {(\tau)^2}d\tau ,$$
(10.25)

thereby proving strong stability.

In the case of diagonal SBP norms it is straightforward to derive similar energy estimates for general linear symmetric hyperbolic systems of equations in several dimensions, simply by working with each characteristic variable at a time, at each boundary. A penalty term is applied to the evolution term of each incoming characteristic variable at a time, as in Eq. (10.18), where λ is replaced by the corresponding characteristic speed. In particular, edges and corners are dealt with by simply imposing the boundary conditions with respect to the normal to each boundary, and an energy estimate follows.

The global semi-discrete convergence rate can be estimated as follows. Define the error grid-function as the difference between the numerical solution u and the exact one u(e) evaluated at the gridpoints,

$${e_i}(t) = e(t,{x_i}): = u(t,{x_i}) - {u^{(e)}}(t,{x_i}),\quad i = 0, - 1, - 2, \ldots ,$$
(10.26)

It satisfies the equation

$${d \over {dt}}{e_i} = \lambda D{e_i} - {{{\delta _{i,0}}S} \over {{\sigma _{00}}\Delta x}}{e_0} - {F_i}.$$
(10.27)

where Fi denotes the truncation error, here solely depending on the differentiation approximation:

$${F_i}(t) = F(t,{x_i}) = \lambda \left({u_x^{(e)}(t,{x_i}) - D{u^{(e)}}(t,{x_i})} \right),$$
(10.28)
$$= \left\{{\begin{array}{*{20}c} {{\mathcal O}{{(\Delta x)}^r}\quad {\rm{at}}\;{\rm{and}}\;{\rm{close}}\;{\rm{to}}\;{\rm{boundaries}},} \\ {{\mathcal O}{{(\Delta x)}^{2p}}\quad {\rm{in}}\;{\rm{the}}\;{\rm{interior}},\quad \quad \quad \quad \;\;} \\ \end{array}} \right.$$
(10.29)

with r < 2p in general. For example, in the diagonal case r = p, and in the restricted full one r = 2p − 1 [see the discussion below Eq. (8.29)].

Using Eq. (10.26) and the SBP property, the norm of the error satisfies

$${d \over {dt}}\Vert e \Vert_\Sigma ^2 = (\lambda - 2S)e_0^2 - 2{\langle e,F\rangle _\Sigma} \leq 2{\Vert e \Vert_\Sigma}{\Vert F \Vert_\Sigma},$$
(10.30)

where we have used Sλ/2 and the Cauchy-Schwarz inequality in the second step. Dividing both sides of the inequality by 2∥eΣ and integrating we obtain

$${\Vert {e(t)} \Vert_\Sigma} \leq \int\limits_0^t {{{\Vert {F(\tau)} \Vert}_\Sigma}d\tau ,\quad \quad t \geq 0.}$$
(10.31)

Since by Eq. (10.29) the norm of the truncation error is of the order (r + 1/2), it follows that the Σ-norm of the error converges to zero with rate (r + 1/2). This also implies that the error converges pointwise to zero with rate,

$${e_i}(t) = {\mathcal O}\;({(\Delta x)^r}),\quad \;\;t > 0,\quad i = 0, - 1, - 2, \ldots$$
(10.32)

In particular, this proves that the error at the boundary point x0 = 0 converges to zero at least as fast as (Δx)r, and therefore, the penalty term in Eq. (10.18) is bounded. Two remarks are in order:

  • In special cases it is possible to improve the error estimate exploiting the strong stability of the problem. Consider, for instance, the case of the D2−1 operator defined in Example 51 with the associated diagonal scalar product (σij) = diag(…, 1, 1, 1, 1/2). Then, Eq. (10.30) gives

    $$\begin{array}{*{20}c} {{d \over {dt}}\Vert e \Vert_\Sigma ^2 = (\lambda - 2S)e_0^2 - 2{{\langle e,F\rangle}_\Sigma} = (\lambda - 2S)e_0^2 - \Delta x{e_0}{F_0} - 2\Delta x\sum\limits_{i = - \infty}^{- 1} {{e_i}} {F_i}} \\ {\leq \left[ {(\lambda - 2S) + {{{\varepsilon ^2}} \over 2}} \right]e_0^2 + {{\Delta {x^2}} \over {2{\varepsilon ^2}}}F_0^2 + \Delta x\sum\limits_{i = - \infty}^{- 1} {\left({e_i^2 + F_i^2} \right)} \quad \;} \\ {\leq \Vert e \Vert_\Sigma ^2 + {\mathcal O}\left({{{(\Delta x)}^4}} \right),\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \;} \\ \end{array}$$

    where we have chosen 0 < ε2/2 ≤ 2Sλ. This implies that the error converges to zero in the Σ-norm with second order and pointwise with order 3/2.

  • At any fixed resolution, the error typicallyFootnote 30 decreases with larger values of the penalty parameter S, but the spectral radius of the discretization grows quickly in the process by effectively introducing dissipative eigenvalues (on the left half plane) in the spectrum, leading to demanding CFL limits (see, for example, [281]). Because the method is usually applied along with high-order methods, decreasing the error for a fixed resolution at the expense of increasing the CFL limit does not seem worthwhile. In practice values of S in the range λ/2 < S < λ give reasonable CFL limits.

Spectral methods. The penalty method for imposing boundary conditions was actually introduced for spectral methods prior to FDs in [198, 199]. In fact, as we will see below, the FD and spectral cases follow very similar paths. Here we only discuss its application to the collocation method. Furthermore, as discussed in Section 9.4, when solving an IBVP, Gauss-Lobatto collocation points are natural among Gauss-type nodes, because they include the end points of the interval. We restrict our review to them, but the penalty method applies equally well to the other nodes. We refer to [236] for a thorough analysis of spectral penalty methods.

Like the FD case, we summarize the method through the example of the advection problem (10.14, 10.15, 10.16), except that now we consider the bounded domain x ∈ [−1, 1] and apply the boundary condition at x = 1. Furthermore, we first consider a truncated expansion in Legendre polynomials. A penalty term with strength τ is added to the evolution equation at the last collocation point:

$${d \over {dt}}{u_N}(t,{x_i}) = \lambda u{\prime_N}(t,{x_i}) + \tau \lambda (g(t) - {u_N}(t,{x_N})){\delta _{iN}}\quad \quad i = 0,1, \ldots ,N,$$
(10.33)

where \(\{{x_i}\} _{i = 0}^N\) are now the Gauss-Lobatto nodes; in particular, x0 = −1 and xN = 1.

Using the discrete energy given by Eq. (9.95) and SBP property associated with Gauss quadratures discussed in Section 9.4, a discrete energy estimate follows exactly as in the FD case if

$$\tau \geq {1 \over {2A_N^N}} \geq {{N(N + 1)} \over 4},$$
(10.34)

where \(A_N^N\) is the Legendre-Gauss-Lobatto quadrature weight at the right end point xN = 1 [cf. Eqs. (9.63, 9.65)]. Notice that this scaling of the penalty with the weight is the analog of the (σ00Δx) SBP weight term for the FD case [Eq. (10.18)]. The similarity is not accidental: both weights arrive from the discrete integration formulae for the energy. Notice:

  • The approach for linear symmetric hyperbolic systems is the same as that we discussed for FDs: the evolution equation for each characteristic variable is penalized as in Eq. (10.33), where λ is replaced by the corresponding characteristic speed.

Devising a Chebyshev penalty scheme that guarantees stability is more convoluted. In particular, the advection equation is already not in the Chebyshev norm, for example (see [209]). Stability in the L2 norm can be established using the Chebyshev-Legendre method [144], where the Chebyshev-Gauss-Lobatto nodes are used for the approximation, but the Legendre ones for satisfying the equation. In this approach, the penalty method is global, because it adds terms to the right-hand side of the equations not only at the endpoint, but at all other collocation points as well.

A simpler approach, where the penalty term is only applied to the boundary collocation point, as in the Legendre and FDs case, is to show stability in a different norm. For example, in [137] it is shown that a penalty term as in Eq. (10.33) is stable for the Chebyshev-Gauss-Lobatto case in the norm defined by the weight, cf. Eq. (9.17),

$$\tilde \omega (x) = (1 + x){\omega _{{\rm{Chebyshev}}}}(x) = \sqrt {{{1 + x} \over {1 - x}}} \,,\quad \quad - 1 < x < 1,$$
(10.35)

if

$$\tau \geq {{{N^2}} \over 2}.$$
(10.36)

In Figure 5 we show the spectrum of an approximation of the advection equation (with speed λ = 1 and g(t) = 0) using the Chebyshev penalty method, N = 20, 40, 60 collocation points and S :=τN2 = 0.3. There are no eigenvalues with positive real component. Even though this is only a necessary condition for stability, it suggests that it might be possible to lower the bound (10.36) on the minimum penalty needed for stability. On the other hand, for S = 0. 2 (and lower values) real positive values do appear in the spectrum. Compared to the injection case (Figure 4), the spectral radius for the cases shown in Figure 5 are about a factor of two larger. However, in most applications using the method of lines, the timestep is dictated by the condition of keeping the time integration error smaller than the spatial one, so as to maintain spectral convergence, and not the CFL limit.

Figure 5
figure 5

Spectrum of the Chebyshev penalty method for the advection equation and (from left to right) N = 20,40, 60 collocation points and S = 0.3.

10.2 Interface boundary conditions

Interface boundary conditions are needed when there are multiple grids in the computational domain, as discussed in Section 11 below. This applies to complex geometries, when multiple patches are used for computational efficiency, to adapt the domain decomposition to the wave zone, mesh refinement, or a combination of these.

A simple approach for exchanging the information between two or more grids is a combination of interpolation and extrapolation of the whole state vector at the intersection of the grids. This is the method of choice in mesh refinement, for example, and works well in practice. In the case of curvilinear grids and very-high-order FD or spectral methods, though, it is in general not only difficult to prove numerical stability, but even to find a scheme that exhibits stability from a practical point of view.

10.2.1 Penalty conditions

The penalty method discussed above for outer boundary conditions can also be used for multidomain interface ones, including those present in complex geometries [115, 235, 118, 311, 312, 140]. It is simple to implement, robust, leads to stability for a very large class of problems and preserves the accuracy of arbitrary high-order FD and spectral methods.

Finite differences. As an example, consider again an advection equation but now on the whole real line

$$\begin{array}{*{20}c} {{u_t} = \lambda {u_x},\;\; - \infty < x < \infty ,\quad t \geq 0,} \\ {u(0,x) = f(x),\;\; - \infty < x < \infty ,\quad \quad \quad \quad \quad \;} \\ \end{array}$$

where we assume that the initial data f is C smooth and has compact support. The domain of the real line is chosen just for simplicity, to focus on the interface procedure at x = 0. In the realistic case of compact domains, outer boundaries are also present and these can be treated by any of the methods discussed in Section 10.1.

Consider two grids, left and right, covering the intervals (−∞, 0], and [0,+∞), respectively:

$$\begin{array}{*{20}c} {x_i^l = i\Delta {x^l},\quad \quad i = 0, - 1, - 2, \ldots ,} \\ {x_i^r = i\Delta {x^r},\quad \quad i = 0,1,2, \ldots ,\quad \;\;} \\ \end{array}$$

where the gridspacings need not agree, Δxl = Δxr, and the difference operators Dl and Dr, which are not necessarily equal to each other either, and do not even need to be of the same order of accuracy, satisfy SBP with respect to scalar products given by the weights σl,σr on their individual grids:

$${\langle {v^l},{v^l}\rangle _{{\Sigma _l}}} = \Delta {x^l}\sum\limits_{i,j = - \infty}^0 {\sigma _{ij}^l} v_i^lv_j^l\,,\quad \quad {\langle {v^r},{v^r}\rangle _{{\Sigma _r}}} = \Delta {x^r}\sum\limits_{i,j = 0}^{+ \infty} {\sigma _{ij}^r} v_i^rv_j^r.$$
(10.37)

For the time, being assume that both SBP scalar products are diagonal, though. Then, the SAT semi-discrete approximation to the problem is

$${d \over {dt}}v_i^l = \lambda {D^l}v_i^l + {{{\delta _{i,0}}{S^l}} \over {\Delta {x^l}\sigma _{00}^l}}(v_0^r - v_0^l),\quad \quad i = 0, - 1, - 2, \ldots ,$$
(10.38)
$${d \over {dt}}v_i^r = \lambda {D^r}v_i^r + {{{\delta _{i,0}}{S^r}} \over {\Delta {x^r}\sigma _{00}^r}}(v_0^l - v_0^r),\quad \quad i = 0,1,2, \ldots .$$
(10.39)

Notice that in this approach the numerical solution at any fixed resolution is bi-valued at the interface boundary x = 0, and in the same spirit as the penalty approach for outer boundary conditions; continuity of the fields at the interface is not enforced strongly but weakly.

Defining the energy

$${E_{\rm{d}}}: = {\langle {v^l},{v^l}\rangle _{{\Sigma _l}}} + {\langle {v^r},{v^r}\rangle _{{\Sigma _r}}},$$
(10.40)

and using the approximation (10.38, 10.39) and the SBP property of the difference operators, its time derivative is

$${d \over {dt}}{E_{\rm{d}}} = (\lambda - 2{S^l})\,{(u_0^l)^2} + (- \lambda - 2{S^r})\,{(u_0^r)^2} + 2({S^l} + {S^r})u_0^lu_0^r.$$
(10.41)

Then an estimate follows if two conditions are satisfied. One of them is λ + SrSl =0. The other one imposes an additional constraint on the values of Sl and Sr:

  • Positive λ:

    $${S_l} = \lambda + \delta ,\quad {S_r} = \delta ,\quad {\rm{with}}\;\delta \geq - {\lambda \over 2}.$$
    (10.42)

    The estimate is

    $${d \over {dt}}{E_{\rm{d}}} = - {(u_0^l - u_0^r)^2}(\lambda + 2\delta) \leq 0.$$
    (10.43)
  • Negative λ: this is obtained from the previous case after the transformation λ ↦ −λ

    $${S_r} = - \lambda + \delta ,\quad {S_l} = \delta ,\quad {\rm{with}}\;\delta \geq {\lambda \over 2}$$
    (10.44)

    and

    $${d \over {dt}}{E_{\rm{d}}} = {(u_0^l - u_0^r)^2}(\lambda - 2\delta) \leq 0.$$
    (10.45)
  • Vanishing λ: this can be seen as the limiting case of any of the above two, with

    $${d \over {dt}}{E_{\rm{d}}} = - {(u_0^l - u_0^r)^2}2\delta \leq 0.$$
    (10.46)

The following remarks are in order:

  • For the minimum values of δ allowed by the above inequalities, the energy estimate is the same as for the single grid case with outer boundary conditions, see Section 10.1.3, and the discretization is time-stable (see Section 7.4), while for larger values of δ there is damping in the energy, which is proportional to the mismatch at the interface.

  • Except for the case of the most natural choice δ = 0, the evolution equations for outgoing modes also need to be penalized in a consistent way in order to derive an energy estimate. However, as is always the case, the lack of an energy-type estimate does not mean that the scheme is unstable, since the energy method provides sufficient but not always necessary conditions for stability.

  • The general case of symmetric hyperbolic systems follows along the same lines: a decomposition into characteristic variables is performed and the evolution equation for each of them is penalized as in the advection equation example. At least for diagonal norms, stability also follows for general linear symmetric hyperbolic systems in several dimensions. With the standard caveats for non-diagonal norms, the procedure follows in a similar way except that penalty terms are not only added to the evolution equations at the interface on each grid, but also near them. In practice, though, applying penalties just at the interfaces appears to work well in practice in many situations.

Spectral methods. The standard procedure for interface spectral methods is to penalize each incoming characteristic variable, exactly as in the outer boundary condition case. Namely, as in Eq. (10.33) with lower bounds for the penalty strengths given by Eqs. (10.34) and (10.36) for Legendre and Chebyshev polynomials, respectively. We know from the FD analysis above, though, that in general this does not imply an energy estimate, and in order to achieve one, outgoing modes also need to be penalized, with strengths that are coupled to the penalty for incoming modes. However, the procedure of penalizing just incoming modes at interfaces appears to works well in practice, so we analyze this in some detail.

Figure 6 shows the spectrum of the Chebyshev penalty method as described, for an advection equation,

$${u_t} = {u_x}\,,\quad \quad - 1 \leq x \leq 1,\quad t \geq 0,$$
(10.47)

where there is an interface boundary at x = 0. In more detail: the figure shows the maximum real component in the spectrum for N = 20 collocation points as a function of the penalty strength S. There are no real positive values, and this remains true for different values of N, in agreement with the fact that penalizing just the incoming mode works well in practice. The figure also supports the possibility that S ≳ 0.3 might actually be enough for stability, as in the outer boundary case.

Figure 6
figure 6

Maximum real component (left) and spectral radii (right) versus penalty strength S, for the Chebyshev penalty method for the advection equation with two domains (N = 20 collocation points).

The figure also shows the spectral radius as a function of the penalty strength. Beyond S =1 it grows very quickly, and even though as mentioned the timestep is usually determined by keeping the time integration error below that one due to spatial discretization, that might not be the case if the spectral radius is too large. Thus, it is probably good to keep S ≲ 1.

10.3 Going further, applications in numerical relativity

In numerical relativity, the projection method for outer boundary conditions has been used in references [102, 103, 421, 278, 225], the penalty FD one for multi-domain boundary conditions in [281, 141, 385, 145, 324, 325, 425, 256, 454, 426], and for spectral methods in — among many others — [130, 289, 90, 168, 306, 450, 402, 131, 149, 290, 97, 91, 31, 381, 150, 384].

[296] presents a comparison, for the FD case, of numerical boundary conditions through injection, orthogonal projections, and penalty terms. One additional advantage of the penalty approach is that, for advection-diffusion problems, in time it damps away violations of the compatibility conditions between the initial and boundary data; see Section 5. Figure 7 shows the error as a function of time for an advection-diffusion equation where the initial data is perturbed so that an inconsistency with the boundary condition is present, and the error is defined as the difference between the unperturbed and perturbed solutions. See [296] for more details. One expects this not to be the case for hyperbolic problems, though. For example, at the continuum an incompatibility between the initial data and initial boundary condition for the advection would propagate, not dissipate in time, and a consistent and convergent numerical scheme should reflect this behavior.

Figure 7
figure 7

Comparison of different numerical boundary approaches for an advection-diffusion equation where the initial data is perturbed, introducing an inconsistency with the boundary condition [296]. The SAT approach, besides guaranteeing time stability for general systems, “washes out” this inconsistency in time. “MODIFIED P” corresponds to a modified projection [226] for which this is solved at the expense of losing an energy estimate. Courtesy: Ken Mattsson. Reprinted with permission from [296]; copyright by Springer.

Many systems in numerical relativity, starting with Einstein’s equations themselves, are numerically solved by reducing them to first-order systems, because there is a large pool of advanced and mature analytical and numerical techniques for them. However, this is at the expense of enlarging the system and, in particular, introducing extra constraints (though this seems to be less of a concern; see, for example, [286, 82]). It seems more natural to solve such equations directly in second-order (at least in space) form. It turns out, though, that it is considerably more complicated to ensure stability for such systems. Trying to “integrate back” an algorithm for a first-order reduction to its original second-order form is, in general, not possible; see, for example, [101, 413] for a discussion of this apparent paradox and the difficulties associated with constructing boundary closures for second-order systems such that an energy estimate follows.

The SAT approach was generalized to a class of wave equations in second-order form in space in [300, 302, 299, 121]. Part of the difficulty in obtaining an energy estimate for second-order variable coefficient systems in these approaches is related to the property that the FD operators now depend on the equations being solved. This also complicates their generalization to arbitrary systems. For example, [299] deals with systems of the form

$$a{u_{tt}} = {(b{u_x})_x},\quad \quad 0 \leq x \leq 1,$$
(10.48)

where a, b : [0, 1] → ℝ are strictly positive, smooth functions, and difference and dissipative operators of the desired order satisfying SBP approximating (bux)x are constructed. See also [302] for generalizations, which include shift-type terms.

In [301] the projection method and high-order SBP operators approximating second derivatives were used to provide interface boundary conditions for a wave equation (directly in second-order-in-space form) in discontinuous media, while guaranteeing an energy estimate. The domain in this work is a rectangular multi-block domain with piecewise constant coefficients, where the interfaces coincide with the location of the discontinuities. This approach was generalized, using the SAT for the jump discontinuities instead of projection, to variable coefficients and complex geometries in [298].

The difficulty is not related to FDs: in [413] a penalty multi-domain method was derived for second-order systems. For the Legendre and constant coefficient case the method guarantees an energy estimate but difficulties are reported in guaranteeing one in the variable coefficient case. Nevertheless, it appears to work well in practice in the variable coefficient case as well. An interesting aspect of the approach of [413] is that an energy estimate is obtained by applying the penalty in the whole domain (as an analogy we recall the above discussion about the Legendre-Chebyshev penalty method in Section 10.1.3).

A recent generalization, valid both for FDs and — at least Legendre — collocation methods (as discussed, the underlying tool is the same: SBP), to more general penalty couplings, where the penalty terms are not scalar but matrices (i.e., there is coupling between the penalty for different characteristic variables) can be found in [119].

Energy estimates are in general lost when the different grids are not conforming (different types of domain decompositions are discussed in the next Section 11), and interpolation is needed. This is the case when using overlapping patches with curvilinear coordinates but also mesh refinement with Cartesian, nested boxes (see, for example, [277] and references therein). A recent promising development has been the introduction of a procedure for systematically constructing interpolation operators preserving the SBP property for arbitrary high-order cases; see [297]. Numerical tests are presented with a 2:1 refinement ratio, where the design convergence rate is observed. It is not clear whether reflections at refinement boundaries such as those reported in [39] would still need to be addressed or if they would be taken care of by the high-order accuracy.

10.3.1 Absorbing boundary conditions

Finally, we mention some results in numerical relativity concerning absorbing artificial boundaries. In [314], boundary conditions based on the work of [46], which are perfectly absorbing for quadrupolar solutions of the flat-wave equation, were numerically implemented via spectral methods, and proposed to be used in a constrained evolution scheme of Einstein’s field equations [65]. For a different method, which provides exact, nonlocal outer boundary conditions for linearized gravitational waves propagating on a Schwarzschild background; see [271, 270, 272]. A numerical implementation of the well-posed harmonic IBVP with Sommerfeld-type boundary conditions given in [267] was worked out in [33], where the accuracy of the boundary conditions was also tested.

In [366], various boundary treatments for the Einstein equations were compared to each other using the test problem of a Schwarzschild black hole perturbed by an outgoing gravitational wave. The solutions from different boundary algorithms were compared to a reference numerical solution obtained by placing the outer boundary at a distance large enough to be causally disconnected from the interior spacetime region where the comparison was performed. The numerical implementation in [366] was based on the harmonic formulation described in [286].

In Figure 8, a comparison is shown between (a) simple boundary conditions, which just freeze the incoming characteristic fields to their initial value, and (b) constraint-preserving boundary conditions controlling the complex Weyl scalar Ψ0 at the boundary. The boundary surface is an approximate metric sphere of areal radius R = 41.9 M, with M the mass of the black hole. The left side of the figure demonstrates that case (a) leads to significantly larger reflections than case (b). The difference with the reference solution after the first reflection at the boundary is not only large in case (a), but also it does not decrease with increasing resolution. Furthermore, the violations of the constraints shown in the right side of the figure do not converge away in case (a), indicating that one does not obtain a solution to Einstein’s field equations in the continuum limit. In contrast to this, the difference with the reference solution and the constraint violations both decrease with increasing resolution in case (b).

Figure 8
figure 8

Comparison between boundary conditions in case (a) (solid) and case (b) (dotted); see the body of the text for more details. Four different resolutions are shown: (Nr, L) = (21, 8), (31,10), (41,12) and (51,14), where Nr and L refer to the number of collocation points in the radial and angular directions, where Chebyshev and spherical harmonics are used, respectively. Left panel: the difference \(\Delta {\mathcal U}\) between the solution with outer boundary at R = 41.9 M and the reference solution. Right panel: the constraint violation C (see [366] for precise definitions of these quantities and further details). Courtesy: Oliver Rinne. Reprinted with permission from [366]; copyright by IOP.

A similar comparison was performed for alternative boundary conditions, including spatial compactifications and sponge layers. The errors in the gravitational waves were also estimated in [366], by computing the complex Weyl scalar Ψ4 for the different boundary treatments; see Figure 10 in that reference.

Based on the construction of exact in- and outgoing solutions to the linearized Bianchi identities on a Minkowksi background (cf. Example 32), the reflection coefficient γ for spurious reflections at a spherical boundary of areal radius R, which sets the Weyl scalar Ψ0 to zero, was estimated to be [88]

$$\vert \gamma (kR)\vert \;\approx {3 \over 2}{(kR)^{- 4}},$$
(10.49)

for outgoing quadrupolar gravitational radiation with wave number kR−1. Figure 9 shows the Weyl scalars Ψ0 and Ψ4 computed for the boundary conditions in case (b) and extracted at 1.9 M inside the outer boundary. By computing their Fourier transform in time, the overall dependence of the ratio agrees very well with the predicted reflection coefficient.

Figure 9
figure 9

Comparison of the time Fourier transform of Ψ0 and Ψ4 for two different radii of the outer boundary. The leveling off of Ψ0 for kM ≳ 3 is due to numerical roundoff effects (note the magnitude of Ψ0 at those frequencies). Courtesy: Oliver Rinne. Reprinted with permission from [366]; copyright by IOP.

For higher-order absorbing boundary conditions, which involve derivatives of the Weyl scalar Ψ0; see [369] and [365] for their numerical implementation.

11 Domain Decomposition

Most three-dimensional codes solving the Einstein equations currently use several non-uniform grids/numerical domains. Adaptive mesh refinement (AMR) à la Berger & Oliger [48], where the computational domain is covered with a set of nested grids, usually taken to be Cartesian ones, is used by many efforts. See, for instance, [386, 338, 394, 277, 160, 24, 393, 38, 84, 109, 430, 442, 439, 157, 321]). Other approaches use multiple patches with curvilinear coordinates, or a combination of both. Typical simulations of Einstein’s equations do not fall into the category of complex geometries and usually require a fairly “simple” domain decomposition (in comparison to fully unstructured approaches in other fields).

Below we give a brief overview of some domain decomposition approaches. Our discussion is far from exhaustive, and only a few representative samples from the rich variety of efforts are mentioned. In the context of Cauchy evolutions, the use of multiple patches in numerical relativity was first advocated and pursued by Thornburg [417, 418].

11.1 The power and need of adaptivity

Physical problems governed by nonlinear equations can develop small scale structures, which are rarely easy to predict. The canonical example in hydrodynamics is turbulence, which results in short wavelength features up to the viscous scale. An example in general relativity of arbitrarily small scales is the one of critical phenomena [125, 223], where the solution develops a self-similar behavior revealing a universal approach to a singular one describing a naked singularity. Uncovering this phenomena is crucially required to dynamically adjust the grid structure to respond to the (exponentially) ever-shrinking features of the solution. Recently, such need was also demonstrated quite clearly by the resolution of the final fate of unstable black strings [279, 280]. This work followed the dynamics, in five-dimensional spacetimes, of an unstable black string: a black hole with topology S2× S1, with the asymptotic length of S1 over the black hole mass per unit length above the critical value for linearized stability [216, 217]. As the evolution unfolds, pieces of the string shrink and elongate (so that the area increases), yielding another unstable stage; see Figure 10. This behavior repeats in a self-similar manner, the black string developing a fractal structure of thin black strings joining spherical black holes. This behavior was followed through four generations and the numerical grid refined in some regions up to a factor of 217 compared to the initial one. This allowed the authors to extrapolate the observed behavior and conclude that the spacetime will develop naked singularities in finite time from generic conditions, thereby providing a counterexample to the cosmic censorship conjecture in five dimensions.

Figure 10
figure 10

Early (left) and late (right) stages of the apparent horizon describing the evolution of an unstable black string. Courtesy: Luis Lehner and Frans Pretorius.

11.2 Adaptive mesh refinement for BBH in higher dimensional gravity

In [439] the authors numerically evolved black-hole binaries “in a box” by imposing reflecting outer boundary conditions, which mimic the anti-de Sitter spacetime. These conditions are imposed on all the fields and the outer boundary is taken to have spherical shape, which is approximated by a “Lego” sphere. Figure 11 on the left illustrates a Lego-sphere around a black-hole binary, mimicking an asymptotically anti-de Sitter spacetime. A computational domain is schematically displayed using four refinement levels with one or two components each. The individual components are labeled \(G_m^i\), where the indices i and m denote the refinement level and component number, respectively. At the spherical boundary, marked by X, reflecting boundary conditions were imposed. The right figure shows mesh refinement around two black holes, with the apparent horizons represented by the white grid. Further extensions of this research program to numerically study black holes in higher-dimensional gravity can be found in [441, 437].

Figure 11
figure 11

Left: Lego-sphere around a black-hole binary. Right: Mesh refinement around two black holes. Courtesy: Vitor Cardoso, Ulrich Sperhake and Helvi Witek. Reprinted with permission from [438]; copyright by APS.

11.3 Adaptive mesh refinement and curvilinear grids

In [331, 333] the authors introduced an approach, which combines the advantages of adaptive mesh refinement near the “sources” (say, black holes) with curvilinear coordinates adapted to the wave zone; see Figure 12. The patches are communicated using polynomial interpolation in its Lagrange form, as explained in Section 8.1, and centered stencils are used, both for finite differencing and interpolation. Up to eighth-order finite differencing is used, with an observed convergence rate between six and eight in the (ℓ = 2 = m) modes of the computed gravitational waves (parts of the scheme have a lower-order convergence rate, but they do not appear to dominate). Presently, the BSSN formulation of Einstein’s equations as described in Section 4.3 is used directly in its second-order-in-space form, with outgoing boundary conditions for all the fields. The implementation is generic and flexible enough to allow for other systems of equations, though. As in most approaches using curvilinear coordinates in numerical relativity, the field variables are expressed in a global coordinate frame. This might sound unnatural and against the idea of using local patches and coordinates. However, it simplifies dramatically any implementation. It is also particularly important when taking into account that most formulations of Einstein’s equations and coordinate conditions used are not covariant.

Figure 12
figure 12

Combining adaptive mesh refinement with curvilinear grids adapted to the wave zone. Courtesy: Denis Pollney. Reprinted with permission: top from [332], bottom from [334]; copyright by APS.

This hybrid approach has been used in several applications, including the validation of extrapolation procedures of gravitational waves extracted from numerical simulations at finite radii to large distances from the “sources” [331]. Since the outermost grid structure is well adapted to the wave zone, the outer boundary can be located at large distances with only linear cost on its location. Other applications include Cauchy-Characteristic extraction (CCE) of gravitational waves [350, 55], a waveform hybrid development [371], and studies of memory effect in gravitational waves [330]. The accuracy necessary to study small memory effects is enabled both by the grid structure — being able to locate the outer boundary far away — and CCE.

11.4 Spectral multi-domain binary black-hole evolutions

In current binary black-hole evolutions using spectral collocation methods, there are typically three sets of spherical shells, one around each black hole and one in the wave zone. These three shells are connected by subdomains of various shapes and sizes. Figure 13 shows the global structure of one such grid-structure, emphasizing the spherical patches in the wave zone and how the different domains are connected. Inter-domain boundary conditions are set by the spectral penalty method described in Section 10. The adaptivity provided by domain decomposition in addition to the spectral convergence rate, has lead to the highest-accuracy binary black-hole simulations to date. Currently these evolutions use a first-order symmetry-hyperbolic reduction of the harmonic system with constraint damping as derived in [286] and summarized in Section 4.1, with constraint-preserving boundary conditions, as designed in [286, 366, 363]. The field variables are expressed in an “inertial” Cartesian coordinate system, which is related to one fixed to the computational domain through a dynamically-defined coordinate transformation tracking the black holes (the “dual frame” method) [384].

Figure 13
figure 13

Sample domain decomposition used in spectral evolutions of black-hole binaries. The bottom plot illustrates how the coordinate shape of the excision domain is kept proportional to the coordinate shape of the black holes themselves. Courtesy: Bela Szilágyi.

Simulating non-vacuum systems such as relativistic hydrodynamical ones using spectral methods can be problematic, particularly when surfaces, shocks, or other non-smooth behavior appears in the fluid. Without further processing, the fast convergence is lost, and Gibbs’ oscillations can destabilize the simulation. A method that has been successfully used to overcome this in general-relativistic hydrodynamics is evolving the spacetime metric and the fluid on two different grids, each using different numerical techniques. The spacetime is evolved spectrally, while the fluid is evolved using standard finite difference/finite volume shock-capturing techniques on a separate uniform grid. The first code adopting this approach was described in [142], which is a stellar-collapse code assuming a conformally-flat three-metric, with the resulting elliptic equations being solved spectrally. The two-grid approach was adopted for full numerical-relativity simulations of black-hole-neutron-star binaries in [150, 149, 168]. The main advantage of this method when applied to binary systems is that at any given time the fluid evolution grid only needs to extend as far as the neutron-star matter. During the pre-merger phase, then, this grid can be chosen to be a small box around the neutron star, achieving very high resolution for the fluid evolution at low computational cost. More recently, in [168] an automated re-gridder was added, so that the fluid grid automatically adjusts itself at discrete times to accommodate expansion or contraction of the nuclear matter. The main disadvantage of the two-grid method is the large amount of interpolation required for the two grids to communicate with each other. Straightforward spectral interpolation would be prohibitively expensive, but a combination of spectral refinement and polynomial interpolation [69] reduces the cost to about 20–30 percent of the simulation time.

11.5 Multi-domain studies of accretion disks around black holes

The freedom to choose arbitrary (smooth) coordinate transformations allows the design of sophisticated problem-fitted meshes to address a number of practical issues. In [256] the authors used a hybrid multiblock approach for a general-relativistic hydrodynamics code developed in [456] to study instabilities in accretion disks around black holes in the context of gamma-ray-burst central engines. They evolved the spacetime metric using the first-order form of the generalized harmonic formulation of the Einstein equations (see Section 4.1) on conforming grids, while using a high-resolution shock capturing scheme for relativistic fluids on the same grid but with additional overlapping boundary zones (see [456] for details on the method). The metric differentiation was performed using the optimized D8−4 FD operators satisfying the SBP property, as described in Section 8.3. The authors made extensive use of adapted curvilinear coordinates in order to achieve desired resolutions in different parts of the domain and to make the coordinate lines conform to the shape of the solution. Maximal dissipative boundary conditions as defined in Section 5.2 were applied to the incoming fields, and inter-domain boundary conditions for the metric were implemented using the finite-difference version of the penalty method described in Section 10.

Figure 14 shows examples of the type of mesh adaptation used. The top left panel shows the meridional cut of an accretion disk on a uniform multiblock mesh (model C in [256]). The top right panel gives an example of a mesh with adapted radial coordinate lines, which resolves the disk more accurately than the mesh with uniform grid resolution (see [256] for details on the particular coordinate transformations used to obtain such a grid). A 3D view of such multi-domain mesh at large radii is shown on the bottom left panel of Figure 14. In the area near the inner radius where the central black hole is located, the resolution is high enough to accurately resolve the shape and the dynamics of the black-hole horizon. Finally, near the disk, the resolution across the disk in both radial and angular directions is made approximately equal and sufficiently high to resolve the transverse disk dynamics. The bottom right panel of Figure 14 shows the 3D view of the adapted mesh in the vicinity of the disk.

Figure 14
figure 14

Domain decomposition used in evolutions of accretion disks around black holes [256]; see the text for more details. Courtesy: Oleg Korobkin.

11.6 Finite-difference multi-block orbiting binary black-hole simulations

In [325] orbiting binary black-hole simulations using a high-order FD multi-domain approach were presented. The basic layout of the full domain used is shown in Figures 15 and 16, where the centers of the excised black-hole spheres are located along the x axis at x = ±a. The computational domain is based on two types of building blocks, shown in Figure 17, which are essentially variations of cubed spheres. In the simulations of [325] the optimized D8−4 difference operator described in Section 8.3 was used, the patches were communicated using the penalty technique (see Section 10), and the resulting equations were evolved in time using an embedded fourth- and fifth-order Runge-Kutta stepper (see Section 7.5). The formulations of the equations used, boundary conditions and dual frame technique are exactly those of the spectral evolutions discussed above, only the numerical and domain decomposition approaches differ, which follow building work from [281, 141, 385, 145, 324]. Strong and weak scaling is observed up to at least several thousand processors.

Figure 15
figure 15

Equatorial cut of the computational domain used in multi-block simulations of orbiting black-hole binaries (left). Schematic figure showing the direction considered as radial (red arrows) for the cuboidal blocks (right). Reprinted with permission from [326]; copyright by APS.

Figure 16
figure 16

Multi-block domain decomposition for a binary black-hole simulation. Reprinted with permission from [326]; copyright by APS.

Figure 17
figure 17

Equatorial cross-section of varations of cubed sphere patches. Reprinted with permission from [326]; copyright by APS.