Forward problem¶
The forward problem describes how the physical properties of an (astrophysical) medium give rise to a spectral line observation. We collectively represent the physical properties of an object by a model vector, \(\boldsymbol{m}\), and we represent an observation by a vector, \(\boldsymbol{o}\). Hence, the forward problem can be thought of as a function, \(f\), that maps a model vector, \(\boldsymbol{m}\), to the corresponding observation vector, \begin{equation*} \boldsymbol{o} \ = \ f \left( \boldsymbol{m} \right) . \end{equation*} We distinguish two stages, such that, \(f\left( \boldsymbol{m} \right) = f_{2}\left( f_{1}\left( \boldsymbol{m} \right) \right)\). The first part, \(f_{1}\), describes the spectral line formation, i.e. the radiative processes that determine the amount of electromagnetic radiation that emanates from the medium and travels towards our detectors. The second part, \(f_{2}\), describes the observation and includes the instrumentation effects on the observed signal.
Spectral line formation¶
Physical model¶
In pomme, we use Cartesian coordinates, \(\boldsymbol{x}=(x,y,z)\), and assume the line of sight to be along the positive \(z\)-axis, such that the plane of the sky corresponds to the \((x,y)\)-plane. Furthermore, we will assume that the observer is located at \(z=0\), and that the model box has a depth, \(L\), along the \(z\)-axis. Then, in the absence of radiation scattering, considering a single ray through the model (resulting in a single pixel in the observed image), the observed intensity at \(\boldsymbol{x}=(x,y,0)\), at frequency, \(\nu\), reads, \begin{equation*} \begin{split} I_{\text{obs}}(\nu; x, y) \ =& \ \, I_{\text{bdy}}(\nu; x, y) \, e^{-\tau_{\text{obs}}(\nu; \, x, y, L)} \\ & \ + \ \int_{0}^{L} \text{d}z \ \eta\big( \nu_{\text{com}}(\nu; \, x, y, z); \, x, y, z \big) \, e^{-\tau_{\text{obs}}(\nu; \, x, y, z)} , \end{split} \end{equation*} where we defined the intensity of the incoming radiation at the boundary, \(I_{\text{bdy}}(\nu; x, y)\), and the optical depth, \(\tau_{\text{obs}}\), in the observer frame between \((x,y,0)\) and \((x,y,z)\), as, \begin{equation*} \tau_{\text{obs}}(\nu; x, y, z) \ \equiv \ \int_{0}^{z} \text{d}z' \ \chi\big( \nu_{\text{com}}(\nu; \, x, y, z'); \, x, y, z' \big) . \end{equation*}
Since we look along the \(z\)-axis, the non-relativistic Doppler-shifted frequency in the co-moving frame is given by, \begin{equation*} \nu_{\text{com}}(\nu, \boldsymbol{x}) \ = \ \left( 1 + \frac{v_{z}(\boldsymbol{x})}{c} \right) \nu , \end{equation*} where, \(v_{z}(\boldsymbol{x})\), is the component of the velocity field along the line of sight, which we earlier already chose to be along the \(z\)-axis, and, \(c\), is the speed of light. The above equations, known as the formal solution to the radiative transfer equation, express the monochromatic intensity, \(I_{\text{obs}}(\nu; x, y)\), observed along the \(z\)-axis, at \(z=0\), in terms of the monochromatic emissivity, \(\eta(\nu; \boldsymbol{x})\), and opacity, \(\chi(\nu; \boldsymbol{x})\), throughout the medium, along the line of sight. The numerical solution of these integrals is discussed below. When evaluating the emissivity and opacity, to account for Doppler shifts caused by the macroscopic motion of the medium along the line of sight, we shift all frequencies to the co-moving frame.
Considering only a single spectral line transition between the quantized energy levels denoted by \(i\) and \(j\), the corresponding local monochromatic line emissivity, \(\eta_{ij}(\nu; \boldsymbol{x})\), and opacity, \(\chi_{ij}(\nu; \boldsymbol{x})\), can be written as, \begin{align*} \eta_{ij}(\nu; \boldsymbol{x}) \ &= \ \eta_{ij}(\boldsymbol{x}) \, n(\boldsymbol{x}) \, \phi_{ij}(\nu; \boldsymbol{x}) , \\ \chi_{ij}(\nu; \boldsymbol{x}) \ &= \ \chi_{ij}(\boldsymbol{x}) \, n(\boldsymbol{x}) \, \phi_{ij}(\nu; \boldsymbol{x}) . \end{align*} Here, \(n(\boldsymbol{x})\), denotes the number density of the chemical species producing the line, and \(\phi_{ij}(\nu; \boldsymbol{x})\) is the line profile function describing the spread of the spectral line in frequency space. In this paper, we will assume a Gaussian line profile function predominantly caused by Doppler shifts due to the thermal and turbulent motions (along the line of sight) in the medium, \begin{equation*} \phi(\nu; \boldsymbol{x}) \ = \ \frac{1}{\delta\nu_{ij}(\boldsymbol{x}) \sqrt{\pi}} \ \exp \left( - \left(\frac{\nu - \nu_{ij}}{\delta \nu_{ij}(\boldsymbol{x})}\right)^{2} \right) , \end{equation*} in which the local line width, \(\delta\nu_{ij}(\boldsymbol{x})\), is defined as, \begin{equation*} \delta\nu_{ij}(\boldsymbol{x}) \ = \ \frac{\nu_{ij}}{c} \sqrt{\frac{2 k_{\text{B}}T(\boldsymbol{x})}{m_{\text{spec}}} \ + \ v_{\text{turb}}^{2}(\boldsymbol{x})} . \end{equation*} The Gaussian line profile is centred around the line frequency, \(\nu_{ij}\). The line width is determined by a thermal component characterised by the local gas temperature, \(T(\boldsymbol{x})\), and the molecular mass, \(m_{\text{spec}}\), of the chemical species producing the line, and a turbulent component characterised by a local turbulent velocity, \(v_{\text{turb}}(\boldsymbol{x})\). The two remaining components are the line emissivity and opacity, \begin{align*} \eta_{ij}(\boldsymbol{x}) \ &= \ \frac{h \nu_{ij}}{4 \pi} \, p_{i}(\boldsymbol{x}) A_{ij}, \\ \chi_{ij}(\boldsymbol{x}) \ &= \ \frac{h \nu_{ij}}{4 \pi} \, \Big( p_{j}(\boldsymbol{x}) B_{ji} \ - \ p_{i}(\boldsymbol{x}) B_{ij} \Big) , \end{align*} which are determined by the Einstein \(A_{ij}\), \(B_{ji}\), and \(B_{ij}\) coefficients quantifying the rates of spontaneous emission, absorption, and stimulated emission respectively. Note that stimulated emission is modelled as negative absorption. The \(p_{i}(\boldsymbol{x})\) denote the relative populations of the energy levels and represent the local quantum mechanical state of the medium. These are normalised such that \(\sum_{i} p_{i}(\boldsymbol{x}) = 1\), in which the sum iterates over all energy levels of the chemical species under consideration.
Assuming that the medium is in local thermodynamic equilibrium (LTE), the local level populations are completely determined by the local gas temperature, \(T(\boldsymbol{x})\), \begin{equation*} p_{i}(\boldsymbol{x}) \ = \ \frac{g_{i}}{Z(\boldsymbol{x})} \, \exp \left(-\frac{E_{i}}{k_{\text{B}} T(\boldsymbol{x})} \right) , \end{equation*} where \(g_{i}\) and \(E_{i}\) denote the statistical weight and the energy of level, \(i\), respectively, and the normalisation factor, \(Z(\boldsymbol{x})\), is defined such that the local level populations are normalised, \(\sum_{i} p_{i}(\boldsymbol{x}) = 1\). Hence, assuming LTE, the spectral line model is determined completely by the local gas temperature, \(T(\boldsymbol{x})\), the number density, \(n(\boldsymbol{x})\), the turbulent velocity, \(v_{\text{turb}}(\boldsymbol{x})\), and the \(z\)-component of the velocity, \(v_{z}(\boldsymbol{x})\). All other parameters, such as the radiative constants, can be found in data bases. In this paper, we use the Leiden Atomic and Molecular Database (LAMDA).
It should be emphasised that, although all these parameters are encoded in the spectral lines, this does not mean that they also can be retrieved unambiguously. Note that the information of all distributions along the line of sight is encoded in the frequency-dependence of the intensity in a single pixel, and thus, without further assumptions, some information will necessarily be lost.
Numerical implementation¶
Line optical depth¶
Velocity gradients play a key role in line radiative transfer. Since spectral lines are narrowly peaked in frequency space, they are very sensitive to Doppler shifts, and thus motion (gradients), along the line of sight. Therefore, when numerically solving a line transfer problem, it is key to properly trace the velocity (gradient) along the line of sight. Since we assume to know the line profile function analytically, we can take care of this sharp frequency dependence by resolving its dependence analytically.
Consider a line-of-sight segment between two consequtive elements, indexed as 0 and 1, parametrized by \(\lambda \in [0, 1]\). The line optical depth in this segment can then be written as, \begin{equation*} \chi(\lambda) \ = \ a(\lambda) \, \exp \left(-b(\lambda)^{2}\right) . \end{equation*} where we defined, \begin{align*} a(\lambda) \ &= \ \frac{\chi_{ij}(\lambda) \, n(\lambda)}{\sqrt{\pi} \, \delta\nu_{ij}(\lambda)}, \\ b(\lambda) \ &= \ \frac{1}{\delta \nu_{ij}(\lambda)} \left\{ \left( 1 + \frac{v_{z}(\lambda)}{c} \right) \nu - \nu_{ij} \right\} . \end{align*} The strongly peaked behaviour is mainly caused by the exponential function. We can resolve this, for instance in the computation of the optical depth, by using linear interpolation functions for \(a\) and \(b\) while explicitly integrating the exponential. This yields the optical depth increment, \begin{equation*} \Delta \tau \ = \ \Delta x \int_{0}^{1} \text{d}\lambda \ \chi (\lambda) . \end{equation*} (Note: in the implementation we included the factor \(\Delta x\) in the definition of \(a\), for efficiency.) Using the linear interpolation scheme, \begin{align*} a(\lambda) \ &= \ (1-\lambda) a_{0} \ + \ \lambda a_{1}, \\ b(\lambda) \ &= \ (1-\lambda) b_{0} \ + \ \lambda b_{1}, \end{align*} for the functions \(a\) and \(b\), this yields, \begin{equation*} \Delta \tau \ = \ \frac{\Delta x}{2\left(b_{1}-b_{0}\right)^{2}} \left\{ \left(a_{1}-a_{0}\right) \left( e^{-b_{0}^{2}} - e^{-b_{1}^{2}}\right) + \sqrt{\pi} \left(b_{0}a_{1}-b_{1}a_{0}\right) \big(\text{Erf}(b_{0}) - \text{Erf}(b_{1})\big) \right\} . \end{equation*} This expression is numerically stable as long as \(b_1\) is not too close to \(b_0\), but will suffer from cancelation errors otherwise.
Therefore, for \(\left|b_{1}-b_{0}\right| < 10^{-3}\), we use the first two terms of the Taylor expansion of \(b_1\) around \(b_0\), \begin{equation*} \Delta \tau \ \approx \ \Delta x \, e^{-b_{0}^{2}} \left(\frac{1}{2}\left(a_{0} + a_{1}\right) \ - \ \frac{1}{3} \, \left( a_{0} + 2 a_{1} \right) b_{0} \left(b_{1}-b_{0} \right) \right) \\ \end{equation*}
The implementation on this line optical depth can be found in src/pomme/lines.py. It turned out that the implementation with two masks (one for case \(\left|b_{1}-b_{0}\right| < 10^{-3}\) and one for its complement) is more expensive than doing both calculations for both cases, and only in the end mergingg the result. This, however, will mean that at seem point some \(\Delta \tau\) will be NaN due to division by zero (coming from \(b_{1}-b_{0}\)), which causes no problem for the forward model (since these values will eventually be overwritten), but which causes gradients to diverge (see this issue). Therefore we add a small number \(10^{-30}\) to the denominator.
Line radiative transfer¶
Consider a line-of-sight segment between two consequtive elements, indexed as 0 and 1, parametrized by \(\lambda \in [0, 1]\). Ignoring any incoming radiation, the accumulated intentisty in this segment can then be written as, \begin{equation*} \Delta I \ = \ \Delta\tau \int_{0}^{1} \text{d}\lambda \ S(\lambda) \, e^{-\lambda}. \end{equation*} where the source function is defined as, \begin{equation*} S \ \equiv \ \frac{\eta}{\chi} . \end{equation*} Using the linear interpolation scheme, \begin{align*} S (\lambda) \ &= \ (1-\lambda) S_{0} \ + \ \lambda S_{1}, \\ \tau(\lambda) \ &= \ (1-\lambda) \tau_{0} \ + \ \lambda \tau_{1}, \end{align*} for the source function, \(S\), and the optical depth, \(tau\), this yields, \begin{equation*} \Delta I \ = \ \frac{1}{\Delta\tau} \Big( S_{0} \, e^{-\tau_{0}} \left( e^{-\Delta\tau} - (1 - \Delta\tau) \right) \ + \ S_{1} \, e^{-\tau_{1}} \left( e^{+\Delta\tau} - (1 + \Delta\tau) \right) \Big), \end{equation*} where \(\Delta\tau \equiv \tau_{1} - \tau_{0}\). This expression is numerically stable as long as \(\Delta \tau\) is not too small, but will suffer from cancellation errors otherwise.
Therefore, for \(\Delta \tau < 10^{-2}\) we use the first three terms in the Taylor expansion, \begin{equation*} \frac{1}{\Delta\tau} \left(e^{-\Delta\tau} - (1 - \Delta\tau) \right) \ \approx \ \frac{1}{2}\Delta\tau \ - \ \frac{1}{6}\Delta\tau^{2} \ + \ \frac{1}{24}\Delta\tau^{3}, \end{equation*}
\begin{equation*} \frac{1}{\Delta\tau} \left(e^{+\Delta\tau} - (1 + \Delta\tau) \right) \ \approx \ \frac{1}{2} \Delta\tau \ + \ \frac{1}{6}\Delta\tau^{2} \ + \ \frac{1}{24}\Delta\tau^{3}, \end{equation*}
in which you can recognize the expansion of the exponential minus the first two terms. The implementation of this intensity increment can be found in src/pomme/forward.py. For the same reaseon as with the line optical depth, we compute both cases for all values, to minimise the use of masks, and we add a small number (\(10^{-30}\)) to the denominator to avoid NaNs in gradients.
Observation¶
In the previous section, we presented a theoretical model for spectral line formation. In practice, however, spectral line observations are affected by instrumentation effects, such as binning and noise, the particular form of which highly depends on the way the object is observed.
High spatial and spectral resolution observations are typically obtained using interferometry, for which the situation is even more complicated, since there are spatial scale-dependent effects which cause the smallest and largest structures to be unresolved in the resulting images. These effects must carefully be taken into account, as they (in part) determine which information about the spatial distribution of the physical properties is encoded in the observations and which is not. At this point, we should clarify what we mean with interferometric data and distinguish between two types. The first type are the visibilities, i.e. the correlations between the coherent electromagnetic waves, measured in several frequency bins for the different pairs of antennas in the array. These visibilities are often further processed into images that map the actual brightness in the plane of the sky for each frequency bin. This is what we distinguish as the second type of interferometric data. The conversion of the sky brightness maps from the visibilities can be achieved, for instance, with the clean algorithm (Hogbom 1974), as implemented, for instance, in CASA. This conversion is in itself already an inverse problem. Therefore, when reconstructing a model from interferometric observations, it makes sense to start directly from the visibilities. However, since visibilities are more difficult to interpret than sky brightness maps, we consider both options.
The conversion of the observed intensity in the plane of the sky, \(I_{\text{obs}}(\nu; x, y)\), to visibilities, \(V(\nu;u,v)\), can be expressed as, \begin{equation*} V(\nu; u, v) \ = \ \frac{1}{d^{2}} \int_{-\infty}^{+\infty} \text{d}x \int_{-\infty}^{+\infty} \text{d}y \ \hat{A}(x,y) \, I_{\text{obs}}(\nu; x, y) \exp\big( -2\pi i \left( u x + v y \right) \big) \end{equation*} in which \(\hat{A}(x,y) \equiv A(x,y) / A_{0}\), is the antenna response function, normalised by, \(A_{0}\), the response at the centre of the beam, \(d\), is the distance between the object and the observer, and \(u\) and \(v\) are the components of the projected distance vectors, often referred to as baselines, between pairs of antennas in the array. Since there are only a finite number of antennas and thus a finite number of antenna pairs, the visibilities are only known at a finite number of baseline samples, \(\{(u_{d},v_{d})\}_{d=1}^{N_{d}}\). This sampling, which depends on the particular antenna configuration, causes a loss of information about the spatial distribution of the source in an intricate way. We can see that the transformation of the intensity in the plane of the sky into visibilities is essentially a 2D Fourier transform. Although the Fourier transform is invertible, the fact that we can only sample the visibilities for a finite number of baselines, makes that the combined transformation from a sky brightness map to a finite set of visibilities is not invertible.
In principle, we could use the exact pipeline to model the instrumentation effects in our forward model, for instance, using CASA. However, later, in our reconstruction algorithm, we want the implementation of the forward function, \(f\), to be automatically differentiable using the autograd functionality in PyTorch. Therefore, we implemented this map ourselves in PyTorch using a fast Fourier transform (FFT), following the Galario package for visibility modelling (Tazzari et al. 2018).
Alternatively, one may want to reconstruct 3D physical models from previously obtained sky brightness maps. In that case, it is important to consider the beam, i.e. a kernel that models the spatial spread of the intensity in the plane of the sky, and the antenna response function. \begin{equation*} \tilde{I}_{\text{obs}}(\nu; x, y) \ = \ \frac{1}{d^{2}} \int_{-\infty}^{+\infty} \text{d}x' \int_{-\infty}^{+\infty} \text{d}y' \ \hat{A}(x,y) \, I_{\text{obs}}(\nu; x', y') \, B(x-x',y-y') , \end{equation*} in which \(B(x, y)\) is the beam kernel, which we assume to be a 2D Gaussian, centred around \((0,0)\), and, \(d\), is again the distance between the object and the observer.
It should be emphasised that in both processes of line formation and observation information is lost. Mathematically, this implies that the function, \(f\), is not invertible. In the next section, we describe how, with a probabilistic approach, we can circumvent this problem and obtain a probabilistic inverse.