Fermi Gamma-ray Space Telescope

Likelihood Overview

To analyze LAT data, we construct the likelihood that is applicable to the LAT data, and then use this likelihood to find the best fit model parameters. These parameters include the description of a source's spectrum, its position, and even whether it exists.

The Likelihood

The likelihood L is the probability of obtaining the data given an input model. In our case, the input model is the distribution of gamma-ray sources on the sky, and includes their intensity and spectra. There is an implicit assumption that we understand sufficiently well the response of our detectors, in this case the LAT and the GBM, to the incident flux, in other words, that we have a sufficiently accurate mapping of the input model (the gamma-ray sky) to the data (the list of counts produced by either the LAT or the GBM).

Clearly, we expect a higher probability of obtaining the data from a model that is a better description of the underlying reality than from a model that is a poor description. At the same time we need to consider the plausibility of the models being compared; the data must favor a less plausible model more strongly before we accept that model. For example, few would consider a 30 percent discrepancy between energies calculated in an undergraduate laboratory course to be evidence for a violation of the conservation of energy.

The form of the LAT likelihood function will be discussed in the next section.

Model Fitting

In one of the most common applications, we know that a source is present, and we want to determine the best value of the spectral model parameters. Since we expect the best model to have the highest probability of resulting in the data, we vary the spectral parameters until the likelihood is maximized. Note that χ2 is -2 times the logarithm of the likelihood in the limit of a large number of counts in each bin, and therefore where χ2 is a valid statistic, minimizing χ2 is equivalent to maximizing the likelihood.

A number of steps are necessary to fit a source's spectra; these are described in detail below.

  1. Select the data. The data from a substantial spatial region around the source(s) being analyzed must be used because of the overlapping of the point spread functions of nearby sources.
  2. Select the model. This model includes the position of the source(s) being analyzed, the position of nearby sources, a model of the diffuse emission, the functional form of the source spectra, and values of the spectral parameters. In fitting the source(s) of interest, you will let the parameters for these sources vary, but because the region around these sources includes counts from nearby sources in which you are not interested, you might also let the parameters from these nearby sources vary.
  3. Precompute a number of quantities that are part of the likelihood computation. As the parameter values are varied in searching for the best fit, the likelihood is calculated many times. While not strictly necessary, precomputing a number of computation-intensive quantities will greatly speed up the fitting process.
  4. Finally, perform the actual fit. The parameter space can be quite large—the spectral parameters from a number of sources must be fit simultaneously—and therefore the likelihood tools provide a choice of three 'optimizers' (section 7.8) to maximize the likelihood efficiently. Fitting requires repeatedly calculating the likelihood for different trial parameter sets until a value sufficiently near the maximum is found; the optimizers guide the choice of new trial parameter sets to converge efficiently on the best set. The variation of the likelihood in the vicinity of the maximum can be related to the uncertainties on the parameters, and therefore these optimizers estimate the parameter uncertainties.

Thus likelihood spectral fitting provides the best fit parameter values and their uncertainties. But is this a good fit? When χ2 is a valid statistic, then we know that the value of χ2 is drawn from a known distribution, and we can use the probability of obtaining the observed value as a goodness-of-fit measure. When there are many degrees of freedom (i.e., the number of energy channels minus the number of fitted parameters) then we expect the χ2 per degree of freedom to be ~1 for a good fit. However, when χ2 is not a valid statistic, we usually do not know the distribution from which the maximum likelihood value is drawn, and therefore we do not have a goodness-of-fit measure.

Source Localization

As mentioned above, the optimizers find the best fit spectral parameters, but not the location. In other words, the fitting tool does not fit the source coordinates. However, a tool is provided that performs a grid search—mapping out the maximum likelihood value over a grid of locations. As will be explained below, it is convenient to use a quantity called the 'Test Statistic' TS that is maximized when the likelihood is maximized.

Source Detection

The Test Statistic is defined as TS=-2ln(Lmax,0/Lmax,1), where Lmax,0 is the maximum likelihood value for a model without an additional source (the 'null hypothesis') and Lmax,1 is the maximum likelihood value for a model with the additional source at a specified location. As can be seen, TS is a monotonically increasing function of Lmax,1, which is why maximizing TS on a grid is equivalent to maximizing the likelihood on a grid. In the limit of a large number of counts, Wilkes Theorem states that the TS for the null hypothesis is asymptotically distributed as χ2x (here χ2 is the distribution, not a value of the statistic), where x is the number of parameters characterizing the additional source. This means that TS is drawn from this distribution if no source is present, and an apparent source results from a fluctuation. Thus, a larger TS indicates that the null hypothesis is incorrect (i.e., a source really is present), which can be quantified. As a basic rule of thumb, the square root of the TS is approximately equal to the detection significance for a given source.


» Forward to The Likelihood Functional Form
» Back to The Challenge of LAT Data Analysis
» Back to the beginning of the likelihood section
» Back to the beginning of the Cicerone