To analyze LAT data, we construct the likelihood that is applicable to the LAT data, and then use this likelihood to find the best fit model parameters. These parameters include the description of a source's spectrum, its position, and even whether it exists.
The likelihood L is the probability of obtaining the data given an input model. In our case, the input model is the distribution of gamma-ray sources on the sky, and includes their intensity and spectra. There is an implicit assumption that we understand sufficiently well the response of our detectors, in this case the LAT and the GBM, to the incident flux, in other words, that we have a sufficiently accurate mapping of the input model (the gamma-ray sky) to the data (the list of counts produced by either the LAT or the GBM).
Clearly, we expect a higher probability of obtaining the data from a model that is a better description of the underlying reality than from a model that is a poor description. At the same time we need to consider the plausibility of the models being compared; the data must favor a less plausible model more strongly before we accept that model. For example, few would consider a 30 percent discrepancy between energies calculated in an undergraduate laboratory course to be evidence for a violation of the conservation of energy.
The form of the LAT likelihood function will be discussed in the next section.
In one of the most common applications, we know that a source is present, and we want to determine the best value of the spectral model parameters. Since we expect the best model to have the highest probability of resulting in the data, we vary the spectral parameters until the likelihood is maximized. Note that χ2 is -2 times the logarithm of the likelihood in the limit of a large number of counts in each bin, and therefore where χ2 is a valid statistic, minimizing χ2 is equivalent to maximizing the likelihood.
A number of steps are necessary to fit a source's spectra; these are described in detail below.
Thus likelihood spectral fitting provides the best fit parameter values and their uncertainties. But is this a good fit? When χ2 is a valid statistic, then we know that the value of χ2 is drawn from a known distribution, and we can use the probability of obtaining the observed value as a goodness-of-fit measure. When there are many degrees of freedom (i.e., the number of energy channels minus the number of fitted parameters) then we expect the χ2 per degree of freedom to be ~1 for a good fit. However, when χ2 is not a valid statistic, we usually do not know the distribution from which the maximum likelihood value is drawn, and therefore we do not have a goodness-of-fit measure.
As mentioned above, the optimizers find the best fit spectral parameters, but not the location. In other words, the fitting tool does not fit the source coordinates. However, a tool is provided that performs a grid searchmapping out the maximum likelihood value over a grid of locations. As will be explained below, it is convenient to use a quantity called the 'Test Statistic' TS that is maximized when the likelihood is maximized.
The Test Statistic is defined as TS=-2ln(Lmax,0/Lmax,1), where Lmax,0 is the maximum likelihood value for a model without an additional source (the 'null hypothesis') and Lmax,1 is the maximum likelihood value for a model with the additional source at a specified location. As can be seen, TS is a monotonically increasing function of Lmax,1, which is why maximizing TS on a grid is equivalent to maximizing the likelihood on a grid. In the limit of a large number of counts, Wilkes Theorem states that the TS for the null hypothesis is asymptotically distributed as χ2x (here χ2 is the distribution, not a value of the statistic), where x is the number of parameters characterizing the additional source. This means that TS is drawn from this distribution if no source is present, and an apparent source results from a fluctuation. Thus, a larger TS indicates that the null hypothesis is incorrect (i.e., a source really is present), which can be quantified. As a basic rule of thumb, the square root of the TS is approximately equal to the detection significance for a given source.