{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Plugin-like Architecture\n", "\n", "Parts of the GBM Data Tools were designed such that various operations could be performed by one of several different algorithms. Some of those algorithms are provided in the Data Tools, but we also want to leave open the possibility of future development and user-defined algorithms. Specifically, data binning, background fitting, spectral fitting functions, the maximum-likelihood-based spectral fitter, and even some of the datatypes were designed with an eye toward user-customization. We'll briefly describe each of these areas.\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data Binning\n", "As mentioned previously, the binning algorithms are divided into two types: algorithms for pre-binned data (found in `gbm.binning.binned`) and algorithms for unbinned data (in `gbm.binning.unbinned`). There are a number of algorithms of each type already provided, but a user could create their own algorithms if they find what is provided isn't adequate. In both cases, all you need to do is write up a single function that performs the binning by taking in some standard inputs, and providing a standard output.\n", "\n", "### Pre-binned Algorithms\n", "For prebinned data, let's say we want to develop a Bayesian-block binning algorithm. Our function should look like this:\n", "```python\n", "def bayesblock_rebin(old_counts, old_exposure, old_edges, *args):\n", " some_result_here = some_functionality(*args)\n", " ...\n", " return new_counts, new_exposure, new_edges\n", "```\n", "The standard required input arguments are \n", "`old_counts`: The current array of counts in each bin \n", "`old_exposure`: The current array of exposure for each bin \n", "`old_edges`: The current array of bin edges \n", "\n", "Additionally, you can define any algorithm-specific arguments by replacing `*args` with your additional arguments. For the case of Bayesian Blocks, maybe this would be the P0 parameter. \n", "\n", "Inside the function block, you implement your algorithm, and the standard required output arguments are \n", "`new_counts`: The new array of counts in the rebinned data \n", "`new_exposure`: The new array of exposure for the rebinned data \n", "`new_edges`: The new array of bin edges for the binned data \n", "Essentially, the function should read in the old histogram data and provide updated values for the new histogram. As long as you design your function like this, you can use it to bin PHAII data and energy channels.\n", "\n", "### Unbinned Algorithms\n", "We can also develop a Bayesian-block algorithm for unbinned data. In this case, we would write our function like this:\n", "```python\n", "def bayesblock_bin(times, *args, **kwargs):\n", " some_result_here = some_functionality(*args, **kwargs)\n", " ...\n", " return bin_edges\n", "```\n", "The required input argument is \n", "`times`: The array of event times \n", "\n", "Again, you can include algorithm-specific arguments in place of `*args` and `**kwargs`. The required output is \n", "`bin_edges`: The bin edges \n", "So the algorithm should read in the event times and return the bin edges. A function designed like this can be used to bin TTE data.\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Background Fitting\n", "Similar to the binning algorithms, these are divided up into two types: algorithms for pre-binned data (in `gbm.background.binned`) and algorithms for unbinned data (in `gbm.background.unbinned`). However, background fitting requires more than just a simple function like data binning does. We have to define a class that contains certain required (and optional functionality). For many uses, the polynomial-fitting algorithm is used, although we include an unbinned fitter, `NaivePoisson` which is actually used in the [GBM subthreshold Targeted Search](https://arxiv.org/abs/1903.12597). Both types of algorithms interface with the `gbm.background.BackgroundFitter()` class, so as long as you follow the requirements, you can design and implement your own background fitting algorithm.\n", "\n", "### Pre-binned Background Fitting\n", "Perhaps we want to make a spline background fitter. This is how we would design such a class:\n", "```python\n", "class SplineFit():\n", " def __init__(self, counts, tstart, tstop, exposure):\n", " some_init_stuff\n", " \n", " def fit(self, **kwargs):\n", " the_fitting_algorithm\n", " \n", " def interpolate(self, tstart, tstop, **kwargs):\n", " some_interpolation_code\n", " return rates, rates_err\n", " \n", " # optional properties\n", " @property\n", " def statistic_name(self):\n", " return self.the_stat_name\n", " @property\n", " def statistic(self):\n", " return self.the_stat\n", " @property\n", " def dof(self):\n", " return self.the_dof\n", "```\n", "The class must be initialized with the following: \n", "`counts`: An array of counts in each time bin and energy channel; shape (numtimes, numchans) \n", "`tstart`: The time bin start edges \n", "`tstop`: The time bin end edges \n", "`exposure`: The exposure in each time bin \n", "Typically you should execute anything you need to do to *prepare* for a fit, but not actually do the fit.\n", "\n", "The class is required to have a `fit` method that performs the fit and is not required to output anything. Any algorithm-specific arguments can be specified in place of `**kwargs`. In short, the `fit` method is intended to allow someone to repeatedly perform a fit with different arguments without having to reinitialize the whole thing again (e.g. for Polynomial, can fit with `order=2`, then retry with `order=3`, etc).\n", "\n", "The last requirement is that it must have an `interpolate` method that has the following arguments: \n", "`tstart`, `tstop`: The time bin start/stop edges over which the fit is being interpolated. And it must return\n", "`rates`: The interpolated rates over each time bin and channel, shape (numtimes, numchans) \n", "`rates_err`: The corresponding rate uncertainties\n", "\n", "Optionally, you can provide it the `statistic_name`, `statistic`, and `dof` attributes that can be accessed by the `BackgroudFitter`\n", "\n", "### Unbinned Background Fitting\n", "As for unbinned background fitting, maybe we want a more complex fit than `NaivePoisson`; something that incorporates a prior on the presence of a signal:\n", "```python\n", "class BayesBackFit():\n", " def __init__(self, times):\n", " some_init_stuff\n", " \n", " def fit(self, **kwargs):\n", " the_fitting_algorithm\n", " \n", " def interpolate(self, tstart, tstop, **kwargs):\n", " some_interpolation_code\n", " return rates, rates_err\n", " \n", " # optional properties\n", " @property\n", " def statistic_name(self):\n", " return self.the_stat_name\n", " @property\n", " def statistic(self):\n", " return self.the_stat\n", " @property\n", " def dof(self):\n", " return self.the_dof\n", "```\n", "\n", "You'll notice that the requirements of the class are pretty similar to the requirements for the pre-binned version. The primary difference is that you initialize the class with \n", "`times`: A list of length numchans, each element containing the event times in that energy channel\n", "\n", "All the other rules in pre-binned section apply, other than the fact that you may not really need to use `tstart` and `tstop` in the ` interpolate()` method. If that's the case, then you can simply ignore the `tstop` argument in your implementation.\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Spectral Fitting Functions\n", "In this section and the rest of the following sections, instead of relying on an interface class to provide plugin-like capability, we use the concept of *inheritance*. That is, we can define a parent class that has a bunch of functionality in it, and then write short little child classes that *inherit* the functionality from the parent. That is how the spectral fitting functions work in `gbm.spectra.functions`.\n", "\n", "In this case, the `Function` class is our parent class, and we never directly instantiate that class. Instead, we want to write our own spectral function that can use all of the functionality built into `Function`. How do we do this? It's pretty easy, actually. Let's say we want to write our own power-law function (one is already included, but this is a good example).\n", "```python\n", "from gbm.spectra.functions import Function\n", "\n", "class powerlaw(Function):\n", " def eval(self, params, x):\n", " amp, index, pivot = params\n", " return amp*(x/pivot)**params\n", "```\n", "\n", "And that's it! This has two requirements: First you declare this is inheriting from `Function` in the class definition, and second you define an `eval` method that contains the function evaluation with arguments \n", "`params`: The list of parameters \n", "`x`: An array of 'x' values (i.e. energy) \n", "And it must return the function evaluation along `x`.\n", "\n", "That's the simplest way to define a function, but there are a lot of other attributes that can be specified so that you can fully leverage the function for spectral fitting. Here's what our class looks like if we define everything:\n", "```python\n", "class powerlaw(Function):\n", " # The total number of parameters\n", " nparams = 3\n", " # The list of parameters: (parameter name, units, description)\n", " param_list = [('A', 'ph/s/cm^2/keV', 'Amplitude'),\n", " ('index', '', 'Photon index'),\n", " ('Epiv', '', 'Pivot energy')]\n", " \n", " #-----------------------------------------------------------------------\n", " # The following attributes are used for fitting and parameter estimation\n", " \n", " # The default initialization values for each parameter\n", " default_values = [0.1, -1.0, 100.0]\n", " # The maximum absolute fitting step for each parameter \n", " delta_abs = [0.1, 0.1, 0.1]\n", " # The maximum relative fitting step for each parameter \n", " delta_rel = [0.01, 0.01, 0.01]\n", " # The minimum value that each parameter can take\n", " min_values = [1e-10, -20.0, 0.01]\n", " # The maximum value that each parameter can take \n", " max_values = [np.inf, np.inf, np.inf]\n", " # If True, then the parameter is a free parameter, otherwise it is fixed \n", " free = [True, True, False] \n", "\n", " def eval(self, params, x):\n", " amp, index, pivot = params\n", " return amp*(x/pivot)**params\n", "```\n", "\n", "So there is a lot of stuff here (although half of it is comments). Basically, you can give it metadata like the parameter names, units, and descriptions, and that can be printed out so you don't forget which parameter is which. There are also lists of parameter properties: default initialization values, the min/max allowable range for each parameter, the default step to shift each parameter when fitting, and whether each parameter is free to be fit or is fixed at the initialization value. If these aren't set when you write the function, the parent class provides defaults for all of these. You can update these after your function is initialized, such as the initialization values. Defining these properties will enable you to use your function to its fullest capability. You can evaluate, use in a fit with a wide range of fitting algorithms, integrate, and you can even include your function as one component in a multi-component model.\n", "\n", "The multi-component models are possible because there is special *extended* class of `Function` called `SuperFunction` that *extends* all of the functionality of `Function`, but can handle multiple functions. Again, you wouldn't directly call `SuperFunction`, but if you did something like this:\n", "```python\n", "my_model = powerlaw()+Comptonized()\n", "```\n", "it would create a SuperFunction that is a sum of the power law you just wrote and the Comptonized function. You can do a fit of that model, and then you can even calculate the contribution of the individual components to the total spectrum." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Spectral Fitter\n", "Currently the only type of spectral fitter we are providing is based on maximum likelihood estimation, although we may expand our options in the future. As for the `SpectralFitter` class, it provides general fitting capability once one has provided a likelihood to maximize. In our previous examples, we've used `SpectralFitterPgstat`, which is a version of `SpectralFitter` that uses PG-stat as its likelihood. In principle, all you need to do if you want to use a different likelihood is write a function that calculates that likelihood from a set of inputs, and pass that to the `SpectralFitter` initializer. In practice, you may need additional or different inputs to your likelihood function than `SpectralFitter` assumes, therefore it is much better to *inherit* `SpectralFitter` and design your class around your likelihood.\n", "\n", "If that is the route you take, you will, at a minimum, need to define an internal `_fold_model` method that takes the following inputs: \n", "`function`: The reference to your `Function` or `SuperFunction` that you're fitting \n", "`params`: The parameter list for your function \n", "And it must return the total log-likelihood. You may also need to define the `__init__` method if you need more than the standard arguments required by `SpectralFitter`.\n", "\n", "Examples can be found in `SpectralFitterPgstat` and `SpectralFitterChisq`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "## GBM Datatypes\n", "The GBM datatypes, like CTIME, CSPEC, TTE, and others we've covered can be extended or inherited to accommodate data from other instruments. For example, as long as your underlying data is similar to GBM (binned in time, binned in energy or unbinned in time, binned in energy), you could make an inherited class that has most, or all, of the functionality provided here. Perhaps you have prebinned data with FITS headers that don't have the same information as the GBM counterparts, or you have different extensions, etc. No matter. We can do this:\n", "```python\n", "from gbm.data.phaii import PHAII\n", "\n", "class MyNewDataType(PHAII):\n", " @classmethod\n", " def open(cls, filename):\n", " open_and_read_your_file\n", "```\n", "And there you have it! As long as your redefined `open` method reads the relevant data into a `TimeEnergyBins` object (see the `PHAII.open` method for details), you can use all of the basic functionality. Any additional functionality can be added in your class, and you can override the PHAII functionality you don't wish to support using python's `NotImplementedError`. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Saving the best for last, learn how to [make and customize pretty plots](./Visualizations.ipynb)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.9" } }, "nbformat": 4, "nbformat_minor": 2 }