{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction to the Data Primitives\n", "\n", "Within the GBM Data Tools, there are data primitive classes that handle the properties and manipulation of the fundamental types of data that GBM produces. Often you won't need to directly worry about these classes and how they work if you are doing sufficiently high-level work using the rest of the data tools (e.g. plotting lightcurves/spectra, standard data binning, background/spectral fitting, etc.), however the API exposes a full array of properties and methods that allow you to work directly with the data primitives.\n", "\n", "In general, there are three primary primitive types that describe GBM Data: `Bins`, `TimeEnergyBins`, and `EventList`.\n", "Specifically, `Bins` refers to 1D binned data (histograms), and there are two derived types with special additional properties: `TimeBins` and `EnergyBins`. While `TimeEnergyBins` is a class representing 2D binned data, where one axis is time and the other is energy. The final of these primitive types, `EventList`, is a class representing a list of events, where each event has some attributes, namely time and energy channel.\n", "\n", "In addition to these primary primitives, there are two more additional \"utility\" primitives called `TimeRange` and `GTI` which represent a single span of time, and a series of disjoint spans of time, respectively.\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## TimeBins and EnergyBins\n", "\n", "These classes are 1D histograms with special properties. Both can be instantiated with an array of `counts` (number of things in each bin), `lo_edges` and `hi_edges` (the edge boundaries for each bin), and `exposure` (the exposure in each bin). " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from gbm.data.primitives import TimeBins, EnergyBins\n", "\n", "# create a TimeBins object: 4 time bins, with exposure < binwidth due to some deadtime\n", "time_counts = np.array([1,3,5,2])\n", "time_lo_edges = np.array([1.0, 2.0, 3.0, 4.0])\n", "time_hi_edges = np.array([2.0, 3.0, 4.0, 5.0])\n", "time_exposure = np.array([0.99, 0.98, 0.97, 0.96])\n", "timebins = TimeBins(time_counts, time_lo_edges, time_hi_edges, time_exposure)\n", "\n", "# create an EnergyBins object: 4 energy bins, with the same exposure for each bin\n", "energy_counts = np.array([1,3,5,2])\n", "energy_lo_edges = np.array([10.0, 35.0, 100.0, 250.0])\n", "energy_hi_edges = np.array([35.0, 100.0, 250.0, 600.0])\n", "energy_exposure = np.array([9.67]*4)\n", "energybins = EnergyBins(energy_counts, energy_lo_edges, energy_hi_edges, energy_exposure)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are a variety of properties that can be retrieved from the objects once they've been created. For example, maybe you want to know what the rate (and rate uncertainty) is for each of your bins:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(array([1.01010101, 3.06122449, 5.15463918, 2.08333333]),\n", " array([1.01010101, 1.76739878, 2.30522472, 1.47313913]))" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "timebins.rates, \\\n", "timebins.rate_uncertainty" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or maybe you need the bin centers and full domain of the data because you want to make a plot:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(array([1.5, 2.5, 3.5, 4.5]), (1.0, 5.0))" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "timebins.centroids, \\\n", "timebins.range" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similarly, you can retrieve the same properties for `EnergyBins`, with some important differences. `EnergyBins` is defined to account for a spectrum that spans orders of magnitude, therefore the bin centroids are calculated as the geometric mean, instead of the arithmetic mean. Also, the rate is naturally defined as the differential rate (i.e. divided by the bin width)." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(array([ 18.70828693, 59.16079783, 158.11388301, 387.29833462]),\n", " array([0.0041365 , 0.00477289, 0.00344709, 0.00059093]))" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "energybins.centroids, \\\n", " energybins.rates" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In addition to these properties (and more), these classes have some important methods. You can add multiple `TimeBins` or `EnergyBins` together to create a new class that is the sum (provided they cover the same time and energy range)." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(array([1, 3, 5, 2]), array([ 2., 6., 10., 4.]))" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "timebins2 = TimeBins.sum([timebins, timebins])\n", "timebins.counts, \\\n", "timebins2.counts" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(array([1, 3, 5, 2]), array([ 2., 6., 10., 4.]))" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "energybins2 = EnergyBins.sum([energybins, energybins])\n", "energybins.counts, \\\n", "energybins2.counts" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can merge multiple `TimeBins` or `EnergyBins` together into one (even if they are overlapping)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(array([ 1.5, 2.5, 3.5, 4.5, 8.5, 9.5, 10.5]),\n", " array([ 1., 3., 5., 2., 10., 20., 3.]))" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "new_timebins = TimeBins(np.array([10., 20.0, 3.0]), np.array([8.0, 9.0, 10.0]), np.array([9.0, 10.0, 11.0]), np.array([0.97, 0.96, 0.97]))\n", "merged_timebins = TimeBins.merge([timebins, new_timebins])\n", "merged_timebins.centroids, \\\n", "merged_timebins.counts" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And because data can have gaps in it (and that can complicate analysis if not properly handled), these classes have the ability to split into a list of contiguous bins:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[(array([1.5, 2.5, 3.5, 4.5]), array([1., 3., 5., 2.])),\n", " (array([ 8.5, 9.5, 10.5]), array([10., 20., 3.]))]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "split_timebins = merged_timebins.contiguous_bins()\n", "[(tb.centroids, tb.counts) for tb in split_timebins]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Maybe we don't want all of our data for an analysis, but only a snippet. We can make a slice:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(35.0, 250.0)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# slice energy bins so that we only have data in the 50-200 keV range, inclusive\n", "slice_energybins = energybins.slice(50.0, 200.0)\n", "slice_energybins.range" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And finally we can rebin the data if we want, using some type of binning algorithm:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(array([ 31.6227766 , 244.94897428]), array([4, 7]))" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from gbm.binning.binned import combine_by_factor\n", "rebinned_energybins = energybins.rebin(combine_by_factor, 2) # rebin by a factor of 2\n", "rebinned_energybins.centroids, \\\n", "rebinned_energybins.counts" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "## TimeEnergyBins\n", "This class is a 2D representation of data that have 2 axes: time and energy. To successfully create such an object, you need to provide a 2D `counts` array (the number of counts in each time bin for each energy channel), `tstart` and `tstop`, the edges for the time bins, `exposure` the time exposure in each bin, and `emin` and `emax`, the edges for the energy bins. In general, many of the same properties and methods for `TimeBins` and `EnergyBins` are available here, keeping in mind that there are different methods for each axis." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(4, 2)" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from gbm.data.primitives import TimeEnergyBins\n", "\n", "counts = np.array([[1,10], [3, 20], [5,3], [2,7]]) # 4 time bins, 2 energy channels\n", "tstart = np.array([1.0, 2.0, 3.0, 4.0])\n", "tstop = np.array([2.0, 3.0, 4.0, 5.0])\n", "exposure = np.array([0.99, 0.98, 0.97, 0.96])\n", "emin = np.array([35.0, 150.0])\n", "emax = np.array([150.0, 350.0])\n", "\n", "bins = TimeEnergyBins(counts, tstart, tstop, exposure, emin, emax)\n", "\n", "# how many times, how many channels\n", "bins.numtimes, \\\n", "bins.numchans" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similar to the 1D counterparts, we can access various properties like the bin centroids:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(array([1.5, 2.5, 3.5, 4.5]), array([ 72.45688373, 229.12878475]))" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bins.time_centroids, \\\n", "bins.energy_centroids" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or the bin widths:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(array([1., 1., 1., 1.]), array([115., 200.]))" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bins.time_widths, \\\n", "bins.chan_widths" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And we can immediately return the rates and uncertainty for each bin:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 1.01010101 10.1010101 ]\n", " [ 3.06122449 20.40816327]\n", " [ 5.15463918 3.09278351]\n", " [ 2.08333333 7.29166667]]\n", "[[1.01010101 3.19421986]\n", " [1.76739878 4.56340404]\n", " [2.30522472 1.78561939]\n", " [1.47313913 2.75599095]]\n" ] } ], "source": [ "print(bins.rates)\n", "print(bins.rate_uncertainty)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As for the available methods, you can perform all of following operations similar to the 1D counterparts:\n", "* Retrieve a list of contiguous bins (`contiguous_time_bins()` and `contiguous_energy_bins()`)\n", "* Perform a slice (`slice_time()` and `slice_energy()`)\n", "* Rebin (`rebin_time()` and `rebin_energy()`)\n", "* Merge multiple `TimeEnergyBins` across one axis (`merge_time()` and `merge_energy()`)\n", "\n", "In addition to these methods, you can also integrate along one axis of `TimeEnergyBins` to return the corresponding `TimeBins` or `EnergyBins` object:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(gbm.data.primitives.TimeBins, array([11, 23, 8, 9]))" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# integrate over energy\n", "energy_integrated = bins.integrate_energy()\n", "type(energy_integrated), \\\n", "energy_integrated.counts" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(gbm.data.primitives.EnergyBins, array([11, 40]))" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# integrate over time\n", "time_integrated = bins.integrate_time()\n", "type(time_integrated), \\\n", "time_integrated.counts" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "## EventList\n", "Unlike the previous primitives, `EventList` is design to work with *semi-unbinned* data. Generally `EventList` is a list of events with attributes; specifically, it is a list of event arrival times with a PHA channel attribute. The *semi-unbinned* qualification comes from the fact that the time axis of the data is not binned (down to the precision of the instrument being able to measure the individual arrival times of photons), but the energy axis *is necessarily* binned into energy channels. For this reason, `EventList` not only contains a list of times, but has some metadata that translates the PHA channel attribute to an energy calibration.\n", "\n", "Because of the metadata requirement, there are a couple of ways to create an `EventList` object. One can instantiate with the relevant arrays in numpy.recarray or astropy.io.fits.fitsrec.FITS_rec format (e.g. from reading one of the GBM TTE data files). Or maybe we want to *create* an EventList from a simulation. Then we would use `EventList.from_lists()` which requires a list of arrival times, a list of PHA channels, and lists of the lower and upper edges of the energy channel calibration." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(7, (1.0, 8.8), (10.0, 600.0))" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from gbm.data.primitives import EventList\n", "# 7 events\n", "times = [1.0, 1.5, 2.7, 3.2, 4.9, 8.8, 8.7]\n", "phas = [3, 1, 0, 2, 1, 3, 0]\n", "\n", "# 4 channels\n", "emin = [10.0, 35.0, 100.0, 250.0]\n", "emax = [35.0, 100.0, 250.0, 600.0]\n", "\n", "eventlist = EventList.from_lists(times, phas, emin, emax)\n", "eventlist.size, \\\n", "eventlist.time_range, \\\n", "eventlist.energy_range" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One of the nice properties of `EventList` is that you can immediately retrieve the count spectrum as an `EnergyBins` object:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(gbm.data.primitives.EnergyBins, array([2, 2, 1, 2]))" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "spectrum = eventlist.count_spectrum\n", "type(spectrum), \\\n", "spectrum.counts" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As with the binned datatypes, you can slice in time and energy, which returns a new `EventList` containing the slice:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(3, 3, 4)" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "time_sliced = eventlist.time_slice(2.0, 5.0) # return events with 2-5 s\n", "chan_sliced = eventlist.channel_slice(1, 2) # return events in channels 1 & 2\n", "energy_sliced = eventlist.energy_slice(10.0, 100.0) # return events in energy channels spanning 10-100 keV\n", "\n", "time_sliced.size, \\\n", "chan_sliced.size, \\\n", "energy_sliced.size" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As with `TimeEnergyBins`, you can rebin the `EventList` in energy using `rebin_energy()`. You can also *bin* in time, given a binning function, which returns a `TimeEnergyBins` object:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(gbm.data.primitives.TimeEnergyBins,\n", " array([[0., 1., 0., 1.],\n", " [1., 0., 0., 0.],\n", " [0., 0., 1., 0.],\n", " [0., 1., 0., 0.],\n", " [0., 0., 0., 0.],\n", " [0., 0., 0., 0.],\n", " [0., 0., 0., 0.],\n", " [1., 0., 0., 1.]]))" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from gbm.binning.unbinned import bin_by_time\n", "\n", "bins = eventlist.bin(bin_by_time, 1.0) # bins the data into 1.0 s wide bins\n", "type(bins), \\\n", "bins.counts" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also retrieve the exposure for all or part of the EventList. By default, this assumes the deadtime profile for GBM, but can be modified through kwargs:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(7.7999670000000005, 3.9999796)" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "eventlist.get_exposure(), \\\n", "eventlist.get_exposure(time_ranges=(1.0, 5.0))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, multiple EventLists can be merged into a single `EventList`. If there is a chance that the EventLists may contain duplicate events, you may want to set the *force_unique* keyword." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "merged_eventlist = EventList.merge([time_sliced, chan_sliced], force_unique=True)\n", "merged_eventlist.size" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "## TimeRange and GTI\n", "\n", "These two primitives are designed to represent time ranges (surprise!) and series of time ranges. The `TimeRange` class provides a simple class defining a time range and associated methods. `GTI` (Good Time Intervals) can contain many `TimeRanges`.\n", "\n", "Let's create a `TimeRange`:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(10.0, 5.0)" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from gbm.data.primitives import TimeRange, GTI\n", "\n", "# a time range from 0 s to 10 s\n", "timerange1 = TimeRange(0.0, 10.0)\n", "\n", "# time range duration and center\n", "timerange1.duration, \\\n", "timerange1.center" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can check to see if it contains a time of interest:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(True, False)" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "timerange1.contains(3.5), \\\n", "timerange1.contains(-1.0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also do simple time range \"arithmetic\":" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(0.0, 22.0)\n", "(4.8, 10.0)\n" ] } ], "source": [ "timerange2 = TimeRange(4.8, 22.0)\n", "time_union = TimeRange.union(timerange1, timerange2)\n", "time_intersect = TimeRange.intersection(timerange1, timerange2)\n", "\n", "print(time_union)\n", "print(time_intersect)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As for `GTI`, you can instantiate it a few different ways. Let's start with just one range:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1, (0.0, 10.0))" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gti1 = GTI(*timerange1.as_tuple())\n", "\n", "# number of contiguous intervals, full range\n", "gti1.num_intervals, \\\n", "gti1.range" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And now we can insert a new time range into our `GTI`:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(2, (0.0, 55.0))" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gti1.insert(23.0, 55.0) # insert new interval from 23 s to 55 s\n", "gti1.num_intervals, \\\n", "gti1.range" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And we can merge two `GTI` into one:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(5, (0.0, 508.0))" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# create another GTI from a list of (tstart, tstop) tuples\n", "gti2 = GTI.from_list([(100.0, 200.0), (250.0, 275.0), (507.0, 508.)])\n", "\n", "gti_merged = GTI.merge(gti1, gti2)\n", "gti_merged.num_intervals, \\\n", "gti_merged.range" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As with the simple `TimeRange`, we can check to see if a time of interest is contained in the `GTI`:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(False, True)" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gti_merged.contains(17.0), \\\n", "gti_merged.contains(170.0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And if you want to see the full `GTI` intervals:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[(0.0, 10.0), (23.0, 55.0), (100.0, 200.0), (250.0, 275.0), (507.0, 508.0)]" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gti_merged.as_list()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "Hopefully your eyes haven't glazed over yet. If not, you've survived the deepest dive into code! Onward to the [plugin architecture and capabilities](./Plugins.ipynb)!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.9" } }, "nbformat": 4, "nbformat_minor": 2 }