Introduction to the Data Primitives

Within the GBM Data Tools, there are data primitive classes that handle the properties and manipulation of the fundamental types of data that GBM produces. Often you won’t need to directly worry about these classes and how they work if you are doing sufficiently high-level work using the rest of the data tools (e.g. plotting lightcurves/spectra, standard data binning, background/spectral fitting, etc.), however the API exposes a full array of properties and methods that allow you to work directly with the data primitives.

In general, there are three primary primitive types that describe GBM Data: Bins, TimeEnergyBins, and EventList. Specifically, Bins refers to 1D binned data (histograms), and there are two derived types with special additional properties: TimeBins and EnergyBins. While TimeEnergyBins is a class representing 2D binned data, where one axis is time and the other is energy. The final of these primitive types, EventList, is a class representing a list of events, where each event has some attributes, namely time and energy channel.

In addition to these primary primitives, there are two more additional “utility” primitives called TimeRange and GTI which represent a single span of time, and a series of disjoint spans of time, respectively.


TimeBins and EnergyBins

These classes are 1D histograms with special properties. Both can be instantiated with an array of counts (number of things in each bin), lo_edges and hi_edges (the edge boundaries for each bin), and exposure (the exposure in each bin).

[1]:
import numpy as np
from gbm.data.primitives import TimeBins, EnergyBins

# create a TimeBins object: 4 time bins, with exposure < binwidth due to some deadtime
time_counts = np.array([1,3,5,2])
time_lo_edges = np.array([1.0, 2.0, 3.0, 4.0])
time_hi_edges = np.array([2.0, 3.0, 4.0, 5.0])
time_exposure = np.array([0.99, 0.98, 0.97, 0.96])
timebins = TimeBins(time_counts, time_lo_edges, time_hi_edges, time_exposure)

# create an EnergyBins object: 4 energy bins, with the same exposure for each bin
energy_counts = np.array([1,3,5,2])
energy_lo_edges = np.array([10.0, 35.0, 100.0, 250.0])
energy_hi_edges = np.array([35.0, 100.0, 250.0, 600.0])
energy_exposure = np.array([9.67]*4)
energybins = EnergyBins(energy_counts, energy_lo_edges, energy_hi_edges, energy_exposure)

There are a variety of properties that can be retrieved from the objects once they’ve been created. For example, maybe you want to know what the rate (and rate uncertainty) is for each of your bins:

[2]:
timebins.rates, \
timebins.rate_uncertainty
[2]:
(array([1.01010101, 3.06122449, 5.15463918, 2.08333333]),
 array([1.01010101, 1.76739878, 2.30522472, 1.47313913]))

Or maybe you need the bin centers and full domain of the data because you want to make a plot:

[3]:
timebins.centroids, \
timebins.range
[3]:
(array([1.5, 2.5, 3.5, 4.5]), (1.0, 5.0))

Similarly, you can retrieve the same properties for EnergyBins, with some important differences. EnergyBins is defined to account for a spectrum that spans orders of magnitude, therefore the bin centroids are calculated as the geometric mean, instead of the arithmetic mean. Also, the rate is naturally defined as the differential rate (i.e. divided by the bin width).

[4]:
energybins.centroids, \
 energybins.rates
[4]:
(array([ 18.70828693,  59.16079783, 158.11388301, 387.29833462]),
 array([0.0041365 , 0.00477289, 0.00344709, 0.00059093]))

In addition to these properties (and more), these classes have some important methods. You can add multiple TimeBins or EnergyBins together to create a new class that is the sum (provided they cover the same time and energy range).

[5]:
timebins2 = TimeBins.sum([timebins, timebins])
timebins.counts, \
timebins2.counts
[5]:
(array([1, 3, 5, 2]), array([ 2.,  6., 10.,  4.]))
[6]:
energybins2 = EnergyBins.sum([energybins, energybins])
energybins.counts, \
energybins2.counts
[6]:
(array([1, 3, 5, 2]), array([ 2.,  6., 10.,  4.]))

You can merge multiple TimeBins or EnergyBins together into one (even if they are overlapping)

[7]:
new_timebins = TimeBins(np.array([10., 20.0, 3.0]), np.array([8.0, 9.0, 10.0]), np.array([9.0, 10.0, 11.0]), np.array([0.97, 0.96, 0.97]))
merged_timebins = TimeBins.merge([timebins, new_timebins])
merged_timebins.centroids, \
merged_timebins.counts
[7]:
(array([ 1.5,  2.5,  3.5,  4.5,  8.5,  9.5, 10.5]),
 array([ 1.,  3.,  5.,  2., 10., 20.,  3.]))

And because data can have gaps in it (and that can complicate analysis if not properly handled), these classes have the ability to split into a list of contiguous bins:

[8]:
split_timebins = merged_timebins.contiguous_bins()
[(tb.centroids, tb.counts) for tb in split_timebins]
[8]:
[(array([1.5, 2.5, 3.5, 4.5]), array([1., 3., 5., 2.])),
 (array([ 8.5,  9.5, 10.5]), array([10., 20.,  3.]))]

Maybe we don’t want all of our data for an analysis, but only a snippet. We can make a slice:

[9]:
# slice energy bins so that we only have data in the 50-200 keV range, inclusive
slice_energybins = energybins.slice(50.0, 200.0)
slice_energybins.range
[9]:
(35.0, 250.0)

And finally we can rebin the data if we want, using some type of binning algorithm:

[10]:
from gbm.binning.binned import combine_by_factor
rebinned_energybins = energybins.rebin(combine_by_factor, 2) # rebin by a factor of 2
rebinned_energybins.centroids, \
rebinned_energybins.counts
[10]:
(array([ 31.6227766 , 244.94897428]), array([4, 7]))

TimeEnergyBins

This class is a 2D representation of data that have 2 axes: time and energy. To successfully create such an object, you need to provide a 2D counts array (the number of counts in each time bin for each energy channel), tstart and tstop, the edges for the time bins, exposure the time exposure in each bin, and emin and emax, the edges for the energy bins. In general, many of the same properties and methods for TimeBins and EnergyBins are available here, keeping in mind that there are different methods for each axis.

[11]:
from gbm.data.primitives import TimeEnergyBins

counts = np.array([[1,10], [3, 20], [5,3], [2,7]]) # 4 time bins, 2 energy channels
tstart = np.array([1.0, 2.0, 3.0, 4.0])
tstop = np.array([2.0, 3.0, 4.0, 5.0])
exposure = np.array([0.99, 0.98, 0.97, 0.96])
emin = np.array([35.0, 150.0])
emax = np.array([150.0, 350.0])

bins = TimeEnergyBins(counts, tstart, tstop, exposure, emin, emax)

# how many times, how many channels
bins.numtimes, \
bins.numchans
[11]:
(4, 2)

Similar to the 1D counterparts, we can access various properties like the bin centroids:

[12]:
bins.time_centroids, \
bins.energy_centroids
[12]:
(array([1.5, 2.5, 3.5, 4.5]), array([ 72.45688373, 229.12878475]))

Or the bin widths:

[13]:
bins.time_widths, \
bins.chan_widths
[13]:
(array([1., 1., 1., 1.]), array([115., 200.]))

And we can immediately return the rates and uncertainty for each bin:

[14]:
print(bins.rates)
print(bins.rate_uncertainty)
[[ 1.01010101 10.1010101 ]
 [ 3.06122449 20.40816327]
 [ 5.15463918  3.09278351]
 [ 2.08333333  7.29166667]]
[[1.01010101 3.19421986]
 [1.76739878 4.56340404]
 [2.30522472 1.78561939]
 [1.47313913 2.75599095]]

As for the available methods, you can perform all of following operations similar to the 1D counterparts: * Retrieve a list of contiguous bins (contiguous_time_bins() and contiguous_energy_bins()) * Perform a slice (slice_time() and slice_energy()) * Rebin (rebin_time() and rebin_energy()) * Merge multiple TimeEnergyBins across one axis (merge_time() and merge_energy())

In addition to these methods, you can also integrate along one axis of TimeEnergyBins to return the corresponding TimeBins or EnergyBins object:

[15]:
# integrate over energy
energy_integrated = bins.integrate_energy()
type(energy_integrated), \
energy_integrated.counts
[15]:
(gbm.data.primitives.TimeBins, array([11, 23,  8,  9]))
[16]:
# integrate over time
time_integrated = bins.integrate_time()
type(time_integrated), \
time_integrated.counts
[16]:
(gbm.data.primitives.EnergyBins, array([11, 40]))

EventList

Unlike the previous primitives, EventList is design to work with semi-unbinned data. Generally EventList is a list of events with attributes; specifically, it is a list of event arrival times with a PHA channel attribute. The semi-unbinned qualification comes from the fact that the time axis of the data is not binned (down to the precision of the instrument being able to measure the individual arrival times of photons), but the energy axis is necessarily binned into energy channels. For this reason, EventList not only contains a list of times, but has some metadata that translates the PHA channel attribute to an energy calibration.

Because of the metadata requirement, there are a couple of ways to create an EventList object. One can instantiate with the relevant arrays in numpy.recarray or astropy.io.fits.fitsrec.FITS_rec format (e.g. from reading one of the GBM TTE data files). Or maybe we want to create an EventList from a simulation. Then we would use EventList.from_lists() which requires a list of arrival times, a list of PHA channels, and lists of the lower and upper edges of the energy channel calibration.

[17]:
from gbm.data.primitives import EventList
# 7 events
times = [1.0, 1.5, 2.7, 3.2, 4.9, 8.8, 8.7]
phas = [3, 1, 0, 2, 1, 3, 0]

# 4 channels
emin = [10.0, 35.0, 100.0, 250.0]
emax = [35.0, 100.0, 250.0, 600.0]

eventlist = EventList.from_lists(times, phas, emin, emax)
eventlist.size, \
eventlist.time_range, \
eventlist.energy_range
[17]:
(7, (1.0, 8.8), (10.0, 600.0))

One of the nice properties of EventList is that you can immediately retrieve the count spectrum as an EnergyBins object:

[18]:
spectrum = eventlist.count_spectrum
type(spectrum), \
spectrum.counts
[18]:
(gbm.data.primitives.EnergyBins, array([2, 2, 1, 2]))

As with the binned datatypes, you can slice in time and energy, which returns a new EventList containing the slice:

[19]:
time_sliced = eventlist.time_slice(2.0, 5.0) # return events with 2-5 s
chan_sliced = eventlist.channel_slice(1, 2) # return events in channels 1 & 2
energy_sliced = eventlist.energy_slice(10.0, 100.0) # return events in energy channels spanning 10-100 keV

time_sliced.size, \
chan_sliced.size, \
energy_sliced.size
[19]:
(3, 3, 4)

As with TimeEnergyBins, you can rebin the EventList in energy using rebin_energy(). You can also bin in time, given a binning function, which returns a TimeEnergyBins object:

[20]:
from gbm.binning.unbinned import bin_by_time

bins = eventlist.bin(bin_by_time, 1.0) # bins the data into 1.0 s wide bins
type(bins), \
bins.counts
[20]:
(gbm.data.primitives.TimeEnergyBins,
 array([[0., 1., 0., 1.],
        [1., 0., 0., 0.],
        [0., 0., 1., 0.],
        [0., 1., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [1., 0., 0., 1.]]))

You can also retrieve the exposure for all or part of the EventList. By default, this assumes the deadtime profile for GBM, but can be modified through kwargs:

[21]:
eventlist.get_exposure(), \
eventlist.get_exposure(time_ranges=(1.0, 5.0))
[21]:
(7.7999670000000005, 3.9999796)

Finally, multiple EventLists can be merged into a single EventList. If there is a chance that the EventLists may contain duplicate events, you may want to set the force_unique keyword.

[22]:
merged_eventlist = EventList.merge([time_sliced, chan_sliced], force_unique=True)
merged_eventlist.size
[22]:
4

TimeRange and GTI

These two primitives are designed to represent time ranges (surprise!) and series of time ranges. The TimeRange class provides a simple class defining a time range and associated methods. GTI (Good Time Intervals) can contain many TimeRanges.

Let’s create a TimeRange:

[23]:
from gbm.data.primitives import TimeRange, GTI

# a time range from 0 s to 10 s
timerange1 = TimeRange(0.0, 10.0)

# time range duration and center
timerange1.duration, \
timerange1.center
[23]:
(10.0, 5.0)

You can check to see if it contains a time of interest:

[24]:
timerange1.contains(3.5), \
timerange1.contains(-1.0)
[24]:
(True, False)

You can also do simple time range “arithmetic”:

[25]:
timerange2 = TimeRange(4.8, 22.0)
time_union = TimeRange.union(timerange1, timerange2)
time_intersect = TimeRange.intersection(timerange1, timerange2)

print(time_union)
print(time_intersect)
(0.0, 22.0)
(4.8, 10.0)

As for GTI, you can instantiate it a few different ways. Let’s start with just one range:

[26]:
gti1 = GTI(*timerange1.as_tuple())

# number of contiguous intervals, full range
gti1.num_intervals, \
gti1.range
[26]:
(1, (0.0, 10.0))

And now we can insert a new time range into our GTI:

[27]:
gti1.insert(23.0, 55.0) # insert new interval from 23 s to 55 s
gti1.num_intervals, \
gti1.range
[27]:
(2, (0.0, 55.0))

And we can merge two GTI into one:

[28]:
# create another GTI from a list of (tstart, tstop) tuples
gti2 = GTI.from_list([(100.0, 200.0), (250.0, 275.0), (507.0, 508.)])

gti_merged = GTI.merge(gti1, gti2)
gti_merged.num_intervals, \
gti_merged.range
[28]:
(5, (0.0, 508.0))

As with the simple TimeRange, we can check to see if a time of interest is contained in the GTI:

[29]:
gti_merged.contains(17.0), \
gti_merged.contains(170.0)
[29]:
(False, True)

And if you want to see the full GTI intervals:

[30]:
gti_merged.as_list()
[30]:
[(0.0, 10.0), (23.0, 55.0), (100.0, 200.0), (250.0, 275.0), (507.0, 508.0)]

Hopefully your eyes haven’t glazed over yet. If not, you’ve survived the deepest dive into code! Onward to the plugin architecture and capabilities!