Data Preparation¶

range_driver.data_prep.add_rt_group_info(events_df, metadata)[source]¶: Calls get_all_group_info() for events_df and merges the info into metadata.rt_groups

range_driver.data_prep.calc_intervals(grdf, field='datetime')[source]¶: Calculate time intervals between adjacent detection events

range_driver.data_prep.calc_station_dists_m(deploy_lat_lon)[source]¶

Calculate geodesic distances between stations.

Parameters: deploy_lat_lon (pandas.DataFrame) – DataFrame with station latitude and longitude columns (in that order).
Returns: A dataframe containing the distance (in meters) between each pair of stations.
Return type: pandas.DataFrame

range_driver.data_prep.clean_nsog_raw_detections(df_detections_raw, dates=True, rt_ids=True, select_cols=True)[source]¶

range_driver.data_prep.clean_raw_detections(df_detections_raw, dates=True, rt_ids=True, select_cols=True)[source]¶

range_driver.data_prep.detection_rate_grid(detection_df, time_bin_length, metadata, auto_dr=False)[source]¶

Group detections into timestamp bins and analyze the detections on a group-level. Append aggregated bin data to the end of the detections DF.

Parameters

detection_df (pandas.DataFrame) – Dataframe containing the detection events
time_bin_length (str) – A string that can be coerced/converted into a pandas.Timedelta object (e.g. “60Min”, “1day”)
metadata (sklearn.utils.Bunch) – Metadata associated with the detection events
auto_dr (bool) – Automatically estimate tag rate programming

Returns

detection_df (pandas.DataFrame) - DataFrame containing the detection events, grouped into timestamp bins. New columns have been added that include the detection rate and counts for that bin. New rows have been added, containingaggregated data for the timestamp bins.
event_bin_split (int) - The row number of the first new row.

range_driver.data_prep.dist_str_th(dist, dist_th)[source]¶

Determine ‘F’ar, ‘N’ear, or ‘U’nknown string for distances above or below dist_th

Args:: dist (float): distance in meters dist_th (float): distance threshold
Returns:: Short string or single char that classifies distance

range_driver.data_prep.dr_estimate_and_cutoff(drs)[source]¶

Estimate unknown detection rate and determine init sequence cutoff point

Parameters

drs – Detection rate sequence over fixed time intervals

Returns

dr_max - maximum detection rate for majority of time
cutoff - date of first valid detection bin
cutoff_loc - raw index of first valid detection bin

range_driver.data_prep.estimate_det_max(drs)[source]¶

Estimate maximum detection rate over majority of time ignoring higher detection rate during init sequence

Use this in case the true tag interval length programming is unknown.

range_driver.data_prep.get_all_group_info(detections_df, metadata)[source]¶: Create a dataframe with group_info() for each group in detections_df

range_driver.data_prep.group_info(grdf=None, metadata=None)[source]¶

Information about a Receiver/Transmitter group

Args:: grdf - DataFrame with detections and ‘interval’ column or None metadata - metadata dict as returned by read_otn_metadata
Returns:: info tuple about r/t group OR tuple element names, if grdf is None

range_driver.data_prep.make_column(tdf, column_name)[source]¶: Modify DataFrame tdf to add variable with name column tdf remains unchanged if column is already available.

range_driver.data_prep.old_make_detection_rate(tdfok, exp_interval_s=300, num_time_bins=200)[source]¶: calculate detection rate

range_driver.data_prep.old_process_detections(gr, params)[source]¶

Add interval length to dataframe

Returns:: tdfok - processed detections with detection rate cutoff_t - interval threshold determined to remove invalid lead pings tdf - full dataframe with detection interval calculations

range_driver.data_prep.old_resample_detection_rate(tdfok, det_rate)[source]¶: Add detection rate to full dataframe

range_driver.data_prep.process_detections(ev_df, params)[source]¶: Perform some computations on the detection event dataframe

range_driver.data_prep.process_intervals(detection_df, metadata)[source]¶

Calculate detection interval lengths and split out init sequences

Parameters

detection_df (pd.DataFrame) – Cleaned detection data as returned by py:clean_raw_detections() or read_otn_data()
metadata (dict) – Metadata dictionary as returned by read_otn_data(). New member rt_groups will be added to metadata.

Returns

df_dets (pandas.DataFrame) - DataFrame containing the detection events.
df_inits (pandas.DataFrame) - DataFrame containing the detections from the short-interval initial sequence
rt_groups (pandas.DataFrame) - DataFrame containing the summary of receiver/transmitter groups

range_driver.data_prep.read_nsog_data(detections_csv, vendor_tag_specs=None, merge=False, bunch=False)[source]¶

All in one function to read NSOG data

Args:: detections_csv - csv file name for detection data otn_metadata - xls file name for OTN-style metadata vendor_tag_specs - xls file name for vendor extracted metadata merge - bool whether to merge metadata into detections bunch - bool whether to return metadata as Bunch dict
Returns:: df_detections, df_deploy_meta, transmitter - if bunch == False, else df_detections, metadata - where metadata is a dict with metadata

range_driver.data_prep.read_otn_data(detections_csv, otn_metadata=None, vendor_tag_specs=None, merge=True, bunch=False)[source]¶

All in one function to read OTN data

Args:: detections_csv - csv file name for detection data otn_metadata - xls file name for OTN-style metadata vendor_tag_specs - xls file name for vendor extracted metadata merge - bool whether to merge metadata into detections bunch - bool whether to return metadata as Bunch dict
Returns:: df_detections, df_deploy_meta, transmitter - if bunch == False, else df_decetions, metadata - where metadata is a dict with metadata

range_driver.data_prep.read_via_config(config)[source]¶

Invoke configured data loading & processing.

Parameters

config (sklearn.utils.Bunch) – A Bunch dictionary containing the configuration parameters for data loading & pre-processing. Created via yload() of the YAML config file.

Returns

detection_df (pandas.DataFrame) - DataFrame containing the detection events.
mdb (sklearn.utils.Bunch) - Metadata associated with the detection events.

range_driver.data_prep.rt_name(grdf, metadata, dist_str=None)[source]¶

Construct a name for given receiver/tag combination

Args:: grdf: dataframe with ‘Transmitter’, ‘Tag Family’, and ‘Power’ columns metadata: bunch dict with ‘transmitter’ dist_str (optional): function handle that returns a string representation when given a distance in m
Returns:: String that combines info, such as tag/receiver names, power-level, and distance

range_driver.data_prep.unpack_column_name(column_name)[source]¶: Returns unpacked tuple of 1. column name in DF and 2. full column name for reporting. If column_name is a tuple, then just upack. Otherwise, use column_name string for both dataframe column id and print name.