Data Preparation¶
- range_driver.data_prep.add_rt_group_info(events_df, metadata)[source]¶
Calls get_all_group_info() for events_df and merges the info into metadata.rt_groups
- range_driver.data_prep.calc_intervals(grdf, field='datetime')[source]¶
Calculate time intervals between adjacent detection events
- range_driver.data_prep.calc_station_dists_m(deploy_lat_lon)[source]¶
Calculate geodesic distances between stations.
- Parameters
deploy_lat_lon (pandas.DataFrame) – DataFrame with station latitude and longitude columns (in that order).
- Returns
A dataframe containing the distance (in meters) between each pair of stations.
- Return type
pandas.DataFrame
- range_driver.data_prep.clean_nsog_raw_detections(df_detections_raw, dates=True, rt_ids=True, select_cols=True)[source]¶
- range_driver.data_prep.clean_raw_detections(df_detections_raw, dates=True, rt_ids=True, select_cols=True)[source]¶
- range_driver.data_prep.detection_rate_grid(detection_df, time_bin_length, metadata, auto_dr=False)[source]¶
Group detections into timestamp bins and analyze the detections on a group-level. Append aggregated bin data to the end of the detections DF.
- Parameters
detection_df (pandas.DataFrame) – Dataframe containing the detection events
time_bin_length (str) – A string that can be coerced/converted into a pandas.Timedelta object (e.g. “60Min”, “1day”)
metadata (sklearn.utils.Bunch) – Metadata associated with the detection events
auto_dr (bool) – Automatically estimate tag rate programming
- Returns
detection_df (pandas.DataFrame) - DataFrame containing the detection events, grouped into timestamp bins. New columns have been added that include the detection rate and counts for that bin. New rows have been added, containingaggregated data for the timestamp bins.
event_bin_split (int) - The row number of the first new row.
- range_driver.data_prep.dist_str_th(dist, dist_th)[source]¶
Determine ‘F’ar, ‘N’ear, or ‘U’nknown string for distances above or below dist_th
- Args:
dist (float): distance in meters dist_th (float): distance threshold
- Returns:
Short string or single char that classifies distance
- range_driver.data_prep.dr_estimate_and_cutoff(drs)[source]¶
Estimate unknown detection rate and determine init sequence cutoff point
- Parameters
drs – Detection rate sequence over fixed time intervals
- Returns
dr_max - maximum detection rate for majority of time
cutoff - date of first valid detection bin
cutoff_loc - raw index of first valid detection bin
- range_driver.data_prep.estimate_det_max(drs)[source]¶
Estimate maximum detection rate over majority of time ignoring higher detection rate during init sequence
Use this in case the true tag interval length programming is unknown.
- range_driver.data_prep.get_all_group_info(detections_df, metadata)[source]¶
Create a dataframe with group_info() for each group in detections_df
- range_driver.data_prep.group_info(grdf=None, metadata=None)[source]¶
Information about a Receiver/Transmitter group
- Args:
grdf - DataFrame with detections and ‘interval’ column or None metadata - metadata dict as returned by read_otn_metadata
- Returns:
info tuple about r/t group OR tuple element names, if grdf is None
- range_driver.data_prep.make_column(tdf, column_name)[source]¶
Modify DataFrame tdf to add variable with name column tdf remains unchanged if column is already available.
- range_driver.data_prep.old_make_detection_rate(tdfok, exp_interval_s=300, num_time_bins=200)[source]¶
calculate detection rate
- range_driver.data_prep.old_process_detections(gr, params)[source]¶
Add interval length to dataframe
- Returns:
tdfok - processed detections with detection rate cutoff_t - interval threshold determined to remove invalid lead pings tdf - full dataframe with detection interval calculations
- range_driver.data_prep.old_resample_detection_rate(tdfok, det_rate)[source]¶
Add detection rate to full dataframe
- range_driver.data_prep.process_detections(ev_df, params)[source]¶
Perform some computations on the detection event dataframe
- range_driver.data_prep.process_intervals(detection_df, metadata)[source]¶
Calculate detection interval lengths and split out init sequences
- Parameters
detection_df (pd.DataFrame) – Cleaned detection data as returned by py:clean_raw_detections() or read_otn_data()
metadata (dict) – Metadata dictionary as returned by read_otn_data(). New member rt_groups will be added to metadata.
- Returns
df_dets (pandas.DataFrame) - DataFrame containing the detection events.
df_inits (pandas.DataFrame) - DataFrame containing the detections from the short-interval initial sequence
rt_groups (pandas.DataFrame) - DataFrame containing the summary of receiver/transmitter groups
- range_driver.data_prep.read_nsog_data(detections_csv, vendor_tag_specs=None, merge=False, bunch=False)[source]¶
All in one function to read NSOG data
- Args:
detections_csv - csv file name for detection data otn_metadata - xls file name for OTN-style metadata vendor_tag_specs - xls file name for vendor extracted metadata merge - bool whether to merge metadata into detections bunch - bool whether to return metadata as Bunch dict
- Returns:
df_detections, df_deploy_meta, transmitter - if bunch == False, else df_detections, metadata - where metadata is a dict with metadata
- range_driver.data_prep.read_otn_data(detections_csv, otn_metadata=None, vendor_tag_specs=None, merge=True, bunch=False)[source]¶
All in one function to read OTN data
- Args:
detections_csv - csv file name for detection data otn_metadata - xls file name for OTN-style metadata vendor_tag_specs - xls file name for vendor extracted metadata merge - bool whether to merge metadata into detections bunch - bool whether to return metadata as Bunch dict
- Returns:
df_detections, df_deploy_meta, transmitter - if bunch == False, else df_decetions, metadata - where metadata is a dict with metadata
- range_driver.data_prep.read_via_config(config)[source]¶
Invoke configured data loading & processing.
- Parameters
config (sklearn.utils.Bunch) – A Bunch dictionary containing the configuration parameters for data loading & pre-processing. Created via yload() of the YAML config file.
- Returns
detection_df (pandas.DataFrame) - DataFrame containing the detection events.
mdb (sklearn.utils.Bunch) - Metadata associated with the detection events.
- range_driver.data_prep.rt_name(grdf, metadata, dist_str=None)[source]¶
Construct a name for given receiver/tag combination
- Args:
grdf: dataframe with ‘Transmitter’, ‘Tag Family’, and ‘Power’ columns metadata: bunch dict with ‘transmitter’ dist_str (optional): function handle that returns a string representation when given a distance in m
- Returns:
String that combines info, such as tag/receiver names, power-level, and distance