Readers

Defines several methods for analyzing, plotting, and exporting wearable data, including a Pandas accessor for wearable dataframes

Overview

The circadian.readers module contains several methods for working with wearable data such as step counts, heart rate, and sleep. It also defines a Pandas accessor called WearableData to standardize and validate wearable dataframes.

Loading wearable data

The circadian.readers module provides functionality to import files in several formats, including raw CSV counts, JSON files, and data coming from Actiwatch readers in CSV format. For example, to load a CSV file with heart rate data we can do:

from circadian.readers import load_csv
file_path = 'circadian/sample_data/hr_data.csv'
df_hr = load_csv(file_path, timestamp_col='timestamp')

	heartrate	timestamp	datetime
0	79.0	4.688359e+07	1971-06-27 15:13:12.693424232
1	80.0	4.688329e+07	1971-06-27 15:08:09.693448064
2	81.0	4.688306e+07	1971-06-27 15:04:20.692736632
3	80.0	4.688273e+07	1971-06-27 14:58:46.686474800
4	85.0	4.688257e+07	1971-06-27 14:56:08.187120912
...	...	...	...
99995	97.0	3.271680e+07	1971-01-14 15:59:56.779711960
99996	95.0	3.271679e+07	1971-01-14 15:59:49.779711960
99997	95.0	3.271679e+07	1971-01-14 15:59:48.779711960
99998	95.0	3.271678e+07	1971-01-14 15:59:43.779711960
99999	93.0	3.271677e+07	1971-01-14 15:59:34.779711960

100000 rows × 3 columns

by indicating which column contains the unix timestamp information, load_csv automatically generates a new column with the datetime information. If no timestamp column is provided, it is assumed that a column named ‘datetime’ (or ‘start’ and ‘end’) is present in the file. For data specified via time intervals, such as step counts, no new column is generated and the user can choose how to process the data. For example, to load a CSV file with step counts we can do:

file_path = 'circadian/sample_data/steps_data.csv'
df_steps = load_csv(file_path)

	start	end	steps
0	1970-01-01 00:00:00	1970-01-01 00:01:00	21.000000
1	1970-01-01 00:49:00	1970-01-01 00:50:00	8.183578
2	1970-01-01 00:50:00	1970-01-01 00:51:00	19.816422
3	1970-01-01 01:51:00	1970-01-01 01:52:00	0.571419
4	1970-01-01 01:52:00	1970-01-01 01:53:00	26.499032
...	...	...	...
222765	1971-06-27 14:24:00	1971-06-27 14:25:00	28.006870
222766	1971-06-27 14:25:00	1971-06-27 14:26:00	15.957981
222767	1971-06-27 14:26:00	1971-06-27 14:27:00	14.000000
222768	1971-06-27 14:37:00	1971-06-27 14:38:00	72.642453
222769	1971-06-27 14:38:00	1971-06-27 14:39:00	31.995192

222770 rows × 3 columns

Additionally, we can import data in JSON format. For example, to load a JSON file with multiple streams of wearable data we can do:

file_path = 'circadian/sample_data/sample_data.json'
df_dict = load_json(file_path)
print(df_dict.keys())

dict_keys(['wake', 'steps', 'heartrate'])

where df_dict is a dictionary with the dataframes for each stream. The keys of the dictionary are the names of the streams. For example, to access the dataframe with the wake data we can do:

df_wake = df_dict['wake']

	start	end	wake
0	1970-02-03 04:49:01.000000	1970-02-03 09:01:00.000000	0
1	1970-02-03 09:02:00.000000	1970-02-03 11:25:00.000000	0
2	1970-02-04 04:51:01.000000	1970-02-04 12:35:00.000000	0
3	1970-02-04 12:36:00.000000	1970-02-04 12:37:00.000000	0
4	1970-02-04 12:38:00.000000	1970-02-04 12:39:00.000000	0
...	...	...	...
2750	1971-06-27 07:38:31.105829	1971-06-27 08:01:01.105829	0
2751	1971-06-27 08:03:01.105829	1971-06-27 08:55:31.105829	0
2752	1971-06-27 09:05:31.105829	1971-06-27 09:07:01.105829	0
2753	1971-06-27 09:08:01.105829	1971-06-27 12:06:01.105829	0
2754	1971-06-27 12:08:01.105829	1971-06-27 12:15:31.105829	0

2755 rows × 3 columns

The circadian.readers module only accepts specific column names for wearable data. The accepted column names are stored in VALID_WEARABLE_STREAMS:

['steps', 'heartrate', 'wake', 'light_estimate', 'activity']

Finally, we can import data from Actiwatch readers. For example, to load a CSV file with data from an Actiwatch reader we can do:

file_path = 'circadian/sample_data/sample_actiwatch.csv'
df_actiwatch = load_actiwatch(file_path)

	activity	light_estimate	wake	datetime
0	91.0	318.16	1.0	2019-02-20 12:32:00
1	125.0	285.38	1.0	2019-02-20 12:32:30
2	154.0	312.05	1.0	2019-02-20 12:33:00
3	424.0	294.61	1.0	2019-02-20 12:33:30
4	385.0	285.06	1.0	2019-02-20 12:34:00
...	...	...	...	...
55646	0.0	5.02	0.0	2019-03-11 20:15:00
55647	56.0	4.56	1.0	2019-03-11 20:15:30
55648	30.0	2.85	1.0	2019-03-11 20:16:00
55649	9.0	2.39	0.0	2019-03-11 20:16:30
55650	2.0	2.20	NaN	2019-03-11 20:17:00

55651 rows × 4 columns

note that load_actiwatch automatically generates a new column with the datetime information and standardizes column names.

Resampling wearable data

The circadian.readers module provides functionality to resample both data that is specified via time intervals or via timestamps. For example, to resample a dataframe with step counts we can do:

name = 'steps'
resample_freq = '1D'
agg_method = 'sum'
resampled_steps = resample_df(df_steps, name, resample_freq, agg_method)

	datetime	steps
0	1970-01-01	847.000000
1	1970-01-02	1097.000000
2	1970-01-03	1064.000000
3	1970-01-04	2076.000000
4	1970-01-05	2007.000000
...	...	...
538	1971-06-23	9372.098478
539	1971-06-24	10137.802450
540	1971-06-25	14977.306682
541	1971-06-26	5644.161346
542	1971-06-27	3823.642766

543 rows × 2 columns

where resample_freq is a string indicating the frequency of the resampling in Pandas offset aliases notation. Under name, the column to be resampled is specified and the agg_method parameter indicates how to aggregate the data.

Combining wearable data

We can combine wearable data from different streams into a single dataframe. To achieve this we can use the combine_wearable_dataframes method which resamples and aggregates data to produce a dataframe with a single datetime index and columns for each stream. For example, to combine all the loaded dataframes from the previous section we would do:

df_dict = {
    'heartrate': df_hr,
    'steps': df_steps,
    'wake': df_wake
}
resample_freq = '1D'
combined_data = combine_wearable_dataframes(df_dict, resample_freq)

	datetime	heartrate	steps	wake
0	1970-01-04	0.000000	16188.000000	0.0
1	1970-01-11	0.000000	19199.000000	0.0
2	1970-01-18	0.000000	17888.000000	0.0
3	1970-01-25	0.000000	31879.933210	0.0
4	1970-02-01	0.000000	55148.103393	0.0
...	...	...	...	...
73	1971-05-30	79.914844	63334.731963	0.0
74	1971-06-06	97.080529	96282.157729	0.0
75	1971-06-13	93.772603	58306.829809	0.0
76	1971-06-20	99.018829	75375.797207	0.0
77	1971-06-27	97.370401	3823.642766	0.0

78 rows × 4 columns

For resampling, each wearable stream has a defaul aggregation method. The default methods are defined in the variable wearable_RESAMPLE_METHOD:

{'steps': 'sum', 'wake': 'max', 'heartrate': 'mean', 'light_estimate': 'mean', 'activity': 'mean'}

API Documentation

source

WearableData

 WearableData (pandas_obj)

pd.DataFrame accessor implementing wearable-specific methods

source

WearableData.is_valid

 WearableData.is_valid ()

source

WearableData.add_metadata

 WearableData.add_metadata (metadata:Dict[str,str], inplace:bool=False)

	Type	Default	Details
metadata	Dict		metadata containing data_id, subject_id, or other_info
inplace	bool	False	whether to return a new dataframe or modify the current one

source

WearableData.rename_columns

 WearableData.rename_columns (df, inplace:bool=False)

Standardize column names by making them lowercase and replacing spaces with underscores

source

load_json

 load_json (filepath:str, metadata:Dict[str,str]=None)

Create a dataframe from a json containing a single or multiple streams of wearable data

	Type	Default	Details
filepath	str		path to file
metadata	Dict	None	metadata containing data_id, subject_id, or other_info
Returns	Dict		dictionary of wearable dataframes, one key:value pair per wearable data stream

source

load_csv

 load_csv (filepath:str, metadata:Dict[str,str]=None,
           timestamp_col:str=None, *args, **kwargs)

Create a dataframe from a csv containing wearable data

	Type	Default	Details
filepath	str		full path to csv file to be loaded
metadata	Dict	None	metadata containing data_id, subject_id, or other_info
timestamp_col	str	None	name of the column to be used as timestamp. If None, it is assumed that a `datetime` column exists
args	VAR_POSITIONAL		arguments to pass to pd.read_csv
kwargs	VAR_KEYWORD

source

load_actiwatch

 load_actiwatch (filepath:str, metadata:Dict[str,str]=None, *args,
                 **kwargs)

Create a dataframe from an actiwatch csv file

	Type	Default	Details
filepath	str		full path to csv file to be loaded
metadata	Dict	None	metadata containing data_id, subject_id, or other_info
args	VAR_POSITIONAL		arguments to pass to pd.read_csv
kwargs	VAR_KEYWORD
Returns	DataFrame		dataframe with the wearable data

source

resample_df

 resample_df (df:pandas.core.frame.DataFrame, name:str, freq:str,
              agg_method:str, initial_datetime:pandas._libs.tslibs.timesta
              mps.Timestamp=None, final_datetime:pandas._libs.tslibs.times
              tamps.Timestamp=None)

Resample a wearable dataframe. If data is specified in intervals, returns the density of the quantity per minute.

	Type	Default	Details
df	DataFrame		dataframe to be resampled
name	str		name of the wearable data to resample (one of steps, heartrate, wake, light_estimate, or activity)
freq	str		frequency to resample to. String must be a valid pandas frequency string (e.g. ‘1min’, ‘5min’, ‘1H’, ‘1D’). See https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases
agg_method	str		aggregation method to use when resampling
initial_datetime	Timestamp	None	initial datetime to use when resampling. If None, the minimum datetime in the dataframe is used
final_datetime	Timestamp	None	final datetime to use when resampling. If None, the maximum datetime in the dataframe is used
Returns	DataFrame		resampled dataframe

source

combine_wearable_dataframes

 combine_wearable_dataframes
                              (df_dict:Dict[str,pandas.core.frame.DataFram
                              e], resample_freq:str,
                              metadata:Dict[str,str]=None)

Combine a dictionary of wearable dataframes into a single dataframe with resampling

	Type	Default	Details
df_dict	Dict		dictionary of wearable dataframes
resample_freq	str		resampling frequency (e.g. ‘10min’ for 10 minutes, see Pandas Offset aliases: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases)
metadata	Dict	None	metadata for the combined dataframe
Returns	DataFrame		combined wearable dataframe