pisces
This package provides a framework for running machine learning experiments in the sleep classification space. The core of the package is SleepWakeClassifier class that formalizes an API for pre-processing raw sensor data into model-specified formats and scoring methods, automated data set and subject/feature discovery based on a light folder structure.
Also included is an example notebook showing pisces being used for analysis in a forthcoming paper from Arcascope. That studies the potential impact to scoring accuracy that may occur when a sleep classifier is trained on stationary subjects in a sleep study, then is deployed for inference on a Naval vessel with lots of ambient mechanical vibrations which affect the accelerometer recordings essential to many approaches to sleep classification.
The pipeline is designed to be flexible and can be easily extended to include new models, datasets, and evaluation metrics.
Installation
We will soon make Pisces available on PyPi, but for the time being you clone this repository and install the package locally.
Start by making a python or conda environment with Python 3.11 and installing the requirements from file, replacing {env_name} with the name you’d like to give it, such as pisces_env:
conda create -n {env_name} python=3.11
conda activate {env_name}
In the same terminal (so that your new conda environment is active), navigate to the directory where you’d like to clone the package and run the following commands to clone it and use pip to install the package in an editable way with -e .
git clone https://github.com/Arcascope/pisces.git
cd pisces
pip install -e .Common issues
You may end up with a version of Keras incompatible with the marshalled data in pisces/cached_models. In that case, re-run the generation notebook <pisces>/analyses/convert_mads_olsen_model_to_keras.ipynb
Usage
The primary module to import is pisces.experiments which contains classes used for discovering and providing access to data sets in your chosen folder, as well as (trainable) classifiers.
Pisces is designed to be extended to support new models, datasets, and evaluation metrics. The analyses folder contains example notebooks that demonstrate how to use this code for comparing classifier performance on in-distribution and out-of-distribution accelerometer data.
Data Sets
Pisces automatically discovers data sets that match a simple, flexible format inside a given directory. analyses/stationary_vs_hybrid.ipynb finds data contained in the data_sets folder of the Pisces repository. The code is simple:
from pisces.experiments import DataSetObject
sets = DataSetObject.find_data_sets("../data_sets")
walch = sets['walch_et_al']
hybrid = sets['hybrid_motion']Now we have 2 DataSetObjects, walch and hybrid, that can be queried for their subjects and features. These were discovered because these are folders inside of data_sets that have a compatible structure.
These two sets were discovered because of the presence of at least one subdirectory matching the glob expression cleaned_*. Every subdirectory that matches this pattern is considered a feature, so based on the example below, Pisces discovers that hybrid_motion and walch_et_al both have psg, accelerometer, and activity features, in addition to other folders they may have not listed.
The data_sets directory looks like:
data_sets
├── walch_et_al
│ ├── cleaned_accelerometer
│ │ ├── 46343_cleaned_motion.out
│ │ ├── 759667_cleaned_motion.out
│ │ ├── ...
│ ├── cleaned_activity
│ │ ├── 46343_cleaned_counts.out
│ │ ├── 759667_cleaned_counts.out
│ │ ├── ...
│ ├── cleaned_psg
│ │ ├── 46343_cleaned_psg.out
│ │ ├── 759667_cleaned_psg.out
│ │ ├── ...
├── hybrid_motion
│ ├── cleaned_accelerometer
│ │ ├── 46343.csv
│ │ ├── 759667.csv
│ │ ├── ...
│ ├── cleaned_activity
│ │ ├── 46343.csv
│ │ ├── 759667.csv
│ │ ├── ...
│ ├── cleaned_psg
│ │ ├── 46343_labeled_sleep.txt
│ │ ├── 759667_labeled_sleep.txt
│ │ ├── ...Key takeaways for data set discovery:
- The data set is discovered based on the presence of a subdirectory matching the glob expression
cleaned_*. - Every subdirectory that matches this pattern is considered a feature; these features are named after the part matching
*. - Subjects within a feature are computed per-feature, based on variadic and constant parts of the filenames within each feature directory. Said in a less fancy way, because the
walch_et_alaccelerometer folders contain the files46343_cleaned_motion.outand759667_cleaned_motion.outwhich have_cleaned_motion.outin common, Pisces identifies46343and759667as subject IDs that have accelerometer feature data forwalch_et_al.- It is no problem if some subjects are missing a certain feature. When the feature data for an existing subject, without that feature in their data, is requested, the feature will return
Nonefor that subject. - The naming scheme can vary greatly between features. However, the subject id MUST be the prefix on the filenames. For example,
46343_labeled_sleep.txtare both for the same subject,46343. If instead we named thosefinal_46343_cleaned_motion.outand46343_labeled_sleep.txtthen the subject’s data would be broken into two subjects,46343andfinal_46343.
- It is no problem if some subjects are missing a certain feature. When the feature data for an existing subject, without that feature in their data, is requested, the feature will return
Advanced features of data set discovery:
- There is no a-priori rule about what features in a data set give the labels and which are model inputs. This allows you to call the label feature whatever you want, or use a mixture of features (psg + …) as labels for complex models supporting rich outputs.
- You can have other folders inside data set directories that do NOT match
cleaned_*, and these are totally ignored. This allows you to store other data, like raw data or metadata, in the same directory as the cleaned data. - You can have other folders whose sub-structure does not match the subject/feature structure, and these are totally ignored.