{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# HW3: Hidden Markov Models\n", "\n", "This assignment considers the problem of classifying sleep states based on noisy heart rate sensor data, like what you might obtain from an Apple Watch. Sleep researchers use [polysomnography (PSG)](https://en.wikipedia.org/wiki/Polysomnography) to obtain ground truth sleep states. PSG combines measurements of your brain (EEG), muscles (EMG), and heart (EKG) to determine which of four sleep states you are in:\n", "1. _Awake_\n", "2. _Core Sleep_: Non-REM stages 1 and 2. Heart rate and body temperature drop. Sleep spindles, which are thought to be important for memory consolidation, seen in EEG.\n", "3. _Deep Sleep_: Non-REM stage 3. Hard to wake up, and you feel groggy if you do. Lowest heart rate.\n", "4. _Rapid Eye Movement (REM) Sleep_: This is when dreaming occurs. Heart rate increases, similar to in an awake state. Not considered restful sleep. \n", "\n", "Of course, measuring EEG, EMG, and EKG while you're sleeping is a big pain. It would be great if we could predict sleep states using less invasive measures, like what you might obtain with an Apple Watch. That's the goal of this assignment!\n", "\n", "> **This Assignment**\n", ">\n", ">**Model:** You will use a **Hidden Markov Model (HMM)** to classify sleep stages based on (synthetic) heart rate measurements. \n", ">\n", ">**Algorithm:** You will use **expectation-maximization (EM)** and the **forward-backward algorithm** to estimate the model parameters and infer the latent states.\n", ">\n", ">**Data**: We will work with a synthetic dataset modeled after a sleep study by Walch et al (2019). We found it too difficult to make accurate predictions on their real data, so we simulated synthetic data with a stronger relationship between sleep states and heart rate. This way, you should be able to predict sleep states with better accuracy than we were able to achieve with the actual data. Though the data is synthetic, it still serves as a good exercise for learning about these models and algorithms.\n", "\n", "\n", "**References**\n", "- Walch, O. (2019). Motion and heart rate from a wrist-worn wearable and labeled sleep from polysomnography (version 1.0.0). PhysioNet. https://doi.org/10.13026/hmhs-py35.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup " ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import torch" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data consists of 30 time series, each corresponding to one subject's sleep data over from a single night. The time series have been concatenated together in the data frame below. The fields are,\n", "- `id`: the index of the time series (0, ..., 29)\n", "- `state` : the \"true\" sleep state\n", " - 0 = \"awake\"\n", " - 1 = \"core sleep\"\n", " - 2 = \"deep sleep\"\n", " - 3 = \"REM\"\n", "- `hr`: the measured heart rate. Missing data is marked by `NaN`.\n", "\n", "Each row corresponds to a 30 second time bin. The time series are variable in length, ranging from just shy of 4 hours to over 8 hours. Again, these are simulated data, but we generated the data to look like heart rate traces measured by Walch et al. (2019)." ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | id | \n", "state | \n", "hr | \n", "
---|---|---|---|
0 | \n", "0 | \n", "0 | \n", "97.378630 | \n", "
1 | \n", "0 | \n", "0 | \n", "97.254448 | \n", "
2 | \n", "0 | \n", "0 | \n", "118.884372 | \n", "
3 | \n", "0 | \n", "0 | \n", "99.038121 | \n", "
4 | \n", "0 | \n", "0 | \n", "98.649347 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "
26609 | \n", "29 | \n", "1 | \n", "NaN | \n", "
26610 | \n", "29 | \n", "1 | \n", "NaN | \n", "
26611 | \n", "29 | \n", "1 | \n", "NaN | \n", "
26612 | \n", "29 | \n", "1 | \n", "NaN | \n", "
26613 | \n", "29 | \n", "1 | \n", "NaN | \n", "
26614 rows × 3 columns
\n", "