{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Epidemiological Indicators\n", "In most epidemiological analyses we must start with simple measurements, or indicators about the raw data. In the `epigraphhub.epistats` module, some useful functions to calculate such indicators are provided." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2022-09-20T11:34:41.877918Z", "start_time": "2022-09-20T11:34:41.510470Z" } }, "outputs": [], "source": [ "import pandas as pd\n", "from epigraphhub.analysis import epistats as es\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Calculating the posterior prevalence distribution\n", "The prevalence of a disease in a population can be modeled as a binomial random variable :math:`Bin(n,p)`, where a fraction of the cases in the population is the parameter :math:`p` and the population size is the parameter :math:`n`. If we have a prior guess on the proportion, which we can represent as a :math:`Beta(a,b)` distribution, we can obtain the posterior distribution of the prevalence for a point in time as a Beta(a+cases, b+population size-cases) as a result of a conjugate Bayesian analysis.\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2022-09-20T11:34:49.927138Z", "start_time": "2022-09-20T11:34:49.921296Z" } }, "outputs": [ { "data": { "text/plain": [ "0.010000979998040003" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Using a vague prior Beta(1,1)\n", "pdist = es.posterior_prevalence(pop_size=1e6,positives=10000,a=1,b=1)\n", "pdist.mean()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2022-09-20T11:34:51.310228Z", "start_time": "2022-09-20T11:34:51.307610Z" } }, "outputs": [ { "data": { "text/plain": [ "9.950342051571256e-05" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pdist.std()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Incidence rate\n", "incidence is defined as the number of new cases in a population over a period of time, typically 1 year. The incidence rate is also usually scale to 100k people to facilitate comparisons between localities with different populations.\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "ExecuteTime": { "end_time": "2022-09-20T11:34:57.325469Z", "start_time": "2022-09-20T11:34:57.317630Z" } }, "outputs": [ { "data": { "text/html": [ "
| \n", " | population | \n", "cases | \n", "incidence_ratio | \n", "
|---|---|---|---|
| 0 | \n", "1000 | \n", "5 | \n", "500.0 | \n", "
| 1 | \n", "5000 | \n", "5 | \n", "100.0 | \n", "
| 2 | \n", "10000 | \n", "5 | \n", "50.0 | \n", "