{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "3b3c1874-30f1-41e4-9220-aa3a0b636d20",
   "metadata": {},
   "source": [
    "# Extracting data from epigraphhub database (Python version)\n",
    "\n",
    "This notebook provides some examples of how the functions in the `epigraphhub_db.py` module can be used. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "37d6f01c-0d82-4344-b5e8-9d585d8b352a",
   "metadata": {},
   "source": [
    "### The function `get_agg_data()`\n",
    "\n",
    "This function queries a table saved in the epigraphhub database and returns a column's aggregated value related to another column with location names.\n",
    "\n",
    "Besides the `schema` and `table_name`it's necessary to provide a list with the name of three columns. The first column should contain dates, which will be used as an index. The second should contain locations to be considered in the aggregation. The third column should contain the values that will be aggregated.\n",
    "\n",
    "With this function, we can transform for example, the individual data of covid-19 in Colombia into a time series that represents the daily number of cases by `departamento`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "1d22d9c5-ce70-4293-8012-657e24ae67ce",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'dbname': 'epigraphhub', 'host': 'localhost', 'password': 'epigraph', 'port': 5432, 'username': 'epigraph'}\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>departamento_nom</th>\n",
       "      <th>count</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>fecha_de_notificaci_n</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2021-12-20</th>\n",
       "      <td>STA MARTA D.E.</td>\n",
       "      <td>102</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2021-07-30</th>\n",
       "      <td>NORTE SANTANDER</td>\n",
       "      <td>169</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-09-29</th>\n",
       "      <td>AMAZONAS</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-04-19</th>\n",
       "      <td>STA MARTA D.E.</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-04-11</th>\n",
       "      <td>META</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2021-11-29</th>\n",
       "      <td>CAUCA</td>\n",
       "      <td>27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-03-30</th>\n",
       "      <td>CAUCA</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2022-04-11</th>\n",
       "      <td>RISARALDA</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-06-20</th>\n",
       "      <td>NARIÑO</td>\n",
       "      <td>103</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2021-07-18</th>\n",
       "      <td>CORDOBA</td>\n",
       "      <td>144</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>28124 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                      departamento_nom  count\n",
       "fecha_de_notificaci_n                        \n",
       "2021-12-20              STA MARTA D.E.    102\n",
       "2021-07-30             NORTE SANTANDER    169\n",
       "2020-09-29                    AMAZONAS      1\n",
       "2020-04-19              STA MARTA D.E.      2\n",
       "2020-04-11                        META     17\n",
       "...                                ...    ...\n",
       "2021-11-29                       CAUCA     27\n",
       "2020-03-30                       CAUCA      2\n",
       "2022-04-11                   RISARALDA      6\n",
       "2020-06-20                      NARIÑO    103\n",
       "2021-07-18                     CORDOBA    144\n",
       "\n",
       "[28124 rows x 2 columns]"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from epigraphhub.data.epigraphhub_db import get_agg_data\n",
    "df = get_agg_data(schema = 'colombia', table_name = 'positive_cases_covid_d',\n",
    "                  columns = ['fecha_de_notificaci_n', 'departamento_nom', 'id_'],\n",
    "                  method = 'COUNT', ini_date = '2020-01-01'\n",
    "                 )\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "25e05533-7171-41fb-a889-e1407df9d39d",
   "metadata": {},
   "source": [
    "### The function `get_data_by_location()`\n",
    "\n",
    "This function queries a table saved in the epigraphhub database and has the possibility to filter the output given a list of locations and the name of the column to filter. \n",
    "\n",
    "For example, we have the `foph_cases_d` table, which represents the number of cases of covid-19 by canton in Switzerland. Using the function, we can get the output just for the cantons `GE` and `BE`.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "de674218-9583-4968-bf21-2b5216695d61",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>datum</th>\n",
       "      <th>georegion</th>\n",
       "      <th>entries</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2022-05-14</td>\n",
       "      <td>GE</td>\n",
       "      <td>68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2022-05-15</td>\n",
       "      <td>GE</td>\n",
       "      <td>48</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2022-05-16</td>\n",
       "      <td>GE</td>\n",
       "      <td>146</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2022-05-17</td>\n",
       "      <td>GE</td>\n",
       "      <td>118</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2022-05-18</td>\n",
       "      <td>GE</td>\n",
       "      <td>118</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1833</th>\n",
       "      <td>2022-05-09</td>\n",
       "      <td>GE</td>\n",
       "      <td>210</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1834</th>\n",
       "      <td>2022-05-10</td>\n",
       "      <td>GE</td>\n",
       "      <td>146</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1835</th>\n",
       "      <td>2022-05-11</td>\n",
       "      <td>GE</td>\n",
       "      <td>124</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1836</th>\n",
       "      <td>2022-05-12</td>\n",
       "      <td>GE</td>\n",
       "      <td>146</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1837</th>\n",
       "      <td>2022-05-13</td>\n",
       "      <td>GE</td>\n",
       "      <td>120</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1838 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "           datum georegion  entries\n",
       "0     2022-05-14        GE       68\n",
       "1     2022-05-15        GE       48\n",
       "2     2022-05-16        GE      146\n",
       "3     2022-05-17        GE      118\n",
       "4     2022-05-18        GE      118\n",
       "...          ...       ...      ...\n",
       "1833  2022-05-09        GE      210\n",
       "1834  2022-05-10        GE      146\n",
       "1835  2022-05-11        GE      124\n",
       "1836  2022-05-12        GE      146\n",
       "1837  2022-05-13        GE      120\n",
       "\n",
       "[1838 rows x 3 columns]"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from epigraphhub.data.epigraphhub_db import get_data_by_location\n",
    "\n",
    "df = get_data_by_location(schema = 'switzerland', table_name = 'foph_cases_d', \n",
    "                       loc = ['GE', 'BE'], columns = ['datum', 'georegion', 'entries'],\n",
    "                       loc_column = 'georegion')\n",
    "\n",
    "df"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  },
  "vscode": {
   "interpreter": {
    "hash": "f9e3a44f2f7108c4b7beba943bd42895a37b8963dbda2768ecd4cf1430c6d52e"
   }
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {},
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}