{ "cells": [ { "cell_type": "markdown", "id": "3b3c1874-30f1-41e4-9220-aa3a0b636d20", "metadata": {}, "source": [ "# Extracting data from epigraphhub database (Python version)\n", "\n", "This notebook provides some examples of how the functions in the `epigraphhub_db.py` module can be used. " ] }, { "cell_type": "markdown", "id": "37d6f01c-0d82-4344-b5e8-9d585d8b352a", "metadata": {}, "source": [ "### The function `get_agg_data()`\n", "\n", "This function queries a table saved in the epigraphhub database and returns a column's aggregated value related to another column with location names.\n", "\n", "Besides the `schema` and `table_name`it's necessary to provide a list with the name of three columns. The first column should contain dates, which will be used as an index. The second should contain locations to be considered in the aggregation. The third column should contain the values that will be aggregated.\n", "\n", "With this function, we can transform for example, the individual data of covid-19 in Colombia into a time series that represents the daily number of cases by `departamento`." ] }, { "cell_type": "code", "execution_count": 1, "id": "1d22d9c5-ce70-4293-8012-657e24ae67ce", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'dbname': 'epigraphhub', 'host': 'localhost', 'password': 'epigraph', 'port': 5432, 'username': 'epigraph'}\n" ] }, { "data": { "text/html": [ "
\n", " | departamento_nom | \n", "count | \n", "
---|---|---|
fecha_de_notificaci_n | \n", "\n", " | \n", " |
2021-12-20 | \n", "STA MARTA D.E. | \n", "102 | \n", "
2021-07-30 | \n", "NORTE SANTANDER | \n", "169 | \n", "
2020-09-29 | \n", "AMAZONAS | \n", "1 | \n", "
2020-04-19 | \n", "STA MARTA D.E. | \n", "2 | \n", "
2020-04-11 | \n", "META | \n", "17 | \n", "
... | \n", "... | \n", "... | \n", "
2021-11-29 | \n", "CAUCA | \n", "27 | \n", "
2020-03-30 | \n", "CAUCA | \n", "2 | \n", "
2022-04-11 | \n", "RISARALDA | \n", "6 | \n", "
2020-06-20 | \n", "NARIÑO | \n", "103 | \n", "
2021-07-18 | \n", "CORDOBA | \n", "144 | \n", "
28124 rows × 2 columns
\n", "\n", " | datum | \n", "georegion | \n", "entries | \n", "
---|---|---|---|
0 | \n", "2022-05-14 | \n", "GE | \n", "68 | \n", "
1 | \n", "2022-05-15 | \n", "GE | \n", "48 | \n", "
2 | \n", "2022-05-16 | \n", "GE | \n", "146 | \n", "
3 | \n", "2022-05-17 | \n", "GE | \n", "118 | \n", "
4 | \n", "2022-05-18 | \n", "GE | \n", "118 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "
1833 | \n", "2022-05-09 | \n", "GE | \n", "210 | \n", "
1834 | \n", "2022-05-10 | \n", "GE | \n", "146 | \n", "
1835 | \n", "2022-05-11 | \n", "GE | \n", "124 | \n", "
1836 | \n", "2022-05-12 | \n", "GE | \n", "146 | \n", "
1837 | \n", "2022-05-13 | \n", "GE | \n", "120 | \n", "
1838 rows × 3 columns
\n", "