{ "cells": [ { "cell_type": "markdown", "id": "3b3c1874-30f1-41e4-9220-aa3a0b636d20", "metadata": {}, "source": [ "# Extracting data from epigraphhub database (Python version)\n", "\n", "This notebook provides some examples of how the functions in the `epigraphhub_db.py` module can be used. " ] }, { "cell_type": "markdown", "id": "37d6f01c-0d82-4344-b5e8-9d585d8b352a", "metadata": {}, "source": [ "### The function `get_agg_data()`\n", "\n", "This function queries a table saved in the epigraphhub database and returns a column's aggregated value related to another column with location names.\n", "\n", "Besides the `schema` and `table_name`it's necessary to provide a list with the name of three columns. The first column should contain dates, which will be used as an index. The second should contain locations to be considered in the aggregation. The third column should contain the values that will be aggregated.\n", "\n", "With this function, we can transform for example, the individual data of covid-19 in Colombia into a time series that represents the daily number of cases by `departamento`." ] }, { "cell_type": "code", "execution_count": 1, "id": "1d22d9c5-ce70-4293-8012-657e24ae67ce", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'dbname': 'epigraphhub', 'host': 'localhost', 'password': 'epigraph', 'port': 5432, 'username': 'epigraph'}\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
departamento_nomcount
fecha_de_notificaci_n
2021-12-20STA MARTA D.E.102
2021-07-30NORTE SANTANDER169
2020-09-29AMAZONAS1
2020-04-19STA MARTA D.E.2
2020-04-11META17
.........
2021-11-29CAUCA27
2020-03-30CAUCA2
2022-04-11RISARALDA6
2020-06-20NARIÑO103
2021-07-18CORDOBA144
\n", "

28124 rows × 2 columns

\n", "
" ], "text/plain": [ " departamento_nom count\n", "fecha_de_notificaci_n \n", "2021-12-20 STA MARTA D.E. 102\n", "2021-07-30 NORTE SANTANDER 169\n", "2020-09-29 AMAZONAS 1\n", "2020-04-19 STA MARTA D.E. 2\n", "2020-04-11 META 17\n", "... ... ...\n", "2021-11-29 CAUCA 27\n", "2020-03-30 CAUCA 2\n", "2022-04-11 RISARALDA 6\n", "2020-06-20 NARIÑO 103\n", "2021-07-18 CORDOBA 144\n", "\n", "[28124 rows x 2 columns]" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from epigraphhub.data.epigraphhub_db import get_agg_data\n", "df = get_agg_data(schema = 'colombia', table_name = 'positive_cases_covid_d',\n", " columns = ['fecha_de_notificaci_n', 'departamento_nom', 'id_'],\n", " method = 'COUNT', ini_date = '2020-01-01'\n", " )\n", "\n", "df" ] }, { "cell_type": "markdown", "id": "25e05533-7171-41fb-a889-e1407df9d39d", "metadata": {}, "source": [ "### The function `get_data_by_location()`\n", "\n", "This function queries a table saved in the epigraphhub database and has the possibility to filter the output given a list of locations and the name of the column to filter. \n", "\n", "For example, we have the `foph_cases_d` table, which represents the number of cases of covid-19 by canton in Switzerland. Using the function, we can get the output just for the cantons `GE` and `BE`. " ] }, { "cell_type": "code", "execution_count": 2, "id": "de674218-9583-4968-bf21-2b5216695d61", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datumgeoregionentries
02022-05-14GE68
12022-05-15GE48
22022-05-16GE146
32022-05-17GE118
42022-05-18GE118
............
18332022-05-09GE210
18342022-05-10GE146
18352022-05-11GE124
18362022-05-12GE146
18372022-05-13GE120
\n", "

1838 rows × 3 columns

\n", "
" ], "text/plain": [ " datum georegion entries\n", "0 2022-05-14 GE 68\n", "1 2022-05-15 GE 48\n", "2 2022-05-16 GE 146\n", "3 2022-05-17 GE 118\n", "4 2022-05-18 GE 118\n", "... ... ... ...\n", "1833 2022-05-09 GE 210\n", "1834 2022-05-10 GE 146\n", "1835 2022-05-11 GE 124\n", "1836 2022-05-12 GE 146\n", "1837 2022-05-13 GE 120\n", "\n", "[1838 rows x 3 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from epigraphhub.data.epigraphhub_db import get_data_by_location\n", "\n", "df = get_data_by_location(schema = 'switzerland', table_name = 'foph_cases_d', \n", " loc = ['GE', 'BE'], columns = ['datum', 'georegion', 'entries'],\n", " loc_column = 'georegion')\n", "\n", "df" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" }, "vscode": { "interpreter": { "hash": "f9e3a44f2f7108c4b7beba943bd42895a37b8963dbda2768ecd4cf1430c6d52e" } }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }