Extracting data from epigraphhub database (Python version)

This notebook provides some examples of how the functions in the epigraphhub_db.py module can be used.

The function get_agg_data()

This function queries a table saved in the epigraphhub database and returns a column’s aggregated value related to another column with location names.

Besides the schema and table_nameit’s necessary to provide a list with the name of three columns. The first column should contain dates, which will be used as an index. The second should contain locations to be considered in the aggregation. The third column should contain the values that will be aggregated.

With this function, we can transform for example, the individual data of covid-19 in Colombia into a time series that represents the daily number of cases by departamento.

from epigraphhub.data.epigraphhub_db import get_agg_data
df = get_agg_data(schema = 'colombia', table_name = 'positive_cases_covid_d',
                  columns = ['fecha_de_notificaci_n', 'departamento_nom', 'id_'],
                  method = 'COUNT', ini_date = '2020-01-01'
                 )

df
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
/tmp/ipykernel_267/3903642927.py in <module>
----> 1 from epigraphhub.data.epigraphhub_db import get_agg_data
      2 df = get_agg_data(schema = 'colombia', table_name = 'positive_cases_covid_d',
      3                   columns = ['fecha_de_notificaci_n', 'departamento_nom', 'id_'],
      4                   method = 'COUNT', ini_date = '2020-01-01'
      5                  )

ModuleNotFoundError: No module named 'epigraphhub'

The function get_data_by_location()

This function queries a table saved in the epigraphhub database and has the possibility to filter the output given a list of locations and the name of the column to filter.

For example, we have the foph_cases_d table, which represents the number of cases of covid-19 by canton in Switzerland. Using the function, we can get the output just for the cantons GE and BE.

from epigraphhub.data.epigraphhub_db import get_data_by_location

df = get_data_by_location(schema = 'switzerland', table_name = 'foph_cases_d', 
                       loc = ['GE', 'BE'], columns = ['datum', 'georegion', 'entries'],
                       loc_column = 'georegion')

df
datum georegion entries
0 2022-05-14 GE 68
1 2022-05-15 GE 48
2 2022-05-16 GE 146
3 2022-05-17 GE 118
4 2022-05-18 GE 118
... ... ... ...
1833 2022-05-09 GE 210
1834 2022-05-10 GE 146
1835 2022-05-11 GE 124
1836 2022-05-12 GE 146
1837 2022-05-13 GE 120

1838 rows × 3 columns