Extracting data from epigraphhub database (Python version)¶

This notebook provides some examples of how the functions in the epigraphhub_db.py module can be used.

The function `get_agg_data()`¶

This function queries a table saved in the epigraphhub database and returns a column’s aggregated value related to another column with location names.

Besides the schema and table_nameit’s necessary to provide a list with the name of three columns. The first column should contain dates, which will be used as an index. The second should contain locations to be considered in the aggregation. The third column should contain the values that will be aggregated.

With this function, we can transform for example, the individual data of covid-19 in Colombia into a time series that represents the daily number of cases by departamento.

from epigraphhub.data.epigraphhub_db import get_agg_data
df = get_agg_data(schema = 'colombia', table_name = 'positive_cases_covid_d',
                  columns = ['fecha_de_notificaci_n', 'departamento_nom', 'id_'],
                  method = 'COUNT', ini_date = '2020-01-01'
                 )

df

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
/tmp/ipykernel_267/3903642927.py in <module>
----> 1 from epigraphhub.data.epigraphhub_db import get_agg_data
      2 df = get_agg_data(schema = 'colombia', table_name = 'positive_cases_covid_d',
      3                   columns = ['fecha_de_notificaci_n', 'departamento_nom', 'id_'],
      4                   method = 'COUNT', ini_date = '2020-01-01'
      5                  )

ModuleNotFoundError: No module named 'epigraphhub'

The function `get_data_by_location()`¶

This function queries a table saved in the epigraphhub database and has the possibility to filter the output given a list of locations and the name of the column to filter.

For example, we have the foph_cases_d table, which represents the number of cases of covid-19 by canton in Switzerland. Using the function, we can get the output just for the cantons GE and BE.

from epigraphhub.data.epigraphhub_db import get_data_by_location

df = get_data_by_location(schema = 'switzerland', table_name = 'foph_cases_d', 
                       loc = ['GE', 'BE'], columns = ['datum', 'georegion', 'entries'],
                       loc_column = 'georegion')

df

	datum	georegion	entries
0	2022-05-14	GE	68
1	2022-05-15	GE	48
2	2022-05-16	GE	146
3	2022-05-17	GE	118
4	2022-05-18	GE	118
...	...	...	...
1833	2022-05-09	GE	210
1834	2022-05-10	GE	146
1835	2022-05-11	GE	124
1836	2022-05-12	GE	146
1837	2022-05-13	GE	120

1838 rows × 3 columns

Extracting data from epigraphhub database (Python version)¶

The function `get_agg_data()`¶

The function `get_data_by_location()`¶

EpigraphHub Library

Navigation

Related Topics

Extracting data from epigraphhub database (Python version)¶

The function get_agg_data()¶

The function get_data_by_location()¶

The function `get_agg_data()`¶

The function `get_data_by_location()`¶