Extracting data from epigraphhub database (Python version)¶
This notebook provides some examples of how the functions in the epigraphhub_db.py
module can be used.
The function get_agg_data()
¶
This function queries a table saved in the epigraphhub database and returns a column’s aggregated value related to another column with location names.
Besides the schema
and table_name
it’s necessary to provide a list with the name of three columns. The first column should contain dates, which will be used as an index. The second should contain locations to be considered in the aggregation. The third column should contain the values that will be aggregated.
With this function, we can transform for example, the individual data of covid-19 in Colombia into a time series that represents the daily number of cases by departamento
.
from epigraphhub.data.epigraphhub_db import get_agg_data
df = get_agg_data(schema = 'colombia', table_name = 'positive_cases_covid_d',
columns = ['fecha_de_notificaci_n', 'departamento_nom', 'id_'],
method = 'COUNT', ini_date = '2020-01-01'
)
df
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
/tmp/ipykernel_267/3903642927.py in <module>
----> 1 from epigraphhub.data.epigraphhub_db import get_agg_data
2 df = get_agg_data(schema = 'colombia', table_name = 'positive_cases_covid_d',
3 columns = ['fecha_de_notificaci_n', 'departamento_nom', 'id_'],
4 method = 'COUNT', ini_date = '2020-01-01'
5 )
ModuleNotFoundError: No module named 'epigraphhub'
The function get_data_by_location()
¶
This function queries a table saved in the epigraphhub database and has the possibility to filter the output given a list of locations and the name of the column to filter.
For example, we have the foph_cases_d
table, which represents the number of cases of covid-19 by canton in Switzerland. Using the function, we can get the output just for the cantons GE
and BE
.
from epigraphhub.data.epigraphhub_db import get_data_by_location
df = get_data_by_location(schema = 'switzerland', table_name = 'foph_cases_d',
loc = ['GE', 'BE'], columns = ['datum', 'georegion', 'entries'],
loc_column = 'georegion')
df
datum | georegion | entries | |
---|---|---|---|
0 | 2022-05-14 | GE | 68 |
1 | 2022-05-15 | GE | 48 |
2 | 2022-05-16 | GE | 146 |
3 | 2022-05-17 | GE | 118 |
4 | 2022-05-18 | GE | 118 |
... | ... | ... | ... |
1833 | 2022-05-09 | GE | 210 |
1834 | 2022-05-10 | GE | 146 |
1835 | 2022-05-11 | GE | 124 |
1836 | 2022-05-12 | GE | 146 |
1837 | 2022-05-13 | GE | 120 |
1838 rows × 3 columns