{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "6770f85f",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "# Downloading data from World Bank Data (Python version)\n",
    "\n",
    "This section will explain how to use the functions in the `worldbank` module from the `epigraphhub` package to download the data hosted in the world bank data platform. \n",
    "\n",
    "All the functions created in this file were created based on the implementation of the package [**wbgapi**](https://pypi.org/project/wbgapi/). \n",
    "\n",
    "### Function `search_in_database()`\n",
    "\n",
    "This function allows the user to search, using a keyword the name of a database hosted in the world bank data. The function will search over all the databases and return the matched values. The return of this function is a pandas DataFrame with some information about the databases found in the search. \n",
    " \n",
    "The most important columns of the DataFrame returned are:\n",
    "\n",
    "* The column `name`, that is used in the search to match with the keyword; \n",
    "\n",
    "* The column `id` that we will use to refer to the database in other functions;\n",
    "\n",
    "* The column `lastupdated` that returns when was the last time that the data in the database was updated. \n",
    "\n",
    "This function has only one parameter named `keyword` and must be a string. \n",
    "\n",
    "For example, you can search over all the databases with the keyword `global` in the name. In this case, the return will be: \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "b08de47c",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>lastupdated</th>\n",
       "      <th>name</th>\n",
       "      <th>code</th>\n",
       "      <th>databid</th>\n",
       "      <th>description</th>\n",
       "      <th>url</th>\n",
       "      <th>dataavailability</th>\n",
       "      <th>metadataavailability</th>\n",
       "      <th>concepts</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>15</td>\n",
       "      <td>2020-07-27</td>\n",
       "      <td>Global Economic Monitor</td>\n",
       "      <td>GEM</td>\n",
       "      <td>1179</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>Y</td>\n",
       "      <td>Y</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>27</td>\n",
       "      <td>2022-01-11</td>\n",
       "      <td>Global Economic Prospects</td>\n",
       "      <td>GEP</td>\n",
       "      <td>999</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>Y</td>\n",
       "      <td>N</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>28</td>\n",
       "      <td>2018-10-15</td>\n",
       "      <td>Global Financial Inclusion</td>\n",
       "      <td>FDX</td>\n",
       "      <td>1228</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>Y</td>\n",
       "      <td>Y</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>32</td>\n",
       "      <td>2021-12-03</td>\n",
       "      <td>Global Financial Development</td>\n",
       "      <td>GFD</td>\n",
       "      <td>1250</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>Y</td>\n",
       "      <td>N</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>34</td>\n",
       "      <td>2013-04-12</td>\n",
       "      <td>Global Partnership for Education</td>\n",
       "      <td>GPE</td>\n",
       "      <td>1247</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>Y</td>\n",
       "      <td>Y</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>69</td>\n",
       "      <td>2019-06-27</td>\n",
       "      <td>Global Financial Inclusion and Consumer Protec...</td>\n",
       "      <td>RFA</td>\n",
       "      <td>3703</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>Y</td>\n",
       "      <td>Y</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>73</td>\n",
       "      <td>2019-06-27</td>\n",
       "      <td>Global Financial Inclusion and Consumer Protec...</td>\n",
       "      <td>RFI</td>\n",
       "      <td>3704</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>Y</td>\n",
       "      <td>Y</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>82</td>\n",
       "      <td>2021-03-24</td>\n",
       "      <td>Global Public Procurement</td>\n",
       "      <td>GPP</td>\n",
       "      <td>3724</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>Y</td>\n",
       "      <td>N</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>86</td>\n",
       "      <td>2021-09-24</td>\n",
       "      <td>Global Jobs Indicators Database (JOIN)</td>\n",
       "      <td>JON</td>\n",
       "      <td>3735</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>Y</td>\n",
       "      <td>Y</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   id lastupdated                                               name code  \\\n",
       "0  15  2020-07-27                            Global Economic Monitor  GEM   \n",
       "1  27  2022-01-11                          Global Economic Prospects  GEP   \n",
       "2  28  2018-10-15                         Global Financial Inclusion  FDX   \n",
       "3  32  2021-12-03                       Global Financial Development  GFD   \n",
       "4  34  2013-04-12                   Global Partnership for Education  GPE   \n",
       "5  69  2019-06-27  Global Financial Inclusion and Consumer Protec...  RFA   \n",
       "6  73  2019-06-27  Global Financial Inclusion and Consumer Protec...  RFI   \n",
       "7  82  2021-03-24                          Global Public Procurement  GPP   \n",
       "8  86  2021-09-24             Global Jobs Indicators Database (JOIN)  JON   \n",
       "\n",
       "  databid description url dataavailability metadataavailability concepts  \n",
       "0    1179                                Y                    Y        3  \n",
       "1     999                                Y                    N        3  \n",
       "2    1228                                Y                    Y        3  \n",
       "3    1250                                Y                    N        3  \n",
       "4    1247                                Y                    Y        3  \n",
       "5    3703                                Y                    Y        3  \n",
       "6    3704                                Y                    Y        3  \n",
       "7    3724                                Y                    N        3  \n",
       "8    3735                                Y                    Y        3  "
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from epigraphhub.data.worldbank import search_in_database\n",
    "\n",
    "df_db = search_in_database('global')\n",
    "\n",
    "df_db"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "13cbd132",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "If you use the `keyword = all` all the available databases will be returned.  \n",
    "\n",
    "After selecting a database, we can use the function `search_in_indicators()` to see what indicators we can get from this database. \n",
    "\n",
    "### Function `search_in_indicators()`\n",
    "\n",
    "This function returns a DataFrame with the indicators matched by partial name. Accept two parameters: the first is `keyword`, which should be a string used to search combinations between the keyword and the indicator's name in a specific database; the second parameter is related with the database, it's called `db`. This parameter only accepts  **int** values as input. It must be filled with the **id** number of the database, which can be obtained with the function `search_in_database`. \n",
    "\n",
    "If the `db` parameter is not filled, the function assumes as default `db = 2`. In this configuration, the list of indicators from the database **World Development Indicators** is returned.\n",
    "\n",
    "For example, to get the name of the indicators related to `air pollution` in the `db = 2`, just type `search_in_indicators('air pollution', db = 2)` and the returned data frame will be: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "cb544c55",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>EN.ATM.PM25.MC.M3</td>\n",
       "      <td>PM2.5 air pollution, mean annual exposure (mic...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>EN.ATM.PM25.MC.ZS</td>\n",
       "      <td>PM2.5 air pollution, population exposed to lev...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>SH.STA.AIRP.FE.P5</td>\n",
       "      <td>Mortality rate attributed to household and amb...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>SH.STA.AIRP.MA.P5</td>\n",
       "      <td>Mortality rate attributed to household and amb...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>SH.STA.AIRP.P5</td>\n",
       "      <td>Mortality rate attributed to household and amb...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                  id                                              value\n",
       "0  EN.ATM.PM25.MC.M3  PM2.5 air pollution, mean annual exposure (mic...\n",
       "1  EN.ATM.PM25.MC.ZS  PM2.5 air pollution, population exposed to lev...\n",
       "2  SH.STA.AIRP.FE.P5  Mortality rate attributed to household and amb...\n",
       "3  SH.STA.AIRP.MA.P5  Mortality rate attributed to household and amb...\n",
       "4     SH.STA.AIRP.P5  Mortality rate attributed to household and amb..."
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from epigraphhub.data.worldbank import search_in_indicators \n",
    "\n",
    "df_ind = search_in_indicators('air pollution', db = 2)\n",
    "\n",
    "df_ind "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9c9142d0",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "\n",
    "We will use the `id` column values to get the data for the indicators described in the `value` cell associated with the `id` column. To get this data we will use the function `get_worldbank_data()`. \n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "42b01c33",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Function `get_worldbank_data()`\n",
    "\n",
    "The function `get_worldbank_data` returns a DataFrame with indicators available in some database of the world bank data. \n",
    " \n",
    " This function has the following parameters: \n",
    "\n",
    " * `ind` : This parameter must be filled with a list of strings where each value in the list should be filled with an indicator's id value. An indicator's `id` value can be obtained with the function `search_in_indicators()`. \n",
    "\n",
    " * `country`: This parameter must be filled with a list of strings where each value in the list should be filled with the ISO-CODE of each interest country. \n",
    "\n",
    " * `db`: This parameter should be filled with an int value representing the database where the data is being captured. You can obtain this value with the function `search_in_database()`.\n",
    "\n",
    " * `time`: If filled ` time = 'all'`, the function will return all the data available. You can also specify a range of years. For example, if you want to get the data for the period between the years 2010 and 2020, you can fill this parameter with `time = range(2010,2021)`.\n",
    "\n",
    " * `columns`: This parameter will be used to rename the columns in the DataFrame returned. By default, the columns of the indicators will be named using the `ind` name. To rename the columns, you should provide a list of strings with the same length of the list int the parameter `ind`. Also, observe that the columns will be renamed respecting the order of the list. So, the first value in `columns` will be used as the new name of the first value in `ind`. \n",
    "\n",
    "For example, we can get the data for the two first indicators that we obtained in the last section for the countries Brazil and Switzerland. In this case `ind = ['EN.ATM.PM25.MC.M3', 'EN.ATM.PM25.MC.ZS']`, `country = ['BRA', 'CHE']`, `db = 2` (The indicators refered in `ind` are available in the dabatase refered by the number 2).\n",
    "\n",
    "Using these parameters the result will be: \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "e7ce0305",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/eduardoaraujo/mambaforge/envs/swiss_covid19/lib/python3.10/site-packages/wbgapi/data.py:327: FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.\n",
      "  dummy = pd.Series()    # empty series - never assigned actual values\n",
      "/Users/eduardoaraujo/Documents/GitHub/epigraphhub_py/epigraphhub/data/worldbank.py:241: FutureWarning: The default value of regex will change from True to False in a future version. In addition, single character regular expressions will *not* be treated as literal strings when regex=True.\n",
      "  df.columns = df.columns.str.lower().str.replace(\".\", \"_\")\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>en_atm_pm25_mc_m3</th>\n",
       "      <th>en_atm_pm25_mc_zs</th>\n",
       "      <th>frequency</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>date</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2010-01-01</th>\n",
       "      <td>CHE</td>\n",
       "      <td>12.922220</td>\n",
       "      <td>93.000705</td>\n",
       "      <td>yearly</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2010-01-01</th>\n",
       "      <td>BRA</td>\n",
       "      <td>15.955285</td>\n",
       "      <td>90.938123</td>\n",
       "      <td>yearly</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2011-01-01</th>\n",
       "      <td>BRA</td>\n",
       "      <td>15.912798</td>\n",
       "      <td>91.928375</td>\n",
       "      <td>yearly</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2011-01-01</th>\n",
       "      <td>CHE</td>\n",
       "      <td>13.049221</td>\n",
       "      <td>94.785235</td>\n",
       "      <td>yearly</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-01</th>\n",
       "      <td>CHE</td>\n",
       "      <td>12.261388</td>\n",
       "      <td>91.820914</td>\n",
       "      <td>yearly</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           country  en_atm_pm25_mc_m3  en_atm_pm25_mc_zs frequency\n",
       "date                                                              \n",
       "2010-01-01     CHE          12.922220          93.000705    yearly\n",
       "2010-01-01     BRA          15.955285          90.938123    yearly\n",
       "2011-01-01     BRA          15.912798          91.928375    yearly\n",
       "2011-01-01     CHE          13.049221          94.785235    yearly\n",
       "2012-01-01     CHE          12.261388          91.820914    yearly"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from epigraphhub.data.worldbank import get_worldbank_data\n",
    "\n",
    "ind = ['EN.ATM.PM25.MC.M3', 'EN.ATM.PM25.MC.ZS']\n",
    "country = ['BRA', 'CHE']\n",
    "\n",
    "df = get_worldbank_data(ind, country, db= 2, time = range(2010, 2021))\n",
    "\n",
    "df = df.sort_index()\n",
    "\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5623c1f7",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "By default, the function will transform all the upper cases in the column's name to lower case and replace '.' with '_'. If you would like to rename the columns with the names 'air_1' and 'air_2', for example,  just add the parameter `columns = ['air_1', 'air_2'] `, and the result will be: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "41a82f54",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/eduardoaraujo/mambaforge/envs/swiss_covid19/lib/python3.10/site-packages/wbgapi/data.py:327: FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.\n",
      "  dummy = pd.Series()    # empty series - never assigned actual values\n",
      "/Users/eduardoaraujo/Documents/GitHub/epigraphhub_py/epigraphhub/data/worldbank.py:232: FutureWarning: The default value of regex will change from True to False in a future version. In addition, single character regular expressions will *not* be treated as literal strings when regex=True.\n",
      "  df.columns = df.columns.str.lower().str.replace(\".\", \"_\")\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>air_1</th>\n",
       "      <th>air_2</th>\n",
       "      <th>frequency</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>date</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2010-01-01</th>\n",
       "      <td>CHE</td>\n",
       "      <td>12.922220</td>\n",
       "      <td>93.000705</td>\n",
       "      <td>yearly</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2010-01-01</th>\n",
       "      <td>BRA</td>\n",
       "      <td>15.955285</td>\n",
       "      <td>90.938123</td>\n",
       "      <td>yearly</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2011-01-01</th>\n",
       "      <td>BRA</td>\n",
       "      <td>15.912798</td>\n",
       "      <td>91.928375</td>\n",
       "      <td>yearly</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2011-01-01</th>\n",
       "      <td>CHE</td>\n",
       "      <td>13.049221</td>\n",
       "      <td>94.785235</td>\n",
       "      <td>yearly</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-01-01</th>\n",
       "      <td>CHE</td>\n",
       "      <td>12.261388</td>\n",
       "      <td>91.820914</td>\n",
       "      <td>yearly</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           country      air_1      air_2 frequency\n",
       "date                                              \n",
       "2010-01-01     CHE  12.922220  93.000705    yearly\n",
       "2010-01-01     BRA  15.955285  90.938123    yearly\n",
       "2011-01-01     BRA  15.912798  91.928375    yearly\n",
       "2011-01-01     CHE  13.049221  94.785235    yearly\n",
       "2012-01-01     CHE  12.261388  91.820914    yearly"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from epigraphhub.data.worldbank import get_worldbank_data\n",
    "\n",
    "ind = ['EN.ATM.PM25.MC.M3', 'EN.ATM.PM25.MC.ZS']\n",
    "country = ['BRA', 'CHE']\n",
    "\n",
    "df = get_worldbank_data(ind, country, db= 2, time = range(2010, 2021), columns = ['air_1', 'air_2'])\n",
    "\n",
    "df = df.sort_index()\n",
    "\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "86a41235",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Function `get_pop_data()`\n",
    "\n",
    "This function get the population data, stratified by age and sex, from the database with the id number equal to two. This database is called **World Development Indicators**. This function has three parameters: \n",
    "\n",
    "\n",
    "* `country`: It must be filled with a string with the ISO-CODE of the country which you want to get the data from. \n",
    "\n",
    "* `time`: If filled `time = 'all'`, the function will return all the data available. You can also specify a range of years. For example, if you want to get the data for the period between the years 2010 and 2020, you can fill this parameter with `time = range(2010,2021)`.\n",
    "\n",
    "* `fx_et`: This parameter selects the stratification type in the population data. There are three different possibilities: \n",
    "\n",
    "- If `fx_et == '5Y'`, it will be returned the population by 5-year age groups.\n",
    "\n",
    "- If `fx_et == 'IN'`, it will be return the population divided in 3 age groups.\n",
    "\n",
    "- If `fx_et == 'TOTL'`, it will be returned the total population without considering the age groups.\n",
    "\n",
    "The return of the function is a pandas DataFrame. \n",
    "\n",
    "In the cell below, you can see an example of how to get the population data divided into three age groups in Switzerland. \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "50d1348c",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/eduardoaraujo/mambaforge/envs/swiss_covid19/lib/python3.10/site-packages/wbgapi/data.py:327: FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.\n",
      "  dummy = pd.Series()    # empty series - never assigned actual values\n",
      "/Users/eduardoaraujo/Documents/GitHub/epigraphhub_py/epigraphhub/data/worldbank.py:89: FutureWarning: The default value of regex will change from True to False in a future version. In addition, single character regular expressions will *not* be treated as literal strings when regex=True.\n",
      "  df.columns = ((df.columns.str.lower()).str.replace(\".\", \"_\")).str[3:-3]\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>pop_0014_fe</th>\n",
       "      <th>pop_0014_ma</th>\n",
       "      <th>pop_1564_fe</th>\n",
       "      <th>pop_1564_ma</th>\n",
       "      <th>pop_65up_fe</th>\n",
       "      <th>pop_65up_ma</th>\n",
       "      <th>pop_totl_fe</th>\n",
       "      <th>pop_totl_ma</th>\n",
       "      <th>frequency</th>\n",
       "      <th>country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2016-01-01</th>\n",
       "      <td>604573.0</td>\n",
       "      <td>637409.0</td>\n",
       "      <td>2775453.0</td>\n",
       "      <td>2829540.0</td>\n",
       "      <td>846466.0</td>\n",
       "      <td>679897.0</td>\n",
       "      <td>4226492.0</td>\n",
       "      <td>4146846.0</td>\n",
       "      <td>yearly</td>\n",
       "      <td>CHE</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2017-01-01</th>\n",
       "      <td>612125.0</td>\n",
       "      <td>644879.0</td>\n",
       "      <td>2790916.0</td>\n",
       "      <td>2846187.0</td>\n",
       "      <td>860828.0</td>\n",
       "      <td>696905.0</td>\n",
       "      <td>4263869.0</td>\n",
       "      <td>4187971.0</td>\n",
       "      <td>yearly</td>\n",
       "      <td>CHE</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2018-01-01</th>\n",
       "      <td>618492.0</td>\n",
       "      <td>651077.0</td>\n",
       "      <td>2801611.0</td>\n",
       "      <td>2857509.0</td>\n",
       "      <td>873428.0</td>\n",
       "      <td>712212.0</td>\n",
       "      <td>4293531.0</td>\n",
       "      <td>4220798.0</td>\n",
       "      <td>yearly</td>\n",
       "      <td>CHE</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-01-01</th>\n",
       "      <td>624324.0</td>\n",
       "      <td>656882.0</td>\n",
       "      <td>2811358.0</td>\n",
       "      <td>2867308.0</td>\n",
       "      <td>886912.0</td>\n",
       "      <td>728496.0</td>\n",
       "      <td>4322594.0</td>\n",
       "      <td>4252686.0</td>\n",
       "      <td>yearly</td>\n",
       "      <td>CHE</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-01</th>\n",
       "      <td>629609.0</td>\n",
       "      <td>662354.0</td>\n",
       "      <td>2819974.0</td>\n",
       "      <td>2875524.0</td>\n",
       "      <td>902457.0</td>\n",
       "      <td>746978.0</td>\n",
       "      <td>4352040.0</td>\n",
       "      <td>4284856.0</td>\n",
       "      <td>yearly</td>\n",
       "      <td>CHE</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            pop_0014_fe  pop_0014_ma  pop_1564_fe  pop_1564_ma  pop_65up_fe  \\\n",
       "2016-01-01     604573.0     637409.0    2775453.0    2829540.0     846466.0   \n",
       "2017-01-01     612125.0     644879.0    2790916.0    2846187.0     860828.0   \n",
       "2018-01-01     618492.0     651077.0    2801611.0    2857509.0     873428.0   \n",
       "2019-01-01     624324.0     656882.0    2811358.0    2867308.0     886912.0   \n",
       "2020-01-01     629609.0     662354.0    2819974.0    2875524.0     902457.0   \n",
       "\n",
       "            pop_65up_ma  pop_totl_fe  pop_totl_ma frequency country  \n",
       "2016-01-01     679897.0    4226492.0    4146846.0    yearly     CHE  \n",
       "2017-01-01     696905.0    4263869.0    4187971.0    yearly     CHE  \n",
       "2018-01-01     712212.0    4293531.0    4220798.0    yearly     CHE  \n",
       "2019-01-01     728496.0    4322594.0    4252686.0    yearly     CHE  \n",
       "2020-01-01     746978.0    4352040.0    4284856.0    yearly     CHE  "
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from epigraphhub.data.worldbank import get_pop_data\n",
    "\n",
    "country = 'CHE'\n",
    "time = range(2016,2022)\n",
    "df_pop = get_pop_data(country, time , fx_et = 'IN')\n",
    "\n",
    "df_pop"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c4e24f97",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  },
  "vscode": {
   "interpreter": {
    "hash": "f9e3a44f2f7108c4b7beba943bd42895a37b8963dbda2768ecd4cf1430c6d52e"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}