Downloading Google Trends Data (Python version)¶

Google trends data can be quite usefull to understand behavioral differences in response to a public health risk, particulary when it comes to gauging the perception of such risk.

The ggtrends module from Epigraphhub returns the results as convenient pandas DataFrames.`

from epigraphhub.data.ggtrends import (
    historical_interest,
    interest_by_region,
    interest_over_time,
)
import pandas as pd
pd.options.plotting.backend = "plotly"
import warnings
warnings.filterwarnings('ignore')
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
/tmp/ipykernel_309/971444934.py in <module>
----> 1 from epigraphhub.data.ggtrends import (
      2     historical_interest,
      3     interest_by_region,
      4     interest_over_time,
      5 )

ModuleNotFoundError: No module named 'epigraphhub'

Fetch trends by region¶

keywords = ["covid", "vaccine"]
interest_by_region(keywords)
geoCode covid vaccine
geoName
Afghanistan AF 65 35
Albania AL 85 15
Algeria DZ 97 3
American Samoa AS 79 21
Andorra AD 98 2
... ... ... ...
Western Sahara EH 93 7
Yemen YE 62 38
Zambia ZM 74 26
Zimbabwe ZW 84 16
Åland Islands AX 86 14

250 rows × 3 columns

Fetching time-series¶

We can get the interest overtime for a specific geographic location, using the country 2-letter ISO code. In example for Brasil, we also remind you that you have to translate the keywords to the local language.

The interest_over_time function will get you by default one year long series.

Numbers represent search interest relative to the highest point on the chart for the given region and time. A value of 100 is the peak popularity for the term. A value of 50 means that the term is half as popular. A score of 0 means there was not enough data for this term.

iot_BR = interest_over_time(["covid", "vaccine", "vacina"], geo='BR')
iot_BR
iot_BR[["covid", "vaccine", "vacina"]].plot(labels=dict(index="time", value="Popularity", variable="Terms"))

You can also drill down to subregions, and different timeframes. Below we show the same search for the state of Ceará in Brazil, from March 2022 to April 2022.

iot_BR_CE = interest_over_time(["covid", "vaccine", "vacina"], geo='BR-CE', timeframe='2022-03-17 2022-04-17')
iot_BR_CE[["covid", "vaccine", "vacina"]].plot(labels={'index':"time", 'value':"Popularity", 'variable':"Terms"})

With the historical_interest function we can get hourly results on a time window of a maximum of one week

hi = historical_interest(["covid", "vacina"], geo='BR',year_start=2022, month_start=1,
                                day_start=10, hour_start=0, year_end=2022,
                                month_end=1, day_end=17, hour_end=0)#,frequency='hourly')
hi[["covid", "vacina"]].plot()

EpigraphHub Library

Navigation

Contents:

  • Getting started
  • Data Tools
    • Data Collection
    • Extracting data from epigraphhub database (Python version)
    • Downloading Google Trends Data (Python version)
    • Downloading data from WorldPop.org
    • Downloading data from World Bank Data (Python version)
  • Analysis Tools

Related Topics

  • Documentation overview
    • Data Collection and Manipulation
      • Previous: Extracting data from epigraphhub database (Python version)
      • Next: Downloading data from WorldPop.org

Quick search

©2022, Flávio Codeço Coelho. | Powered by Sphinx 5.3.0 & Alabaster 0.7.12 | Page source