Downloading Google Trends Data (Python version)¶
Google trends data can be quite usefull to understand behavioral differences in response to a public health risk, particulary when it comes to gauging the perception of such risk.
The ggtrends
module from Epigraphhub returns the results as convenient pandas DataFrames.`
from epigraphhub.data.ggtrends import (
historical_interest,
interest_by_region,
interest_over_time,
)
import pandas as pd
pd.options.plotting.backend = "plotly"
import warnings
warnings.filterwarnings('ignore')
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
/tmp/ipykernel_309/971444934.py in <module>
----> 1 from epigraphhub.data.ggtrends import (
2 historical_interest,
3 interest_by_region,
4 interest_over_time,
5 )
ModuleNotFoundError: No module named 'epigraphhub'
Fetch trends by region¶
keywords = ["covid", "vaccine"]
interest_by_region(keywords)
geoCode | covid | vaccine | |
---|---|---|---|
geoName | |||
Afghanistan | AF | 65 | 35 |
Albania | AL | 85 | 15 |
Algeria | DZ | 97 | 3 |
American Samoa | AS | 79 | 21 |
Andorra | AD | 98 | 2 |
... | ... | ... | ... |
Western Sahara | EH | 93 | 7 |
Yemen | YE | 62 | 38 |
Zambia | ZM | 74 | 26 |
Zimbabwe | ZW | 84 | 16 |
Åland Islands | AX | 86 | 14 |
250 rows × 3 columns
Fetching time-series¶
We can get the interest overtime for a specific geographic location, using the country 2-letter ISO code. In example for Brasil, we also remind you that you have to translate the keywords to the local language.
The interest_over_time
function will get you by default one year long series.
Numbers represent search interest relative to the highest point on the chart for the given region and time. A value of 100 is the peak popularity for the term. A value of 50 means that the term is half as popular. A score of 0 means there was not enough data for this term.
iot_BR = interest_over_time(["covid", "vaccine", "vacina"], geo='BR')
iot_BR
iot_BR[["covid", "vaccine", "vacina"]].plot(labels=dict(index="time", value="Popularity", variable="Terms"))
You can also drill down to subregions, and different timeframes. Below we show the same search for the state of Ceará in Brazil, from March 2022 to April 2022.
iot_BR_CE = interest_over_time(["covid", "vaccine", "vacina"], geo='BR-CE', timeframe='2022-03-17 2022-04-17')
iot_BR_CE[["covid", "vaccine", "vacina"]].plot(labels={'index':"time", 'value':"Popularity", 'variable':"Terms"})
With the historical_interest
function we can get hourly results on a time window of a maximum of one week
hi = historical_interest(["covid", "vacina"], geo='BR',year_start=2022, month_start=1,
day_start=10, hour_start=0, year_end=2022,
month_end=1, day_end=17, hour_end=0)#,frequency='hourly')
hi[["covid", "vacina"]].plot()