Clusterization of time series (Python version)¶
This notebook provides some examples of how the functions in the clustering.py
module can be used.
import pandas as pd
from epigraphhub.analysis.clustering import *
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
/tmp/ipykernel_179/3370620054.py in <module>
1 import pandas as pd
----> 2 from epigraphhub.analysis.clustering import *
ModuleNotFoundError: No module named 'epigraphhub'
In this tutorial we will use the data saved in the path: ./data/data_to_get_clusters.csv
. This table represets the number of cases reported in Switzerland by canton.
df = pd.read_csv('./data/data_to_get_clusters.csv')
df.set_index('datum', inplace = True)
df.index = pd.to_datetime(df.index)
df.head()
georegion | entries | |
---|---|---|
datum | ||
2021-11-25 | NW | 72 |
2021-11-26 | NW | 73 |
2021-11-27 | NW | 42 |
2021-11-28 | NW | 46 |
2021-12-15 | NW | 47 |
Function get_lag()
¶
This function computes the lag and correlation between two times series.
In the example below let’s compute the lag and correlation between the series of cases in the cantons of GE and ZH:
lag, corr = get_lag(df.loc[df.georegion == 'GE']['entries'],
df.loc[df.georegion == 'ZH']['entries'])
print('Lag:', lag)
print('Correlation:', corr)
Lag: 2
Correlation: 0.8106405497638762
Function lag_ccf()
¶
This function returns two matrices, one with the lagged values and another with the correlation values.
This function takes as input a data frame where each column represents a time series of a different region. The function computes the lag and correlation between each column in the input data frame.
# First let's transform the data in the right format
inc = df.sort_index().pivot(columns = 'georegion', values = 'entries')
inc
georegion | AG | AI | AR | BE | BL | BS | CH | CHFL | FL | FR | ... | SH | SO | SZ | TG | TI | UR | VD | VS | ZG | ZH |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
datum | |||||||||||||||||||||
2020-02-24 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
2020-02-25 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2020-02-26 | 0 | 0 | 0 | 0 | 1 | 1 | 10 | 10 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
2020-02-27 | 0 | 0 | 0 | 1 | 0 | 0 | 10 | 10 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 1 |
2020-02-28 | 0 | 0 | 0 | 0 | 2 | 2 | 10 | 10 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2022-08-05 | 245 | 1 | 14 | 438 | 124 | 82 | 2975 | 2991 | 16 | 112 | ... | 36 | 109 | 37 | 73 | 148 | 5 | 243 | 108 | 46 | 592 |
2022-08-06 | 188 | 2 | 9 | 259 | 90 | 65 | 2027 | 2033 | 6 | 76 | ... | 17 | 70 | 18 | 47 | 105 | 0 | 150 | 68 | 19 | 429 |
2022-08-07 | 101 | 0 | 0 | 175 | 74 | 45 | 1344 | 1345 | 1 | 43 | ... | 13 | 66 | 10 | 29 | 36 | 5 | 113 | 35 | 2 | 368 |
2022-08-08 | 275 | 4 | 11 | 299 | 168 | 115 | 2795 | 2828 | 33 | 108 | ... | 37 | 138 | 34 | 76 | 119 | 1 | 246 | 119 | 47 | 466 |
2022-08-09 | 1 | 0 | 0 | 0 | 0 | 1 | 11 | 11 | 0 | 0 | ... | 0 | 0 | 0 | 1 | 0 | 0 | 2 | 1 | 0 | 3 |
898 rows × 29 columns
cmat, lags = lag_ccf(inc.values)
cmat
array([[1. , 0.93219106, 0.97233978, 0.99445709, 0.9933896 ,
0.98480909, 0.97886981, 0.97912046, 0.98246341, 0.88853248,
0.82893076, 0.97567614, 0.95930308, 0.87066806, 0.98806147,
0.87204554, 0.96777624, 0.95259553, 0.98525947, 0.98586852,
0.99781372, 0.96439329, 0.98758893, 0.92073587, 0.96877853,
0.88168057, 0.87183881, 0.98570835, 0.98842752],
[0.93219106, 1. , 0.96870904, 0.93841094, 0.92882828,
0.9114159 , 0.92703689, 0.92729665, 0.93836863, 0.88056667,
0.8263914 , 0.95401096, 0.92381444, 0.87692333, 0.95095366,
0.85943588, 0.94785507, 0.96175727, 0.96368431, 0.92512183,
0.92392819, 0.97200266, 0.9564174 , 0.8352603 , 0.94269904,
0.85998975, 0.86520975, 0.91929927, 0.91427012],
[0.97233978, 0.96870904, 1. , 0.97316659, 0.96691297,
0.95269372, 0.95714361, 0.95746985, 0.97808039, 0.87563468,
0.81749141, 0.97845912, 0.94140117, 0.86188734, 0.96673358,
0.85503369, 0.96720425, 0.96778045, 0.99219622, 0.96461279,
0.96396098, 0.98282589, 0.99205713, 0.87225511, 0.95720783,
0.86221873, 0.86162402, 0.95874015, 0.95689113],
[0.99445709, 0.93841094, 0.97316659, 1. , 0.99263981,
0.98526499, 0.99008278, 0.99024459, 0.97469619, 0.92136698,
0.8662935 , 0.98263878, 0.97100516, 0.904567 , 0.98848103,
0.90663142, 0.9789012 , 0.9640843 , 0.98721301, 0.97431223,
0.99198994, 0.97681487, 0.98652605, 0.9354266 , 0.97170765,
0.91359121, 0.90486877, 0.98996368, 0.99153287],
[0.9933896 , 0.92882828, 0.96691297, 0.99263981, 1. ,
0.9950439 , 0.97707072, 0.97729626, 0.97469322, 0.88707821,
0.82878166, 0.96986404, 0.96194557, 0.87126502, 0.98486508,
0.87032351, 0.96447814, 0.94752546, 0.97753076, 0.9787014 ,
0.99013865, 0.96156626, 0.9841536 , 0.92336759, 0.96628388,
0.8821222 , 0.87203399, 0.98556503, 0.98951323],
[0.98480909, 0.9114159 , 0.95269372, 0.98526499, 0.9950439 ,
1. , 0.97630161, 0.97647737, 0.96360632, 0.8873001 ,
0.83664112, 0.95816632, 0.96424888, 0.87338035, 0.97326099,
0.8731083 , 0.94925844, 0.92687181, 0.96598938, 0.9717501 ,
0.97959542, 0.94868959, 0.97530763, 0.93853343, 0.95133029,
0.88918537, 0.87275073, 0.98358051, 0.99174382],
[0.97886981, 0.92703689, 0.95714361, 0.99008278, 0.97707072,
0.97630161, 1. , 0.99999886, 0.94960264, 0.95759429,
0.91803729, 0.9782752 , 0.97723156, 0.94649938, 0.97633192,
0.94606532, 0.97447968, 0.96095139, 0.97788007, 0.94961681,
0.97458384, 0.97531835, 0.97307136, 0.96436928, 0.96445036,
0.95664699, 0.94420805, 0.99013992, 0.9911085 ],
[0.97912046, 0.92729665, 0.95746985, 0.99024459, 0.97729626,
0.97647737, 0.99999886, 1. , 0.95006805, 0.95726108,
0.91755377, 0.97838375, 0.97728014, 0.94614759, 0.97652338,
0.94565904, 0.97456665, 0.96103845, 0.9781267 , 0.95001734,
0.97481775, 0.97546302, 0.97335546, 0.96420479, 0.96458517,
0.95626316, 0.94382268, 0.99021893, 0.99122109],
[0.98246341, 0.93836863, 0.97808039, 0.97469619, 0.97469322,
0.96360632, 0.94960264, 0.95006805, 1. , 0.84618378,
0.77983752, 0.9546705 , 0.94044332, 0.82951599, 0.96676492,
0.82062607, 0.94605571, 0.93373766, 0.98062641, 0.98574573,
0.97508883, 0.95804964, 0.98342411, 0.88103643, 0.94512802,
0.83242453, 0.83050314, 0.95697902, 0.96404144],
[0.88853248, 0.88056667, 0.87563468, 0.92136698, 0.88707821,
0.8873001 , 0.95759429, 0.95726108, 0.84618378, 1. ,
0.98054024, 0.92427362, 0.92578435, 0.98372571, 0.9085216 ,
0.99298982, 0.92748016, 0.92073423, 0.90740451, 0.83260199,
0.88693022, 0.92494564, 0.88389342, 0.94063121, 0.90814707,
0.99271195, 0.99352412, 0.92559938, 0.91629 ],
[0.82893076, 0.8263914 , 0.81749141, 0.8662935 , 0.82878166,
0.83664112, 0.91803729, 0.91755377, 0.77983752, 0.98054024,
1. , 0.87397851, 0.8787494 , 0.9709824 , 0.8489258 ,
0.98504541, 0.87991416, 0.87744091, 0.85178743, 0.76439118,
0.82734018, 0.87539203, 0.82548046, 0.92335007, 0.85603806,
0.9904317 , 0.98410589, 0.88170748, 0.87235252],
[0.97567614, 0.95401096, 0.97845912, 0.98263878, 0.96986404,
0.95816632, 0.9782752 , 0.97838375, 0.9546705 , 0.92427362,
0.87397851, 1. , 0.95414095, 0.90318587, 0.9797663 ,
0.91379184, 0.9838573 , 0.97897218, 0.98823846, 0.94496673,
0.97340999, 0.98558647, 0.98354739, 0.90742596, 0.97426975,
0.9147592 , 0.90754604, 0.97577107, 0.96836947],
[0.95930308, 0.92381444, 0.94140117, 0.97100516, 0.96194557,
0.96424888, 0.97723156, 0.97728014, 0.94044332, 0.92578435,
0.8787494 , 0.95414095, 1. , 0.93479564, 0.95855053,
0.90174563, 0.94612603, 0.93247287, 0.95675029, 0.94963123,
0.95159186, 0.95653621, 0.95925313, 0.96235316, 0.94279815,
0.92474124, 0.92323357, 0.96683507, 0.97099303],
[0.87066806, 0.87692333, 0.86188734, 0.904567 , 0.87126502,
0.87338035, 0.94649938, 0.94614759, 0.82951599, 0.98372571,
0.9709824 , 0.90318587, 0.93479564, 1. , 0.88929196,
0.97194303, 0.90677544, 0.90557377, 0.89020114, 0.82404108,
0.86454214, 0.91606551, 0.87030115, 0.94357762, 0.89305628,
0.98351431, 0.98666352, 0.91304345, 0.90673858],
[0.98806147, 0.95095366, 0.96673358, 0.98848103, 0.98486508,
0.97326099, 0.97633192, 0.97652338, 0.96676492, 0.9085216 ,
0.8489258 , 0.9797663 , 0.95855053, 0.88929196, 1. ,
0.89161366, 0.98107538, 0.96859398, 0.9820885 , 0.96373956,
0.98819614, 0.97446734, 0.9790546 , 0.91050226, 0.98264953,
0.89470035, 0.89303309, 0.98106632, 0.97562452],
[0.87204554, 0.85943588, 0.85503369, 0.90663142, 0.87032351,
0.8731083 , 0.94606532, 0.94565904, 0.82062607, 0.99298982,
0.98504541, 0.91379184, 0.90174563, 0.97194303, 0.89161366,
1. , 0.91912466, 0.91050262, 0.88886721, 0.80555883,
0.87272569, 0.90990927, 0.86437748, 0.93142589, 0.89438954,
0.99220001, 0.98607043, 0.9155098 , 0.90462989],
[0.96777624, 0.94785507, 0.96720425, 0.9789012 , 0.96447814,
0.94925844, 0.97447968, 0.97456665, 0.94605571, 0.92748016,
0.87991416, 0.9838573 , 0.94612603, 0.90677544, 0.98107538,
0.91912466, 1. , 0.987473 , 0.97937752, 0.9332251 ,
0.96961761, 0.98582447, 0.96933118, 0.89880479, 0.98170436,
0.91650468, 0.90564464, 0.97536318, 0.96073804],
[0.95259553, 0.96175727, 0.96778045, 0.9640843 , 0.94752546,
0.92687181, 0.96095139, 0.96103845, 0.93373766, 0.92073423,
0.87744091, 0.97897218, 0.93247287, 0.90557377, 0.96859398,
0.91050262, 0.987473 , 1. , 0.97350884, 0.9208362 ,
0.95242532, 0.98444268, 0.96386909, 0.87406361, 0.97380401,
0.90972002, 0.89915327, 0.95654805, 0.94200538],
[0.98525947, 0.96368431, 0.99219622, 0.98721301, 0.97753076,
0.96598938, 0.97788007, 0.9781267 , 0.98062641, 0.90740451,
0.85178743, 0.98823846, 0.95675029, 0.89020114, 0.9820885 ,
0.88886721, 0.97937752, 0.97350884, 1. , 0.96825925,
0.97950217, 0.98907138, 0.99360596, 0.9055526 , 0.973181 ,
0.89549469, 0.8895485 , 0.97786776, 0.97486198],
[0.98586852, 0.92512183, 0.96461279, 0.97431223, 0.9787014 ,
0.9717501 , 0.94961681, 0.95001734, 0.98574573, 0.83260199,
0.76439118, 0.94496673, 0.94963123, 0.82404108, 0.96373956,
0.80555883, 0.9332251 , 0.9208362 , 0.96825925, 1. ,
0.97998051, 0.93932695, 0.97809006, 0.89846557, 0.93715992,
0.82633649, 0.81566456, 0.95858297, 0.97030381],
[0.99781372, 0.92392819, 0.96396098, 0.99198994, 0.99013865,
0.97959542, 0.97458384, 0.97481775, 0.97508883, 0.88693022,
0.82734018, 0.97340999, 0.95159186, 0.86454214, 0.98819614,
0.87272569, 0.96961761, 0.95242532, 0.97950217, 0.97998051,
1. , 0.95839736, 0.98039334, 0.91287639, 0.97001636,
0.87858991, 0.86948612, 0.98412129, 0.98323766],
[0.96439329, 0.97200266, 0.98282589, 0.97681487, 0.96156626,
0.94868959, 0.97531835, 0.97546302, 0.95804964, 0.92494564,
0.87539203, 0.98558647, 0.95653621, 0.91606551, 0.97446734,
0.90990927, 0.98582447, 0.98444268, 0.98907138, 0.93932695,
0.95839736, 1. , 0.98054915, 0.90060591, 0.97282894,
0.91421667, 0.90688464, 0.97009081, 0.96208466],
[0.98758893, 0.9564174 , 0.99205713, 0.98652605, 0.9841536 ,
0.97530763, 0.97307136, 0.97335546, 0.98342411, 0.88389342,
0.82548046, 0.98354739, 0.95925313, 0.87030115, 0.9790546 ,
0.86437748, 0.96933118, 0.96386909, 0.99360596, 0.97809006,
0.98039334, 0.98054915, 1. , 0.90223858, 0.96301313,
0.87688397, 0.86584692, 0.97656702, 0.97814585],
[0.92073587, 0.8352603 , 0.87225511, 0.9354266 , 0.92336759,
0.93853343, 0.96436928, 0.96420479, 0.88103643, 0.94063121,
0.92335007, 0.90742596, 0.96235316, 0.94357762, 0.91050226,
0.93142589, 0.89880479, 0.87406361, 0.9055526 , 0.89846557,
0.91287639, 0.90060591, 0.90223858, 1. , 0.89022818,
0.95414953, 0.94190943, 0.94143665, 0.9566929 ],
[0.96877853, 0.94269904, 0.95720783, 0.97170765, 0.96628388,
0.95133029, 0.96445036, 0.96458517, 0.94512802, 0.90814707,
0.85603806, 0.97426975, 0.94279815, 0.89305628, 0.98264953,
0.89438954, 0.98170436, 0.97380401, 0.973181 , 0.93715992,
0.97001636, 0.97282894, 0.96301313, 0.89022818, 1. ,
0.89543124, 0.89378495, 0.97334639, 0.95577015],
[0.88168057, 0.85998975, 0.86221873, 0.91359121, 0.8821222 ,
0.88918537, 0.95664699, 0.95626316, 0.83242453, 0.99271195,
0.9904317 , 0.9147592 , 0.92474124, 0.98351431, 0.89470035,
0.99220001, 0.91650468, 0.90972002, 0.89549469, 0.82633649,
0.87858991, 0.91421667, 0.87688397, 0.95414953, 0.89543124,
1. , 0.99019747, 0.92526134, 0.92008886],
[0.87183881, 0.86520975, 0.86162402, 0.90486877, 0.87203399,
0.87275073, 0.94420805, 0.94382268, 0.83050314, 0.99352412,
0.98410589, 0.90754604, 0.92323357, 0.98666352, 0.89303309,
0.98607043, 0.90564464, 0.89915327, 0.8895485 , 0.81566456,
0.86948612, 0.90688464, 0.86584692, 0.94190943, 0.89378495,
0.99019747, 1. , 0.91085973, 0.9010808 ],
[0.98570835, 0.91929927, 0.95874015, 0.98996368, 0.98556503,
0.98358051, 0.99013992, 0.99021893, 0.95697902, 0.92559938,
0.88170748, 0.97577107, 0.96683507, 0.91304345, 0.98106632,
0.9155098 , 0.97536318, 0.95654805, 0.97786776, 0.95858297,
0.98412129, 0.97009081, 0.97656702, 0.94143665, 0.97334639,
0.92526134, 0.91085973, 1. , 0.99092239],
[0.98842752, 0.91427012, 0.95689113, 0.99153287, 0.98951323,
0.99174382, 0.9911085 , 0.99122109, 0.96404144, 0.91629 ,
0.87235252, 0.96836947, 0.97099303, 0.90673858, 0.97562452,
0.90462989, 0.96073804, 0.94200538, 0.97486198, 0.97030381,
0.98323766, 0.96208466, 0.97814585, 0.9566929 , 0.95577015,
0.92008886, 0.9010808 , 0.99092239, 1. ]])
Function plot_matrix()
¶
This function takes as input a matrix and some strings to be used in the plot. It plot the results of the lag_ccf()
function.
fig = plot_matrix(cmat, inc.columns, title='Correlation matrix', label_scale='Correlation')
fig = plot_matrix(lags, inc.columns, title='Highest correlation lag', label_scale='Lag')
Function plot_xcorr()
¶
This function takes as input a data frame where each column represents a different time series and two strings that represent the name of the columns in the data frame that we want to compute the correlation.
fig = plot_xcorr(inc, X='GE', Y='ZH', ini_date='2021-01-01')
Function compute_clusters()
¶
This function applies hierarchical clusterization given a data frame where each column represents a time series in a specific region.
inc, clusters, all_regions, fig = compute_clusters(
df, columns = ['georegion', 'entries'],
t=0.8,
drop_values = ['CH', 'CHFL'],
smooth = True,
ini_date = '2020-05-01',
plot = True)