Clusterization of time series (Python version)

This notebook provides some examples of how the functions in the clustering.py module can be used.

import pandas as pd 
from epigraphhub.analysis.clustering import *
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
/tmp/ipykernel_179/3370620054.py in <module>
      1 import pandas as pd
----> 2 from epigraphhub.analysis.clustering import *

ModuleNotFoundError: No module named 'epigraphhub'

In this tutorial we will use the data saved in the path: ./data/data_to_get_clusters.csv. This table represets the number of cases reported in Switzerland by canton.

df = pd.read_csv('./data/data_to_get_clusters.csv')
df.set_index('datum', inplace = True)
df.index = pd.to_datetime(df.index)
df.head()
georegion entries
datum
2021-11-25 NW 72
2021-11-26 NW 73
2021-11-27 NW 42
2021-11-28 NW 46
2021-12-15 NW 47

Function get_lag()

This function computes the lag and correlation between two times series.

In the example below let’s compute the lag and correlation between the series of cases in the cantons of GE and ZH:

lag, corr = get_lag(df.loc[df.georegion == 'GE']['entries'], 
                    df.loc[df.georegion == 'ZH']['entries'])

print('Lag:', lag)
print('Correlation:', corr)
Lag: 2
Correlation: 0.8106405497638762

Function lag_ccf()

This function returns two matrices, one with the lagged values and another with the correlation values.

This function takes as input a data frame where each column represents a time series of a different region. The function computes the lag and correlation between each column in the input data frame.

# First let's transform the data in the right format
inc = df.sort_index().pivot(columns = 'georegion', values = 'entries')

inc
georegion AG AI AR BE BL BS CH CHFL FL FR ... SH SO SZ TG TI UR VD VS ZG ZH
datum
2020-02-24 0 0 0 0 0 0 1 1 0 0 ... 0 0 0 0 1 0 0 0 0 0
2020-02-25 1 0 0 0 0 0 1 1 0 0 ... 0 0 0 0 0 0 0 0 0 0
2020-02-26 0 0 0 0 1 1 10 10 0 0 ... 0 0 0 0 0 0 1 0 0 1
2020-02-27 0 0 0 1 0 0 10 10 0 0 ... 0 0 0 0 3 0 0 0 0 1
2020-02-28 0 0 0 0 2 2 10 10 0 0 ... 0 0 0 0 0 0 0 1 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2022-08-05 245 1 14 438 124 82 2975 2991 16 112 ... 36 109 37 73 148 5 243 108 46 592
2022-08-06 188 2 9 259 90 65 2027 2033 6 76 ... 17 70 18 47 105 0 150 68 19 429
2022-08-07 101 0 0 175 74 45 1344 1345 1 43 ... 13 66 10 29 36 5 113 35 2 368
2022-08-08 275 4 11 299 168 115 2795 2828 33 108 ... 37 138 34 76 119 1 246 119 47 466
2022-08-09 1 0 0 0 0 1 11 11 0 0 ... 0 0 0 1 0 0 2 1 0 3

898 rows × 29 columns

cmat, lags = lag_ccf(inc.values)

cmat
array([[1.        , 0.93219106, 0.97233978, 0.99445709, 0.9933896 ,
        0.98480909, 0.97886981, 0.97912046, 0.98246341, 0.88853248,
        0.82893076, 0.97567614, 0.95930308, 0.87066806, 0.98806147,
        0.87204554, 0.96777624, 0.95259553, 0.98525947, 0.98586852,
        0.99781372, 0.96439329, 0.98758893, 0.92073587, 0.96877853,
        0.88168057, 0.87183881, 0.98570835, 0.98842752],
       [0.93219106, 1.        , 0.96870904, 0.93841094, 0.92882828,
        0.9114159 , 0.92703689, 0.92729665, 0.93836863, 0.88056667,
        0.8263914 , 0.95401096, 0.92381444, 0.87692333, 0.95095366,
        0.85943588, 0.94785507, 0.96175727, 0.96368431, 0.92512183,
        0.92392819, 0.97200266, 0.9564174 , 0.8352603 , 0.94269904,
        0.85998975, 0.86520975, 0.91929927, 0.91427012],
       [0.97233978, 0.96870904, 1.        , 0.97316659, 0.96691297,
        0.95269372, 0.95714361, 0.95746985, 0.97808039, 0.87563468,
        0.81749141, 0.97845912, 0.94140117, 0.86188734, 0.96673358,
        0.85503369, 0.96720425, 0.96778045, 0.99219622, 0.96461279,
        0.96396098, 0.98282589, 0.99205713, 0.87225511, 0.95720783,
        0.86221873, 0.86162402, 0.95874015, 0.95689113],
       [0.99445709, 0.93841094, 0.97316659, 1.        , 0.99263981,
        0.98526499, 0.99008278, 0.99024459, 0.97469619, 0.92136698,
        0.8662935 , 0.98263878, 0.97100516, 0.904567  , 0.98848103,
        0.90663142, 0.9789012 , 0.9640843 , 0.98721301, 0.97431223,
        0.99198994, 0.97681487, 0.98652605, 0.9354266 , 0.97170765,
        0.91359121, 0.90486877, 0.98996368, 0.99153287],
       [0.9933896 , 0.92882828, 0.96691297, 0.99263981, 1.        ,
        0.9950439 , 0.97707072, 0.97729626, 0.97469322, 0.88707821,
        0.82878166, 0.96986404, 0.96194557, 0.87126502, 0.98486508,
        0.87032351, 0.96447814, 0.94752546, 0.97753076, 0.9787014 ,
        0.99013865, 0.96156626, 0.9841536 , 0.92336759, 0.96628388,
        0.8821222 , 0.87203399, 0.98556503, 0.98951323],
       [0.98480909, 0.9114159 , 0.95269372, 0.98526499, 0.9950439 ,
        1.        , 0.97630161, 0.97647737, 0.96360632, 0.8873001 ,
        0.83664112, 0.95816632, 0.96424888, 0.87338035, 0.97326099,
        0.8731083 , 0.94925844, 0.92687181, 0.96598938, 0.9717501 ,
        0.97959542, 0.94868959, 0.97530763, 0.93853343, 0.95133029,
        0.88918537, 0.87275073, 0.98358051, 0.99174382],
       [0.97886981, 0.92703689, 0.95714361, 0.99008278, 0.97707072,
        0.97630161, 1.        , 0.99999886, 0.94960264, 0.95759429,
        0.91803729, 0.9782752 , 0.97723156, 0.94649938, 0.97633192,
        0.94606532, 0.97447968, 0.96095139, 0.97788007, 0.94961681,
        0.97458384, 0.97531835, 0.97307136, 0.96436928, 0.96445036,
        0.95664699, 0.94420805, 0.99013992, 0.9911085 ],
       [0.97912046, 0.92729665, 0.95746985, 0.99024459, 0.97729626,
        0.97647737, 0.99999886, 1.        , 0.95006805, 0.95726108,
        0.91755377, 0.97838375, 0.97728014, 0.94614759, 0.97652338,
        0.94565904, 0.97456665, 0.96103845, 0.9781267 , 0.95001734,
        0.97481775, 0.97546302, 0.97335546, 0.96420479, 0.96458517,
        0.95626316, 0.94382268, 0.99021893, 0.99122109],
       [0.98246341, 0.93836863, 0.97808039, 0.97469619, 0.97469322,
        0.96360632, 0.94960264, 0.95006805, 1.        , 0.84618378,
        0.77983752, 0.9546705 , 0.94044332, 0.82951599, 0.96676492,
        0.82062607, 0.94605571, 0.93373766, 0.98062641, 0.98574573,
        0.97508883, 0.95804964, 0.98342411, 0.88103643, 0.94512802,
        0.83242453, 0.83050314, 0.95697902, 0.96404144],
       [0.88853248, 0.88056667, 0.87563468, 0.92136698, 0.88707821,
        0.8873001 , 0.95759429, 0.95726108, 0.84618378, 1.        ,
        0.98054024, 0.92427362, 0.92578435, 0.98372571, 0.9085216 ,
        0.99298982, 0.92748016, 0.92073423, 0.90740451, 0.83260199,
        0.88693022, 0.92494564, 0.88389342, 0.94063121, 0.90814707,
        0.99271195, 0.99352412, 0.92559938, 0.91629   ],
       [0.82893076, 0.8263914 , 0.81749141, 0.8662935 , 0.82878166,
        0.83664112, 0.91803729, 0.91755377, 0.77983752, 0.98054024,
        1.        , 0.87397851, 0.8787494 , 0.9709824 , 0.8489258 ,
        0.98504541, 0.87991416, 0.87744091, 0.85178743, 0.76439118,
        0.82734018, 0.87539203, 0.82548046, 0.92335007, 0.85603806,
        0.9904317 , 0.98410589, 0.88170748, 0.87235252],
       [0.97567614, 0.95401096, 0.97845912, 0.98263878, 0.96986404,
        0.95816632, 0.9782752 , 0.97838375, 0.9546705 , 0.92427362,
        0.87397851, 1.        , 0.95414095, 0.90318587, 0.9797663 ,
        0.91379184, 0.9838573 , 0.97897218, 0.98823846, 0.94496673,
        0.97340999, 0.98558647, 0.98354739, 0.90742596, 0.97426975,
        0.9147592 , 0.90754604, 0.97577107, 0.96836947],
       [0.95930308, 0.92381444, 0.94140117, 0.97100516, 0.96194557,
        0.96424888, 0.97723156, 0.97728014, 0.94044332, 0.92578435,
        0.8787494 , 0.95414095, 1.        , 0.93479564, 0.95855053,
        0.90174563, 0.94612603, 0.93247287, 0.95675029, 0.94963123,
        0.95159186, 0.95653621, 0.95925313, 0.96235316, 0.94279815,
        0.92474124, 0.92323357, 0.96683507, 0.97099303],
       [0.87066806, 0.87692333, 0.86188734, 0.904567  , 0.87126502,
        0.87338035, 0.94649938, 0.94614759, 0.82951599, 0.98372571,
        0.9709824 , 0.90318587, 0.93479564, 1.        , 0.88929196,
        0.97194303, 0.90677544, 0.90557377, 0.89020114, 0.82404108,
        0.86454214, 0.91606551, 0.87030115, 0.94357762, 0.89305628,
        0.98351431, 0.98666352, 0.91304345, 0.90673858],
       [0.98806147, 0.95095366, 0.96673358, 0.98848103, 0.98486508,
        0.97326099, 0.97633192, 0.97652338, 0.96676492, 0.9085216 ,
        0.8489258 , 0.9797663 , 0.95855053, 0.88929196, 1.        ,
        0.89161366, 0.98107538, 0.96859398, 0.9820885 , 0.96373956,
        0.98819614, 0.97446734, 0.9790546 , 0.91050226, 0.98264953,
        0.89470035, 0.89303309, 0.98106632, 0.97562452],
       [0.87204554, 0.85943588, 0.85503369, 0.90663142, 0.87032351,
        0.8731083 , 0.94606532, 0.94565904, 0.82062607, 0.99298982,
        0.98504541, 0.91379184, 0.90174563, 0.97194303, 0.89161366,
        1.        , 0.91912466, 0.91050262, 0.88886721, 0.80555883,
        0.87272569, 0.90990927, 0.86437748, 0.93142589, 0.89438954,
        0.99220001, 0.98607043, 0.9155098 , 0.90462989],
       [0.96777624, 0.94785507, 0.96720425, 0.9789012 , 0.96447814,
        0.94925844, 0.97447968, 0.97456665, 0.94605571, 0.92748016,
        0.87991416, 0.9838573 , 0.94612603, 0.90677544, 0.98107538,
        0.91912466, 1.        , 0.987473  , 0.97937752, 0.9332251 ,
        0.96961761, 0.98582447, 0.96933118, 0.89880479, 0.98170436,
        0.91650468, 0.90564464, 0.97536318, 0.96073804],
       [0.95259553, 0.96175727, 0.96778045, 0.9640843 , 0.94752546,
        0.92687181, 0.96095139, 0.96103845, 0.93373766, 0.92073423,
        0.87744091, 0.97897218, 0.93247287, 0.90557377, 0.96859398,
        0.91050262, 0.987473  , 1.        , 0.97350884, 0.9208362 ,
        0.95242532, 0.98444268, 0.96386909, 0.87406361, 0.97380401,
        0.90972002, 0.89915327, 0.95654805, 0.94200538],
       [0.98525947, 0.96368431, 0.99219622, 0.98721301, 0.97753076,
        0.96598938, 0.97788007, 0.9781267 , 0.98062641, 0.90740451,
        0.85178743, 0.98823846, 0.95675029, 0.89020114, 0.9820885 ,
        0.88886721, 0.97937752, 0.97350884, 1.        , 0.96825925,
        0.97950217, 0.98907138, 0.99360596, 0.9055526 , 0.973181  ,
        0.89549469, 0.8895485 , 0.97786776, 0.97486198],
       [0.98586852, 0.92512183, 0.96461279, 0.97431223, 0.9787014 ,
        0.9717501 , 0.94961681, 0.95001734, 0.98574573, 0.83260199,
        0.76439118, 0.94496673, 0.94963123, 0.82404108, 0.96373956,
        0.80555883, 0.9332251 , 0.9208362 , 0.96825925, 1.        ,
        0.97998051, 0.93932695, 0.97809006, 0.89846557, 0.93715992,
        0.82633649, 0.81566456, 0.95858297, 0.97030381],
       [0.99781372, 0.92392819, 0.96396098, 0.99198994, 0.99013865,
        0.97959542, 0.97458384, 0.97481775, 0.97508883, 0.88693022,
        0.82734018, 0.97340999, 0.95159186, 0.86454214, 0.98819614,
        0.87272569, 0.96961761, 0.95242532, 0.97950217, 0.97998051,
        1.        , 0.95839736, 0.98039334, 0.91287639, 0.97001636,
        0.87858991, 0.86948612, 0.98412129, 0.98323766],
       [0.96439329, 0.97200266, 0.98282589, 0.97681487, 0.96156626,
        0.94868959, 0.97531835, 0.97546302, 0.95804964, 0.92494564,
        0.87539203, 0.98558647, 0.95653621, 0.91606551, 0.97446734,
        0.90990927, 0.98582447, 0.98444268, 0.98907138, 0.93932695,
        0.95839736, 1.        , 0.98054915, 0.90060591, 0.97282894,
        0.91421667, 0.90688464, 0.97009081, 0.96208466],
       [0.98758893, 0.9564174 , 0.99205713, 0.98652605, 0.9841536 ,
        0.97530763, 0.97307136, 0.97335546, 0.98342411, 0.88389342,
        0.82548046, 0.98354739, 0.95925313, 0.87030115, 0.9790546 ,
        0.86437748, 0.96933118, 0.96386909, 0.99360596, 0.97809006,
        0.98039334, 0.98054915, 1.        , 0.90223858, 0.96301313,
        0.87688397, 0.86584692, 0.97656702, 0.97814585],
       [0.92073587, 0.8352603 , 0.87225511, 0.9354266 , 0.92336759,
        0.93853343, 0.96436928, 0.96420479, 0.88103643, 0.94063121,
        0.92335007, 0.90742596, 0.96235316, 0.94357762, 0.91050226,
        0.93142589, 0.89880479, 0.87406361, 0.9055526 , 0.89846557,
        0.91287639, 0.90060591, 0.90223858, 1.        , 0.89022818,
        0.95414953, 0.94190943, 0.94143665, 0.9566929 ],
       [0.96877853, 0.94269904, 0.95720783, 0.97170765, 0.96628388,
        0.95133029, 0.96445036, 0.96458517, 0.94512802, 0.90814707,
        0.85603806, 0.97426975, 0.94279815, 0.89305628, 0.98264953,
        0.89438954, 0.98170436, 0.97380401, 0.973181  , 0.93715992,
        0.97001636, 0.97282894, 0.96301313, 0.89022818, 1.        ,
        0.89543124, 0.89378495, 0.97334639, 0.95577015],
       [0.88168057, 0.85998975, 0.86221873, 0.91359121, 0.8821222 ,
        0.88918537, 0.95664699, 0.95626316, 0.83242453, 0.99271195,
        0.9904317 , 0.9147592 , 0.92474124, 0.98351431, 0.89470035,
        0.99220001, 0.91650468, 0.90972002, 0.89549469, 0.82633649,
        0.87858991, 0.91421667, 0.87688397, 0.95414953, 0.89543124,
        1.        , 0.99019747, 0.92526134, 0.92008886],
       [0.87183881, 0.86520975, 0.86162402, 0.90486877, 0.87203399,
        0.87275073, 0.94420805, 0.94382268, 0.83050314, 0.99352412,
        0.98410589, 0.90754604, 0.92323357, 0.98666352, 0.89303309,
        0.98607043, 0.90564464, 0.89915327, 0.8895485 , 0.81566456,
        0.86948612, 0.90688464, 0.86584692, 0.94190943, 0.89378495,
        0.99019747, 1.        , 0.91085973, 0.9010808 ],
       [0.98570835, 0.91929927, 0.95874015, 0.98996368, 0.98556503,
        0.98358051, 0.99013992, 0.99021893, 0.95697902, 0.92559938,
        0.88170748, 0.97577107, 0.96683507, 0.91304345, 0.98106632,
        0.9155098 , 0.97536318, 0.95654805, 0.97786776, 0.95858297,
        0.98412129, 0.97009081, 0.97656702, 0.94143665, 0.97334639,
        0.92526134, 0.91085973, 1.        , 0.99092239],
       [0.98842752, 0.91427012, 0.95689113, 0.99153287, 0.98951323,
        0.99174382, 0.9911085 , 0.99122109, 0.96404144, 0.91629   ,
        0.87235252, 0.96836947, 0.97099303, 0.90673858, 0.97562452,
        0.90462989, 0.96073804, 0.94200538, 0.97486198, 0.97030381,
        0.98323766, 0.96208466, 0.97814585, 0.9566929 , 0.95577015,
        0.92008886, 0.9010808 , 0.99092239, 1.        ]])

Function plot_matrix()

This function takes as input a matrix and some strings to be used in the plot. It plot the results of the lag_ccf() function.

fig = plot_matrix(cmat, inc.columns, title='Correlation matrix', label_scale='Correlation')
fig = plot_matrix(lags, inc.columns, title='Highest correlation lag', label_scale='Lag')

Function plot_xcorr()

This function takes as input a data frame where each column represents a different time series and two strings that represent the name of the columns in the data frame that we want to compute the correlation.

fig = plot_xcorr(inc, X='GE', Y='ZH', ini_date='2021-01-01')

Function compute_clusters()

This function applies hierarchical clusterization given a data frame where each column represents a time series in a specific region.

inc, clusters, all_regions, fig = compute_clusters(
    df, columns = ['georegion', 'entries'],
    t=0.8,
    drop_values = ['CH', 'CHFL'],
    smooth = True,
    ini_date = '2020-05-01',
    plot = True)
../_images/29432115ed052f3b1bac63dee129a3a7cbe84487d6847d4899407f2161c77d8a.png