topic Re: Clustering different Time series in Statistical Procedures

Clustering different Time series

harmonic — Wed, 19 Jun 2024 13:24:35 GMT

Hello commuity,

I would like to apply clustering to my 20 time series with a three years range time.

For instance comparing the distributions not just graphically.

With this view maybe I can put Distribution 1, 3 and 4 in Cluster A and Distribution 2 in Cluster B.

Is there a statistical method more accurate then this graphical example?

Re: Clustering different Time series

Ksharp — Thu, 20 Jun 2024 02:40:21 GMT

Very interesting question.

You could use R Square of OLS to check if the two time series is similiar.

And @Rick_SAS could have a better idea.

data have;
 set sashelp.stocks;
 keep stock close date;
run;

/*First check the time series by stocks*/
proc sgpanel data=have;
panelby stock/onepanel columns=1;
series x=date y=close;
run;

/*Calculate the RSquare of OLS for checking the difference between two series*/
proc sort data=have;by date stock;run;
proc transpose data=have out=have2(drop=_NAME_);
by date;
var close;
id stock;
run;

proc reg data=have2 noprint rsquare outest=outest;
IBM_Intel:       model IBM=Intel;
IBM_Microsoft:   model IBM=Microsoft;
Intel_Microsoft: model Intel=Microsoft;
quit;
proc print data=outest noobs;run;

You could see Intel and Microsoft have the max RSquare means they are most similiar series.

You also to set a CUTOFF value to cluster these time series.

Re: Clustering different Time series

harmonic — Thu, 20 Jun 2024 08:29:03 GMT

To cluster 20 Time Series should I compare all the series and then how could I choose the number of clusters with this method. Because this is used to calculate the similarity between two series.

Re: Clustering different Time series

Ksharp — Thu, 20 Jun 2024 09:10:27 GMT

If you have lots of variable to compare ,you also could try Pearson Correlation Coefficience by PROC CORR.

Since corr**2=RSquare, that would be a lot of easy for coding.

data have;
 set sashelp.stocks;
 keep stock close date;
run;
proc sort data=have;by date stock;run;
proc transpose data=have out=have2(drop=_NAME_);
by date;
var close;
id stock;
run;

proc corr data=have2 outp=outp noprint;
var IBM Intel Microsoft;
run;

Still Intel and Microsoft have the most similiar.

About how to choose the number of clusters, it is hard to deal with.

You need set a cutoff value to cluster.

E.X.

here if you set corr>0.6 means two series are identity. here Intel and Microsoft is one cluster , IBM is another cluster.

If you have more stocks ,that would be hard to code to get CLUSTER.

E.X.

a b 0.98
b c 0.89
d b 0.2
e d 0.3
e f 0.86  /*<--Changed*/

If you set cutoff=0.8 ,then

a b  1
b c  1
d b 0
e d 0
e f 1

So scan them by eyeball :

a,b,c is one cluster

e, f is another cluster <---Changed

d is another cluster

You could code to make it automatically , but that is another story (Searching a tree problem).

Re: Clustering different Time series

harmonic — Thu, 20 Jun 2024 09:17:00 GMT

I already used proc tsmodel to calculate the ward distance and triangular matric to pass it to the proc cluster and tree, this is the result.
I would like to know if there was a different method because for Rsquared around 0.7 I have 8 clusters, this is maybe because there is no possibility to separate the series with 3 clusters?

Re: Clustering different Time series

Ksharp — Thu, 20 Jun 2024 09:41:53 GMT

The method I demonstrated is different with PROC CLUSTER or PROC FASTCLUS.

Once the cutoff value is settled up , the number of cluster is fixed.

E.X.

a b 0.98
b c 0.89
d b 0.2
e d 0.3
e f 0.86

If you set cutoff=0.9 then

a b 1
b c 0
d b 0
e d 0
e f 0

a,b is one cluster

c is one cluster

d is one cluster

e is one cluster

f is one cluster

it is five cluster unlike three cluster I showed above.

If you really want to decide the number of cluster you could try Primary Component Analysis:

Rick_SAS 's blog here:

But you also need to decide it by yourself.

Deciding the number of cluste is a world/unsolved statistical question.

https://blogs.sas.com/content/iml/2014/11/07/distribution-of-blood-types.html

P.S. If you want to use SAS/ETS to solve this problem ,suggest you to post your question at Forecasting forum:

https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/bd-p/forecasting_econometrics

experts about time seriese analysis would give you constructive advice .

Re: Clustering different Time series

Ksharp — Thu, 20 Jun 2024 09:47:48 GMT

BTW, if you want "separate the series with 3 clusters" ,you could try K-Means Cluster by PROC FASTCLUS + maxclusters=

another way is using KNN method by proc modeclus + r= .