<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Clustering different Time series in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Clustering-different-Time-series/m-p/933096#M46535</link>
    <description>&lt;P&gt;I already used proc tsmodel to calculate the ward distance and triangular matric to pass it to the proc cluster and tree, this is the result.&lt;BR /&gt;I would like to know if there was a different method because for Rsquared around 0.7 I have 8 clusters, this is maybe because there is no possibility to separate the series with 3 clusters?&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="harmonic_0-1718874949047.png" style="width: 752px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/97657iD51A9521A7D9DC7C/image-dimensions/752x714?v=v2" width="752" height="714" role="button" title="harmonic_0-1718874949047.png" alt="harmonic_0-1718874949047.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 20 Jun 2024 09:17:00 GMT</pubDate>
    <dc:creator>harmonic</dc:creator>
    <dc:date>2024-06-20T09:17:00Z</dc:date>
    <item>
      <title>Clustering different Time series</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Clustering-different-Time-series/m-p/932989#M46529</link>
      <description>&lt;P&gt;Hello commuity,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I would like to apply clustering to my 20 time series with a three years range time.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For instance comparing the distributions not just graphically.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="harmonic_0-1718803363834.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/97628i122BD7542435F210/image-size/medium?v=v2&amp;amp;px=400" role="button" title="harmonic_0-1718803363834.png" alt="harmonic_0-1718803363834.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;With this view maybe I can put Distribution 1, 3 and 4 in Cluster A and Distribution 2 in Cluster B.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Is there a statistical method more accurate then this graphical example?&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Jun 2024 13:24:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Clustering-different-Time-series/m-p/932989#M46529</guid>
      <dc:creator>harmonic</dc:creator>
      <dc:date>2024-06-19T13:24:35Z</dc:date>
    </item>
    <item>
      <title>Re: Clustering different Time series</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Clustering-different-Time-series/m-p/933072#M46532</link>
      <description>&lt;P&gt;Very interesting question.&lt;/P&gt;
&lt;P&gt;You could use R Square of OLS&amp;nbsp; to check if the two time series is similiar.&lt;/P&gt;
&lt;P&gt;And &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13684"&gt;@Rick_SAS&lt;/a&gt; could have a better idea.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
 set sashelp.stocks;
 keep stock close date;
run;

/*First check the time series by stocks*/
proc sgpanel data=have;
panelby stock/onepanel columns=1;
series x=date y=close;
run;

/*Calculate the RSquare of OLS for checking the difference between two series*/
proc sort data=have;by date stock;run;
proc transpose data=have out=have2(drop=_NAME_);
by date;
var close;
id stock;
run;

proc reg data=have2 noprint rsquare outest=outest;
IBM_Intel:       model IBM=Intel;
IBM_Microsoft:   model IBM=Microsoft;
Intel_Microsoft: model Intel=Microsoft;
quit;
proc print data=outest noobs;run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Ksharp_0-1718851109206.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/97652iE069AC50786A57E7/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Ksharp_0-1718851109206.png" alt="Ksharp_0-1718851109206.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You could see Intel and Microsoft have the max RSquare means they are most similiar series.&lt;/P&gt;
&lt;P&gt;You also to set a CUTOFF value to cluster these time series.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 20 Jun 2024 02:40:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Clustering-different-Time-series/m-p/933072#M46532</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2024-06-20T02:40:21Z</dc:date>
    </item>
    <item>
      <title>Re: Clustering different Time series</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Clustering-different-Time-series/m-p/933090#M46533</link>
      <description>&lt;P&gt;To cluster 20 Time Series should I compare all the series and then how could I choose the number of clusters with this method. Because this is used to calculate the similarity between two series.&lt;/P&gt;</description>
      <pubDate>Thu, 20 Jun 2024 08:29:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Clustering-different-Time-series/m-p/933090#M46533</guid>
      <dc:creator>harmonic</dc:creator>
      <dc:date>2024-06-20T08:29:03Z</dc:date>
    </item>
    <item>
      <title>Re: Clustering different Time series</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Clustering-different-Time-series/m-p/933094#M46534</link>
      <description>&lt;P&gt;If you have lots of variable to compare ,you also could try Pearson Correlation Coefficience by PROC CORR.&lt;/P&gt;
&lt;P&gt;Since corr**2=RSquare, that would be a lot of easy for coding.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
 set sashelp.stocks;
 keep stock close date;
run;
proc sort data=have;by date stock;run;
proc transpose data=have out=have2(drop=_NAME_);
by date;
var close;
id stock;
run;

proc corr data=have2 outp=outp noprint;
var IBM Intel Microsoft;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Ksharp_0-1718873712535.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/97656i4D9BCFCB6A1F5978/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Ksharp_0-1718873712535.png" alt="Ksharp_0-1718873712535.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;Still Intel and Microsoft have the most similiar.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;About how to&amp;nbsp;&lt;SPAN&gt;choose the number of clusters, it is hard to deal with.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;You need set a cutoff value to cluster.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;E.X.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;here if you set corr&amp;gt;0.6 means two series are identity. here Intel and Microsoft is one cluster , IBM is another cluster.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;If you have more stocks ,that would be hard to code to get CLUSTER.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;E.X.&lt;/SPAN&gt;&lt;/P&gt;
&lt;PRE&gt;a b&amp;nbsp;0.98
b c 0.89
d b 0.2
e d 0.3
e f 0.86  /*&amp;lt;--Changed*/&lt;/PRE&gt;
&lt;P&gt;&lt;SPAN&gt;If you set cutoff=0.8 ,then&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;PRE&gt;a b&amp;nbsp; 1
b c&amp;nbsp; 1
d b 0
e d 0
e f 1&lt;/PRE&gt;
&lt;P&gt;&lt;SPAN&gt;So scan them by eyeball :&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;a,b,c is one cluster&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;e, f&amp;nbsp; &amp;nbsp;is another cluster&amp;nbsp; &amp;nbsp;&amp;lt;---Changed&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;d&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;is another cluster&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;You could code to make it automatically , but that is another story (Searching a tree problem).&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 20 Jun 2024 09:10:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Clustering-different-Time-series/m-p/933094#M46534</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2024-06-20T09:10:27Z</dc:date>
    </item>
    <item>
      <title>Re: Clustering different Time series</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Clustering-different-Time-series/m-p/933096#M46535</link>
      <description>&lt;P&gt;I already used proc tsmodel to calculate the ward distance and triangular matric to pass it to the proc cluster and tree, this is the result.&lt;BR /&gt;I would like to know if there was a different method because for Rsquared around 0.7 I have 8 clusters, this is maybe because there is no possibility to separate the series with 3 clusters?&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="harmonic_0-1718874949047.png" style="width: 752px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/97657iD51A9521A7D9DC7C/image-dimensions/752x714?v=v2" width="752" height="714" role="button" title="harmonic_0-1718874949047.png" alt="harmonic_0-1718874949047.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 20 Jun 2024 09:17:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Clustering-different-Time-series/m-p/933096#M46535</guid>
      <dc:creator>harmonic</dc:creator>
      <dc:date>2024-06-20T09:17:00Z</dc:date>
    </item>
    <item>
      <title>Re: Clustering different Time series</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Clustering-different-Time-series/m-p/933102#M46536</link>
      <description>&lt;P&gt;The method I demonstrated is different with PROC CLUSTER or PROC FASTCLUS.&lt;/P&gt;
&lt;P&gt;Once the cutoff value is settled up , the number of cluster is fixed.&lt;/P&gt;
&lt;P&gt;E.X.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;a b 0.98
b c 0.89
d b 0.2
e d 0.3
e f 0.86  &lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;If you set cutoff=0.9 then&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;a b 1
b c 0
d b 0
e d 0
e f 0&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;a,b is one cluster&lt;/P&gt;
&lt;P&gt;c is one cluster&lt;/P&gt;
&lt;P&gt;d is one cluster&lt;/P&gt;
&lt;P&gt;e is one cluster&lt;/P&gt;
&lt;P&gt;f is one cluster&lt;/P&gt;
&lt;P&gt;it is five cluster unlike three cluster I showed above.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you really want to decide the number of cluster you could try Primary Component Analysis:&lt;/P&gt;
&lt;P&gt;Rick_SAS 's blog here:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But you also need to decide it by yourself.&lt;/P&gt;
&lt;P&gt;Deciding&amp;nbsp;the number of cluste is a world/&lt;FONT color="#FF0000"&gt;&lt;STRONG&gt;unsolved&lt;/STRONG&gt; &lt;/FONT&gt;statistical question.&lt;/P&gt;
&lt;P&gt;&lt;A href="https://blogs.sas.com/content/iml/2014/11/07/distribution-of-blood-types.html" target="_blank"&gt;https://blogs.sas.com/content/iml/2014/11/07/distribution-of-blood-types.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;P.S. If you want to use SAS/ETS to solve this problem ,suggest you to post your question at Forecasting forum:&lt;/P&gt;
&lt;P&gt;&lt;A href="https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/bd-p/forecasting_econometrics" target="_blank"&gt;https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/bd-p/forecasting_econometrics&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;experts about time seriese analysis would give you constructive advice .&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 20 Jun 2024 09:41:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Clustering-different-Time-series/m-p/933102#M46536</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2024-06-20T09:41:53Z</dc:date>
    </item>
    <item>
      <title>Re: Clustering different Time series</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Clustering-different-Time-series/m-p/933103#M46537</link>
      <description>BTW, if you want "separate the series with 3 clusters" ,you could try K-Means Cluster by PROC FASTCLUS + maxclusters=&lt;BR /&gt;&lt;BR /&gt;another way is using KNN method by proc modeclus + r= .</description>
      <pubDate>Thu, 20 Jun 2024 09:47:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Clustering-different-Time-series/m-p/933103#M46537</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2024-06-20T09:47:48Z</dc:date>
    </item>
  </channel>
</rss>

