BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
PGStats
Opal | Level 21

In article An Introduction to Similarity Analysis Using SAS by Leonard et al., similarity matrices are introduced in these terms :

Similarity measures can be used to compare several
time sequences to form a similarity matrix. This situation usually
arises in time series clustering. For example, given K time
sequences, a (KxK) symmetric matrix can be constructed whose ijth
element contains the similarity measure between the ith and jth
sequence.


That's a neet idea. However, Proc Similarity (in SAS/ETS 9.3) doesn't accept the same series to be listed as an input and a target sequence. What is the best way to get a similarity matrix with Proc Similarity?

PG

PG
1 ACCEPTED SOLUTION

Accepted Solutions
udo_sas
SAS Employee

Hello -

This example might be useful - additional information can be found here: http://support.sas.com/documentation/cdl/en/etsug/63939/HTML/default/viewer.htm#etsug_similarity_sec...

Thanks,

Udo

data tmp;

set sashelp.snacks;

retain Series 0;

if first.product then series+1;

by product;

run;

proc sort data=tmp out=tmp2;

by date;

run;

proc transpose data=tmp2

OUT=tmp3

PREFIX=C_

NAME=reihe

LABEL=Etikett

;

BY Date;

ID series;

VAR QtySold;

run;

proc similarity data=tmp3 out=_null_ outsum=summary;

      id date interval=day accumulate=total;

      target _numeric_ /normalize=standard measure=mabsdevmax;

run;

data matrix(type=distance);

set summary;

drop _status_;

run;

proc cluster data=matrix outtree=tree method=average;

  id _input_;

run;

proc tree data=tree out=result nclusters=4;

id _input_;

run;

View solution in original post

3 REPLIES 3
udo_sas
SAS Employee

Hello -

This example might be useful - additional information can be found here: http://support.sas.com/documentation/cdl/en/etsug/63939/HTML/default/viewer.htm#etsug_similarity_sec...

Thanks,

Udo

data tmp;

set sashelp.snacks;

retain Series 0;

if first.product then series+1;

by product;

run;

proc sort data=tmp out=tmp2;

by date;

run;

proc transpose data=tmp2

OUT=tmp3

PREFIX=C_

NAME=reihe

LABEL=Etikett

;

BY Date;

ID series;

VAR QtySold;

run;

proc similarity data=tmp3 out=_null_ outsum=summary;

      id date interval=day accumulate=total;

      target _numeric_ /normalize=standard measure=mabsdevmax;

run;

data matrix(type=distance);

set summary;

drop _status_;

run;

proc cluster data=matrix outtree=tree method=average;

  id _input_;

run;

proc tree data=tree out=result nclusters=4;

id _input_;

run;

PGStats
Opal | Level 21

Thanks Udo. That's very helpful. I didn't realize that statement INPUT was optional and that in its absence, target sequences would also be considered as input sequences. I hadn't read the clustering example since I have another application in mind.

PG

PG
udo_sas
SAS Employee

PS: my colleague posted a very interesting blog today on How to color clusters in a dendogram - The DO Loop - which might be of interest.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 2035 views
  • 1 like
  • 2 in conversation