In article An Introduction to Similarity Analysis Using SAS by Leonard et al., similarity matrices are introduced in these terms :
Similarity measures can be used to compare several
time sequences to form a similarity matrix. This situation usually
arises in time series clustering. For example, given K time
sequences, a (KxK) symmetric matrix can be constructed whose ijth
element contains the similarity measure between the ith and jth
sequence.
That's a neet idea. However, Proc Similarity (in SAS/ETS 9.3) doesn't accept the same series to be listed as an input and a target sequence. What is the best way to get a similarity matrix with Proc Similarity?
PG
Hello -
This example might be useful - additional information can be found here: http://support.sas.com/documentation/cdl/en/etsug/63939/HTML/default/viewer.htm#etsug_similarity_sec...
Thanks,
Udo
data tmp;
set sashelp.snacks;
retain Series 0;
if first.product then series+1;
by product;
run;
proc sort data=tmp out=tmp2;
by date;
run;
proc transpose data=tmp2
OUT=tmp3
PREFIX=C_
NAME=reihe
LABEL=Etikett
;
BY Date;
ID series;
VAR QtySold;
run;
proc similarity data=tmp3 out=_null_ outsum=summary;
id date interval=day accumulate=total;
target _numeric_ /normalize=standard measure=mabsdevmax;
run;
data matrix(type=distance);
set summary;
drop _status_;
run;
proc cluster data=matrix outtree=tree method=average;
id _input_;
run;
proc tree data=tree out=result nclusters=4;
id _input_;
run;
Hello -
This example might be useful - additional information can be found here: http://support.sas.com/documentation/cdl/en/etsug/63939/HTML/default/viewer.htm#etsug_similarity_sec...
Thanks,
Udo
data tmp;
set sashelp.snacks;
retain Series 0;
if first.product then series+1;
by product;
run;
proc sort data=tmp out=tmp2;
by date;
run;
proc transpose data=tmp2
OUT=tmp3
PREFIX=C_
NAME=reihe
LABEL=Etikett
;
BY Date;
ID series;
VAR QtySold;
run;
proc similarity data=tmp3 out=_null_ outsum=summary;
id date interval=day accumulate=total;
target _numeric_ /normalize=standard measure=mabsdevmax;
run;
data matrix(type=distance);
set summary;
drop _status_;
run;
proc cluster data=matrix outtree=tree method=average;
id _input_;
run;
proc tree data=tree out=result nclusters=4;
id _input_;
run;
Thanks Udo. That's very helpful. I didn't realize that statement INPUT was optional and that in its absence, target sequences would also be considered as input sequences. I hadn't read the clustering example since I have another application in mind.
PG
PS: my colleague posted a very interesting blog today on How to color clusters in a dendogram - The DO Loop - which might be of interest.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.