In article An Introduction to Similarity Analysis Using SAS by Leonard et al., similarity matrices are introduced in these terms :
Similarity measures can be used to compare several
time sequences to form a similarity matrix. This situation usually
arises in time series clustering. For example, given K time
sequences, a (KxK) symmetric matrix can be constructed whose ijth
element contains the similarity measure between the ith and jth
sequence.
That's a neet idea. However, Proc Similarity (in SAS/ETS 9.3) doesn't accept the same series to be listed as an input and a target sequence. What is the best way to get a similarity matrix with Proc Similarity?
PG
Hello -
This example might be useful - additional information can be found here: http://support.sas.com/documentation/cdl/en/etsug/63939/HTML/default/viewer.htm#etsug_similarity_sec...
Thanks,
Udo
data tmp;
set sashelp.snacks;
retain Series 0;
if first.product then series+1;
by product;
run;
proc sort data=tmp out=tmp2;
by date;
run;
proc transpose data=tmp2
OUT=tmp3
PREFIX=C_
NAME=reihe
LABEL=Etikett
;
BY Date;
ID series;
VAR QtySold;
run;
proc similarity data=tmp3 out=_null_ outsum=summary;
id date interval=day accumulate=total;
target _numeric_ /normalize=standard measure=mabsdevmax;
run;
data matrix(type=distance);
set summary;
drop _status_;
run;
proc cluster data=matrix outtree=tree method=average;
id _input_;
run;
proc tree data=tree out=result nclusters=4;
id _input_;
run;
Hello -
This example might be useful - additional information can be found here: http://support.sas.com/documentation/cdl/en/etsug/63939/HTML/default/viewer.htm#etsug_similarity_sec...
Thanks,
Udo
data tmp;
set sashelp.snacks;
retain Series 0;
if first.product then series+1;
by product;
run;
proc sort data=tmp out=tmp2;
by date;
run;
proc transpose data=tmp2
OUT=tmp3
PREFIX=C_
NAME=reihe
LABEL=Etikett
;
BY Date;
ID series;
VAR QtySold;
run;
proc similarity data=tmp3 out=_null_ outsum=summary;
id date interval=day accumulate=total;
target _numeric_ /normalize=standard measure=mabsdevmax;
run;
data matrix(type=distance);
set summary;
drop _status_;
run;
proc cluster data=matrix outtree=tree method=average;
id _input_;
run;
proc tree data=tree out=result nclusters=4;
id _input_;
run;
Thanks Udo. That's very helpful. I didn't realize that statement INPUT was optional and that in its absence, target sequences would also be considered as input sequences. I hadn't read the clustering example since I have another application in mind.
PG
PS: my colleague posted a very interesting blog today on How to color clusters in a dendogram - The DO Loop - which might be of interest.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.