In article An Introduction to Similarity Analysis Using SAS by Leonard et al., similarity matrices are introduced in these terms :
Similarity measures can be used to compare several
time sequences to form a similarity matrix. This situation usually
arises in time series clustering. For example, given K time
sequences, a (KxK) symmetric matrix can be constructed whose ijth
element contains the similarity measure between the ith and jth
sequence.
That's a neet idea. However, Proc Similarity (in SAS/ETS 9.3) doesn't accept the same series to be listed as an input and a target sequence. What is the best way to get a similarity matrix with Proc Similarity?
PG
Hello -
This example might be useful - additional information can be found here: http://support.sas.com/documentation/cdl/en/etsug/63939/HTML/default/viewer.htm#etsug_similarity_sec...
Thanks,
Udo
data tmp;
set sashelp.snacks;
retain Series 0;
if first.product then series+1;
by product;
run;
proc sort data=tmp out=tmp2;
by date;
run;
proc transpose data=tmp2
OUT=tmp3
PREFIX=C_
NAME=reihe
LABEL=Etikett
;
BY Date;
ID series;
VAR QtySold;
run;
proc similarity data=tmp3 out=_null_ outsum=summary;
id date interval=day accumulate=total;
target _numeric_ /normalize=standard measure=mabsdevmax;
run;
data matrix(type=distance);
set summary;
drop _status_;
run;
proc cluster data=matrix outtree=tree method=average;
id _input_;
run;
proc tree data=tree out=result nclusters=4;
id _input_;
run;
Hello -
This example might be useful - additional information can be found here: http://support.sas.com/documentation/cdl/en/etsug/63939/HTML/default/viewer.htm#etsug_similarity_sec...
Thanks,
Udo
data tmp;
set sashelp.snacks;
retain Series 0;
if first.product then series+1;
by product;
run;
proc sort data=tmp out=tmp2;
by date;
run;
proc transpose data=tmp2
OUT=tmp3
PREFIX=C_
NAME=reihe
LABEL=Etikett
;
BY Date;
ID series;
VAR QtySold;
run;
proc similarity data=tmp3 out=_null_ outsum=summary;
id date interval=day accumulate=total;
target _numeric_ /normalize=standard measure=mabsdevmax;
run;
data matrix(type=distance);
set summary;
drop _status_;
run;
proc cluster data=matrix outtree=tree method=average;
id _input_;
run;
proc tree data=tree out=result nclusters=4;
id _input_;
run;
Thanks Udo. That's very helpful. I didn't realize that statement INPUT was optional and that in its absence, target sequences would also be considered as input sequences. I hadn't read the clustering example since I have another application in mind.
PG
PS: my colleague posted a very interesting blog today on How to color clusters in a dendogram - The DO Loop - which might be of interest.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!