Solved: How to get a Similarity Matrix

PGStats · Posted 06-25-2013 12:55 PM

In article An Introduction to Similarity Analysis Using SAS by Leonard et al., similarity matrices are introduced in these terms :

Similarity measures can be used to compare several
time sequences to form a similarity matrix. This situation usually
arises in time series clustering. For example, given K time
sequences, a (KxK) symmetric matrix can be constructed whose ijth
element contains the similarity measure between the ith and jth
sequence.

That's a neet idea. However, Proc Similarity (in SAS/ETS 9.3) doesn't accept the same series to be listed as an input and a target sequence. What is the best way to get a similarity matrix with Proc Similarity?

PG

udo_sas · Posted 06-25-2013 02:45 PM

Hello -

This example might be useful - additional information can be found here: http://support.sas.com/documentation/cdl/en/etsug/63939/HTML/default/viewer.htm#etsug_similarity_sec...

Thanks,

Udo

data tmp;

set sashelp.snacks;

retain Series 0;

if first.product then series+1;

by product;

run;

proc sort data=tmp out=tmp2;

by date;

run;

proc transpose data=tmp2

OUT=tmp3

PREFIX=C_

NAME=reihe

LABEL=Etikett

;

BY Date;

ID series;

VAR QtySold;

run;

proc similarity data=tmp3 out=_null_ outsum=summary;

id date interval=day accumulate=total;

target _numeric_ /normalize=standard measure=mabsdevmax;

run;

data matrix(type=distance);

set summary;

drop _status_;

run;

proc cluster data=matrix outtree=tree method=average;

id _input_;

run;

proc tree data=tree out=result nclusters=4;

id _input_;

run;

View solution in original post

udo_sas · Posted 06-25-2013 02:45 PM

Hello -

This example might be useful - additional information can be found here: http://support.sas.com/documentation/cdl/en/etsug/63939/HTML/default/viewer.htm#etsug_similarity_sec...

Thanks,

Udo

data tmp;

set sashelp.snacks;

retain Series 0;

if first.product then series+1;

by product;

run;

proc sort data=tmp out=tmp2;

by date;

run;

proc transpose data=tmp2

OUT=tmp3

PREFIX=C_

NAME=reihe

LABEL=Etikett

;

BY Date;

ID series;

VAR QtySold;

run;

proc similarity data=tmp3 out=_null_ outsum=summary;

id date interval=day accumulate=total;

target _numeric_ /normalize=standard measure=mabsdevmax;

run;

data matrix(type=distance);

set summary;

drop _status_;

run;

proc cluster data=matrix outtree=tree method=average;

id _input_;

run;

proc tree data=tree out=result nclusters=4;

id _input_;

run;

PGStats · Posted 06-25-2013 03:34 PM

Thanks Udo. That's very helpful. I didn't realize that statement INPUT was optional and that in its absence, target sequences would also be considered as input sequences. I hadn't read the clustering example since I have another application in mind.

PG

udo_sas · Posted 06-26-2013 10:53 AM

PS: my colleague posted a very interesting blog today on How to color clusters in a dendogram - The DO Loop - which might be of interest.

How to get a Similarity Matrix

Re: How to get a Similarity Matrix

Re: How to get a Similarity Matrix

Re: How to get a Similarity Matrix

Re: How to get a Similarity Matrix

How to get a Similarity Matrix

Re: How to get a Similarity Matrix

Re: How to get a Similarity Matrix

Re: How to get a Similarity Matrix

Re: How to get a Similarity Matrix

The 2025 SAS Hackathon has begun!