  PGStats
Opal | Level 21

## How to get a Similarity Matrix

In article An Introduction to Similarity Analysis Using SAS by Leonard et al., similarity matrices are introduced in these terms :

Similarity measures can be used to compare several
time sequences to form a similarity matrix. This situation usually
arises in time series clustering. For example, given K time
sequences, a (KxK) symmetric matrix can be constructed whose ijth
element contains the similarity measure between the ith and jth
sequence.

That's a neet idea. However, Proc Similarity (in SAS/ETS 9.3) doesn't accept the same series to be listed as an input and a target sequence. What is the best way to get a similarity matrix with Proc Similarity?

PG

PG
1 ACCEPTED SOLUTION

Accepted Solutions  udo_sas
SAS Employee

## Re: How to get a Similarity Matrix

Hello -

This example might be useful - additional information can be found here: http://support.sas.com/documentation/cdl/en/etsug/63939/HTML/default/viewer.htm#etsug_similarity_sec...

Thanks,

Udo

data tmp;

set sashelp.snacks;

retain Series 0;

if first.product then series+1;

by product;

run;

proc sort data=tmp out=tmp2;

by date;

run;

proc transpose data=tmp2

OUT=tmp3

PREFIX=C_

NAME=reihe

LABEL=Etikett

;

BY Date;

ID series;

VAR QtySold;

run;

proc similarity data=tmp3 out=_null_ outsum=summary;

id date interval=day accumulate=total;

target _numeric_ /normalize=standard measure=mabsdevmax;

run;

data matrix(type=distance);

set summary;

drop _status_;

run;

proc cluster data=matrix outtree=tree method=average;

id _input_;

run;

proc tree data=tree out=result nclusters=4;

id _input_;

run;

3 REPLIES 3  udo_sas
SAS Employee

## Re: How to get a Similarity Matrix

Hello -

This example might be useful - additional information can be found here: http://support.sas.com/documentation/cdl/en/etsug/63939/HTML/default/viewer.htm#etsug_similarity_sec...

Thanks,

Udo

data tmp;

set sashelp.snacks;

retain Series 0;

if first.product then series+1;

by product;

run;

proc sort data=tmp out=tmp2;

by date;

run;

proc transpose data=tmp2

OUT=tmp3

PREFIX=C_

NAME=reihe

LABEL=Etikett

;

BY Date;

ID series;

VAR QtySold;

run;

proc similarity data=tmp3 out=_null_ outsum=summary;

id date interval=day accumulate=total;

target _numeric_ /normalize=standard measure=mabsdevmax;

run;

data matrix(type=distance);

set summary;

drop _status_;

run;

proc cluster data=matrix outtree=tree method=average;

id _input_;

run;

proc tree data=tree out=result nclusters=4;

id _input_;

run;  PGStats
Opal | Level 21

## Re: How to get a Similarity Matrix

Thanks Udo. That's very helpful. I didn't realize that statement INPUT was optional and that in its absence, target sequences would also be considered as input sequences. I hadn't read the clustering example since I have another application in mind.

PG

PG  udo_sas
SAS Employee

## Re: How to get a Similarity Matrix

PS: my colleague posted a very interesting blog today on How to color clusters in a dendogram - The DO Loop - which might be of interest.

Discussion stats
• 3 replies
• 1909 views
• 1 like
• 2 in conversation