## Assessment of Similarity

Regular Contributor
Posts: 238

# Assessment of Similarity

Suppose we have a number of column vectors of data, each column representing a case.  Suppose 10 such columns.

Rows represent characteristics.  Suppose 50 such characteristics.

Some cells in the matrix are empty -- missing data of Case Xi on Characteristic Yj.

The question is, How can one describe the similarities among the given cases?

A parallel example, perhaps:  Images of people.  Two eyes, a nose.  But, some short hair, some long, so not similar on that measure.

So, what are the similarities among the cases?

All suggestions and thoughts appreciated.

Thanks.

Nicholas Kormanik

PROC Star
Posts: 1,283

## Re: Assessment of Similarity

Posted in reply to NicholasKormanik

Please be more specific. Preferably provide a small sample data set and what you want your result to look like.

Super User
Posts: 10,784

## Re: Assessment of Similarity

Posted in reply to NicholasKormanik

Cluster Analysis ?

Check PROC CLUSTER or PROC VARCLUS .

Super User
Posts: 13,563

## Re: Assessment of Similarity

Posted in reply to NicholasKormanik

Likely the first step would be to transpose the data as most SAS procedures expect rows to represent observations (cases) and each column a separate characteristic.

What types of similarities are you looking for? Between individuals or groups of the characteristics.

Here is a brief example of comparing age and sex distributions using a supplied SAS data set that you should have available

(not claiming is the best example just one way)

```proc freq data=sashelp.class;
tables age*sex /chisq;
run;```

The chi square test would be testing if the distributions of sex are the same across age groups. A large Prob value (p-value) would indicate no difference while a small one would indicate the is some difference in the distribution of sex for age.

Show some example input and what the result would look like would be helpful. Approaches would vary depending on if the characteristics were categorical (sex for example) or continuous (height measurements) and if you mixing them.

Discussion stats
• 3 replies
• 116 views
• 0 likes
• 4 in conversation