BookmarkSubscribeRSS Feed
NKormanik
Barite | Level 11

Suppose we have a number of column vectors of data, each column representing a case.  Suppose 10 such columns.

 

Rows represent characteristics.  Suppose 50 such characteristics.

 

Some cells in the matrix are empty -- missing data of Case Xi on Characteristic Yj.

 

The question is, How can one describe the similarities among the given cases?

 

A parallel example, perhaps:  Images of people.  Two eyes, a nose.  But, some short hair, some long, so not similar on that measure.

 

So, what are the similarities among the cases?

 

All suggestions and thoughts appreciated.

 

Thanks.

 

Nicholas Kormanik

 

 

3 REPLIES 3
PeterClemmensen
Tourmaline | Level 20

Please be more specific. Preferably provide a small sample data set and what you want your result to look like. 

Ksharp
Super User

Cluster Analysis ?

Check PROC CLUSTER or PROC VARCLUS .

ballardw
Super User

Likely the first step would be to transpose the data as most SAS procedures expect rows to represent observations (cases) and each column a separate characteristic.

 

 

What types of similarities are you looking for? Between individuals or groups of the characteristics.

Here is a brief example of comparing age and sex distributions using a supplied SAS data set that you should have available

(not claiming is the best example just one way)

proc freq data=sashelp.class;
 tables age*sex /chisq;
run;

The chi square test would be testing if the distributions of sex are the same across age groups. A large Prob value (p-value) would indicate no difference while a small one would indicate the is some difference in the distribution of sex for age.

 

 

Show some example input and what the result would look like would be helpful. Approaches would vary depending on if the characteristics were categorical (sex for example) or continuous (height measurements) and if you mixing them.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1226 views
  • 0 likes
  • 4 in conversation