04-23-2015 03:19 PM

I have a dataset with five categorical variables.

V1 have 3 levels: CO, KY, NY

V2 have 2 levels: A, F

V3 have 2 levels: H, P

V4 have 2 levels: C, P

V5 have 4 levels: A, B, C, D

The data below shows all combinations we observed. Based on this dataset, how I can find all valid combinations of different levels of these variables?

What I am looking for is combinations like [CO]..

.

.

.

.

.

How should I write my program to list all valid combinations? Any suggestion are highly appreciated. Thanks!

V1 V2 V3 V4 V5

CO A P C D

CO F H C D

CO F P C D

KY A P C D

KY F H C D

KY F P C D

NY A P C D

NY A P P A

NY A P P B

NY A P P C

NY A P P D

NY F H C D

NY F H P A

NY F H P B

NY F H P C

NY F H P D

NY F P C D

NY F P P A

NY F P P B

NY F P P C

NY F P P D

04-23-2015 04:54 PM

I would start with something like:

proc freq data= have ;

tables v1* v2* v3 *v4* v5/list nocum nopercent;

run;

If I need an output data set then add an OUT=want to the tables statement.

04-23-2015 05:06 PM

Thanks for your reply! Actually, the dataset I listed there is the output from the proc freq on a larger dataset. But that only give me the combinations with one level from each variable. I'm looking for a way to grab multiple levels from each variable. For example, for V2, it can take the form of A, F or (A, F). For V1, it can take CO, KY, NY, (CO, KY), (CO, NY), (KY, NY) OR (CO, KY, NY). I have actually figured out a way to list all POSSIBLE combinations like this, but the part that I got stuck is how to identify the valid combinations out of all possible combinations. As I mentioned, [KY]..

.

04-23-2015 06:18 PM

It looks like you have to explain what [CO.KY]..

.

Maybe you should provide a small example dataset with 3 variables and then show what result you are expecting.