New Contributor
Posts: 2

Generate combinations of categorical variables with varying levels

I have a dataset with five categorical variables.

V1 have 3 levels: CO, KY, NY

V2 have 2 levels: A, F

V3 have 2 levels: H, P

V4 have 2 levels: C, P

V5 have 4 levels: A, B, C, D

The data below shows all combinations we observed. Based on this dataset, how I can find all valid combinations of different levels of these variables?

What I am looking for is combinations like [CO]..

.., OR [CO].[A.F].

.., OR [CO.KY]..

.., OR [CO.KY.NY].]..

..[A.B.C.D]. For each variable, I can take one or multiple levels, but I want to make sure that the combination is represented in the data. For example, [KY]..

.

How should I write my program to list all valid combinations? Any suggestion are highly appreciated. Thanks!

V1 V2 V3 V4 V5

CO A P C D

CO F H C D

CO F P C D

KY A P C D

KY F H C D

KY F P C D

NY A P C D

NY A P P A

NY A P P B

NY A P P C

NY A P P D

NY F H C D

NY F H P A

NY F H P B

NY F H P C

NY F H P D

NY F P C D

NY F P P A

NY F P P B

NY F P P C

NY F P P D

Super User
Posts: 13,583

Re: Generate combinations of categorical variables with varying levels

proc freq data= have ;

tables v1* v2* v3 *v4* v5/list nocum nopercent;

run;

If I need an output data set then add an OUT=want to the tables statement.

New Contributor
Posts: 2

Re: Generate combinations of categorical variables with varying levels

Thanks for your reply! Actually, the dataset I listed there is the output from the proc freq on a larger dataset. But that only give me the combinations with one level from each variable. I'm looking for a way to grab multiple levels from each variable. For example, for V2, it can take the form of A, F or (A, F). For V1, it can take CO, KY, NY, (CO, KY), (CO, NY), (KY, NY) OR (CO, KY, NY). I have actually figured out a way to list all POSSIBLE combinations like this, but the part that I got stuck is how to identify the valid combinations out of all possible combinations. As I mentioned, [KY]..

.

Super User
Posts: 13,583

Re: Generate combinations of categorical variables with varying levels

It looks like you have to explain what [CO.KY]..

.., means as I wouldn't think that what you imply with [CO.KY] is possible to occur in your data. That would mean that a variable has 2 values for a single record?

Maybe you should provide a small example dataset with 3 variables and then show what result you are expecting.

Discussion stats
• 3 replies
• 620 views
• 0 likes
• 2 in conversation