Help using Base SAS procedures

Generate combinations of categorical variables with varying levels

Reply
New Contributor
Posts: 2

Generate combinations of categorical variables with varying levels

I have a dataset with five categorical variables.

V1 have 3 levels: CO, KY, NY

V2 have 2 levels: A, F

V3 have 2 levels: H, P

V4 have 2 levels: C, P

V5 have 4 levels: A, B, C, D

The data below shows all combinations we observed. Based on this dataset, how I can find all valid combinations of different levels of these variables?

What I am looking for is combinations like [CO]..

.., OR [CO].[A.F].

.., OR [CO.KY]..

.., OR [CO.KY.NY].]..

..[A.B.C.D]. For each variable, I can take one or multiple levels, but I want to make sure that the combination is represented in the data. For example, [KY]..

.

. is not a valid combination. When multiple levels selected, order is not important. So A.B.D is the ...

How should I write my program to list all valid combinations? Any suggestion are highly appreciated. Thanks!

V1 V2 V3 V4 V5

CO A P C D

CO F H C D

CO F P C D

KY A P C D

KY F H C D

KY F P C D

NY A P C D

NY A P P A

NY A P P B

NY A P P C

NY A P P D

NY F H C D

NY F H P A

NY F H P B

NY F H P C

NY F H P D

NY F P C D

NY F P P A

NY F P P B

NY F P P C

NY F P P D

Super User
Posts: 11,343

Re: Generate combinations of categorical variables with varying levels

I would start with something like:

proc freq data= have ;

     tables v1* v2* v3 *v4* v5/list nocum nopercent;

run;

If I need an output data set then add an OUT=want to the tables statement.

New Contributor
Posts: 2

Re: Generate combinations of categorical variables with varying levels

Thanks for your reply! Actually, the dataset I listed there is the output from the proc freq on a larger dataset. But that only give me the combinations with one level from each variable. I'm looking for a way to grab multiple levels from each variable. For example, for V2, it can take the form of A, F or (A, F). For V1, it can take CO, KY, NY, (CO, KY), (CO, NY), (KY, NY) OR (CO, KY, NY). I have actually figured out a way to list all POSSIBLE combinations like this, but the part that I got stuck is how to identify the valid combinations out of all possible combinations. As I mentioned, [KY]..

.

. is not a valid combination as it didn't show up in the data. The single level combinations are easy...

Super User
Posts: 11,343

Re: Generate combinations of categorical variables with varying levels

It looks like you have to explain what [CO.KY]..

.., means as I wouldn't think that what you imply with [CO.KY] is possible to occur in your data. That would mean that a variable has 2 values for a single record?

Maybe you should provide a small example dataset with 3 variables and then show what result you are expecting.

Ask a Question
Discussion stats
  • 3 replies
  • 445 views
  • 0 likes
  • 2 in conversation