BookmarkSubscribeRSS Feed
HuiZ
Calcite | Level 5

I have a dataset with five categorical variables.

V1 have 3 levels: CO, KY, NY

V2 have 2 levels: A, F

V3 have 2 levels: H, P

V4 have 2 levels: C, P

V5 have 4 levels: A, B, C, D

The data below shows all combinations we observed. Based on this dataset, how I can find all valid combinations of different levels of these variables?

What I am looking for is combinations like [CO]..

.., OR [CO].[A.F].

.., OR [CO.KY]..

.., OR [CO.KY.NY].]..

..[A.B.C.D]. For each variable, I can take one or multiple levels, but I want to make sure that the combination is represented in the data. For example, [KY]..

.

. is not a valid combination. When multiple levels selected, order is not important. So A.B.D is the ...

How should I write my program to list all valid combinations? Any suggestion are highly appreciated. Thanks!

V1 V2 V3 V4 V5

CO A P C D

CO F H C D

CO F P C D

KY A P C D

KY F H C D

KY F P C D

NY A P C D

NY A P P A

NY A P P B

NY A P P C

NY A P P D

NY F H C D

NY F H P A

NY F H P B

NY F H P C

NY F H P D

NY F P C D

NY F P P A

NY F P P B

NY F P P C

NY F P P D

3 REPLIES 3
ballardw
Super User

I would start with something like:

proc freq data= have ;

     tables v1* v2* v3 *v4* v5/list nocum nopercent;

run;

If I need an output data set then add an OUT=want to the tables statement.

HuiZ
Calcite | Level 5

Thanks for your reply! Actually, the dataset I listed there is the output from the proc freq on a larger dataset. But that only give me the combinations with one level from each variable. I'm looking for a way to grab multiple levels from each variable. For example, for V2, it can take the form of A, F or (A, F). For V1, it can take CO, KY, NY, (CO, KY), (CO, NY), (KY, NY) OR (CO, KY, NY). I have actually figured out a way to list all POSSIBLE combinations like this, but the part that I got stuck is how to identify the valid combinations out of all possible combinations. As I mentioned, [KY]..

.

. is not a valid combination as it didn't show up in the data. The single level combinations are easy...

ballardw
Super User

It looks like you have to explain what [CO.KY]..

.., means as I wouldn't think that what you imply with [CO.KY] is possible to occur in your data. That would mean that a variable has 2 values for a single record?

Maybe you should provide a small example dataset with 3 variables and then show what result you are expecting.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 1872 views
  • 0 likes
  • 2 in conversation