BookmarkSubscribeRSS Feed
ncnickel
Calcite | Level 5

Hello,

I originally posted this in the Data Step section and was advised to post it here in the Statistical Procedures section.

I have a set of seven dichotomous indicator variables:

Step_1 (1=exposed, 0=not exposed)

Step_2 (1=exposed, 0=not exposed)

Step_3 (1=exposed, 0=not exposed)

Step_4 (1=exposed, 0=not exposed)

Step_5 (1=exposed, 0=not exposed)

Step_6 (1=exposed, 0=not exposed)

Step_7 (1=exposed, 0=not exposed)

Each of these Steps is a different hospital policy.

Currently, hospitals will only receive recognition for having all Steps in place (it is an all or nothing deal. If you have all Steps, you get credit; if you're missing even just 1 Step, you get 0 credit). Having all Steps in place is associated with improved health outcomes. However, it is a huge barrier to have ALL Steps in place. There is reserach out there to suggest that increased numbers of Steps in place is associated with improved health (e.g., All steps is better than 6 Steps which is better than 5 Steps which is better than 4 Steps, etc...).

Some States are using this information to start programs where they will recognize hospitals for each additional 2 Steps they have in place. That is, a hospital will receive 1 star for having 2 Steps in place, 2 stars for having 4 Steps in place, 3 stars for having 6 Steps in place.

The problem, though, is we don't know which combinations of 2 Steps to prioritize, meaning are there certain combinations of 2 steps that have a larger impact than other combinations of 2 Steps?

Our institute's research question then is, "Which combination of 2 Steps is associated with the greatest improvement in health, which combination of 2 Steps is associated with the 2nd greatest improvement in health, which combination is associated with the 3rd greatest and so forth.?" This informaton is to be used by these State programs so they can tell hospitals which Steps to prioritize (meaning which combinations of 2 Steps give the biggest bang in terms of health improvement).

In order to identify which combination has the greatest impact we would like to create indicator variables for each of the different combinations of 2 Steps. And then use these indicator variables to identify the associated health impact.

e.g., say combination1 is exposed to Step_1 and Step_2. What is the effect of being exposed to combination 1 (i.e., exposed to BOTH Step_1 AND Step_2) as compared with not being exposed to combination 1 (i.e., exposed to NEITHER Step_1 nor Step_2)?

Ideally, we would like to create two types of combination variables:

1) Exposed to both Steps in the combination without regard to exposure to Other Steps

meaning you could have

Step1=1,  Step2=1 and then any exposure status for Step3 through Step7

and

2) Exposed to both Steps in the combination and ONLY exposure to those two Steps

meaning you would have

Step1=1, Step2=1, and then Step3=0, Step4=0, Step5=0, Step6=0, Step7=0

Is there a way to operationalize this simply in SAS without writing out the code for each and every combination? I came across CALL COMB -type commands and saw how they were used to create observations with different combinations of names, but as a new SAS user I am struggling to see how to extend this to creating new indicator variables out of old indicator variables.

2 REPLIES 2
Rick_SAS
SAS Super FREQ

Several SAS procedure support the "GLM notation" that enables you to specify a model that includes main effects and all two-way interactions.  You use the "|" operator to specify interactions and the "@2" operator to limit the interactions to tw0-way. For example, a GLM analysis might be

proc glm data=a;

class s1-s7;

model y = s1|s2|s3|s4|s5|s6|s7 @2;

run;

If the procedure you want to use does not support the GLM notation, you can use PROC GLMMOD with the OUTDESIGN= option to generate the design matrix.

For more on specifying interactions, see http://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_introcom_a00...

mostater
Obsidian | Level 7

This doesn't answer your question, but have you considered including all 7 variables into one model and then examine the semi-partial R-squared values (assuming your dependent variable is continuous)?  You could sum those values for each of the 21 pairs to come up with a heirarchical ordering.  You could add interactions as Rick@SAS suggests to get at more complex relationships between the variables.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1191 views
  • 0 likes
  • 3 in conversation