Question about creating indicator variables

ncnickel · Posted 07-19-2011 09:10 PM

Hello,

I originally posted this in the Data Step section and was advised to post it here in the Statistical Procedures section.

I have a set of seven dichotomous indicator variables:

Step_1 (1=exposed, 0=not exposed)

Step_2 (1=exposed, 0=not exposed)

Step_3 (1=exposed, 0=not exposed)

Step_4 (1=exposed, 0=not exposed)

Step_5 (1=exposed, 0=not exposed)

Step_6 (1=exposed, 0=not exposed)

Step_7 (1=exposed, 0=not exposed)

Each of these Steps is a different hospital policy.

Currently, hospitals will only receive recognition for having all Steps in place (it is an all or nothing deal. If you have all Steps, you get credit; if you're missing even just 1 Step, you get 0 credit). Having all Steps in place is associated with improved health outcomes. However, it is a huge barrier to have ALL Steps in place. There is reserach out there to suggest that increased numbers of Steps in place is associated with improved health (e.g., All steps is better than 6 Steps which is better than 5 Steps which is better than 4 Steps, etc...).

Some States are using this information to start programs where they will recognize hospitals for each additional 2 Steps they have in place. That is, a hospital will receive 1 star for having 2 Steps in place, 2 stars for having 4 Steps in place, 3 stars for having 6 Steps in place.

The problem, though, is we don't know which combinations of 2 Steps to prioritize, meaning are there certain combinations of 2 steps that have a larger impact than other combinations of 2 Steps?

Our institute's research question then is, "Which combination of 2 Steps is associated with the greatest improvement in health, which combination of 2 Steps is associated with the 2nd greatest improvement in health, which combination is associated with the 3rd greatest and so forth.?" This informaton is to be used by these State programs so they can tell hospitals which Steps to prioritize (meaning which combinations of 2 Steps give the biggest bang in terms of health improvement).

In order to identify which combination has the greatest impact we would like to create indicator variables for each of the different combinations of 2 Steps. And then use these indicator variables to identify the associated health impact.

e.g., say combination1 is exposed to Step_1 and Step_2. What is the effect of being exposed to combination 1 (i.e., exposed to BOTH Step_1 AND Step_2) as compared with not being exposed to combination 1 (i.e., exposed to NEITHER Step_1 nor Step_2)?

Ideally, we would like to create two types of combination variables:

1) Exposed to both Steps in the combination without regard to exposure to Other Steps

meaning you could have

Step1=1, Step2=1 and then any exposure status for Step3 through Step7

and

2) Exposed to both Steps in the combination and ONLY exposure to those two Steps

meaning you would have

Step1=1, Step2=1, and then Step3=0, Step4=0, Step5=0, Step6=0, Step7=0

Is there a way to operationalize this simply in SAS without writing out the code for each and every combination? I came across CALL COMB -type commands and saw how they were used to create observations with different combinations of names, but as a new SAS user I am struggling to see how to extend this to creating new indicator variables out of old indicator variables.

Rick_SAS · Posted 07-20-2011 09:05 AM

Several SAS procedure support the "GLM notation" that enables you to specify a model that includes main effects and all two-way interactions. You use the "|" operator to specify interactions and the "@2" operator to limit the interactions to tw0-way. For example, a GLM analysis might be

proc glm data=a;

class s1-s7;

model y = s1|s2|s3|s4|s5|s6|s7 @2;

run;

If the procedure you want to use does not support the GLM notation, you can use PROC GLMMOD with the OUTDESIGN= option to generate the design matrix.

For more on specifying interactions, see http://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_introcom_a00...

mostater · Posted 07-22-2011 01:02 PM

This doesn't answer your question, but have you considered including all 7 variables into one model and then examine the semi-partial R-squared values (assuming your dependent variable is continuous)? You could sum those values for each of the 21 pairs to come up with a heirarchical ordering. You could add interactions as Rick@SAS suggests to get at more complex relationships between the variables.

Question about creating indicator variables

Re: Question about creating indicator variables

Re: Question about creating indicator variables