DATA Step, Macro, Functions and more

Best way to create certain variables

Reply
Contributor
Posts: 67

Best way to create certain variables

Hello!

i have two questions.. I think they are related though. I'll present both issues just in case.

So I have 6 mutually exclusive groups. Group 1 is people who only have trauma#1, group 2 is only people who have trauma#3, group 3= people who only have Trauma#5, group 4= people who have only Trauma#1 and Trauma#3, group 5 = people who only have trauma#1 and trauma#5, and group 6 = people who only have trauma#3 and trauma#5.

Each trauma has the same variables, but they are modified to reflect which trauma they apply to. So I want to see, for example, who had trauma inflicted on them by a parent. the variable is t"x"perp, where x = the trauma#. So I have this the code, but I feel that it is pretty messy. I do this for sibling, other adult, and other youth.

/*Parent*/

if group = 1 and t1perp=1 then parent = 1; else

if group = 2 and t3perp=1 then parent = 1; else

if group = 3 and t5perp=1 then parent = 1; else

if group = 4 and (t1perp =1 or t3perp =1) then parent =1; else

if group = 5 and (t1perp =1 or t5perp =1) then parent =1; else

if group = 6 and (t3perp =1 or t5perp =1) then parent =1;

/*Other adult relative*/

if group = 1 and t1perar=1 then otheradult = 1; else

if group = 2 and t3perar=1 then otheradult = 1; else

if group = 3 and t5perar=1 then otheradult = 1; else

if group = 4 and (t1perar =1 or t3perar =1) then otheradult =1; else

if group = 5 and (t1perar =1 or t5perar =1) then otheradult =1; else

if group = 6 and (t3perar =1 or t5perar =1) then otheradult =1;

/*Sibling*/

if group = 1 and t1persb=1 then sibling = 1; else

if group = 2 and t3persb=1 then sibling = 1; else

if group = 3 and t5persb=1 then sibling = 1; else

if group = 4 and (t1persb =1 or t3persb =1) then sibling =1; else

if group = 5 and (t1persb =1 or t5persb =1) then sibling =1; else

if group = 6 and (t3persb =1 or t5persb =1) then sibling =1;

What would be the best way to condense this? Macro?Array? I think hard-coding the way I have is very elementary/amateurish and susceptible to error .



Second issue:

I want to know for those who are in groups 3, 5, or 6 (i.e., they're the groups with subjects who have endorsed trauma#5),

I want to see the type of subtypes they have if they have 3 subtypes. There are multiple combinations though, Any ideas on how to condense this code?

You can endorse any 3 of the 5.

      array subtype {5} t5typt5 t5typen t5typva t5typed t5typoth;

if group in (3,5,6) and subtype{1}=. & subtype{2}=. & subtype{3} = . & subtype{4}=. and subtype{5}=.

            then combotype3=0;

      if group in (3,5,6) and subtype{1}=1 & subtype{2}=1 & subtype{3} = 1 & subtype{4} ne 1 & subtype{5} ne 1 then combotype3=1; else

      if group in (3,5,6) and subtype{1}=1 & subtype{2}=1 & subtype{4} = 1 & subtype{3} ne 1 & subtype{5} ne 1 then combotype3=2; else

      if group in (3,5,6) and subtype{1}=1 & subtype{2}=1 & subtype{5} = 1 & subtype{3} ne 1 & subtype{4} ne 1 then combotype3=3; else

      if group in (3,5,6) and subtype{1}=1 & subtype{3}=1 & subtype{4} = 1 & subtype{2} ne 1 & subtype{5} ne 1 then combotype3=4; else

      if group in (3,5,6) and subtype{1}=1 & subtype{3}=1 & subtype{5} = 1 & subtype{2} ne 1 & subtype{4} ne 1 then combotype3=5; else

      if group in (3,5,6) and subtype{1}=1 & subtype{4}=1 & subtype{5} = 1 & subtype{2} ne 1 & subtype{3} ne 1 then combotype3=6; else

      if group in (3,5,6) and subtype{2}=1 & subtype{3}=1 & subtype{4} = 1 & subtype{1} ne 1 & subtype{5} ne 1 then combotype3=7; else

      if group in (3,5,6) and subtype{2}=1 & subtype{3}=1 & subtype{5} = 1 & subtype{1} ne 1 & subtype{4} ne 1 then combotype3=8; else

      if group in (3,5,6) and subtype{2}=1 & subtype{4}=1 & subtype{5} = 1 & subtype{1} ne 1 & subtype{3} ne 1 then combotype3=9; else

      if group in (3,5,6) and subtype{3}=1 & subtype{4}=1 & subtype{5} = 1 & subtype{1} ne 1 & subtype{2} ne 1 then combotype3=10;

Also if you know of any resources I can consult to learn more about arrays I'd appreciate it.. suggestions are welcomeSmiley Happy

Thanks!

Super User
Posts: 10,550

Re: Best way to create certain variables

I would look at common elements and factor out for the first bit though. Since your code is done by group you might look first at doing it so.

if group = 1 then do;

<codes only done for group 1>

end;

Since you said explicitly that your groups do not overlap the Select statement might reduce your code somewhat.

Select (group);

     when (1) do; <code related to group 1> end;

     when (2) do; <code related to group 2> end;

...

     when (6) do; <code related to group 6> end;

     otherwise "Put unexpected group: " group=; /* or something that makes sense if group is not valued 1 to 6*/

end;

I think we need more description of your subtype variables for ways to combine them since we only know that you have values of 1 and something else. Are the dichotomous or multivalued? All the same or different ranges of values?

Contributor
Posts: 67

Re: Best way to create certain variables

Hello!

I haven't used the 'select' statement before; I guess it might be worth looking up Smiley Happy So I've narrowed my sample down to only those in groups 1-6. Regarding the subtype variables, they are dichotomous (you either have "1" for "yes," or missing "." for "no"). Does this help?

Thanks!

Best,

Gina

Super User
Posts: 10,550

Re: Best way to create certain variables

One approach I was thinking of was a dummy variable that would be a false binary combination of your subtype variables which could be built as a character variable using a concatenation function or a numeric such as this:

Dummy = sum(t5typt5*10000, t5typen*1000, t5typva*100, t5typed*10, t5typoth );

Then you could build a custom  informat to assign values to your combo3type.

proc format;

invalue combo3type

11100 = 1

11010 = 2

11001 = 3

<continue pattern to, the leading zeroes are ignored by proc format for numerics but help with coding>

00111 = 10

other = .

;

run;

Then the code would be:

Dummy = sum(t5typt5*10000, t5typen*1000, t5typva*100, t5typed*10, t5typoth );

if group in (3,5,6) the combo3type = input(dummy,combo3type.);

you would probably want to drop the dummy variable after confirming the code is working.

The format approach has the added benefit of if later someone needs the value set when only 2 of the subtypes are set it is easy to add. Also if an additional subtype is added the format is quick to change and the dummy variable would only need to add another position.

OR if one of the subtype acquires a second valid value, say a 2 then adding a 2 in the format in the correct position is relatively easy.

Multiple dummy values can be assigned to the same result, 01000, 00100 = 27 for example.

This approach is limited to the number of types depending on your OS and resulting numeric storage.


Ask a Question
Discussion stats
  • 3 replies
  • 284 views
  • 0 likes
  • 2 in conversation