DATA Step, Macro, Functions and more

array and loops? recoding into different variables

Reply
Contributor
Posts: 66

array and loops? recoding into different variables

Hi,

I have a number of variables i'd like to recode into different variables by range, my data looks like:

id Rate1 Rate2 Rate3 Rate4

1 0 10 23 54

2 18 39 23 84

3 4 87 92 8

...

I'd like to create new variables that group them into category indicators for each Rate

0-25, 25-50, 50-75, 75-max

I'd like the data to come out:

id Rate1 Rate2 Rate3 Rate4 R1_0_25 R1_25_50 R1_50_75 R1_75_M R2_0_25 R2_25_50 R2_50_75 R2_75_M R3_0_25 R3_25_50 R3_50_75 R3_75_M  R4_0_25 R4_25_50 R4_50_75 R4_75_M

1 0 10 23 54 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0

2 18 39 23 84 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1

3 4 87 92 8 1 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0

...

I started with array's but i didnt think through it very clearly yet so that didnt work, any help is very much appreciated!

ARRAY RATELIST {*} RATE1 RATE2 RATE3 RATE4 ;

ARRAY NEWCAT {*} R1_75_M R2_0_25 R2_25_50 R2_50_75 R2_75_M R3_0_25 R3_25_50 R3_50_75 R3_75_M  R4_0_25 R4_25_50 R4_50_75 R4_75_M ;

Do i=1 to dim(NEWCAT) ;

  if 0 LE RATELIST{i} LE 25 then NEWCAT{i}= 1; else NEWCAT{i}=0 ;

  if 25 LE RATELIST{i} LE 50 then NEWCAT{i}= 1; else NEWCAT{i}=0 ;

  if 50 LE RATELIST{i} LE 75 then NEWCAT{i}= 1; else NEWCAT{i}=0 ;

  if RATELIST{i} GE 75 then NEWCAT{i}= 1; else NEWCAT{i}=0 ;

end;

Thank you.

Super User
Posts: 10,454

Re: array and loops? recoding into different variables

You have a problem in that you have the values of 25, 50, and 75 assigned to two categories when equal. Which category do you want when value is exactly equal to 25?

How will you use these recoded values? It may be that you don't need to create a new variable but a custom format.

For example if I wanted to run proc freq on the rates and know how many and percent were within the range then I could do:

Proc format;

value quartile

0 - < 25 = '0 to < 25'

25 -<50 = '25 to <50'

50 -<75 = '50 to <75'

75 - high='75+'

;

run;

proc freq data=yourdatasetname;

tables rate1 rate2 rate3 rate4;

format rate1 rate2 rate3 rate4 quartile.;

run;

Most of the analysis procedures will use the formatted value.

or to recode one:

R2_0_25 = (0 le Rate2 le 25); /* though I think you may want (0 le Rate2 lt 25)  */

Frequent Contributor
Posts: 81

Re: array and loops? recoding into different variables

sorry you're right with the recodes

0 - < 25 = '0 to < 25'

25 -<50 = '25 to <50'

50 -<75 = '50 to <75'

75 - high='75+'

Although a format will not work for me, i need the rates recoded into a new array of variables as indicators.

Valued Guide
Posts: 632

Re: array and loops? recoding into different variables

When forming numeric groups one can usually use the values to calculate the array index.  Here I have added a rate5 so the number of rates and the number of groups are not the same.

data have;
input id Rate1 Rate2 Rate3 Rate4 rate5;
datalines;
1 0 10 23 54 99
2 18 39 23 84 -5
3 4 87 92 8 33
run;

data want(drop=i r index rateSmiley Happy;
   set have;
   ARRAY RATELIST {*} RATE1 RATE2 RATE3 RATE4 rate5;
                    /*0-<25, 25-<50, 50-<75, 75-max*/
   ARRAY NEWCAT {*} R1_0_25 R1_25_50 R1_50_75 R1_75_M
                    R2_0_25 R2_25_50 R2_50_75 R2_75_M
                    R3_0_25 R3_25_50 R3_50_75 R3_75_M 
                    R4_0_25 R4_25_50 R4_50_75 R4_75_M 
                    R5_0_25 R5_25_50 R5_50_75 R5_75_M ;
   do i = 1 to dim(newcat);
      newcat{i}=0;
   end;

   do r = 1 to dim(ratelist);
      i = floor(ratelist{r}/25)+1; /* map rate into a nominal value between 1 and 4 */
      i = max(i,1);                /* map rates<0 to 1*/
      i = min(i,dim(ratelist));    /* map large values to upper rate */
      index = i + ((r-1)*4);          /* 4 (75/25+1) is number of groups  - not number or rates */
      newcat{index} = 1;
   end;
   run;

Frequent Contributor
Posts: 81

Re: array and loops? recoding into different variables

Thanks Art. (I've changed accounts, new company)

The code above works for 4 even groups, if i were to have uneven groups, how would that change?

For example instead of even 25 percent intervals, i have

0-25

25-40

40-60

60-90

90+

Valued Guide
Posts: 632

Re: array and loops? recoding into different variables

Here are a couple of approaches when your groups have unequal width.  First an assignment statement - this will be faster than IF-THEN/ELSE processing.

i = 1 * (0 <=ratelist{r} <25)
    +2 * (25<=ratelist{r} <40)
    +3 * (40<=ratelist{r} <60)
    +4 * (60<=ratelist{r} <90)
    +5 * (90<=ratelist{r});

You could also create an informat that maps the RATELIST{r} to the index value.

proc format;
invalue raterange
   0 -<25=1
   25-<40=2 

   40-<60=3
   60-<90=4
   90-high=5;

The assignment statement then becomes:

     i = input(ratelist{r},raterange.);

If the number of ranges becomes much more than 5 I would recommend the format approach.

Valued Guide
Posts: 2,174

Re: array and loops? recoding into different variables

the format approach offers most flexibility as it handles the ranges better than almost all other approaches

Ask a Question
Discussion stats
  • 6 replies
  • 290 views
  • 0 likes
  • 5 in conversation