Hi,
I have a number of variables i'd like to recode into different variables by range, my data looks like:
id Rate1 Rate2 Rate3 Rate4
1 0 10 23 54
2 18 39 23 84
3 4 87 92 8
...
I'd like to create new variables that group them into category indicators for each Rate
0-25, 25-50, 50-75, 75-max
I'd like the data to come out:
id Rate1 Rate2 Rate3 Rate4 R1_0_25 R1_25_50 R1_50_75 R1_75_M R2_0_25 R2_25_50 R2_50_75 R2_75_M R3_0_25 R3_25_50 R3_50_75 R3_75_M R4_0_25 R4_25_50 R4_50_75 R4_75_M
1 0 10 23 54 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0
2 18 39 23 84 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1
3 4 87 92 8 1 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0
...
I started with array's but i didnt think through it very clearly yet so that didnt work, any help is very much appreciated!
ARRAY RATELIST {*} RATE1 RATE2 RATE3 RATE4 ;
ARRAY NEWCAT {*} R1_75_M R2_0_25 R2_25_50 R2_50_75 R2_75_M R3_0_25 R3_25_50 R3_50_75 R3_75_M R4_0_25 R4_25_50 R4_50_75 R4_75_M ;
Do i=1 to dim(NEWCAT) ;
if 0 LE RATELIST{i} LE 25 then NEWCAT{i}= 1; else NEWCAT{i}=0 ;
if 25 LE RATELIST{i} LE 50 then NEWCAT{i}= 1; else NEWCAT{i}=0 ;
if 50 LE RATELIST{i} LE 75 then NEWCAT{i}= 1; else NEWCAT{i}=0 ;
if RATELIST{i} GE 75 then NEWCAT{i}= 1; else NEWCAT{i}=0 ;
end;
Thank you.
You have a problem in that you have the values of 25, 50, and 75 assigned to two categories when equal. Which category do you want when value is exactly equal to 25?
How will you use these recoded values? It may be that you don't need to create a new variable but a custom format.
For example if I wanted to run proc freq on the rates and know how many and percent were within the range then I could do:
Proc format;
value quartile
0 - < 25 = '0 to < 25'
25 -<50 = '25 to <50'
50 -<75 = '50 to <75'
75 - high='75+'
;
run;
proc freq data=yourdatasetname;
tables rate1 rate2 rate3 rate4;
format rate1 rate2 rate3 rate4 quartile.;
run;
Most of the analysis procedures will use the formatted value.
or to recode one:
R2_0_25 = (0 le Rate2 le 25); /* though I think you may want (0 le Rate2 lt 25) */
sorry you're right with the recodes
0 - < 25 = '0 to < 25'
25 -<50 = '25 to <50'
50 -<75 = '50 to <75'
75 - high='75+'
Although a format will not work for me, i need the rates recoded into a new array of variables as indicators.
When forming numeric groups one can usually use the values to calculate the array index. Here I have added a rate5 so the number of rates and the number of groups are not the same.
data have;
input id Rate1 Rate2 Rate3 Rate4 rate5;
datalines;
1 0 10 23 54 99
2 18 39 23 84 -5
3 4 87 92 8 33
run;
data want(drop=i r index rate:);
set have;
ARRAY RATELIST {*} RATE1 RATE2 RATE3 RATE4 rate5;
/*0-<25, 25-<50, 50-<75, 75-max*/
ARRAY NEWCAT {*} R1_0_25 R1_25_50 R1_50_75 R1_75_M
R2_0_25 R2_25_50 R2_50_75 R2_75_M
R3_0_25 R3_25_50 R3_50_75 R3_75_M
R4_0_25 R4_25_50 R4_50_75 R4_75_M
R5_0_25 R5_25_50 R5_50_75 R5_75_M ;
do i = 1 to dim(newcat);
newcat{i}=0;
end;
do r = 1 to dim(ratelist);
i = floor(ratelist{r}/25)+1; /* map rate into a nominal value between 1 and 4 */
i = max(i,1); /* map rates<0 to 1*/
i = min(i,dim(ratelist)); /* map large values to upper rate */
index = i + ((r-1)*4); /* 4 (75/25+1) is number of groups - not number or rates */
newcat{index} = 1;
end;
run;
Thanks Art. (I've changed accounts, new company)
The code above works for 4 even groups, if i were to have uneven groups, how would that change?
For example instead of even 25 percent intervals, i have
0-25
25-40
40-60
60-90
90+
Here are a couple of approaches when your groups have unequal width. First an assignment statement - this will be faster than IF-THEN/ELSE processing.
i = 1 * (0 <=ratelist{r} <25)
+2 * (25<=ratelist{r} <40)
+3 * (40<=ratelist{r} <60)
+4 * (60<=ratelist{r} <90)
+5 * (90<=ratelist{r});
You could also create an informat that maps the RATELIST{r} to the index value.
proc format;
invalue raterange
0 -<25=1
25-<40=2
40-<60=3
60-<90=4
90-high=5;
The assignment statement then becomes:
i = input(ratelist{r},raterange.);
If the number of ranges becomes much more than 5 I would recommend the format approach.
the format approach offers most flexibility as it handles the ranges better than almost all other approaches
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.