BookmarkSubscribeRSS Feed
Danglytics
Calcite | Level 5

Hi,

I have a number of variables i'd like to recode into different variables by range, my data looks like:

id Rate1 Rate2 Rate3 Rate4

1 0 10 23 54

2 18 39 23 84

3 4 87 92 8

...

I'd like to create new variables that group them into category indicators for each Rate

0-25, 25-50, 50-75, 75-max

I'd like the data to come out:

id Rate1 Rate2 Rate3 Rate4 R1_0_25 R1_25_50 R1_50_75 R1_75_M R2_0_25 R2_25_50 R2_50_75 R2_75_M R3_0_25 R3_25_50 R3_50_75 R3_75_M  R4_0_25 R4_25_50 R4_50_75 R4_75_M

1 0 10 23 54 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0

2 18 39 23 84 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1

3 4 87 92 8 1 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0

...

I started with array's but i didnt think through it very clearly yet so that didnt work, any help is very much appreciated!

ARRAY RATELIST {*} RATE1 RATE2 RATE3 RATE4 ;

ARRAY NEWCAT {*} R1_75_M R2_0_25 R2_25_50 R2_50_75 R2_75_M R3_0_25 R3_25_50 R3_50_75 R3_75_M  R4_0_25 R4_25_50 R4_50_75 R4_75_M ;

Do i=1 to dim(NEWCAT) ;

  if 0 LE RATELIST{i} LE 25 then NEWCAT{i}= 1; else NEWCAT{i}=0 ;

  if 25 LE RATELIST{i} LE 50 then NEWCAT{i}= 1; else NEWCAT{i}=0 ;

  if 50 LE RATELIST{i} LE 75 then NEWCAT{i}= 1; else NEWCAT{i}=0 ;

  if RATELIST{i} GE 75 then NEWCAT{i}= 1; else NEWCAT{i}=0 ;

end;

Thank you.

6 REPLIES 6
ballardw
Super User

You have a problem in that you have the values of 25, 50, and 75 assigned to two categories when equal. Which category do you want when value is exactly equal to 25?

How will you use these recoded values? It may be that you don't need to create a new variable but a custom format.

For example if I wanted to run proc freq on the rates and know how many and percent were within the range then I could do:

Proc format;

value quartile

0 - < 25 = '0 to < 25'

25 -<50 = '25 to <50'

50 -<75 = '50 to <75'

75 - high='75+'

;

run;

proc freq data=yourdatasetname;

tables rate1 rate2 rate3 rate4;

format rate1 rate2 rate3 rate4 quartile.;

run;

Most of the analysis procedures will use the formatted value.

or to recode one:

R2_0_25 = (0 le Rate2 le 25); /* though I think you may want (0 le Rate2 lt 25)  */

DangIT
Fluorite | Level 6

sorry you're right with the recodes

0 - < 25 = '0 to < 25'

25 -<50 = '25 to <50'

50 -<75 = '50 to <75'

75 - high='75+'

Although a format will not work for me, i need the rates recoded into a new array of variables as indicators.

ArtC
Rhodochrosite | Level 12

When forming numeric groups one can usually use the values to calculate the array index.  Here I have added a rate5 so the number of rates and the number of groups are not the same.

data have;
input id Rate1 Rate2 Rate3 Rate4 rate5;
datalines;
1 0 10 23 54 99
2 18 39 23 84 -5
3 4 87 92 8 33
run;

data want(drop=i r index rate:);
   set have;
   ARRAY RATELIST {*} RATE1 RATE2 RATE3 RATE4 rate5;
                    /*0-<25, 25-<50, 50-<75, 75-max*/
   ARRAY NEWCAT {*} R1_0_25 R1_25_50 R1_50_75 R1_75_M
                    R2_0_25 R2_25_50 R2_50_75 R2_75_M
                    R3_0_25 R3_25_50 R3_50_75 R3_75_M 
                    R4_0_25 R4_25_50 R4_50_75 R4_75_M 
                    R5_0_25 R5_25_50 R5_50_75 R5_75_M ;
   do i = 1 to dim(newcat);
      newcat{i}=0;
   end;

   do r = 1 to dim(ratelist);
      i = floor(ratelist{r}/25)+1; /* map rate into a nominal value between 1 and 4 */
      i = max(i,1);                /* map rates<0 to 1*/
      i = min(i,dim(ratelist));    /* map large values to upper rate */
      index = i + ((r-1)*4);          /* 4 (75/25+1) is number of groups  - not number or rates */
      newcat{index} = 1;
   end;
   run;

DangIT
Fluorite | Level 6

Thanks Art. (I've changed accounts, new company)

The code above works for 4 even groups, if i were to have uneven groups, how would that change?

For example instead of even 25 percent intervals, i have

0-25

25-40

40-60

60-90

90+

ArtC
Rhodochrosite | Level 12

Here are a couple of approaches when your groups have unequal width.  First an assignment statement - this will be faster than IF-THEN/ELSE processing.

i = 1 * (0 <=ratelist{r} <25)
    +2 * (25<=ratelist{r} <40)
    +3 * (40<=ratelist{r} <60)
    +4 * (60<=ratelist{r} <90)
    +5 * (90<=ratelist{r});

You could also create an informat that maps the RATELIST{r} to the index value.

proc format;
invalue raterange
   0 -<25=1
   25-<40=2 

   40-<60=3
   60-<90=4
   90-high=5;

The assignment statement then becomes:

     i = input(ratelist{r},raterange.);

If the number of ranges becomes much more than 5 I would recommend the format approach.

Peter_C
Rhodochrosite | Level 12

the format approach offers most flexibility as it handles the ranges better than almost all other approaches

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1065 views
  • 0 likes
  • 5 in conversation