## "Categorizing a continuous variable"

Solved
Super Contributor
Posts: 338

# "Categorizing a continuous variable"

Hi Collegues;

I have a continuous variable called pcntge.
I want discrete categoiries of it as shown below.

data a;
input pcntge;
datalines;
0.0001    /*any value GT zero should be categoirsed as 14*/
.    /*missing values should come as missing (which means a dot)*/
0   /*all zeros should be categorized as 1*/
-9   /* LT 0 to GT -10 should be categorized as 2*/
-19  /* LT -10 to GT -20 should be categoorized as 3*/
-29  /* LT -20 to GT -30 should be categorized as 4*/
-39  /* LT -30 to GT -40 should be categorized as 5*/
-49  /* LT -40 to GT -50 should be categorized as 6*/
-59  /* LT -50 to GT -60 should be categorized as 7*/
-69  /* LT -60 to GT -70 should be categorized as 8*/
-79  /* LT -70 to GT -80 should be categorized as 9*/
-89  /* LT -80 to GT -90 should be categorized as 10*/
-99  /* LT -90 to GT -100 should be categorized as 11*/
-100 /*all -100 should be categoirsed as 12*/
-100.001 /*any value LT -100 should be categoirsed as 13*/
;
Run;

This is the code I have attempted but didn't work. Any help would be really apprecaited.

Data b;
set a  ;

if (. <pcntge <= 0) then category=1;
else if (0 <PCNTGE<= -10) then category=2;
else if (-10 < PCNTGE<=  -20)  then category=3;
else if (-20 < PCNTGE<= -30  ) then category=4;
else if (-30 < PCNTGE<= -40  ) then category=5;
else if (-40 < PCNTGE<= -50  ) then category=6;
else if (-50 < PCNTGE<= -60  ) then category=7;
else if (-60 < PCNTGE<=  -70)  then category=8;
else if (-70 < PCNTGE<= -80  ) then category=9;
else if (-80 < PCNTGE<= -90  ) then category=10;
else if (-90 < PCNTGE<= -100  ) then category=11;

else if (PCNTGE=-100) then category=12;
else if (-100 < PCNTGE  ) then category=13;
run;

Thanks

Mirisage

Accepted Solutions
Solution
‎12-29-2011 05:46 PM
PROC Star
Posts: 8,163

## "Categorizing a continuous variable"

I may not have correctly captured your rules, but I think that the following is closer to what you want:

data a;

input pcntge;

cards;

.2

.0001

.

-.0001

-1

-11

-70

;

Data b;

set a  ;

if missing(pcntge) then do;

call missing(category);

end;

else if pcntge > 0 then category=14;

else if pcntge = 0 then category=1;

else if pcntge > -10 then category=2;

else if pcntge > -20 then category=3;

else if pcntge > -30 then category=4;

else if pcntge > -40 then category=5;

else if pcntge > -50 then category=6;

else if pcntge > -60 then category=7;

else if pcntge > -70 then category=8;

else if pcntge > -80 then category=9;

else if pcntge > -90 then category=10;

else if pcntge > -100 then category=11;

else if pcntge > -100 then category=12;

run;

All Replies
Solution
‎12-29-2011 05:46 PM
PROC Star
Posts: 8,163

## "Categorizing a continuous variable"

I may not have correctly captured your rules, but I think that the following is closer to what you want:

data a;

input pcntge;

cards;

.2

.0001

.

-.0001

-1

-11

-70

;

Data b;

set a  ;

if missing(pcntge) then do;

call missing(category);

end;

else if pcntge > 0 then category=14;

else if pcntge = 0 then category=1;

else if pcntge > -10 then category=2;

else if pcntge > -20 then category=3;

else if pcntge > -30 then category=4;

else if pcntge > -40 then category=5;

else if pcntge > -50 then category=6;

else if pcntge > -60 then category=7;

else if pcntge > -70 then category=8;

else if pcntge > -80 then category=9;

else if pcntge > -90 then category=10;

else if pcntge > -100 then category=11;

else if pcntge > -100 then category=12;

run;

Super Contributor
Posts: 338

## "Categorizing a continuous variable"

Hi Art,

Thank you very much for this code which works correctly after revising the last two statements (below is the revised one).

data b;

SET a;

if missing(pcntge) then do;

call missing(category);

end;

else if pcntge > 0 then category=14;

else if pcntge = 0 then category=1;

else if pcntge > -10 then category=2;

else if pcntge > -20 then category=3;

else if pcntge > -30 then category=4;

else if pcntge > -40 then category=5;

else if pcntge > -50 then category=6;

else if pcntge > -60 then category=7;

else if pcntge > -70 then category=8;

else if pcntge > -80 then category=9;

else if pcntge > -90 then category=10;

else if pcntge > -100 then category=11;

else if pcntge = -100 then category=12;

else if pcntge < -100 then category=13;

run;

Thanks

Mirisage

Posts: 4,736

## "Categorizing a continuous variable"

else if (-10 < PCNTGE<= -20) then category=3;

it should be:

else if (-10 > PCNTGE>= -20) then category=3;

Besides of if statements you could also use a format like below:

proc format;
value _recode (min=17)
<0 - high   = 14
0          = 1
-10 -<  0    = 2
-20 -< -10   = 3
-30 -< -20   = 4
-40 -< -30   = 5
-50 -< -40   = 6
-60 -< -50   = 7
-70 -< -60   = 8
-80 -< -70   = 9
-90 -< -80   = 10
-100<-< -90  = 11
-100         = 12
low -< -100  = 13
;
run;

data a;
input pcntge;
format pcntge category best32.;
category=input(put(pcntge,_recode.),best32.);
datalines;
0.0001 /*any value GT zero should be categoirsed as 14*/
. /*missing values should come as missing (which means a dot)*/
0 /*all zeros should be categorized as 1*/
-9 /* LT 0 to GT -10 should be categorized as 2*/
-19 /* LT -10 to GT -20 should be categoorized as 3*/
-29 /* LT -20 to GT -30 should be categorized as 4*/
-39 /* LT -30 to GT -40 should be categorized as 5*/
-49 /* LT -40 to GT -50 should be categorized as 6*/
-59 /* LT -50 to GT -60 should be categorized as 7*/
-69 /* LT -60 to GT -70 should be categorized as 8*/
-79 /* LT -70 to GT -80 should be categorized as 9*/
-89 /* LT -80 to GT -90 should be categorized as 10*/
-99 /* LT -90 to GT -100 should be categorized as 11*/
-100 /*all -100 should be categoirsed as 12*/
-100.001 /*any value LT -100 should be categoirsed as 13*/
;
Run;

Super Contributor
Posts: 338

## "Categorizing a continuous variable"

Hi Patrick,

This is great!

Thank you very much.

This works well only when I revise the first line of category definition under "proc format" as follows.

Instead of  "  <0 - high   = 14", as you have suggested, I had to revise it to " 0 - high   = 14". Then only code works. However, logically it has to be

>0 - high   = 14, isn't it? But when I incorporate >0 - high   = 14, the code doesn't work?

If you have time, could you please shed some light "how come 0 - high   = 14 works while it has to be >0 - high   = 14   logically, which doesn't work.

Thanks again

Mirisage

Super User
Posts: 8,070

## Re: "Categorizing a continuous variable"

The syntax you used for eliminating the lower bound from the range was wrong.  Also SAS will automatically assign a value that is the upper and lower bounds of two ranges to the lower range.

See the manual pages for PROC FORMAT.

http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a002473474.htm

```You can use the less than (<) symbol to exclude values from ranges. If you are excluding the first value in a range, then put the < after the value. If you are excluding the last value in a range, then put the < before the value. For example, the following range does not include 0:   0<-100
Likewise, the following range does not include 100:
0-<100
If a value at the high end of one range also appears at the low end of another range, and you do not use the < noninclusion notation, then PROC FORMAT assigns the value to the first range. For example, in the following ranges, the value AJ is part of the first range:
'AA'-'AJ'=1 'AJ'-'AZ'=2
In this example, to include the value AJ in the second range, use the noninclusive notation on the first range:
'AA'-<'AJ'=1 'AJ'-'AZ'=2

```
Posts: 4,736

## Re: "Categorizing a continuous variable"

Tom is of course right. It should be:  0 <- high = 14

Super Contributor
Posts: 338

## Re: "Categorizing a continuous variable"

Hi Tom and Patrick,

Wish you a happy 2012!

Tom, I clearly understood the logic by your nice explanation.

Thank you very much.

Mirisage

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
• 7 replies
• 6450 views
• 3 likes
• 4 in conversation