Hi Collegues;
I have a continuous variable called pcntge.
I want discrete categoiries of it as shown below.
data a;
input pcntge;
datalines;
0.0001 /*any value GT zero should be categoirsed as 14*/
. /*missing values should come as missing (which means a dot)*/
0 /*all zeros should be categorized as 1*/
-9 /* LT 0 to GT -10 should be categorized as 2*/
-19 /* LT -10 to GT -20 should be categoorized as 3*/
-29 /* LT -20 to GT -30 should be categorized as 4*/
-39 /* LT -30 to GT -40 should be categorized as 5*/
-49 /* LT -40 to GT -50 should be categorized as 6*/
-59 /* LT -50 to GT -60 should be categorized as 7*/
-69 /* LT -60 to GT -70 should be categorized as 8*/
-79 /* LT -70 to GT -80 should be categorized as 9*/
-89 /* LT -80 to GT -90 should be categorized as 10*/
-99 /* LT -90 to GT -100 should be categorized as 11*/
-100 /*all -100 should be categoirsed as 12*/
-100.001 /*any value LT -100 should be categoirsed as 13*/
;
Run;
This is the code I have attempted but didn't work. Any help would be really apprecaited.
Data b;
set a ;
if (. <pcntge <= 0) then category=1;
else if (0 <PCNTGE<= -10) then category=2;
else if (-10 < PCNTGE<= -20) then category=3;
else if (-20 < PCNTGE<= -30 ) then category=4;
else if (-30 < PCNTGE<= -40 ) then category=5;
else if (-40 < PCNTGE<= -50 ) then category=6;
else if (-50 < PCNTGE<= -60 ) then category=7;
else if (-60 < PCNTGE<= -70) then category=8;
else if (-70 < PCNTGE<= -80 ) then category=9;
else if (-80 < PCNTGE<= -90 ) then category=10;
else if (-90 < PCNTGE<= -100 ) then category=11;
else if (PCNTGE=-100) then category=12;
else if (-100 < PCNTGE ) then category=13;
run;
Thanks
Mirisage
I may not have correctly captured your rules, but I think that the following is closer to what you want:
data a;
input pcntge;
cards;
.2
.0001
.
-.0001
-1
-11
-70
;
Data b;
set a ;
if missing(pcntge) then do;
call missing(category);
end;
else if pcntge > 0 then category=14;
else if pcntge = 0 then category=1;
else if pcntge > -10 then category=2;
else if pcntge > -20 then category=3;
else if pcntge > -30 then category=4;
else if pcntge > -40 then category=5;
else if pcntge > -50 then category=6;
else if pcntge > -60 then category=7;
else if pcntge > -70 then category=8;
else if pcntge > -80 then category=9;
else if pcntge > -90 then category=10;
else if pcntge > -100 then category=11;
else if pcntge > -100 then category=12;
run;
I may not have correctly captured your rules, but I think that the following is closer to what you want:
data a;
input pcntge;
cards;
.2
.0001
.
-.0001
-1
-11
-70
;
Data b;
set a ;
if missing(pcntge) then do;
call missing(category);
end;
else if pcntge > 0 then category=14;
else if pcntge = 0 then category=1;
else if pcntge > -10 then category=2;
else if pcntge > -20 then category=3;
else if pcntge > -30 then category=4;
else if pcntge > -40 then category=5;
else if pcntge > -50 then category=6;
else if pcntge > -60 then category=7;
else if pcntge > -70 then category=8;
else if pcntge > -80 then category=9;
else if pcntge > -90 then category=10;
else if pcntge > -100 then category=11;
else if pcntge > -100 then category=12;
run;
Hi Art,
Thank you very much for this code which works correctly after revising the last two statements (below is the revised one).
data b;
SET a;
if missing(pcntge) then do;
call missing(category);
end;
else if pcntge > 0 then category=14;
else if pcntge = 0 then category=1;
else if pcntge > -10 then category=2;
else if pcntge > -20 then category=3;
else if pcntge > -30 then category=4;
else if pcntge > -40 then category=5;
else if pcntge > -50 then category=6;
else if pcntge > -60 then category=7;
else if pcntge > -70 then category=8;
else if pcntge > -80 then category=9;
else if pcntge > -90 then category=10;
else if pcntge > -100 then category=11;
else if pcntge = -100 then category=12;
else if pcntge < -100 then category=13;
run;
Thanks
Mirisage
Your conditions are never. Instead of:
else if (-10 < PCNTGE<= -20) then category=3;
it should be:
else if (-10 > PCNTGE>= -20) then category=3;
Besides of if statements you could also use a format like below:
proc format;
value _recode (min=17)
<0 - high = 14
0 = 1
-10 -< 0 = 2
-20 -< -10 = 3
-30 -< -20 = 4
-40 -< -30 = 5
-50 -< -40 = 6
-60 -< -50 = 7
-70 -< -60 = 8
-80 -< -70 = 9
-90 -< -80 = 10
-100<-< -90 = 11
-100 = 12
low -< -100 = 13
;
run;
data a;
input pcntge;
format pcntge category best32.;
category=input(put(pcntge,_recode.),best32.);
datalines;
0.0001 /*any value GT zero should be categoirsed as 14*/
. /*missing values should come as missing (which means a dot)*/
0 /*all zeros should be categorized as 1*/
-9 /* LT 0 to GT -10 should be categorized as 2*/
-19 /* LT -10 to GT -20 should be categoorized as 3*/
-29 /* LT -20 to GT -30 should be categorized as 4*/
-39 /* LT -30 to GT -40 should be categorized as 5*/
-49 /* LT -40 to GT -50 should be categorized as 6*/
-59 /* LT -50 to GT -60 should be categorized as 7*/
-69 /* LT -60 to GT -70 should be categorized as 8*/
-79 /* LT -70 to GT -80 should be categorized as 9*/
-89 /* LT -80 to GT -90 should be categorized as 10*/
-99 /* LT -90 to GT -100 should be categorized as 11*/
-100 /*all -100 should be categoirsed as 12*/
-100.001 /*any value LT -100 should be categoirsed as 13*/
;
Run;
Hi Patrick,
This is great!
Thank you very much.
This works well only when I revise the first line of category definition under "proc format" as follows.
Instead of " <0 - high = 14", as you have suggested, I had to revise it to " 0 - high = 14". Then only code works. However, logically it has to be
>0 - high = 14, isn't it? But when I incorporate >0 - high = 14, the code doesn't work?
If you have time, could you please shed some light "how come 0 - high = 14 works while it has to be >0 - high = 14 logically, which doesn't work.
Thanks again
Mirisage
The syntax you used for eliminating the lower bound from the range was wrong. Also SAS will automatically assign a value that is the upper and lower bounds of two ranges to the lower range.
See the manual pages for PROC FORMAT.
http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a002473474.htm
You can use the less than (<) symbol to exclude values from ranges. If you are excluding the first value in a range, then put the < after the value. If you are excluding the last value in a range, then put the < before the value. For example, the following range does not include 0:
0<-100Likewise, the following range does not include 100:
0-<100If a value at the high end of one range also appears at the low end of another range, and you do not use the < noninclusion notation, then PROC FORMAT assigns the value to the first range. For example, in the following ranges, the value AJ is part of the first range:
'AA'-'AJ'=1 'AJ'-'AZ'=2In this example, to include the value AJ in the second range, use the noninclusive notation on the first range:
'AA'-<'AJ'=1 'AJ'-'AZ'=2
Tom is of course right. It should be: 0 <- high = 14
Hi Tom and Patrick,
Wish you a happy 2012!
Tom, I clearly understood the logic by your nice explanation.
Thank you very much.
Mirisage
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.