- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Collegues;
I have a continuous variable called pcntge.
I want discrete categoiries of it as shown below.
data a;
input pcntge;
datalines;
0.0001 /*any value GT zero should be categoirsed as 14*/
. /*missing values should come as missing (which means a dot)*/
0 /*all zeros should be categorized as 1*/
-9 /* LT 0 to GT -10 should be categorized as 2*/
-19 /* LT -10 to GT -20 should be categoorized as 3*/
-29 /* LT -20 to GT -30 should be categorized as 4*/
-39 /* LT -30 to GT -40 should be categorized as 5*/
-49 /* LT -40 to GT -50 should be categorized as 6*/
-59 /* LT -50 to GT -60 should be categorized as 7*/
-69 /* LT -60 to GT -70 should be categorized as 8*/
-79 /* LT -70 to GT -80 should be categorized as 9*/
-89 /* LT -80 to GT -90 should be categorized as 10*/
-99 /* LT -90 to GT -100 should be categorized as 11*/
-100 /*all -100 should be categoirsed as 12*/
-100.001 /*any value LT -100 should be categoirsed as 13*/
;
Run;
This is the code I have attempted but didn't work. Any help would be really apprecaited.
Data b;
set a ;
if (. <pcntge <= 0) then category=1;
else if (0 <PCNTGE<= -10) then category=2;
else if (-10 < PCNTGE<= -20) then category=3;
else if (-20 < PCNTGE<= -30 ) then category=4;
else if (-30 < PCNTGE<= -40 ) then category=5;
else if (-40 < PCNTGE<= -50 ) then category=6;
else if (-50 < PCNTGE<= -60 ) then category=7;
else if (-60 < PCNTGE<= -70) then category=8;
else if (-70 < PCNTGE<= -80 ) then category=9;
else if (-80 < PCNTGE<= -90 ) then category=10;
else if (-90 < PCNTGE<= -100 ) then category=11;
else if (PCNTGE=-100) then category=12;
else if (-100 < PCNTGE ) then category=13;
run;
Thanks
Mirisage
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I may not have correctly captured your rules, but I think that the following is closer to what you want:
data a;
input pcntge;
cards;
.2
.0001
.
-.0001
-1
-11
-70
;
Data b;
set a ;
if missing(pcntge) then do;
call missing(category);
end;
else if pcntge > 0 then category=14;
else if pcntge = 0 then category=1;
else if pcntge > -10 then category=2;
else if pcntge > -20 then category=3;
else if pcntge > -30 then category=4;
else if pcntge > -40 then category=5;
else if pcntge > -50 then category=6;
else if pcntge > -60 then category=7;
else if pcntge > -70 then category=8;
else if pcntge > -80 then category=9;
else if pcntge > -90 then category=10;
else if pcntge > -100 then category=11;
else if pcntge > -100 then category=12;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I may not have correctly captured your rules, but I think that the following is closer to what you want:
data a;
input pcntge;
cards;
.2
.0001
.
-.0001
-1
-11
-70
;
Data b;
set a ;
if missing(pcntge) then do;
call missing(category);
end;
else if pcntge > 0 then category=14;
else if pcntge = 0 then category=1;
else if pcntge > -10 then category=2;
else if pcntge > -20 then category=3;
else if pcntge > -30 then category=4;
else if pcntge > -40 then category=5;
else if pcntge > -50 then category=6;
else if pcntge > -60 then category=7;
else if pcntge > -70 then category=8;
else if pcntge > -80 then category=9;
else if pcntge > -90 then category=10;
else if pcntge > -100 then category=11;
else if pcntge > -100 then category=12;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Art,
Thank you very much for this code which works correctly after revising the last two statements (below is the revised one).
data b;
SET a;
if missing(pcntge) then do;
call missing(category);
end;
else if pcntge > 0 then category=14;
else if pcntge = 0 then category=1;
else if pcntge > -10 then category=2;
else if pcntge > -20 then category=3;
else if pcntge > -30 then category=4;
else if pcntge > -40 then category=5;
else if pcntge > -50 then category=6;
else if pcntge > -60 then category=7;
else if pcntge > -70 then category=8;
else if pcntge > -80 then category=9;
else if pcntge > -90 then category=10;
else if pcntge > -100 then category=11;
else if pcntge = -100 then category=12;
else if pcntge < -100 then category=13;
run;
Thanks
Mirisage
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your conditions are never. Instead of:
else if (-10 < PCNTGE<= -20) then category=3;
it should be:
else if (-10 > PCNTGE>= -20) then category=3;
Besides of if statements you could also use a format like below:
proc format;
value _recode (min=17)
<0 - high = 14
0 = 1
-10 -< 0 = 2
-20 -< -10 = 3
-30 -< -20 = 4
-40 -< -30 = 5
-50 -< -40 = 6
-60 -< -50 = 7
-70 -< -60 = 8
-80 -< -70 = 9
-90 -< -80 = 10
-100<-< -90 = 11
-100 = 12
low -< -100 = 13
;
run;
data a;
input pcntge;
format pcntge category best32.;
category=input(put(pcntge,_recode.),best32.);
datalines;
0.0001 /*any value GT zero should be categoirsed as 14*/
. /*missing values should come as missing (which means a dot)*/
0 /*all zeros should be categorized as 1*/
-9 /* LT 0 to GT -10 should be categorized as 2*/
-19 /* LT -10 to GT -20 should be categoorized as 3*/
-29 /* LT -20 to GT -30 should be categorized as 4*/
-39 /* LT -30 to GT -40 should be categorized as 5*/
-49 /* LT -40 to GT -50 should be categorized as 6*/
-59 /* LT -50 to GT -60 should be categorized as 7*/
-69 /* LT -60 to GT -70 should be categorized as 8*/
-79 /* LT -70 to GT -80 should be categorized as 9*/
-89 /* LT -80 to GT -90 should be categorized as 10*/
-99 /* LT -90 to GT -100 should be categorized as 11*/
-100 /*all -100 should be categoirsed as 12*/
-100.001 /*any value LT -100 should be categoirsed as 13*/
;
Run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Patrick,
This is great!
Thank you very much.
This works well only when I revise the first line of category definition under "proc format" as follows.
Instead of " <0 - high = 14", as you have suggested, I had to revise it to " 0 - high = 14". Then only code works. However, logically it has to be
>0 - high = 14, isn't it? But when I incorporate >0 - high = 14, the code doesn't work?
If you have time, could you please shed some light "how come 0 - high = 14 works while it has to be >0 - high = 14 logically, which doesn't work.
Thanks again
Mirisage
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The syntax you used for eliminating the lower bound from the range was wrong. Also SAS will automatically assign a value that is the upper and lower bounds of two ranges to the lower range.
See the manual pages for PROC FORMAT.
http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a002473474.htm
You can use the less than (<) symbol to exclude values from ranges. If you are excluding the first value in a range, then put the < after the value. If you are excluding the last value in a range, then put the < before the value. For example, the following range does not include 0:
0<-100Likewise, the following range does not include 100:
0-<100If a value at the high end of one range also appears at the low end of another range, and you do not use the < noninclusion notation, then PROC FORMAT assigns the value to the first range. For example, in the following ranges, the value AJ is part of the first range:
'AA'-'AJ'=1 'AJ'-'AZ'=2In this example, to include the value AJ in the second range, use the noninclusive notation on the first range:
'AA'-<'AJ'=1 'AJ'-'AZ'=2
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Tom is of course right. It should be: 0 <- high = 14
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tom and Patrick,
Wish you a happy 2012!
Tom, I clearly understood the logic by your nice explanation.
Thank you very much.
Mirisage