Help using Base SAS procedures

"Categorizing a continuous variable"

Accepted Solution Solved
Reply
Super Contributor
Posts: 338
Accepted Solution

"Categorizing a continuous variable"

Hi Collegues;


      I have a continuous variable called pcntge.
      I want discrete categoiries of it as shown below.

    data a;
      input pcntge;
      datalines;
       0.0001    /*any value GT zero should be categoirsed as 14*/
        .    /*missing values should come as missing (which means a dot)*/
        0   /*all zeros should be categorized as 1*/
      -9   /* LT 0 to GT -10 should be categorized as 2*/
      -19  /* LT -10 to GT -20 should be categoorized as 3*/
      -29  /* LT -20 to GT -30 should be categorized as 4*/
      -39  /* LT -30 to GT -40 should be categorized as 5*/
      -49  /* LT -40 to GT -50 should be categorized as 6*/
      -59  /* LT -50 to GT -60 should be categorized as 7*/
      -69  /* LT -60 to GT -70 should be categorized as 8*/
      -79  /* LT -70 to GT -80 should be categorized as 9*/
      -89  /* LT -80 to GT -90 should be categorized as 10*/
      -99  /* LT -90 to GT -100 should be categorized as 11*/
      -100 /*all -100 should be categoirsed as 12*/
      -100.001 /*any value LT -100 should be categoirsed as 13*/
      ;
      Run;

This is the code I have attempted but didn't work. Any help would be really apprecaited.


      Data b;
             set a  ;

                   if (. <pcntge <= 0) then category=1;
              else if (0 <PCNTGE<= -10) then category=2;
              else if (-10 < PCNTGE<=  -20)  then category=3;
              else if (-20 < PCNTGE<= -30  ) then category=4;
              else if (-30 < PCNTGE<= -40  ) then category=5;
              else if (-40 < PCNTGE<= -50  ) then category=6;
              else if (-50 < PCNTGE<= -60  ) then category=7;
              else if (-60 < PCNTGE<=  -70)  then category=8;
              else if (-70 < PCNTGE<= -80  ) then category=9;
              else if (-80 < PCNTGE<= -90  ) then category=10;
              else if (-90 < PCNTGE<= -100  ) then category=11;

              else if (PCNTGE=-100) then category=12;
              else if (-100 < PCNTGE  ) then category=13;
      run;

Thanks

Mirisage


Accepted Solutions
Solution
‎12-29-2011 05:46 PM
PROC Star
Posts: 7,363

"Categorizing a continuous variable"

I may not have correctly captured your rules, but I think that the following is closer to what you want:

data a;

  input pcntge;

  cards;

.2

.0001

.

-.0001

-1

-11

-70

;

Data b;

  set a  ;

  if missing(pcntge) then do;

    call missing(category);

  end;

  else if pcntge > 0 then category=14;

  else if pcntge = 0 then category=1;

  else if pcntge > -10 then category=2;

  else if pcntge > -20 then category=3;

  else if pcntge > -30 then category=4;

  else if pcntge > -40 then category=5;

  else if pcntge > -50 then category=6;

  else if pcntge > -60 then category=7;

  else if pcntge > -70 then category=8;

  else if pcntge > -80 then category=9;

  else if pcntge > -90 then category=10;

  else if pcntge > -100 then category=11;

  else if pcntge > -100 then category=12;

run;

View solution in original post


All Replies
Solution
‎12-29-2011 05:46 PM
PROC Star
Posts: 7,363

"Categorizing a continuous variable"

I may not have correctly captured your rules, but I think that the following is closer to what you want:

data a;

  input pcntge;

  cards;

.2

.0001

.

-.0001

-1

-11

-70

;

Data b;

  set a  ;

  if missing(pcntge) then do;

    call missing(category);

  end;

  else if pcntge > 0 then category=14;

  else if pcntge = 0 then category=1;

  else if pcntge > -10 then category=2;

  else if pcntge > -20 then category=3;

  else if pcntge > -30 then category=4;

  else if pcntge > -40 then category=5;

  else if pcntge > -50 then category=6;

  else if pcntge > -60 then category=7;

  else if pcntge > -70 then category=8;

  else if pcntge > -80 then category=9;

  else if pcntge > -90 then category=10;

  else if pcntge > -100 then category=11;

  else if pcntge > -100 then category=12;

run;

Super Contributor
Posts: 338

"Categorizing a continuous variable"

Hi Art,

Thank you very much for this code which works correctly after revising the last two statements (below is the revised one).

data b;

    SET a;

    if missing(pcntge) then do;

       call missing(category);

     end;

     else if pcntge > 0 then category=14;

     else if pcntge = 0 then category=1;

     else if pcntge > -10 then category=2;

     else if pcntge > -20 then category=3;

     else if pcntge > -30 then category=4;

     else if pcntge > -40 then category=5;

     else if pcntge > -50 then category=6;

     else if pcntge > -60 then category=7;

     else if pcntge > -70 then category=8;

     else if pcntge > -80 then category=9;

     else if pcntge > -90 then category=10;

     else if pcntge > -100 then category=11;

     else if pcntge = -100 then category=12;

     else if pcntge < -100 then category=13;

   run;

Thanks

Mirisage

Respected Advisor
Posts: 3,893

"Categorizing a continuous variable"

Your conditions are never. Instead of:

else if (-10 < PCNTGE<= -20) then category=3;

it should be:

else if (-10 > PCNTGE>= -20) then category=3;

Besides of if statements you could also use a format like below:

proc format;
  value _recode (min=17)
    <0 - high   = 14
     0          = 1
   -10 -<  0    = 2
   -20 -< -10   = 3
   -30 -< -20   = 4
   -40 -< -30   = 5
   -50 -< -40   = 6
   -60 -< -50   = 7
   -70 -< -60   = 8
   -80 -< -70   = 9
   -90 -< -80   = 10
   -100<-< -90  = 11
   -100         = 12
   low -< -100  = 13
;
run;

data a;
  input pcntge;
  format pcntge category best32.;
  category=input(put(pcntge,_recode.),best32.);
  datalines;
0.0001 /*any value GT zero should be categoirsed as 14*/
. /*missing values should come as missing (which means a dot)*/
0 /*all zeros should be categorized as 1*/
-9 /* LT 0 to GT -10 should be categorized as 2*/
-19 /* LT -10 to GT -20 should be categoorized as 3*/
-29 /* LT -20 to GT -30 should be categorized as 4*/
-39 /* LT -30 to GT -40 should be categorized as 5*/
-49 /* LT -40 to GT -50 should be categorized as 6*/
-59 /* LT -50 to GT -60 should be categorized as 7*/
-69 /* LT -60 to GT -70 should be categorized as 8*/
-79 /* LT -70 to GT -80 should be categorized as 9*/
-89 /* LT -80 to GT -90 should be categorized as 10*/
-99 /* LT -90 to GT -100 should be categorized as 11*/
-100 /*all -100 should be categoirsed as 12*/
-100.001 /*any value LT -100 should be categoirsed as 13*/
;
Run;

Super Contributor
Posts: 338

"Categorizing a continuous variable"

Hi Patrick,

This is great!

Thank you very much.

This works well only when I revise the first line of category definition under "proc format" as follows.

Instead of  "  <0 - high   = 14", as you have suggested, I had to revise it to " 0 - high   = 14". Then only code works. However, logically it has to be

>0 - high   = 14, isn't it? But when I incorporate >0 - high   = 14, the code doesn't work?

If you have time, could you please shed some light "how come 0 - high   = 14 works while it has to be >0 - high   = 14   logically, which doesn't work.

Thanks again

Mirisage

Super User
Super User
Posts: 6,500

Re: "Categorizing a continuous variable"

The syntax you used for eliminating the lower bound from the range was wrong.  Also SAS will automatically assign a value that is the upper and lower bounds of two ranges to the lower range.

See the manual pages for PROC FORMAT.

http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a002473474.htm

You can use the less than (<) symbol to exclude values from ranges. If you are excluding the first value in a range, then put the < after the value. If you are excluding the last value in a range, then put the < before the value. For example, the following range does not include 0:

   0<-100

Likewise, the following range does not include 100:

   0-<100

If a value at the high end of one range also appears at the low end of another range, and you do not use the < noninclusion notation, then PROC FORMAT assigns the value to the first range. For example, in the following ranges, the value AJ is part of the first range:

'AA'-'AJ'=1 'AJ'-'AZ'=2

In this example, to include the value AJ in the second range, use the noninclusive notation on the first range:

   'AA'-<'AJ'=1 'AJ'-'AZ'=2

Respected Advisor
Posts: 3,893

Re: "Categorizing a continuous variable"

Tom is of course right. It should be:  0 <- high = 14

Super Contributor
Posts: 338

Re: "Categorizing a continuous variable"

Hi Tom and Patrick,

Wish you a happy 2012!

Tom, I clearly understood the logic by your nice explanation.

Thank you very much.

Mirisage

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 4561 views
  • 3 likes
  • 4 in conversation