Help using Base SAS procedures

Continuous variable into categories dynamically?

Accepted Solution Solved
Reply
Contributor NWV
Contributor
Posts: 29
Accepted Solution

Continuous variable into categories dynamically?

I have continuous data in a variable that looks like the attached text file.  Sorry, I cannot paste anything into this forum box.  If anyone knows why, please tell me! Too much to do in the html editor

I am dividing it into 20 groups, based on the percentiles.  I use proc univariate to get the percentiles:

proc univariate data=d1 ;

var X;

output out=getpercentiles pctlpre=P_ pctlpts=5 to 100 by 5;

run;

I want the resulting percentiles to be the top of the categories.  I then write if/then statements to categorize my original data:

if X ge 0 and LTCumExpT2 le 0.019 then bin=1;*note, lowest category starts at 0;

else if X gt 0.019 and X le 0.038 then bin=2;

else if X gt 0.038 and X le 0.078 then bin=3;

else if X gt 0.078 and X le 0.095 then bin=4;

else if X gt 0.095 and X le 0.1265 then bin=5;

else if X gt 0.1265 and X le 0.166 then bin=6;

else if X gt 0.166 and X le 0.254 then bin=7;

....

else if X gt 8.088 and X le 34.435 then bin=20;

Is there a way to get a new variable "bin" with a number from 1 to 20 in it more dynamically?  Rather than having to create all those if then statements?  I have searched the forums, and most people say to use a format, but that can't be done here, with the gt/le operators, as far as I can tell (trust me, I tried :smileygrinSmiley Happy.

If it was just the one variable I had to do this for, I wouldn't mind.  But I have to do it for about 20 of them, and was hoping to make it more painless.

Thank you,

Nicole

Attachment

Accepted Solutions
Solution
‎09-11-2014 05:14 PM
Super User
Posts: 5,069

Re: Continuous variable into categories dynamically?

You have all the pieces you need.  There is no need to type, just put the pieces together.

You may need to tweak this a little bit, but here is the idea (using the names for data sets and variables that appear in your example):

data want;

   if _n_=1 then set getpercentiles;

   set d1;

   if x > P_20 then bin=20;

   else if x > P_19 then bin=19;

   else if x > P_18 then bin=18;

   ...

   else if x > P_2 then bin=2;

   else bin=1;

   drop P_:;

run;

You may need to reverse the order, or change the comparisons slightly.  But all the pieces are there and you only have to assemble them into the same data set.

If you have to do this for 20 variables, you may want to use different prefixes in the output data sets for each.  But first, figure out how to do it for one variable without typing in the cutoffs.

This is not a comment about which solution is best, only an answer to your original question.

Good luck.

View solution in original post


All Replies
Trusted Advisor
Posts: 1,605

Re: Continuous variable into categories dynamically?

I think PROC RANK with the GROUPS= option can do what you want.

As a side issue, I question the value of performing a statistical analysis on percentiles of continuous data, when you could perform an equivalent analysis on the raw continuous data.

Super User
Posts: 17,734

Re: Continuous variable into categories dynamically?

You can paste data with a browser other than IE.

You can do this with a format, but I agree with Paige, Proc Rank is better option.

For format creation see this paper:

http://www2.sas.com/proceedings/forum2007/068-2007.pdf

Solution
‎09-11-2014 05:14 PM
Super User
Posts: 5,069

Re: Continuous variable into categories dynamically?

You have all the pieces you need.  There is no need to type, just put the pieces together.

You may need to tweak this a little bit, but here is the idea (using the names for data sets and variables that appear in your example):

data want;

   if _n_=1 then set getpercentiles;

   set d1;

   if x > P_20 then bin=20;

   else if x > P_19 then bin=19;

   else if x > P_18 then bin=18;

   ...

   else if x > P_2 then bin=2;

   else bin=1;

   drop P_:;

run;

You may need to reverse the order, or change the comparisons slightly.  But all the pieces are there and you only have to assemble them into the same data set.

If you have to do this for 20 variables, you may want to use different prefixes in the output data sets for each.  But first, figure out how to do it for one variable without typing in the cutoffs.

This is not a comment about which solution is best, only an answer to your original question.

Good luck.

Contributor NWV
Contributor
Posts: 29

Re: Continuous variable into categories dynamically?

Astounding, thank you, that was the solution I needed.  Now I can even make a macro to handle it easily.  I have only recently become acquainted with the if _n_=1 kind of statement, but I see much future use for it.

Paige, I had tried proc rank with the group option.  However, it seemed that proc rank wanted to force a certain number of observations into each group, rather than assign the group (bin) based on the value of my continuous variable.  And I agree that this seems like a somewhat backwards/extraneous was of analyzing a variable, however, we are working with a group that does not "believe" our models, and we have to break the math down into it's component steps.

Reeza, I had tried proc format, but with groupings like > x- <=y I got errors in proc format.  It did not seem to like the > in front of the range.  Maybe I could have changed my logic to make it work.

Thank you all.

Respected Advisor
Posts: 2,655

Re: Continuous variable into categories dynamically?

I sympathize with your problems regarding the categorization of continuous variables.  For my own peace of mind, I would analyze both ways--a great way to make a problem (=adverse event rate) "disappear" is to categorize the continuous covariates or continuous response variables.

Steve Denham

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 1410 views
  • 0 likes
  • 5 in conversation