BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
NWV
Calcite | Level 5 NWV
Calcite | Level 5

I have continuous data in a variable that looks like the attached text file.  Sorry, I cannot paste anything into this forum box.  If anyone knows why, please tell me! Too much to do in the html editor

I am dividing it into 20 groups, based on the percentiles.  I use proc univariate to get the percentiles:

proc univariate data=d1 ;

var X;

output out=getpercentiles pctlpre=P_ pctlpts=5 to 100 by 5;

run;

I want the resulting percentiles to be the top of the categories.  I then write if/then statements to categorize my original data:

if X ge 0 and LTCumExpT2 le 0.019 then bin=1;*note, lowest category starts at 0;

else if X gt 0.019 and X le 0.038 then bin=2;

else if X gt 0.038 and X le 0.078 then bin=3;

else if X gt 0.078 and X le 0.095 then bin=4;

else if X gt 0.095 and X le 0.1265 then bin=5;

else if X gt 0.1265 and X le 0.166 then bin=6;

else if X gt 0.166 and X le 0.254 then bin=7;

....

else if X gt 8.088 and X le 34.435 then bin=20;

Is there a way to get a new variable "bin" with a number from 1 to 20 in it more dynamically?  Rather than having to create all those if then statements?  I have searched the forums, and most people say to use a format, but that can't be done here, with the gt/le operators, as far as I can tell (trust me, I tried :smileygrin:).

If it was just the one variable I had to do this for, I wouldn't mind.  But I have to do it for about 20 of them, and was hoping to make it more painless.

Thank you,

Nicole

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

You have all the pieces you need.  There is no need to type, just put the pieces together.

You may need to tweak this a little bit, but here is the idea (using the names for data sets and variables that appear in your example):

data want;

   if _n_=1 then set getpercentiles;

   set d1;

   if x > P_20 then bin=20;

   else if x > P_19 then bin=19;

   else if x > P_18 then bin=18;

   ...

   else if x > P_2 then bin=2;

   else bin=1;

   drop P_:;

run;

You may need to reverse the order, or change the comparisons slightly.  But all the pieces are there and you only have to assemble them into the same data set.

If you have to do this for 20 variables, you may want to use different prefixes in the output data sets for each.  But first, figure out how to do it for one variable without typing in the cutoffs.

This is not a comment about which solution is best, only an answer to your original question.

Good luck.

View solution in original post

5 REPLIES 5
PaigeMiller
Diamond | Level 26

I think PROC RANK with the GROUPS= option can do what you want.

As a side issue, I question the value of performing a statistical analysis on percentiles of continuous data, when you could perform an equivalent analysis on the raw continuous data.

--
Paige Miller
Reeza
Super User

You can paste data with a browser other than IE.

You can do this with a format, but I agree with Paige, Proc Rank is better option.

For format creation see this paper:

http://www2.sas.com/proceedings/forum2007/068-2007.pdf

Astounding
PROC Star

You have all the pieces you need.  There is no need to type, just put the pieces together.

You may need to tweak this a little bit, but here is the idea (using the names for data sets and variables that appear in your example):

data want;

   if _n_=1 then set getpercentiles;

   set d1;

   if x > P_20 then bin=20;

   else if x > P_19 then bin=19;

   else if x > P_18 then bin=18;

   ...

   else if x > P_2 then bin=2;

   else bin=1;

   drop P_:;

run;

You may need to reverse the order, or change the comparisons slightly.  But all the pieces are there and you only have to assemble them into the same data set.

If you have to do this for 20 variables, you may want to use different prefixes in the output data sets for each.  But first, figure out how to do it for one variable without typing in the cutoffs.

This is not a comment about which solution is best, only an answer to your original question.

Good luck.

NWV
Calcite | Level 5 NWV
Calcite | Level 5

Astounding, thank you, that was the solution I needed.  Now I can even make a macro to handle it easily.  I have only recently become acquainted with the if _n_=1 kind of statement, but I see much future use for it.

Paige, I had tried proc rank with the group option.  However, it seemed that proc rank wanted to force a certain number of observations into each group, rather than assign the group (bin) based on the value of my continuous variable.  And I agree that this seems like a somewhat backwards/extraneous was of analyzing a variable, however, we are working with a group that does not "believe" our models, and we have to break the math down into it's component steps.

Reeza, I had tried proc format, but with groupings like > x- <=y I got errors in proc format.  It did not seem to like the > in front of the range.  Maybe I could have changed my logic to make it work.

Thank you all.

SteveDenham
Jade | Level 19

I sympathize with your problems regarding the categorization of continuous variables.  For my own peace of mind, I would analyze both ways--a great way to make a problem (=adverse event rate) "disappear" is to categorize the continuous covariates or continuous response variables.

Steve Denham

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 4985 views
  • 0 likes
  • 5 in conversation