I use the following code to obtain the percentiles of over 100,000 observations:
proc rank data = CombineScoredTemp groups = 100 out = CombineScoredTemp;
var Predicted_SLR;
ranks Percentile_SLR;
run;
But it provides the percentiles as integers. Any suggestions on how to get a real number out of this?
Thank you, as usual.
Do you mean that you want 0.01000000 instead of 1? Or 1.00000000?
SAS actually does not differentiate between integer and real in its storage. The active display format controls how many, if any, decimals are displayed. Print your resulting set with Format Percentile_slr F16.8 ; to see more decimals.
If you want to know the percentiles as in which value is the 7th percentile you likely need another procedure than rank as except for tie breaking ranks tend to be integer.
proc rank ranks data based on Predicted_SLR variable and makes 100 groups each contains around 1000 observations. For percentile try proc univariate.
I think you're looking for the CDF instead:
ods table cdfplot=want;;
proc univariate data=CombineScoredTemp;
var Predicted_SLR;
cdfplot Predicted_SLR;
run;quit;
Thank you for all of the answers. CDF seems to show the most promise. But how may I modify the code to accomodate saving the actual percentile on a record-by-record basis?
This would be the equivalent of a PERCENTRANK() function in Excel.
Thanks.
i'm not sure what your question is. Can you clarify what you mean by saving and row by row? The output dataset has the CDf values for all observations.
I am probably missing something.
I run the following code and it makes a nice CDF plot:
ods table cdfplot = want;;
proc univariate data = CombineScoredTemp;
var Predicted_SLR;
cdfplot Predicted_SLR;
run;
quit;
But it does not add the percentiles to the data file.
Plus, I receive a warning saying that the CDFPlot was not created. I am sure that both of these are related, somehow.
I am pretty sure I just mis-read the supplied code or something. But I am currently lost.
I especially am interested in setting this right as I recently read an article on the web very much trashing the PERCENTRANK() procedure in Excel.
Thank you, as usual.
the ods statement before the proc should store the values in a new table called want.
Depending on SAS version you may need to turn ODS Graphics on; to see plot.
What do you think of the following for my solution?
proc rank data = CombineScoredTemp groups = 100000000 out = CombineScoredTemp;
var Predicted_SLR;
ranks PERCENTILE_SLR;
run;
data CombineScoredTemp;
set CombineScoredTemp;
PERCENTILE_SLR = PERCENTILE_SLR / 1000000;
run;
Only problem I see is that it is limited to 6 decimal places. Maybe 7.
I thought you wanted to compute the percentiles for Predicted_SLR that can be done in a number of ways in SAS PROC STDIZE being a pretty easy method.
Your PROC RANKS GROUPS=100 is similar in that it groups the observations into "percentile groups". The computed RANK that you named PERCENTILE_SLR in the RANK statement is not the percentile "value" as computed by PROC STDIZE.
Am I understanding correctly?
This method does not account for ties. If you have no ties or don't care you can more easily calculate your 'percentiles' using a data step.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.