- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I'm pretty new to SAS programming language and I would need advice about the best practice in the following problem. I have this table containing cut-off values of a percentile distribution:
proc univariate data=temp1 noprint;
var ptb;
output out=percentiles1 pctlpts=10 20 30 40 50 60 70 80 90 100 pctlpre=P;
run;
proc print data=percentiles1;
run;
Now, I would like to iterate over "temp1" table and decide in which bucket fits. I tried something like this:
data temp2 (drop = p10 p20 p30 p40 p50 p60 p70 p80 p90 p100);
merge temp1 percentiles1;
if ptb <= p10 then bucket = '1st percentile';
else if ptb <= p20 then bucket = '2nd percentile';
... and so on ...
run;
But this would work only for the first row. I was thinking to do a cross join in SQL, but I think it's not the best approach to copy the row of percentiles everywhere.
So, what would be the best approach?
Thanks!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Other than creating text values (likely not a best idea ) Proc Ranks will do this.
proc ranks data=test1 groups=10 out=want; var ptb ; ranks ptbtile; run;
The variable ptbtile will have values of 0 to 9 (10 groups).
Assign a format to display the value of '1st decile' if needed (decile would be more descriptive then "1st percentile" as that would indicate the P1 not P10 cutoff.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Other than creating text values (likely not a best idea ) Proc Ranks will do this.
proc ranks data=test1 groups=10 out=want; var ptb ; ranks ptbtile; run;
The variable ptbtile will have values of 0 to 9 (10 groups).
Assign a format to display the value of '1st decile' if needed (decile would be more descriptive then "1st percentile" as that would indicate the P1 not P10 cutoff.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
EDIT: solved, I had to use "by" groups
Do you know if there is any way to use 'class' keyword with proc ranks?
I would like the ranking to be done separately for each different date.
I obtained the cut-off values using proc univariate with class and it looks good, but I would also need something similar for ranking.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@littlewho wrote:
Hello,
I'm pretty new to SAS programming language and I would need advice about the best practice in the following problem. I have this table containing cut-off values of a percentile distribution:
proc univariate data=temp1 noprint; var ptb; output out=percentiles1 pctlpts=10 20 30 40 50 60 70 80 90 100 pctlpre=P; run; proc print data=percentiles1; run;
Now, I would like to iterate over "temp1" table and decide in which bucket fits. I tried something like this:
data temp2 (drop = p10 p20 p30 p40 p50 p60 p70 p80 p90 p100); merge temp1 percentiles1; if ptb <= p10 then bucket = '1st percentile'; else if ptb <= p20 then bucket = '2nd percentile'; ... and so on ... run;
But this would work only for the first row. I was thinking to do a cross join in SQL, but I think it's not the best approach to copy the row of percentiles everywhere.
So, what would be the best approach?
Thanks!
Neither of these. Use PROC RANK instead. 0 will correspond to the first decile, 1 to the second decile and so forth.
proc rank data=temp1 out=temp2 groups=10;
var ptb
ranks ptb_rank;
run;
To fix your code, you need to merge it to all rows which is accomplished with a little code trick.
data temp2 (drop = p10 p20 p30 p40 p50 p60 p70 p80 p90 p100); set temp1; if _n_ =1 then set percentiles1; if ptb <= p10 then bucket = '1st percentile'; else if ptb <= p20 then bucket = '2nd percentile'; ... and so on ... run;