Hi everyone,
I am trying to calculate percentiles for the following data (Actual data is quite large). Many observations have same age 36.083; therefore, 70th, 80th and 90th percentile are same. Is there a way to calculate percentiles where the high number of observations do not affect the percentiles, and I can get different percentile values for each percentile.
Thanks in advance for your help.
data have;
infile cards expandtabs truncover;
input stock date : yymmn6. age;
format date yymmn6.;
cards;
10006 196202 36.0833
14656 196202 36.0833
14664 196202 36.0833
14699 196202 36.0833
14701 196202 36.0833
14728 196202 36.0833
14736 196202 36.0833
14760 196202 36.0833
14779 196202 36.0833
14795 196202 36.0833
14816 196202 36.0833
14824 196202 36.0833
14859 196202 36.0833
14867 196202 36.0833
14875 196202 36.0833
14883 196202 36.0833
14891 196202 36.0833
14904 196202 36.0833
14912 196202 36.0833
14920 196202 36.0833
14955 196202 36.0833
15034 196202 36.0833
15499 196202 36.0833
15528 196202 36.0833
15560 196202 36.0833
15755 196202 36.0833
16029 196202 36.0833
16109 196202 36.0833
16117 196202 36.0833
16280 196202 36.0833
19334 196202 36.0833
25486 196202 36.0833
27561 196202 36.0833
27692 196202 36.0833
28513 196202 36.0833
75471 196202 36.0833
10014 196202 36
12298 196202 36
15536 196202 35.9167
15544 196202 35.9167
16985 196202 33.3333
17005 196202 33.3333
17013 196202 33.3333
17056 196202 33.25
17072 196202 33.25
17099 196202 33.25
17101 196202 33.25
17128 196202 33.1667
17144 196202 33.1667
17160 196202 33.1667
21573 196202 33.1667
17224 196202 33.0833
17232 196202 33.0833
17240 196202 33.0833
17267 196202 33.0833
17291 196202 33.0833
17304 196202 33
17312 196202 33
17320 196202 33
17339 196202 33
17347 196202 33
17398 196202 33
17400 196202 32.9167
17435 196202 32.9167
17443 196202 32.9167
17451 196202 32.9167
17478 196202 32.9167
17515 196202 32.8333
17523 196202 32.8333
17558 196202 32.75
17566 196202 32.75
17582 196202 32.75
17590 196202 32.75
17646 196202 32.75
17654 196202 32.75
17830 196202 32.75
17670 196202 32.6667
17689 196202 32.6667
17718 196202 32.6667
17726 196202 32.6667
17734 196202 32.6667
17865 196202 32.5833
17881 196202 32.5833
17910 196202 32.5833
17929 196202 32.5833
17945 196202 32.5
17953 196202 32.5
17961 196202 32.5
18016 196202 32.5
18032 196202 32.5
18040 196202 32.5
18067 196202 32.5
18075 196202 32.4167
18091 196202 32.4167
18112 196202 32.4167
18147 196202 32.4167
;run;
proc univariate data=HAVE noprint;
var age;
by date;
output out=WANT pctlpts = 10 20 30 40 50 60 70 80 90 pctlpre=GR;
run;
Then remove those duplicated values: proc sort data=have out=want nodupkey; by date age; run; proc univariate data=want noprint; var age; by date; output out=WANT pctlpts = 10 20 30 40 50 60 70 80 90 pctlpre=GR; run;
Thanks a lot ksharp, its really helpful. I am using it just as a cutoff point, but will look into data again. Have a good day 🙂
Look at proc rank instead and how it deals with ties.
As KSharp mentioned these are not percentiles so be careful when referencing your analysis to not refer to them as such.
Thanks Reeza, I guess proc rank will not work for me because I need cutoff points. The beginning date of many firms is same; therefore a large number of firms have the same age. This results into few top percentiles ending up having same age.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.