12-13-2017 10:51 AM
I'm using Proc Univariate to calculate 90th percentiles for the response time (hh:mm:ss). I'm getting the 90th percentiles in minutes only, I want it in seconds as well in mm:ss format. How can I get the percentiles in mm:ss format. Also can anyone explain me the math behind calculating 90th percentile using proc univariate.
My code is
proc univariate data=new;
output out=test pctlpre=P_ pctlpts=50,90 pctlname=P50 P90 Mean=MEAN std=STDDEV ;
12-13-2017 11:00 AM
What unit is your original data in, eg minutes, seconds, etc?
Usually you need to apply a format in a next step to control the display of the output. If you want multiple representations you’ll need multiple variables.
Regarding 90th percentile see the docs :
12-13-2017 11:38 AM
My data is in hh:mm:ss and I have two time variables with hh:mm:ss format. I take the difference of the two and calculate the 90th percentile. E.g
Time1 Time2 Diff
01:30:20 01:30:10 00:00:10
12:25:00 12:20:00 00:05:00
03:30:10 03:25:05 00:05:05
And I take the diff variable to calculate the 90th percentile and I want the 90th percentile in 00:05:05 format.
12-13-2017 11:41 AM
Also can anyone explain me the math behind calculating 90th percentile using proc univariate.
Percentiles are basically an order statistic. The data is sorted (or binned or grouped, however you want to think of it) from smallest non-missing value to largest. Then the p-th percentage position is reported. If you have 100 values then basically the 90th value, if you have 10 values then the 9th. If you have 10,000 values then 90th percentile would be the 9,000th value.
There are some details about rules for ties (report min, max or mean) controlled by options.
Note that the same value may be reported for multiple percentiles if the data values are few in number (4 values means anything less than 25th percentile is the first value, anything over 75th is the 4th value).
12-13-2017 02:53 PM
Without seeing how you are reading and printing the data, it is hard to be sure, but I suggest you use the TIMEw.d informat and format. See if the following example addresses your needs:
data have; informat Time1 Time2 time10.; input Time1 Time2; diff = Time1 - Time2; datalines; 01:30:20 01:30:10 12:25:00 12:20:00 03:30:10 03:25:05 ; proc print data=have; format _NUMERIC_ time10.; run; proc univariate data=have; var diff; output out=test pctlpre=P_ pctlpts=50,90 pctlname=P50 P90 Mean=MEAN std=STDDEV ; run; proc print data=test; format _NUMERIC_ time10.; run;