- Proc Means for determining number of observations ...

10-06-2014 06:15 PM

Hi Everyone

I would like to determine the number of observations that fall within 10, 25, 50, 75, 90% of a population. I thought that a Proc Means statement would do this since I use it to calculate quartiles. However I was wondering if this was the case? Variations of the below Proc Means do not seem to work.

Paul

proc means data=test1 noprint missing;

var TprFilingToIssueJoined;

by County TprFileYear;

output out=tprpercents n= nmiss= p10 p25 p50 p75 p90 p95 /autoname;

run;

Accepted Solutions

Solution

Posted in reply to Paul_NYS

10-06-2014 09:24 PM

Proc means works you forgot the = sign after in your original code. There should have been a warning or something though

If it was equal divisions can look at proc rank, though you could break it into 20 and regroup the data as well. probably easier to code.

proc means data=test1 noprint missing;

var TprFilingToIssueJoined;

by County TprFileYear;

output out=tprpercents n= nmiss= p10= p25= p50= p75= p90= p95= /autoname;

run;

Posted in reply to Paul_NYS

10-06-2014 06:25 PM

That doesn't make much sense analytically.

10% of your data is in the 10th percentile by definition, i.e. n*0.1

25% of your data is in the 25th percentile i.e. n*0.25

+/- 1 usually.

Posted in reply to Paul_NYS

10-06-2014 07:12 PM

I use proc univariate for this:

title3 "make test data";

data test(drop=_;

do _i = 1 to 137;

xyz = int(ranuni(_i)*1234);

output;

end;

run;

title3 "Get deciles with PCTLPTS= option";

proc univariate data=test noprint;

var xyz;

output out=deciles pctlpts=10 25 50 75 90 pctlpre=P;

run;

Posted in reply to Orsini

10-06-2014 08:47 PM

I don't have SAS with me now, but I think Proc Univariate could work. I am trying to segment a data set (population) into smaller sub-populations based on the time (in days) it takes to achieve adjudication.

The population divisions would be percentages of the population who achieved adjudication the fastest: 10%, 25%, 50%, 75%, 90%. I need to know which time represents each of these points and then subdivide the population observations into each segment: 0-10, 11-25, etc.

So Proc Univariate appears to be a way to at least identify the observations that represent each percentile. Then I would assume I could segment the population using these values.

Paul

Posted in reply to Paul_NYS

Proc means works you forgot the = sign after in your original code. There should have been a warning or something though

If it was equal divisions can look at proc rank, though you could break it into 20 and regroup the data as well. probably easier to code.

proc means data=test1 noprint missing;

var TprFilingToIssueJoined;

by County TprFileYear;

output out=tprpercents n= nmiss= p10= p25= p50= p75= p90= p95= /autoname;

run;

Posted in reply to Paul_NYS

10-07-2014 10:28 AM

Why not use proc rank , assign it with group=100 ?