## How to obtain sum of 2.5th-97.5th percentile?

Hello,

How to get the obs between 2.5% and 97.5% percentile of the sum in the dataset?  The sum is in dollar 10.2 format. Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions

## Re: How to obtain sum of 2.5th-97.5th percentile?

I am going to assume that you have a variable in your data set named "Sum", and that you want the observations in the middle 95th percentile of that variable:

``````/* generate some example datas */
data Have;
format Sum DOLLAR10.2;
do i = 1 to 100;
Sum = 1000*i;
output;
end;

/* Find the 2.5th and 97.5th percentiles for the SUM variable */
proc univariate data=Have noprint;
var Sum;
output out=Pctls pctlpre=Pctl_ pctlpts=2.5, 97.5;
run;

/* copy the lower and upper pctl value into macro variables */
data _null_;
set Pctls;
call symputx("LowerPctl", Pctl_2_5);
call symputx("UpperPctl", Pctl_97_5);
run;

/* extract the middle 95% of the data */
data Middle;
set Have;
where &LowerPctl <= Sum and Sum <= &UpperPctl;
run;

``````

If you want the sum of these values, you can use PROC MEANS:
proc means data=middle SUM;

var Sum;

run;

8 REPLIES 8

## Re: How to obtain sum of 2.5th-97.5th ?

I assume you mean PERCENTILE, but you did not say that.

If that is correct, then this makes sense "How to get the obs between 2.5% and 97.5%" but the next words "of the sum" make no sense in this context. Please explain further.

--
Paige Miller

percentile

## Re: How to obtain sum of 2.5th-97.5th ?

I asked for an explanation, I don't see one.

--
Paige Miller

## Re: How to obtain sum of 2.5th-97.5th ?

If you mean "sum the values of the observations between the 2.5th and 97.5 percentiles of the variable"

Steps: 1) determine the given percentiles, Proc univariate may be best here.

2) filter the data with a Where statement or data set option using the values of the percentiles

3) sum the variable

## Re: How to obtain sum of 2.5th-97.5th percentile?

So which part do you not understand, calculating the percentiles, filtering the data or calculating the sum?

## Re: How to obtain sum of 2.5th-97.5th percentile?

I am going to assume that you have a variable in your data set named "Sum", and that you want the observations in the middle 95th percentile of that variable:

``````/* generate some example datas */
data Have;
format Sum DOLLAR10.2;
do i = 1 to 100;
Sum = 1000*i;
output;
end;

/* Find the 2.5th and 97.5th percentiles for the SUM variable */
proc univariate data=Have noprint;
var Sum;
output out=Pctls pctlpre=Pctl_ pctlpts=2.5, 97.5;
run;

/* copy the lower and upper pctl value into macro variables */
data _null_;
set Pctls;
call symputx("LowerPctl", Pctl_2_5);
call symputx("UpperPctl", Pctl_97_5);
run;

/* extract the middle 95% of the data */
data Middle;
set Have;
where &LowerPctl <= Sum and Sum <= &UpperPctl;
run;

``````

If you want the sum of these values, you can use PROC MEANS:
proc means data=middle SUM;

var Sum;

run;

## Re: How to obtain sum of 2.5th-97.5th percentile?

Yes, you are right.

## Re: How to obtain sum of 2.5th-97.5th percentile?

Is your data already sorted?   If so, you can use the metadata item NOBS (number of obs in the dataset) to generate an upper and lower bound for observations to be submitted to the summing process:

``````proc sql noprint;
select nobs, floor(nobs/40)+1) , floor(nobs-nobs/40)
into :nobs, :LOWER_BOUND     , :UPPER_BOUND
from dictionary.tables  where libname='WORK' and memname='HAVE';
quit;

%put &=nobs  &=lower_bound &=upper_bound;

proc means data=have (firstobs=&lower_bound  obs=&upper_bound)  sum n min max;
var x;
run;``````

To be precise, this code EXCLUDES as close to the lower and upper 2.5% that is achievable, but never excludes less than the upper and lower 2.5 percent.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Discussion stats
• 8 replies
• 338 views
• 3 likes
• 6 in conversation