Hello,
How to get the obs between 2.5% and 97.5% percentile of the sum in the dataset? The sum is in dollar 10.2 format. Thanks.
I am going to assume that you have a variable in your data set named "Sum", and that you want the observations in the middle 95th percentile of that variable:
/* generate some example datas */
data Have;
format Sum DOLLAR10.2;
do i = 1 to 100;
Sum = 1000*i;
output;
end;
/* Find the 2.5th and 97.5th percentiles for the SUM variable */
proc univariate data=Have noprint;
var Sum;
output out=Pctls pctlpre=Pctl_ pctlpts=2.5, 97.5;
run;
/* copy the lower and upper pctl value into macro variables */
data _null_;
set Pctls;
call symputx("LowerPctl", Pctl_2_5);
call symputx("UpperPctl", Pctl_97_5);
run;
/* extract the middle 95% of the data */
data Middle;
set Have;
where &LowerPctl <= Sum and Sum <= &UpperPctl;
run;
If you want the sum of these values, you can use PROC MEANS:
proc means data=middle SUM;
var Sum;
run;
I assume you mean PERCENTILE, but you did not say that.
If that is correct, then this makes sense "How to get the obs between 2.5% and 97.5%" but the next words "of the sum" make no sense in this context. Please explain further.
percentile
I asked for an explanation, I don't see one.
If you mean "sum the values of the observations between the 2.5th and 97.5 percentiles of the variable"
Steps: 1) determine the given percentiles, Proc univariate may be best here.
2) filter the data with a Where statement or data set option using the values of the percentiles
3) sum the variable
I am going to assume that you have a variable in your data set named "Sum", and that you want the observations in the middle 95th percentile of that variable:
/* generate some example datas */
data Have;
format Sum DOLLAR10.2;
do i = 1 to 100;
Sum = 1000*i;
output;
end;
/* Find the 2.5th and 97.5th percentiles for the SUM variable */
proc univariate data=Have noprint;
var Sum;
output out=Pctls pctlpre=Pctl_ pctlpts=2.5, 97.5;
run;
/* copy the lower and upper pctl value into macro variables */
data _null_;
set Pctls;
call symputx("LowerPctl", Pctl_2_5);
call symputx("UpperPctl", Pctl_97_5);
run;
/* extract the middle 95% of the data */
data Middle;
set Have;
where &LowerPctl <= Sum and Sum <= &UpperPctl;
run;
If you want the sum of these values, you can use PROC MEANS:
proc means data=middle SUM;
var Sum;
run;
Is your data already sorted? If so, you can use the metadata item NOBS (number of obs in the dataset) to generate an upper and lower bound for observations to be submitted to the summing process:
proc sql noprint;
select nobs, floor(nobs/40)+1) , floor(nobs-nobs/40)
into :nobs, :LOWER_BOUND , :UPPER_BOUND
from dictionary.tables where libname='WORK' and memname='HAVE';
quit;
%put &=nobs &=lower_bound &=upper_bound;
proc means data=have (firstobs=&lower_bound obs=&upper_bound) sum n min max;
var x;
run;
To be precise, this code EXCLUDES as close to the lower and upper 2.5% that is achievable, but never excludes less than the upper and lower 2.5 percent.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.