BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
ybz12003
Rhodochrosite | Level 12

Hello,

How to get the obs between 2.5% and 97.5% percentile of the sum in the dataset?  The sum is in dollar 10.2 format. Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

I am going to assume that you have a variable in your data set named "Sum", and that you want the observations in the middle 95th percentile of that variable:

/* generate some example datas */
data Have;
format Sum DOLLAR10.2;
do i = 1 to 100;
   Sum = 1000*i;
   output;
end;

/* Find the 2.5th and 97.5th percentiles for the SUM variable */
proc univariate data=Have noprint;
   var Sum;
   output out=Pctls pctlpre=Pctl_ pctlpts=2.5, 97.5; 
run;

/* copy the lower and upper pctl value into macro variables */
data _null_;
   set Pctls;
   call symputx("LowerPctl", Pctl_2_5);
   call symputx("UpperPctl", Pctl_97_5);
run;

/* extract the middle 95% of the data */
data Middle;
set Have;
where &LowerPctl <= Sum and Sum <= &UpperPctl;
run;

 If you want the sum of these values, you can use PROC MEANS:
proc means data=middle SUM;

   var Sum;

run;

 

View solution in original post

8 REPLIES 8
PaigeMiller
Diamond | Level 26

I assume you mean PERCENTILE, but you did not say that.

 

If that is correct, then this makes sense "How to get the obs between 2.5% and 97.5%" but the next words "of the sum" make no sense in this context. Please explain further.

--
Paige Miller
ybz12003
Rhodochrosite | Level 12

percentile

PaigeMiller
Diamond | Level 26

I asked for an explanation, I don't see one.

--
Paige Miller
ballardw
Super User

If you mean "sum the values of the observations between the 2.5th and 97.5 percentiles of the variable"

 

Steps: 1) determine the given percentiles, Proc univariate may be best here.

2) filter the data with a Where statement or data set option using the values of the percentiles

3) sum the variable

 

 

 

Reeza
Super User
So which part do you not understand, calculating the percentiles, filtering the data or calculating the sum?
Rick_SAS
SAS Super FREQ

I am going to assume that you have a variable in your data set named "Sum", and that you want the observations in the middle 95th percentile of that variable:

/* generate some example datas */
data Have;
format Sum DOLLAR10.2;
do i = 1 to 100;
   Sum = 1000*i;
   output;
end;

/* Find the 2.5th and 97.5th percentiles for the SUM variable */
proc univariate data=Have noprint;
   var Sum;
   output out=Pctls pctlpre=Pctl_ pctlpts=2.5, 97.5; 
run;

/* copy the lower and upper pctl value into macro variables */
data _null_;
   set Pctls;
   call symputx("LowerPctl", Pctl_2_5);
   call symputx("UpperPctl", Pctl_97_5);
run;

/* extract the middle 95% of the data */
data Middle;
set Have;
where &LowerPctl <= Sum and Sum <= &UpperPctl;
run;

 If you want the sum of these values, you can use PROC MEANS:
proc means data=middle SUM;

   var Sum;

run;

 

ybz12003
Rhodochrosite | Level 12
Yes, you are right.
mkeintz
PROC Star

Is your data already sorted?   If so, you can use the metadata item NOBS (number of obs in the dataset) to generate an upper and lower bound for observations to be submitted to the summing process:

 

proc sql noprint;
  select nobs, floor(nobs/40)+1) , floor(nobs-nobs/40)
    into :nobs, :LOWER_BOUND     , :UPPER_BOUND
  from dictionary.tables  where libname='WORK' and memname='HAVE';
quit;

%put &=nobs  &=lower_bound &=upper_bound;

proc means data=have (firstobs=&lower_bound  obs=&upper_bound)  sum n min max;
  var x;
run;

To be precise, this code EXCLUDES as close to the lower and upper 2.5% that is achievable, but never excludes less than the upper and lower 2.5 percent.  

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1218 views
  • 3 likes
  • 6 in conversation