Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- Programming
- /
- How to obtain sum of 2.5th-97.5th percentile?

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

☑ This topic is **solved**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 08-18-2023 12:51 PM
(1102 views)

Hello,

How to get the obs between 2.5% and 97.5% percentile of the sum in the dataset? The sum is in dollar 10.2 format. Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I am going to assume that you have a variable in your data set named "Sum", and that you want the observations in the middle 95th percentile of that variable:

```
/* generate some example datas */
data Have;
format Sum DOLLAR10.2;
do i = 1 to 100;
Sum = 1000*i;
output;
end;
/* Find the 2.5th and 97.5th percentiles for the SUM variable */
proc univariate data=Have noprint;
var Sum;
output out=Pctls pctlpre=Pctl_ pctlpts=2.5, 97.5;
run;
/* copy the lower and upper pctl value into macro variables */
data _null_;
set Pctls;
call symputx("LowerPctl", Pctl_2_5);
call symputx("UpperPctl", Pctl_97_5);
run;
/* extract the middle 95% of the data */
data Middle;
set Have;
where &LowerPctl <= Sum and Sum <= &UpperPctl;
run;
```

If you want the sum of these values, you can use PROC MEANS:**proc means data=middle SUM;**

** var Sum;**

**run;**

8 REPLIES 8

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I assume you mean PERCENTILE, but you did not say that.

If that is correct, then this makes sense "How to get the obs between 2.5% and 97.5%" but the next words "of the sum" make no sense in this context. Please explain further.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

percentile

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I asked for an explanation, I don't see one.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

If you mean "sum the values of the observations between the 2.5th and 97.5 percentiles of the variable"

Steps: 1) determine the given percentiles, Proc univariate may be best here.

2) filter the data with a Where statement or data set option using the values of the percentiles

3) sum the variable

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

So which part do you not understand, calculating the percentiles, filtering the data or calculating the sum?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I am going to assume that you have a variable in your data set named "Sum", and that you want the observations in the middle 95th percentile of that variable:

```
/* generate some example datas */
data Have;
format Sum DOLLAR10.2;
do i = 1 to 100;
Sum = 1000*i;
output;
end;
/* Find the 2.5th and 97.5th percentiles for the SUM variable */
proc univariate data=Have noprint;
var Sum;
output out=Pctls pctlpre=Pctl_ pctlpts=2.5, 97.5;
run;
/* copy the lower and upper pctl value into macro variables */
data _null_;
set Pctls;
call symputx("LowerPctl", Pctl_2_5);
call symputx("UpperPctl", Pctl_97_5);
run;
/* extract the middle 95% of the data */
data Middle;
set Have;
where &LowerPctl <= Sum and Sum <= &UpperPctl;
run;
```

If you want the sum of these values, you can use PROC MEANS:**proc means data=middle SUM;**

** var Sum;**

**run;**

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Yes, you are right.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Is your data already sorted? If so, you can use the metadata item NOBS (number of obs in the dataset) to generate an upper and lower bound for observations to be submitted to the summing process:

```
proc sql noprint;
select nobs, floor(nobs/40)+1) , floor(nobs-nobs/40)
into :nobs, :LOWER_BOUND , :UPPER_BOUND
from dictionary.tables where libname='WORK' and memname='HAVE';
quit;
%put &=nobs &=lower_bound &=upper_bound;
proc means data=have (firstobs=&lower_bound obs=&upper_bound) sum n min max;
var x;
run;
```

To be precise, this code EXCLUDES as close to the lower and upper 2.5% that is achievable, but never excludes less than the upper and lower 2.5 percent.

--------------------------

The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for

Allow PROC SORT to output multiple datasets

--------------------------

The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for

Allow PROC SORT to output multiple datasets

--------------------------

Are you ready for the spotlight? We're accepting content ideas for **SAS Innovate 2025** to be held May 6-9 in Orlando, FL. The call is **open **until September 16. Read more here about **why** you should contribute and **what is in it** for you!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.