turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- Trimming the top and bottom 5% of a dataset

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-01-2016 01:14 AM

From another similar question on these boards I found the following solution:

```
%let trim=0.05;
data trimmed;
set have
Nobs=NN;
if &trim*NN < _N_ <= (1-&trim)*NN
then output;
run;
```

1) Do you concur with this way of trimming the tails? (I'm not exactly sure what's going on with this. Just want it to work.)

2) Data needs to be sorted by the relevant variable we want to trim, yes?

Thanks!

Nicholas Kormanik

Accepted Solutions

Solution

09-02-2016
05:17 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to NicholasKormanik

09-01-2016 03:58 AM

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to NicholasKormanik

09-01-2016 01:19 AM

Depends. Rules of 5% is vague, and often confused with percentiles.

So define your rules first.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

09-01-2016 02:29 AM

The approach I'm presently using is the manual, long-about way.

First running Proc Univariate I am obtaining the _P5_ and _P95_ actual numbers (say, 30 and 156).

I then plug these numbers into the following code.

```
data tails_trimmed_5_percent;
set have;
where 30 <= N <= 156;
run;
```

I have many such databases. And I was thinking there might be a more direct general approach.

Such as, just imagining, mind you....

where (lower_tail_5_percent) <= N <= (upper_tail_5_percent);

But, appears life ain't that easy.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to NicholasKormanik

09-01-2016 03:06 AM

A fast way to get it is using proc rank: proc rank data=sashelp.cars groups=100 out=temp; var invoice; ranks rank; run; data want; set temp; if 5 < rank < 95 ; run; Another way is using IML, you want IML code ?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ksharp

09-01-2016 03:14 AM

Yes, please. IML code, too.

Thanks a million!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to NicholasKormanik

09-01-2016 03:43 AM

Here is . I set the value greater than 95th or less than 5th be missing . data have; do i=1 to 100; a=ceil(ranuni(1)*100); b=ceil(ranuni(2)*100); output; end; drop i; run; %let low=0.05 ; %let high=0.95 ; proc iml; use have; read all var _num_ into x[c=vname]; close have; call qntl(q,x,{&low ,&high}); do i=1 to ncol(x); x[loc(x[,i]q[2,i]),i]=.; end; create want from x[c=vname]; append from x; close want; quit;

Solution

09-02-2016
05:17 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to NicholasKormanik

09-01-2016 03:58 AM