Statistical programming, matrix languages, and more

Trim data set

Accepted Solution Solved
Reply
New Contributor
Posts: 3
Accepted Solution

Trim data set

Goodday to you,

I am trying to trim the data set which is sorted in ascending order. I have no idea how to trim the data set from start and end for k% of data. Basically most of the post I could found is how to perform trimmed mean, but what I want is just trim the data set.

 

XN = DAT[POS1:ENDPOS];
CALL SORT(XN);

...

 

I am very new to SAS IML, hope someone could help me out.

Thanks in advance!


Accepted Solutions
Solution
‎10-20-2015 03:25 AM
SAS Super FREQ
Posts: 3,221

Re: Trim data set

Trimming is the act of truncating the upper and lower tails of an empirical UNIVARIATE distribution, so we don't usually talk about trimming a data set, we talk about trimming a variable.

 

To trim a variable, look at p. 2-3 of my 2010 SAS Global Forum paper, which has an algorithm for computing the trimmed mean and variance of every column in a matrix.  You can modify it to extract the "middle" observations:

 

proc iml;
/* assume v is a column vector. Return the sorted 
   elements that result from trimming the largest and smallest
   proportion of values. The 'prop' parameter is 0 < prop < 1. */
start TrimVec(v, prop);
   n = nrow(v);      /* num rows (assume no missing values) */
   d = ceil(prop*n); /* number of observations to trim */
   z = v;            /* copy it */
   call sort(z,1);   /* sort it */
   w = z[d+1:n-d, ]; /* trim d largest and d smallest values */
   return (w);
finish;

use sashelp.cars;
read all var "mpg_city" into x;
close;
trimX = TrimVec(x, 0.12);

 

The key is the statement

w = z[ d+1:n-d, ];

The expression  d+1:n-d uses the index operator (Smiley Happy to represent the observations to keep.  You can use the same syntax to extract those rows from an entire matrix:

smallerMatrix = bigMatrix[ d+1:n-d, ];

 

See the article "Creating vectors that contain evenly spaced values" for a description of the index operator (Smiley Happy .

View solution in original post


All Replies
Contributor EH
Contributor
Posts: 32

Re: Trim data set

Gday!

 

I am not sure if/why you need IML. Regular datastep code would be something like:

 

%let percentage = 20;

 

data trimmed;

   set base nobs=size;

   if (size * &percentage / 200) <= _N_ <= size - (size * &percentage / 200) then output;

run;

 

Hope this helps,

Eric

New Contributor
Posts: 3

Re: Trim data set

Thanks for your try to help me. Really appreciated that.
Solution
‎10-20-2015 03:25 AM
SAS Super FREQ
Posts: 3,221

Re: Trim data set

Trimming is the act of truncating the upper and lower tails of an empirical UNIVARIATE distribution, so we don't usually talk about trimming a data set, we talk about trimming a variable.

 

To trim a variable, look at p. 2-3 of my 2010 SAS Global Forum paper, which has an algorithm for computing the trimmed mean and variance of every column in a matrix.  You can modify it to extract the "middle" observations:

 

proc iml;
/* assume v is a column vector. Return the sorted 
   elements that result from trimming the largest and smallest
   proportion of values. The 'prop' parameter is 0 < prop < 1. */
start TrimVec(v, prop);
   n = nrow(v);      /* num rows (assume no missing values) */
   d = ceil(prop*n); /* number of observations to trim */
   z = v;            /* copy it */
   call sort(z,1);   /* sort it */
   w = z[d+1:n-d, ]; /* trim d largest and d smallest values */
   return (w);
finish;

use sashelp.cars;
read all var "mpg_city" into x;
close;
trimX = TrimVec(x, 0.12);

 

The key is the statement

w = z[ d+1:n-d, ];

The expression  d+1:n-d uses the index operator (Smiley Happy to represent the observations to keep.  You can use the same syntax to extract those rows from an entire matrix:

smallerMatrix = bigMatrix[ d+1:n-d, ];

 

See the article "Creating vectors that contain evenly spaced values" for a description of the index operator (Smiley Happy .

New Contributor
Posts: 3

Re: Trim data set

Thanks Rick, finally I got my solution after 1 weeks try and errors.
Post a Question
Discussion Stats
  • 4 replies
  • 341 views
  • 1 like
  • 3 in conversation