BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
vince_tsp
Calcite | Level 5

Goodday to you,

I am trying to trim the data set which is sorted in ascending order. I have no idea how to trim the data set from start and end for k% of data. Basically most of the post I could found is how to perform trimmed mean, but what I want is just trim the data set.

 

XN = DAT[POS1:ENDPOS];
CALL SORT(XN);

...

 

I am very new to SAS IML, hope someone could help me out.

Thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

Trimming is the act of truncating the upper and lower tails of an empirical UNIVARIATE distribution, so we don't usually talk about trimming a data set, we talk about trimming a variable.

 

To trim a variable, look at p. 2-3 of my 2010 SAS Global Forum paper, which has an algorithm for computing the trimmed mean and variance of every column in a matrix.  You can modify it to extract the "middle" observations:

 

proc iml;
/* assume v is a column vector. Return the sorted 
   elements that result from trimming the largest and smallest
   proportion of values. The 'prop' parameter is 0 < prop < 1. */
start TrimVec(v, prop);
   n = nrow(v);      /* num rows (assume no missing values) */
   d = ceil(prop*n); /* number of observations to trim */
   z = v;            /* copy it */
   call sort(z,1);   /* sort it */
   w = z[d+1:n-d, ]; /* trim d largest and d smallest values */
   return (w);
finish;

use sashelp.cars;
read all var "mpg_city" into x;
close;
trimX = TrimVec(x, 0.12);

 

The key is the statement

w = z[ d+1:n-d, ];

The expression  d+1:n-d uses the index operator (:) to represent the observations to keep.  You can use the same syntax to extract those rows from an entire matrix:

smallerMatrix = bigMatrix[ d+1:n-d, ];

 

See the article "Creating vectors that contain evenly spaced values" for a description of the index operator (:) .

View solution in original post

4 REPLIES 4
EH
Obsidian | Level 7 EH
Obsidian | Level 7

Gday!

 

I am not sure if/why you need IML. Regular datastep code would be something like:

 

%let percentage = 20;

 

data trimmed;

   set base nobs=size;

   if (size * &percentage / 200) <= _N_ <= size - (size * &percentage / 200) then output;

run;

 

Hope this helps,

Eric

vince_tsp
Calcite | Level 5
Thanks for your try to help me. Really appreciated that.
Rick_SAS
SAS Super FREQ

Trimming is the act of truncating the upper and lower tails of an empirical UNIVARIATE distribution, so we don't usually talk about trimming a data set, we talk about trimming a variable.

 

To trim a variable, look at p. 2-3 of my 2010 SAS Global Forum paper, which has an algorithm for computing the trimmed mean and variance of every column in a matrix.  You can modify it to extract the "middle" observations:

 

proc iml;
/* assume v is a column vector. Return the sorted 
   elements that result from trimming the largest and smallest
   proportion of values. The 'prop' parameter is 0 < prop < 1. */
start TrimVec(v, prop);
   n = nrow(v);      /* num rows (assume no missing values) */
   d = ceil(prop*n); /* number of observations to trim */
   z = v;            /* copy it */
   call sort(z,1);   /* sort it */
   w = z[d+1:n-d, ]; /* trim d largest and d smallest values */
   return (w);
finish;

use sashelp.cars;
read all var "mpg_city" into x;
close;
trimX = TrimVec(x, 0.12);

 

The key is the statement

w = z[ d+1:n-d, ];

The expression  d+1:n-d uses the index operator (:) to represent the observations to keep.  You can use the same syntax to extract those rows from an entire matrix:

smallerMatrix = bigMatrix[ d+1:n-d, ];

 

See the article "Creating vectors that contain evenly spaced values" for a description of the index operator (:) .

vince_tsp
Calcite | Level 5
Thanks Rick, finally I got my solution after 1 weeks try and errors.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 4 replies
  • 1706 views
  • 1 like
  • 3 in conversation