SAS/IML Software and Matrix Computations

vince_tsp · Posted 10-18-2015 11:12 AM

Goodday to you,

I am trying to trim the data set which is sorted in ascending order. I have no idea how to trim the data set from start and end for k% of data. Basically most of the post I could found is how to perform trimmed mean, but what I want is just trim the data set.

XN = DAT[POS1:ENDPOS];
CALL SORT(XN);

...

I am very new to SAS IML, hope someone could help me out.

Thanks in advance!

Rick_SAS · Posted 10-19-2015 06:25 AM

Trimming is the act of truncating the upper and lower tails of an empirical UNIVARIATE distribution, so we don't usually talk about trimming a data set, we talk about trimming a variable.

To trim a variable, look at p. 2-3 of my 2010 SAS Global Forum paper, which has an algorithm for computing the trimmed mean and variance of every column in a matrix. You can modify it to extract the "middle" observations:

proc iml;
/* assume v is a column vector. Return the sorted 
   elements that result from trimming the largest and smallest
   proportion of values. The 'prop' parameter is 0 < prop < 1. */
start TrimVec(v, prop);
   n = nrow(v);      /* num rows (assume no missing values) */
   d = ceil(prop*n); /* number of observations to trim */
   z = v;            /* copy it */
   call sort(z,1);   /* sort it */
   w = z[d+1:n-d, ]; /* trim d largest and d smallest values */
   return (w);
finish;

use sashelp.cars;
read all var "mpg_city" into x;
close;
trimX = TrimVec(x, 0.12);

The key is the statement

w = z[ d+1:n-d, ];

The expression d+1:n-d uses the index operator (:) to represent the observations to keep. You can use the same syntax to extract those rows from an entire matrix:

smallerMatrix = bigMatrix[ d+1:n-d, ];

See the article "Creating vectors that contain evenly spaced values" for a description of the index operator (:) .

View solution in original post

EH · Posted 10-19-2015 03:51 AM

Gday!

I am not sure if/why you need IML. Regular datastep code would be something like:

%let percentage = 20;

data trimmed;

set base nobs=size;

if (size * &percentage / 200) <= _N_ <= size - (size * &percentage / 200) then output;

run;

Hope this helps,

Eric

vince_tsp · Posted 10-20-2015 03:26 AM

Thanks for your try to help me. Really appreciated that.

Rick_SAS · Posted 10-19-2015 06:25 AM

Trimming is the act of truncating the upper and lower tails of an empirical UNIVARIATE distribution, so we don't usually talk about trimming a data set, we talk about trimming a variable.

To trim a variable, look at p. 2-3 of my 2010 SAS Global Forum paper, which has an algorithm for computing the trimmed mean and variance of every column in a matrix. You can modify it to extract the "middle" observations:

proc iml;
/* assume v is a column vector. Return the sorted 
   elements that result from trimming the largest and smallest
   proportion of values. The 'prop' parameter is 0 < prop < 1. */
start TrimVec(v, prop);
   n = nrow(v);      /* num rows (assume no missing values) */
   d = ceil(prop*n); /* number of observations to trim */
   z = v;            /* copy it */
   call sort(z,1);   /* sort it */
   w = z[d+1:n-d, ]; /* trim d largest and d smallest values */
   return (w);
finish;

use sashelp.cars;
read all var "mpg_city" into x;
close;
trimX = TrimVec(x, 0.12);

The key is the statement

w = z[ d+1:n-d, ];

The expression d+1:n-d uses the index operator (:) to represent the observations to keep. You can use the same syntax to extract those rows from an entire matrix:

smallerMatrix = bigMatrix[ d+1:n-d, ];

See the article "Creating vectors that contain evenly spaced values" for a description of the index operator (:) .

vince_tsp · Posted 10-20-2015 03:28 AM

Thanks Rick, finally I got my solution after 1 weeks try and errors.

SAS/IML Software and Matrix Computations

Trim data set

Re: Trim data set

Re: Trim data set

Re: Trim data set

Re: Trim data set

Re: Trim data set

help with sas data set

[SAS 활용 FAQ] DATA SET 결합 - 세로결합

[SAS 활용 FAQ] DATA SET 결합 - 가로결합

[SAS 고급] SAS Data Sets 압축 (Compressing SAS Data Sets)

Assign variable names to a data set based on variable names of another...

Follow Us

What is...

SAS/IML Software and Matrix Computations

Our biggest data and AI event of the year.

Follow Us

What is...