- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Goodday to you,
I am trying to trim the data set which is sorted in ascending order. I have no idea how to trim the data set from start and end for k% of data. Basically most of the post I could found is how to perform trimmed mean, but what I want is just trim the data set.
XN = DAT[POS1:ENDPOS];
CALL SORT(XN);
...
I am very new to SAS IML, hope someone could help me out.
Thanks in advance!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Trimming is the act of truncating the upper and lower tails of an empirical UNIVARIATE distribution, so we don't usually talk about trimming a data set, we talk about trimming a variable.
To trim a variable, look at p. 2-3 of my 2010 SAS Global Forum paper, which has an algorithm for computing the trimmed mean and variance of every column in a matrix. You can modify it to extract the "middle" observations:
proc iml;
/* assume v is a column vector. Return the sorted
elements that result from trimming the largest and smallest
proportion of values. The 'prop' parameter is 0 < prop < 1. */
start TrimVec(v, prop);
n = nrow(v); /* num rows (assume no missing values) */
d = ceil(prop*n); /* number of observations to trim */
z = v; /* copy it */
call sort(z,1); /* sort it */
w = z[d+1:n-d, ]; /* trim d largest and d smallest values */
return (w);
finish;
use sashelp.cars;
read all var "mpg_city" into x;
close;
trimX = TrimVec(x, 0.12);
The key is the statement
w = z[ d+1:n-d, ];
The expression d+1:n-d uses the index operator (:) to represent the observations to keep. You can use the same syntax to extract those rows from an entire matrix:
smallerMatrix = bigMatrix[ d+1:n-d, ];
See the article "Creating vectors that contain evenly spaced values" for a description of the index operator (:) .
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Gday!
I am not sure if/why you need IML. Regular datastep code would be something like:
%let percentage = 20;
data trimmed;
set base nobs=size;
if (size * &percentage / 200) <= _N_ <= size - (size * &percentage / 200) then output;
run;
Hope this helps,
Eric
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Trimming is the act of truncating the upper and lower tails of an empirical UNIVARIATE distribution, so we don't usually talk about trimming a data set, we talk about trimming a variable.
To trim a variable, look at p. 2-3 of my 2010 SAS Global Forum paper, which has an algorithm for computing the trimmed mean and variance of every column in a matrix. You can modify it to extract the "middle" observations:
proc iml;
/* assume v is a column vector. Return the sorted
elements that result from trimming the largest and smallest
proportion of values. The 'prop' parameter is 0 < prop < 1. */
start TrimVec(v, prop);
n = nrow(v); /* num rows (assume no missing values) */
d = ceil(prop*n); /* number of observations to trim */
z = v; /* copy it */
call sort(z,1); /* sort it */
w = z[d+1:n-d, ]; /* trim d largest and d smallest values */
return (w);
finish;
use sashelp.cars;
read all var "mpg_city" into x;
close;
trimX = TrimVec(x, 0.12);
The key is the statement
w = z[ d+1:n-d, ];
The expression d+1:n-d uses the index operator (:) to represent the observations to keep. You can use the same syntax to extract those rows from an entire matrix:
smallerMatrix = bigMatrix[ d+1:n-d, ];
See the article "Creating vectors that contain evenly spaced values" for a description of the index operator (:) .
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content