03-17-2016 04:01 PM
I have a data set that has 50 array variables (x_1, x_2, x_3...) and each variable has around 7000 observations and I need to push each variable thru a macro that will take them thru proc univariate to create a trimmed means for each variable by a group variable (shown as AGE below) and then produce one table that has all of the amended variables in it. Any help with how this macro would be set up would be greatly appreciated.
proc univariate data=DATASET trimmed=1 .1;
var X_1 X_2 ... ;
03-17-2016 04:23 PM
If your variables are truly named X_1 X_2 and such then you don't need any macro.
proc univariate data=dataset trimmed=1.1; by age; var x_1 x_2 x_3 ; output out=means mean=x_1 x_2 x_3 ; run;
Just put your list of variables on the Var line and then after the mean.
03-17-2016 06:01 PM
I agree with @ballardw that you most likely won't need a macro. However, I'm not quite sure what you mean by "amended variables."
Given the large number of variables (times the number of BY groups), you will probably want to create a dataset containing the trimmed means.
Please note that with your specification trimmed=1 .1 you request two different types of k-times trimmed means: k=1 and k=ceil(n_i * 0.1) where n_i is the size of the i-th BY group. I think you need to use ODS OUTPUT to get these into a dataset. The MEAN= option of the OUTPUT statement requests the ordinary (untrimmed) means, regardless of the TRIMMED= option in the PROC UNIVARIATE statement.
/* Create test data */ data have; array x_; do i=1 to 7000; age=51+int(i/2500); do j=1 to dim(x_); x_[j]=100+10*rannor(2718); end; output; end; drop i j; run; /* Write two types of trimmed means to dataset TMEANS */ ods listing close; ods output TrimmedMeans=tmeans; proc univariate data=have trimmed=1 .1; by age; var x:; run; ods listing;
Or do you need the trimmed data, i.e., a dataset where the smallest and largest k values per BY group are deleted or set to missing?
03-18-2016 03:57 PM
Correct, I do need the trimmed data per BY group in a data set with the outliers either deleted or set to missing.
Within the output for proc univariate or proc mean can you specify an output argument similar to the below that would work? Where I want the descriptive statistics for all the variables within my array (x_1, x_2, x_3.... x_56) but I want a seperate table for each variable.
proc mean data= ;
var x_1 x_2 x_3... x_56;
IF var = x_1 THEN output=X_1 mean=average n=n lclm =lower uclm=upper ;
IF var = x_2 THEN output=X_2 mean=average n=n lclm =lower uclm=upper ;
IF var = x_3 THEN output=X_3 mean=average n=n lclm =lower uclm=upper ;
IF var = x_56 THEN output=X_56 mean=average n=n lclm =lower uclm=upper ;
03-18-2016 04:58 PM
Unfortunately, as far as I know, PROC UNIVARIATE does not provide an option to write trimmed (or Winsorized) data to a dataset, and PROC MEANS (and PROC SUMMARY) do not even compute trimmed means.
So, you'll have to trim your data "manually," e.g., in a data step or with SAS/IML (if available). Earlier this month, there was a similar discussion about Winsorizing variables with valuable contributions by @Rick_SAS and others. I think it should be possible to adapt code for Winsorizing data in such a way that the outliers are removed or set to missing rather than replaced by less extreme values.
As a first guess, I could imagine that it'll be helpful to transpose your data (at least temporarily) so that the 56 variables are turned into BY groups of a single variable. 56*7000=392000 observations are still easy to handle with SAS. Then you could trim the values within the "combined" BY groups (i.e. the new ones due to transposing within the old ones "BY age") and eventually re-transpose.
Just give it a try. If you experience any difficulties, please feel free to ask again. Good luck!
03-19-2016 03:51 PM
Why do you think you need to winsorize each variable? If your answer is "because there might be outliers," I would note that there are statistical techniques that are robust to outliers, such as robust regression.