BookmarkSubscribeRSS Feed
eabbo
Calcite | Level 5

I have a data set that has 50 array variables (x_1, x_2, x_3...) and each variable has around 7000 observations and I need to push each variable thru a macro that will take them thru proc univariate to create a trimmed means for each variable by a group variable (shown as AGE below) and then produce one table that has all of the amended variables in it.  Any help with how this macro would be set up would be greatly appreciated.

 

 

 

proc univariate data=DATASET  trimmed=1 .1;

   by AGE
   var X_1 X_2 ...  ;
run;

 

 

5 REPLIES 5
ballardw
Super User

If your variables are truly named X_1 X_2 and such then you don't need any macro.


proc univariate data=dataset trimmed=1.1;
   by age;
   var x_1 x_2 x_3 ;
output out=means mean=x_1 x_2 x_3 ;
run;

Just put your list of variables on the Var line and then after the mean.

 

FreelanceReinh
Jade | Level 19

Hello @eabbo,

 

I agree with @ballardw that you most likely won't need a macro. However, I'm not quite sure what you mean by "amended variables."

 

Given the large number of variables (times the number of BY groups), you will probably want to create a dataset containing the trimmed means.


Please note that with your specification trimmed=1 .1 you request two different types of k-times trimmed means: k=1 and k=ceil(n_i 0.1) where n_i is the size of the i-th BY group. I think you need to use ODS OUTPUT to get these into a dataset. The MEAN= option of the OUTPUT statement requests the ordinary (untrimmed) means, regardless of the TRIMMED= option in the PROC UNIVARIATE statement.

 

/* Create test data */

data have;
array x_[50];
do i=1 to 7000;
  age=51+int(i/2500);
  do j=1 to dim(x_);
    x_[j]=100+10*rannor(2718);
  end;
  output;
end;
drop i j;
run;

/* Write two types of trimmed means to dataset TMEANS */

ods listing close;
ods output TrimmedMeans=tmeans;
proc univariate data=have trimmed=1 .1;
by age;
var x:;
run;
ods listing;

 

Or do you need the trimmed data, i.e., a dataset where the smallest and largest k values per BY group are deleted or set to missing?

eabbo
Calcite | Level 5

Correct, I do need the trimmed data per BY group in a data set with the outliers either deleted or set to missing.

 

 

 

Within the output for proc univariate or proc mean can you specify an output argument similar to the below that would work? Where I want the descriptive statistics for all the variables within my array (x_1, x_2, x_3.... x_56) but I want a seperate table for each variable.

 

proc mean data= ;

by age;

var x_1 x_2 x_3... x_56;

output 

     IF var = x_1 THEN output=X_1 mean=average n=n lclm =lower uclm=upper ;

   IF var = x_2 THEN output=X_2 mean=average n=n lclm =lower uclm=upper ;

   IF var = x_3 THEN output=X_3 mean=average n=n lclm =lower uclm=upper ;

....

   IF var = x_56 THEN output=X_56 mean=average n=n lclm =lower uclm=upper ;

run;

FreelanceReinh
Jade | Level 19

Unfortunately, as far as I know, PROC UNIVARIATE does not provide an option to write trimmed (or Winsorized) data to a dataset, and PROC MEANS (and PROC SUMMARY) do not even compute trimmed means.

 

So, you'll have to trim your data "manually," e.g., in a data step or with SAS/IML (if available). Earlier this month, there was a similar discussion about Winsorizing variables with valuable contributions by @Rick_SAS and others. I think it should be possible to adapt code for Winsorizing data in such a way that the outliers are removed or set to missing rather than replaced by less extreme values.

 

As a first guess, I could imagine that it'll be helpful to transpose your data (at least temporarily) so that the 56 variables are turned into BY groups of a single variable. 56*7000=392000 observations are still easy to handle with SAS. Then you could trim the values within the "combined" BY groups (i.e. the new ones due to transposing within the old ones "BY age") and eventually re-transpose.

 

Just give it a try. If you experience any difficulties, please feel free to ask again. Good luck!

Rick_SAS
SAS Super FREQ

Why do you think you need to winsorize each variable? If your answer is "because there might be outliers," I would note that there are statistical techniques that are robust to outliers, such as robust regression.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1398 views
  • 1 like
  • 4 in conversation