Statistical programming, matrix languages, and more

Winsorized mean across variables

Accepted Solution Solved
Reply
Contributor
Posts: 40
Accepted Solution

Winsorized mean across variables

I would like to create a winsorized mean across variable and store it as a new variable in the original data set.

Is there a way in SAS to get winsorized mean across variables ? Below is the code for generating input dataset.

data have;

  input var1 $ var var3;

datalines;

_1 10 12

_2 20 14

_3 30 16

_4 40 18

_5 50 20

_6 60 22

_7 70 24

_8 80 26

;

run;

I know proc IML has the facility to do winsorized mean for each column, I need to do something similar but for each row instead of column. The desired output dataset would have all the original variables: var1, var2, var3 and also the calculated winsorized mean for each row.

proc iml;

    use have;

    read all var _NUM_ into X;

    read all var _CHAR_ into C;

    close have;

    mean = mean(x[,],"winsorized", 1);

    create want from mean;

  append from mean;

quit;

thanks for your help


Accepted Solutions
Solution
‎03-19-2014 02:46 PM
SAS Super FREQ
Posts: 3,406

Re: Winsorized mean across variables

Agree with Steve.

data have;
array var[10];
do i = 1 to 20;
   do j = 1 to dim(var);
      var{j} = rand("normal", i);
   end;
   output;
end;

proc iml;

use have;
read all var _NUM_ into X[c=varNames];
close have;

mean = mean(x`,"winsorized", 1);
newX = x || mean`;

varNames = varNames || "WinMean";
create want from newX[c=varNames];
append from newX;
close want;

View solution in original post


All Replies
Respected Advisor
Posts: 2,655

Re: Winsorized mean across variables

Can you read all of this into a matrix, then transpose the matrix, get the winsorized mean for each column (transposed row), append to the matrix, and then re-transponse?

Steve Denham

Solution
‎03-19-2014 02:46 PM
SAS Super FREQ
Posts: 3,406

Re: Winsorized mean across variables

Agree with Steve.

data have;
array var[10];
do i = 1 to 20;
   do j = 1 to dim(var);
      var{j} = rand("normal", i);
   end;
   output;
end;

proc iml;

use have;
read all var _NUM_ into X[c=varNames];
close have;

mean = mean(x`,"winsorized", 1);
newX = x || mean`;

varNames = varNames || "WinMean";
create want from newX[c=varNames];
append from newX;
close want;

Contributor
Posts: 40

Re: Winsorized mean across variables

Rick, perfect thanks so much. One additional question, what if I have a character variable in the original dataset, and wanted to append to the newly created dataset. Proc IML is not letting me mix both character and numeric variables.

Can you please let  me know how you would be able to do this.

Appreciate your response

Thanks so much

Grand Advisor
Posts: 17,338

Re: Winsorized mean across variables

Are you going into IML solely for Winsorized mean? If so, proc univariate can calculate the winsorized mean, similar process, transpose, calculate, transpose again.

Contributor
Posts: 40

Re: Winsorized mean across variables

I have 120,000 observations, using a proc univariate to transpose the 120K observations would have performacne issues. That is why I was looking for Proc IML for Winsorized mean.

Grand Advisor
Posts: 17,338

Re: Winsorized mean across variables

How many vars do you have?  For 120K rows with 50 vars it processed in less than a minute for me.

The output comes in a column format, so its Transpose, Univariate then merge.

data random;

    array var(50) var1-var50;

    do ID=1 to 120000;

        do k=1 to 50;

            var(k)=rand('normal', 20, k);

        end;

    output;

    end;

    drop k;

run;

*Transpose;

proc transpose data=random out=step1 prefix=value;

by ID;

run;

ods listing close;

proc univariate data=step1 winsorized=0.1;

by ID;

var value1;

ods output winsorizedMeans=step2;

run;

SAS Super FREQ
Posts: 3,406

Re: Winsorized mean across variables

The efficient way is to write ONLY the winsorized mean to a new data set and then use the DATA step to add the new column to the original data. This avoids reading all the data into IML just so that you can write it out again:

WinMean = mean(x`,"winsorized", 1);
  create want var {"WinMean"};
append;
close want;

quit;

data want;

merge have want;

run;

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 623 views
  • 3 likes
  • 4 in conversation