Hi there!
I am new to SAS and currently using SAS 9.4. I have a dataset containing several financial data for 1.708 companies. I am trying to winsorize this dataset because I am computing index indicators that are based on the average GVA, and there are some outliers in the GVA variable that are inflating the average, causing bias to the indexes. I am using Rick Wicklin's winsorize module (code below) but I keep getting the same error ((execution) Matrix has not been set to a value.) after trying to load all numeric variables into matrix x.
NOTE: Module WINSORIZE defined.
26
27 /* test the algorithm on numerical vars in a data set */
28 use &DSName;
29 read all var _NUM_ into X[colname=varNames];
30 close;
NOTE: Closing DEPO.GVA
31 winX = Winsorize(X, 0.01);
ERROR: (execution) Matrix has not been set to a value.
I have used proc contents and double checked that the variables are numeric, so I don't understand why this error is showing up and can't find any post with a similar problem. Can you please advise?
Many thanks!!
Code used:
libname depo "D:\OneDrive - Universidade do Algarve\Ambiente de Trabalho\Data\08-11-2023 Portugal";
%let DSName = depo.GVA;
proc iml;
/* SAS/IML module to Winsorize each column of a matrix. 
   Input proportion of observations to Winsorize: prop < 0.5. 
   Ex:  y = Winsorize(x, 0.1) computes the two-side 10% Winsorized data */
start Winsorize(x, prop);
   p = ncol(x);            /* number of columns */
   w = x;                  /* copy of x */
   do i = 1 to p;
      z = x[,i];           /* copy i_th column */
      n = countn(z);       /* count nonmissing values */
      k = ceil(prop*n);    /* number of obs to trim from each tail */
      r = rank(z);         /* rank values in i_th column */
      /* find target values and obs with smaller/larger values */
      lowIdx = loc(r<=k & r^=.);
      lowVal = z[loc(r=k+1)]; 
      highIdx = loc(r>=n-k+1);
      highVal = z[loc(r=n-k)]; 
      /* Winsorize (replace) k smallest and k largest values */
      w[lowIdx,i] = lowVal;
      w[highIdx,i] = highVal;
   end;
   return(w);
finish;
 
/* test the algorithm on numerical vars in a data set */
use &DSName;
read all var _NUM_ into X[colname=varNames];
close;
winX = Winsorize(X, 0.01);
create WinData from winX[colname=varNames];
append from winX;;
close;
proc means data=WinData;
run;
I will take a guess that you might have columns of data that are full of missing values, or with only a single non-missing value. Remove these variables first and you should be OK.
You've omitted the most important part of the log, which is the set of statements AFTER the ERROR. Those statements tell you which operation failed and the line number of the module on which the error occurred. Please include the entire log.
If you change the %LET statement to 
%let DSName = sashelp.cars;
or
%let DSName = sashelp.class;
then the program runs without error, so I assume the issue is related to your data. For example, perhaps there is a variable that has all missing values.
It might be helpful if you post a summary of your data by running the following code:
proc means data=&DSName N NMISS Min Mean Std Max;
var _numeric_;
run;
Dear Rick,
Thank you for your response!
Indeed, some variables had all missing values, which I have now dropped. However, I am still getting the same error. I will post the entire log below:
154  %let DSName = depo.gvanonmiss;
155  proc iml;
NOTE: IML Ready
156  /* SAS/IML module to Winsorize each column of a matrix.
157     Input proportion of observations to Winsorize: prop < 0.5.
158     Ex:  y = Winsorize(x, 0.1) computes the two-side 10% Winsorized data */
159  start Winsorize(x, prop);
160     p = ncol(x);
160!                            /* number of columns */
161     w = x;
161!                            /* copy of x */
162     do i = 1 to p;
163        z = x[,i];
163!                            /* copy i_th column */
164        n = countn(z);
164!                            /* count nonmissing values */
165        k = ceil(prop*n);
165!                            /* number of obs to trim from each tail */
166        r = rank(z);
166!                            /* rank values in i_th column */
167        /* find target values and obs with smaller/larger values */
168        lowIdx = loc(r<=k & r^=.);
169        lowVal = z[loc(r=k+1)];
170        highIdx = loc(r>=n-k+1);
171        highVal = z[loc(r=n-k)];
172        /* Winsorize (replace) k smallest and k largest values */
173        w[lowIdx,i] = lowVal;
174        w[highIdx,i] = highVal;
175     end;
176     return(w);
177  finish;
NOTE: Module WINSORIZE defined.
178
179  /* test the algorithm on numerical vars in a data set */
180  use &DSName;
181  read all var _NUM_ into X[colname=varNames];
182  close;
NOTE: Closing DEPO.GVANONMISS
183  winX = Winsorize(X, 0.05);
ERROR: (execution) Matrix has not been set to a value.
 operation : [ at line 169 column 17
 operands  : z, _TEM1003
z   1708 rows      1 col     (numeric)
_TEM1003      0 row       0 col     (type ?, size 0)
 statement : ASSIGN at line 169 column 7
 traceback : module WINSORIZE at line 169 column 7
NOTE: Paused in module WINSORIZE.
184
185  create WinData from winX[colname=varNames];
ERROR: Matrix winX has not been set to a value.
 statement : CREATE at line 185 column 1
186  append from winX;
ERROR: No data set is currently open for output.
 statement : APPEND at line 186 column 1
186!                  ;
187  close;
188
NOTE: Exiting IML.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE IML used (Total process time):
      real time           0.09 seconds
      cpu time            0.09 seconds
189  proc means data=WinData;
ERROR: File WORK.WINDATA.DATA does not exist.
190  run;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE MEANS used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
As per my data, there are over 600 variables, so I will post a printscreen of a portion of the summary if that helps:
Many thanks for the help!
I will take a guess that you might have columns of data that are full of missing values, or with only a single non-missing value. Remove these variables first and you should be OK.
Dear IanWakeling,
Thank you for your answer!
Yes, indeed, I did remove all the variables with only missing values but, however, there are some variables with only one non-missing value.in fact, this dataset contains financial information over a period of 24 years so some of the first years have little to no information.
I will try removing these variables and running the code and see if it works now.
Many thanks!
Dear IanWakeling,
Removing all variables with only one non-missing value worked. Thank you so much!
Kind regards,
Carina
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.