Hi there!
I am new to SAS and currently using SAS 9.4. I have a dataset containing several financial data for 1.708 companies. I am trying to winsorize this dataset because I am computing index indicators that are based on the average GVA, and there are some outliers in the GVA variable that are inflating the average, causing bias to the indexes. I am using Rick Wicklin's winsorize module (code below) but I keep getting the same error ((execution) Matrix has not been set to a value.) after trying to load all numeric variables into matrix x.
NOTE: Module WINSORIZE defined.
26
27 /* test the algorithm on numerical vars in a data set */
28 use &DSName;
29 read all var _NUM_ into X[colname=varNames];
30 close;
NOTE: Closing DEPO.GVA
31 winX = Winsorize(X, 0.01);
ERROR: (execution) Matrix has not been set to a value.
I have used proc contents and double checked that the variables are numeric, so I don't understand why this error is showing up and can't find any post with a similar problem. Can you please advise?
Many thanks!!
Code used:
libname depo "D:\OneDrive - Universidade do Algarve\Ambiente de Trabalho\Data\08-11-2023 Portugal";
%let DSName = depo.GVA;
proc iml;
/* SAS/IML module to Winsorize each column of a matrix.
Input proportion of observations to Winsorize: prop < 0.5.
Ex: y = Winsorize(x, 0.1) computes the two-side 10% Winsorized data */
start Winsorize(x, prop);
p = ncol(x); /* number of columns */
w = x; /* copy of x */
do i = 1 to p;
z = x[,i]; /* copy i_th column */
n = countn(z); /* count nonmissing values */
k = ceil(prop*n); /* number of obs to trim from each tail */
r = rank(z); /* rank values in i_th column */
/* find target values and obs with smaller/larger values */
lowIdx = loc(r<=k & r^=.);
lowVal = z[loc(r=k+1)];
highIdx = loc(r>=n-k+1);
highVal = z[loc(r=n-k)];
/* Winsorize (replace) k smallest and k largest values */
w[lowIdx,i] = lowVal;
w[highIdx,i] = highVal;
end;
return(w);
finish;
/* test the algorithm on numerical vars in a data set */
use &DSName;
read all var _NUM_ into X[colname=varNames];
close;
winX = Winsorize(X, 0.01);
create WinData from winX[colname=varNames];
append from winX;;
close;
proc means data=WinData;
run;
I will take a guess that you might have columns of data that are full of missing values, or with only a single non-missing value. Remove these variables first and you should be OK.
You've omitted the most important part of the log, which is the set of statements AFTER the ERROR. Those statements tell you which operation failed and the line number of the module on which the error occurred. Please include the entire log.
If you change the %LET statement to
%let DSName = sashelp.cars;
or
%let DSName = sashelp.class;
then the program runs without error, so I assume the issue is related to your data. For example, perhaps there is a variable that has all missing values.
It might be helpful if you post a summary of your data by running the following code:
proc means data=&DSName N NMISS Min Mean Std Max;
var _numeric_;
run;
Dear Rick,
Thank you for your response!
Indeed, some variables had all missing values, which I have now dropped. However, I am still getting the same error. I will post the entire log below:
154 %let DSName = depo.gvanonmiss;
155 proc iml;
NOTE: IML Ready
156 /* SAS/IML module to Winsorize each column of a matrix.
157 Input proportion of observations to Winsorize: prop < 0.5.
158 Ex: y = Winsorize(x, 0.1) computes the two-side 10% Winsorized data */
159 start Winsorize(x, prop);
160 p = ncol(x);
160! /* number of columns */
161 w = x;
161! /* copy of x */
162 do i = 1 to p;
163 z = x[,i];
163! /* copy i_th column */
164 n = countn(z);
164! /* count nonmissing values */
165 k = ceil(prop*n);
165! /* number of obs to trim from each tail */
166 r = rank(z);
166! /* rank values in i_th column */
167 /* find target values and obs with smaller/larger values */
168 lowIdx = loc(r<=k & r^=.);
169 lowVal = z[loc(r=k+1)];
170 highIdx = loc(r>=n-k+1);
171 highVal = z[loc(r=n-k)];
172 /* Winsorize (replace) k smallest and k largest values */
173 w[lowIdx,i] = lowVal;
174 w[highIdx,i] = highVal;
175 end;
176 return(w);
177 finish;
NOTE: Module WINSORIZE defined.
178
179 /* test the algorithm on numerical vars in a data set */
180 use &DSName;
181 read all var _NUM_ into X[colname=varNames];
182 close;
NOTE: Closing DEPO.GVANONMISS
183 winX = Winsorize(X, 0.05);
ERROR: (execution) Matrix has not been set to a value.
operation : [ at line 169 column 17
operands : z, _TEM1003
z 1708 rows 1 col (numeric)
_TEM1003 0 row 0 col (type ?, size 0)
statement : ASSIGN at line 169 column 7
traceback : module WINSORIZE at line 169 column 7
NOTE: Paused in module WINSORIZE.
184
185 create WinData from winX[colname=varNames];
ERROR: Matrix winX has not been set to a value.
statement : CREATE at line 185 column 1
186 append from winX;
ERROR: No data set is currently open for output.
statement : APPEND at line 186 column 1
186! ;
187 close;
188
NOTE: Exiting IML.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE IML used (Total process time):
real time 0.09 seconds
cpu time 0.09 seconds
189 proc means data=WinData;
ERROR: File WORK.WINDATA.DATA does not exist.
190 run;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE MEANS used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
As per my data, there are over 600 variables, so I will post a printscreen of a portion of the summary if that helps:
Many thanks for the help!
I will take a guess that you might have columns of data that are full of missing values, or with only a single non-missing value. Remove these variables first and you should be OK.
Dear IanWakeling,
Thank you for your answer!
Yes, indeed, I did remove all the variables with only missing values but, however, there are some variables with only one non-missing value.in fact, this dataset contains financial information over a period of 24 years so some of the first years have little to no information.
I will try removing these variables and running the code and see if it works now.
Many thanks!
Dear IanWakeling,
Removing all variables with only one non-missing value worked. Thank you so much!
Kind regards,
Carina
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.