Dear all,
I wrote a simulation that about ten percent of the time end with a matrix data that contains one 'zero' column.
I identify those cases with if abs(det(xpx))=0.
I clean the zero column and I try to calculate the OLS coefficients.
But unfortunately I get a matrix without the OLS coefficients.
The program works but I don't get the 10 percent results matrix with data, even though I record the data files and there should be coefficients for the OLS. (data with the zero columns).
I don't understand why. I enclose the program. You can 'run' it and see the results.
1. Can someone understand why I get matrix without data for the 10 percent cases of the simulation?
2. Any suggestions how to correct that?
3. Maybe there is more efficient way to write this program that at the end I can get one matrix with also the 10 percent result, Ideas are welcome..
Thanks in advance!
Orit
proc iml;
n=100;
b=j(6,n);
bpsl=j(5,n);
bnsl=j(5,n);
bptl=j(5,n);
bntl=j(5,n);
do j=1 to n;
xtl=j(240, 1);
xsl=j(240, 1);
ptl=j(240, 1);
psl=j(240, 1);
ntl=j(240, 1);
nsl=j(240, 1);
y=j(240, 1);
xtl[1,1]=-1.73754E-16;*/SAS avg obs*/;
xsl[1,1] =-1.12557E-15;
ptl[1,1]=0.0051111;
psl[1,1] =0.2604981;
ntl[1,1]=-0.0051111;
nsl[1,1] =-0.2604981;
y[1,1] =0.9783607;
a=0.18935 ;*first regression*;
k=0.83843;
ka=0.17226;
kb=-0.39250;
kc=-0.21518;
kd=5.16021;
c=-0.03537; *second regression*;
e=0.02979;
d=0.94756;
h=-2.04415;
m=-0.00034939; *third regression*;
l=0.00023025;
q=-0.00045077;
r=0.98906;
/** specify the mean and covariance of the population **/
Mean = {0, 0, 0};
Cov = {0.0467930604 0.0024632274 0.0000520215,
0.0024632274 0.0080371426 0.0000384574,
0.0000520215 0.0000384574 0.0000022165};
NumSamples = 240;
call randseed(0); /** set seed for the RandNormal module **/
U = RandNormal(NumSamples, Mean, Cov);
do i=2 to 240;
Y[i,1]=a+k*y[(i-1),1]+ka*psl[(i-1),1]+kb*ptl[(i-1),1]+kc*nsl[(i-1),1]+kd*ntl[(i-1),1]+u[i,1];
xsl[i,1]=c+e*y[(i-1),1]+d*xsl[(i-1),1]+h*xtl[(i-1),1]+u[i,2];
xtl[i,1]=m+l*y[(i-1),1]+q*xsl[(i-1),1]+r*xtl[(i-1),1]+u[i,3];
if xsl[i,1]>=0 then do;
psl[i,1]=xsl[i,1];
nsl[i,1]=0;
end;
if xsl[i,1]<0 then do;
psl[i,1]=0;
nsl[i,1]=xsl[i,1];
end;
if xtl[i,1]>=0 then do;
ptl[i,1]=xtl[i,1];
ntl[i,1]=0;
end;
if xtl[i,1]<0 then do;
ptl[i,1]=0;
ntl[i,1]=xtl[i,1];
end;
end;
ylag=lag(y);
/* Step 1: Compute X`X and X`y */
x = j(nrow(y), 1, 1) || ylag || psl|| ptl ||nsl||ntl; /* add intercept column */
v = {1}; /* specify rows to exclude */
idx = setdif(1:nrow(x), v); /* start with 1:240, exclude values in v */
xnew = x[idx, ]; /* extract submatrix */
xpx = xnew`* xnew; /* cross-products */
v = {1}; /* specify rows to exclude */
idy = setdif(1:nrow(y), v); /* start with 1:5, exclude values in v */
ynew = y[idx, ]; /* extract submatrix*/
xpy = xnew` * ynew;
if abs(det(xpx))=0 then do;
z1 = ynew || xnew;
create SINGULAR from z1[c={Y Ylag psl ptl nsl ntl}];
append from z1;
close SINGULAR;
end;
if abs(det(xpx))=0 then do;
if ssq(psl)=0 then do;
z = j(nrow(y), 1, 1) || ylag || ptl ||nsl||ntl; /* add intercept column */
v = {1}; /* specify rows to exclude */
idx = setdif(1:nrow(z), v); /* start with 1:240, exclude values in v */
znew = z[idx, ]; /* extract submatrix */
zpz = znew`* znew; /* cross-products */
v = {1}; /* specify rows to exclude */
idy = setdif(1:nrow(y), v); /* start with 1:5, exclude values in v */
ynew = y[idx, ]; /* extract submatrix*/
zpy = znew` * ynew;
bpsl[,j]= solve(zpz, xpy);
end;
if ssq(ptl)=0 then do;
z = j(nrow(y), 1, 1) || ylag || psl||nsl||ntl; /* add intercept column */
v = {1}; /* specify rows to exclude */
idx = setdif(1:nrow(z), v); /* start with 1:240, exclude values in v */
znew = z[idx, ]; /* extract submatrix */
zpz = znew`* znew; /* cross-products */
v = {1}; /* specify rows to exclude */
idy = setdif(1:nrow(y), v); /* start with 1:5, exclude values in v */
ynew = y[idx, ]; /* extract submatrix*/
zpy = znew` * ynew;
bptl[,j]= solve(zpz, xpy);
end;
if ssq(nsl)=0 then do;
z = j(nrow(y), 1, 1) || ylag || psl|| ptl ||ntl; /* add intercept column */
v = {1}; /* specify rows to exclude */
idx = setdif(1:nrow(z), v); /* start with 1:240, exclude values in v */
znew = x[idx, ]; /* extract submatrix */
zpz = znew`* xnew; /* cross-products */
v = {1}; /* specify rows to exclude */
idy = setdif(1:nrow(y), v); /* start with 1:5, exclude values in v */
ynew = y[idx, ]; /* extract submatrix*/
zpy = znew` * ynew;
bnsl[,j]= solve(zpz, xpy);
end;
if ssq(ntl)=0 then do;
z = j(nrow(y), 1, 1) || ylag || psl|| ptl ||nsl; /* add intercept column */
v = {1}; /* specify rows to exclude */
idx = setdif(1:nrow(x), v); /* start with 1:240, exclude values in v */
znew = x[idx, ]; /* extract submatrix */
zpz = znew`* znew; /* cross-products */
v = {1}; /* specify rows to exclude */
idy = setdif(1:nrow(y), v); /* start with 1:5, exclude values in v */
ynew = y[idx, ]; /* extract submatrix*/
zpy = znew` * ynew;
bntl[,j]= solve(zpz, xpy);
end;
end;
IF(abs(DET(xpx)))>0 then do;
*xpxi = inv(xpx); /* form inverse crossproducts */
*b[,j] = xpxi * xpy; /* solve for parameter estimates */
*b= solve(xpx, xpy);
*print b;
b[,j]= solve(xpx, xpy);
end;
end;
bt=b`;
varNames = "b0":"b5";
experiment="Sim1":"Sim100";
print bt[colname=varNames
rowname=experiment
label="Expanded Model"];
bpslt=bpsl`;
varNames = "b0":"b4";
experiment="Sim1":"Sim100";
print bpslt[colname=varNames
rowname=experiment
label="Expanded Model"];
bnslt=bnsl`;
varNames = "b0":"b4";
experiment="Sim1":"Sim100";
print bnslt[colname=varNames
rowname=experiment
label="Expanded Model"];
bptlt=bptl`;
varNames = "b0":"b4";
experiment="Sim1":"Sim100";
print bptlt[colname=varNames
rowname=experiment
label="Expanded Model"];
bntlt=bntl`;
varNames = "b0":"b4";
experiment="Sim1":"Sim100";
print bntlt[colname=varNames
rowname=experiment
label="Expanded Model"];
quit;
proc print data=singular;
run;
Here's an example of how I'd solve the problem without using any determinants or IF-THEN statements.
proc iml;
x = randnormal(100, j(1,6,0), I(6));
y = randnormal(100, 0, 1);
x[,3]=0; x[,6]=0; /* set some columns to exact 0 */
xpx = x`*x; /* normal equations */
xpy = x`*y;
/* Option 1: Use generalized inverse to solve (possibly singular) normal equations */
beta = ginv(xpx)* xpy;
print beta;
/* Option 2: delete columns of xpx that are exactly zero.
Set coeffs of deleted columns to missing */
beta = j(nrow(xpx), 1, .); /* or you can initialize beta to zero */
dropVars = loc( xpx[##,]=0 );
keepVars = setdif(1:ncol(xpx), dropVars);
beta[keepVars] = solve(xpx[keepVars,keepVars], xpy[keepVars]);
print beta;
Given the complexity of your program, it is difficult to answer your questions.
Try this: Make the program reproducible by using a positive seed to the RANDSEED call, such as seed=1. You then can say something like
"During iteration j=3 [or whatever], my program computes [specify matrices and values that you don't understand]. I expect to see [what you think should happen]. Can anyone explain why I don't get what I expect?"
Thank you Rick for your tip. I'll try to be more clear.
I used Rick tip- I used in the RANDSEED call seed=1,
and during iteration 96 I got column of zero's in the variable 'ptl'.
I have to exclude that variable because the matrix is not invertible.
I catch those situations by the if statement if abs(det(xpx))=0 then do;
And then I have to recognize that variable ( ssq(ptl)=0) and exclude him by:
if ssq(ptl)=0 then do;
z = j(nrow(y), 1, 1) || ylag || psl || nsl || ntl; /*Without ptl variable */
v = {1}; /* specify rows to exclude */
idx = setdif(1:nrow(z), v); /* start with 1:240, exclude values in v */
znew = z[idx, ]; /* extract submatrix */
zpz = znew`* znew; /* cross-products */
v = {1}; /* specify rows to exclude */
idy = setdif(1:nrow(y), v); /* start with 1:5, exclude values in v */
ynew = y[idx, ]; /* extract submatrix*/
zpy = znew` * ynew;
bptl[,j]= solve(zpz, xpy);
end;
But I get the matrix bptl (bptlt=bptl`;) of coefficients OLS results empty in simulation 96.
I'll really appreciate any help,
Thanks,
Orit
I think you cut and pasted and forgot to change variable names. Shouldn't the assignment statement be
bptl[,j]= solve(zpz, zpy); /* ZPY, not XPY */
If this is the mistake, you've made it other places as well.
I'll also mention that you don't need to compute IDY. IDX and IDY are the same. This is fortunate because your program computes
ynew = y[idx, ]; /* should be IDY in subscript */
but it doesn't matter because IDX=IDY.
Your code assumes that you never have multiple zero columns. That is ssq(psl)=0 and ssq(ptl)=0 are assumed never to happen simultaneously. If you can have multiple columns be zero simultaneously, you might consider treating all cases at once:
1) Form x matrix
2) compute ssq = x[##, ]; /* ssq for each column */
3) badCols = loc(ssq=0); /* check for all zeros */
4) remove the bad columns, then compute parameter estimates for the good columns, etc
Dear Rick,
Thanks for your comments and ideas, I corrected those mistakes.
But, unfortunately SAS doesn't recognize this part of the program,( in the 'log' it seems that everything is o.k)
I tried to print the Z, ZPZ, ZPY matrices but I get "ERROR: Matrix z has not been set to a value"
even though in iteration 96 it should be with a value.
Why it doesn't create the Z matrix? ( in the part that starts with: if abs(det(xpx))=0 then do;)
Thank you!
Orit
enclose the corrected program:
proc iml;
n=100;
b=j(6,n);
bpsl=j(5,n);
bnsl=j(5,n);
bptl=j(5,n);
bntl=j(5,n);
do j=1 to n;
xtl=j(240, 1);
xsl=j(240, 1);
ptl=j(240, 1);
psl=j(240, 1);
ntl=j(240, 1);
nsl=j(240, 1);
y=j(240, 1);
xtl[1,1]=-1.73754E-16;*/SAS avg obs*/;
xsl[1,1] =-1.12557E-15;
ptl[1,1]=0.0051111;
psl[1,1] =0.2604981;
ntl[1,1]=-0.0051111;
nsl[1,1] =-0.2604981;
y[1,1] =0.9783607;
a=0.18935 ;*first regression*;
k=0.83843;
ka=0.17226;
kb=-0.39250;
kc=-0.21518;
kd=5.16021;
c=-0.03537; *second regression*;
e=0.02979;
d=0.94756;
h=-2.04415;
m=-0.00034939; *third regression*;
l=0.00023025;
q=-0.00045077;
r=0.98906;
/** specify the mean and covariance of the population **/
Mean = {0, 0, 0};
Cov = {0.0467930604 0.0024632274 0.0000520215,
0.0024632274 0.0080371426 0.0000384574,
0.0000520215 0.0000384574 0.0000022165};
NumSamples = 240;
call randseed(1); /** set seed for the RandNormal module **/
U = RandNormal(NumSamples, Mean, Cov);
do i=2 to 240;
Y[i,1]=a+k*y[(i-1),1]+ka*psl[(i-1),1]+kb*ptl[(i-1),1]+kc*nsl[(i-1),1]+kd*ntl[(i-1),1]+u[i,1];
xsl[i,1]=c+e*y[(i-1),1]+d*xsl[(i-1),1]+h*xtl[(i-1),1]+u[i,2];
xtl[i,1]=m+l*y[(i-1),1]+q*xsl[(i-1),1]+r*xtl[(i-1),1]+u[i,3];
if xsl[i,1]>=0 then do;
psl[i,1]=xsl[i,1];
nsl[i,1]=0;
end;
if xsl[i,1]<0 then do;
psl[i,1]=0;
nsl[i,1]=xsl[i,1];
end;
if xtl[i,1]>=0 then do;
ptl[i,1]=xtl[i,1];
ntl[i,1]=0;
end;
if xtl[i,1]<0 then do;
ptl[i,1]=0;
ntl[i,1]=xtl[i,1];
end;
end;
ylag=lag(y);
/* Step 1: Compute X`X and X`y */
x = j(nrow(y), 1, 1) || ylag || psl|| ptl ||nsl||ntl; /* add intercept column */
v = {1}; /* specify rows to exclude */
idx = setdif(1:nrow(x), v); /* start with 1:240, exclude values in v */
xnew = x[idx, ]; /* extract submatrix */
xpx = xnew`* xnew; /* cross-products */
ynew = y[idx, ]; /* extract submatrix*/
xpy = xnew` * ynew;
if abs(det(xpx))=0 then do;
z1 = ynew || xnew;
create SINGULAR from z1[c={Y Ylag psl ptl nsl ntl}];
append from z1;
close SINGULAR;
end;
if abs(det(xpx))=0 then do;
if ssq(psl)=0 then do;
z = j(nrow(y), 1, 1) || ylag || ptl ||nsl||ntl; /* add intercept column */
v = {1}; /* specify rows to exclude */
idx = setdif(1:nrow(z), v); /* start with 1:240, exclude values in v */
znew = z[idx, ]; /* extract submatrix */
zpz = znew`* znew; /* cross-products */
ynew = y[idx, ]; /* extract submatrix*/
zpy = znew` * ynew;
bpsl[,j]= solve(zpz, zpy);
end;
if ssq(ptl)=0 then do;
z = j(nrow(y), 1, 1) || ylag || psl||nsl||ntl; /* add intercept column */
v = {1}; /* specify rows to exclude */
idx = setdif(1:nrow(z), v); /* start with 1:240, exclude values in v */
znew = z[idx, ]; /* extract submatrix */
zpz = znew`* znew; /* cross-products */
ynew = y[idx, ]; /* extract submatrix*/
zpy = znew` * ynew;
bptl[,j]= solve(zpz, zpy);
end;
if ssq(nsl)=0 then do;
z = j(nrow(y), 1, 1) || ylag || psl|| ptl ||ntl; /* add intercept column */
v = {1}; /* specify rows to exclude */
idx = setdif(1:nrow(z), v); /* start with 1:240, exclude values in v */
znew = z[idx, ]; /* extract submatrix */
zpz = znew`* xnew; /* cross-products */
ynew = y[idx, ]; /* extract submatrix*/
zpy = znew` * ynew;
bnsl[,j]= solve(zpz, zpy);
end;
if ssq(ntl)=0 then do;
z = j(nrow(y), 1, 1) || ylag || psl|| ptl ||nsl; /* add intercept column */
v = {1}; /* specify rows to exclude */
idx = setdif(1:nrow(x), v); /* start with 1:240, exclude values in v */
znew = z[idx, ]; /* extract submatrix */
zpz = znew`* znew; /* cross-products */
ynew = y[idx, ]; /* extract submatrix*/
zpy = znew` * ynew;
bntl[,j]= solve(zpz, zpy);
end;
end;
IF(abs(DET(xpx)))>0 then do;
*xpxi = inv(xpx); /* form inverse crossproducts */
*b[,j] = xpxi * xpy; /* solve for parameter estimates */
*b= solve(xpx, xpy);
*print b;
b[,j]= solve(xpx, xpy);
end;
end;
bt=b`;
varNames = "b0":"b5";
experiment="Sim1":"Sim10000";
print bt[colname=varNames
rowname=experiment
label="Expanded Model"];
bpslt=bpsl`;
varNames = "b0":"b4";
experiment="Sim1":"Sim10000";
print bpslt[colname=varNames
rowname=experiment
label="Expanded Model"];
bnslt=bnsl`;
varNames = "b0":"b4";
experiment="Sim1":"Sim10000";
print bnslt[colname=varNames
rowname=experiment
label="Expanded Model"];
bptlt=bptl`;
varNames = "b0":"b4";
experiment="Sim1":"Sim10000";
print bptlt[colname=varNames
rowname=experiment
label="Expanded Model"];
bntlt=bntl`;
varNames = "b0":"b4";
experiment="Sim1":"Sim10000";
print bntlt[colname=varNames
rowname=experiment
label="Expanded Model"];
quit;
proc print data=singular;
run;
When you compute ssq(psl), ssq(ptl), etc, you are including the first observation of x in the computation. That observation was deleted when you formed xnew.
I think you want to examine whether any values of xnew[##,] are zero, or alternately ssq(psl[idx]).
Dear Rick,
I also used your advice and consider to treat all 'special' cases at once:
if abs(det(xpx))=0 then do;
ssq = xnew[##, ];
znew=xnew[ ,loc(ssq>0)];
zpz = znew`* znew;
zpy = znew` * ynew;
bpsl[,j]= solve(zpz, zpy);
end;
Even though it's really more short and efficient, the problem that I get coefficients but I don't know for which variables.
In the cases where if abs(det(xpx))=0, each time the 'zero' columns come from different variable.
For statistical checks it's really important that I know for each variable his coefficient.
Do you think there is a way in this efficient code that I can know for each coefficient his variable?
Thanks,
Orit
Yes, there is some bookkeeping that I left out that you would need to figure out. Given that you are having lots of problems getting the code to run, I suggest that you ignore my suggestion and save the generalizations for another day.
Dear Rick,
Thank you for your great tips!
Given I have both of your books, maybe there are pages that you recommend me to read about the generalizations ..?
Have a great day,
Orit
When you exclude a column, what value do you assign to the coefficient? Zero? One? Missing?
If a variable has the constant value 0, the corresponding regression coefficient can have any arbitrary value. I guess setting the coefficient to 'missing' makes the most sense?
Dear Rick,
Yes you right. When the variable is a column of zero the coefficient should be missing.
The matrix of the coefficients should be something like:
bo b1 b2 b3 b4 b5 b6
sim1 0.5 0.2 0.3 . 0.8 0.9 0.05
sim2 0.5 0.2 0.3 0.8 0.8 . 0.05
sim3 0.5 0.2 0.3 . 0.8 0.9 .
sim4 0.5 0.2 0.3 0.7 0.8 0.9 0.33
etc..
I would really appreciate your opinion.
Thanks,
Orit
Here's an example of how I'd solve the problem without using any determinants or IF-THEN statements.
proc iml;
x = randnormal(100, j(1,6,0), I(6));
y = randnormal(100, 0, 1);
x[,3]=0; x[,6]=0; /* set some columns to exact 0 */
xpx = x`*x; /* normal equations */
xpy = x`*y;
/* Option 1: Use generalized inverse to solve (possibly singular) normal equations */
beta = ginv(xpx)* xpy;
print beta;
/* Option 2: delete columns of xpx that are exactly zero.
Set coeffs of deleted columns to missing */
beta = j(nrow(xpx), 1, .); /* or you can initialize beta to zero */
dropVars = loc( xpx[##,]=0 );
keepVars = setdif(1:ncol(xpx), dropVars);
beta[keepVars] = solve(xpx[keepVars,keepVars], xpy[keepVars]);
print beta;
Brilliant !!
Thanks so much,
Orit
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.