Dear ALL,
I am testing some code to compute the variance via Taylor series linearization for a stratified cluster design. However my answer is only correct up to 2 significant digits. Can anyone offer suggestions as to why my answer is not exactly the same as when done with PROC SURVEYREG. See link for formulas SAS/STAT(R) 9.22 User's Guide. I initially thought that the inv( ) function was the culprit but that does not seem to be the case here.
/* TEST CODE */
proc surveyreg data=exexx;
model ptincom = cage white;
weight suppwgt;
cluster keyfitz;
strata strata;
output out = simex residual = res;
run;
proc iml;
use simex;
read all var {cage white} into X;
read all var {ptincom} into y;
read all var {suppwgt} into w;
read all var {strata} into strata;
read all var {keyfitz} into psu;
read all var {res} into e;
close simex;
X = J(nrow(y),1) || X;
M = J(3,3,0);
ustrata = unique(strata);
do h = 1 to ncol(ustrata); /* strata index*/
hx = loc(strata = ustrata
ww = w[hx];
ee = e[hx];
XX = X[hx,];
clx = psu[hx];
uclx = unique(clx);
U = J(3,ncol(uclx),.);
do i = 1 to ncol(uclx); /* cluster (PSU) index*/
dx = loc(clx = uclx);
d = ww[dx]#ee[dx];
P = t(d#XX[dx,]);
eb = P[,+]; /* (p x 1) sum columns */
U[,i] = eb;
end;
Z = U[,:]; /* row means */
nh = ncol(U); /* number of PSUs in each stratum */
R = nh/(nh - 1)*(U - Z)*t(U - Z);
M = M + R;
end;
n = nrow(X);
M = (n-1)/(n-3)*M;
B = t(X)*(w#X); /* X' D X where D = diag(w) */
V = inv(B)*M*inv(B); /* covariance estimation */
se = sqrt(vecdiag(V)); /* estimated standard error */
print se;
quit;
Regards,
Raphael
Interesting program. I am not an expert in survey statistics, but I don't see anything obviously wrong. Do you have any missing values in your data? Any degenerate strata? That might affect DOF computations.
There is a SAS macro that does computations similar to SURVEYREG. You might try the macro and see what results it gives. If it matches the PROC, then look at the code it calls and see if you can find where it differs from your code.
You didn't include data, so no way for anyone to run your code. The following simulation code creates a data set that may or may not be similar to your data. For this data set, the standard errors from SURVEYREG and IML match, so I assume that there is something special about your data.
data exexx;
call streaminit(1234);
do strata = 1 to 5;
do keyfitz = 1 to 4;
do i = 1 to 3+rand("poisson",5);
cage=rand("Normal");
white=rand("Normal");
ptincom=3+0.5*cage-1.9*white + rand("Normal");
suppwgt=strata+10;
output;
end;
end;
end;
run;
It turns that the problem is with the data and not the code. Thank you so much.
Raphael
Do any of the variables used in the surveyreg except the weight variable have a format assigned? If so, might it cause some grouping?
If you have values in the data of 12345.678 and 12345.999 and the format is f5. then both could be treated as 12345 in any calculations.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.