Dear ALL,
I am testing some code to compute the variance via Taylor series linearization for a stratified cluster design. However my answer is only correct up to 2 significant digits. Can anyone offer suggestions as to why my answer is not exactly the same as when done with PROC SURVEYREG. See link for formulas SAS/STAT(R) 9.22 User's Guide. I initially thought that the inv( ) function was the culprit but that does not seem to be the case here.
/* TEST CODE */
proc surveyreg data=exexx;
model ptincom = cage white;
weight suppwgt;
cluster keyfitz;
strata strata;
output out = simex residual = res;
run;
proc iml;
use simex;
read all var {cage white} into X;
read all var {ptincom} into y;
read all var {suppwgt} into w;
read all var {strata} into strata;
read all var {keyfitz} into psu;
read all var {res} into e;
close simex;
X = J(nrow(y),1) || X;
M = J(3,3,0);
ustrata = unique(strata);
do h = 1 to ncol(ustrata); /* strata index*/
hx = loc(strata = ustrata);
ww = w[hx];
ee = e[hx];
XX = X[hx,];
clx = psu[hx];
uclx = unique(clx);
U = J(3,ncol(uclx),.);
do i = 1 to ncol(uclx); /* cluster (PSU) index*/
dx = loc(clx = uclx);
d = ww[dx]#ee[dx];
P = t(d#XX[dx,]);
eb = P[,+]; /* (p x 1) sum columns */
U[,i] = eb;
end;
Z = U[,:]; /* row means */
nh = ncol(U); /* number of PSUs in each stratum */
R = nh/(nh - 1)*(U - Z)*t(U - Z);
M = M + R;
end;
n = nrow(X);
M = (n-1)/(n-3)*M;
B = t(X)*(w#X); /* X' D X where D = diag(w) */
V = inv(B)*M*inv(B); /* covariance estimation */
se = sqrt(vecdiag(V)); /* estimated standard error */
print se;
quit;
Regards,
Raphael