Contributor
Posts: 21

Variance Estimation PROC SURVEYREG

Dear ALL,

I am testing some code to compute the variance via Taylor series linearization for a stratified cluster design. However my answer is only correct up to 2 significant digits. Can anyone offer suggestions as to why my answer is not exactly the same as when done with PROC SURVEYREG. See link for formulas SAS/STAT(R) 9.22 User's Guide. I initially thought that the inv( ) function was the culprit but that does not seem to be the case here.

/* TEST CODE */

proc surveyreg data=exexx;

model ptincom = cage white;

weight suppwgt;

cluster keyfitz;

strata strata;

output out = simex residual = res;

run;

proc iml;

use simex;

read all var {cage white} into X;

read all var {ptincom} into y;

read all var {suppwgt} into w;

read all var {strata} into strata;

read all var {keyfitz} into psu;

read all var {res} into e;

close simex;

X = J(nrow(y),1) || X;

M = J(3,3,0);

ustrata = unique(strata);

do h = 1 to ncol(ustrata);                /* strata index*/

hx = loc(strata = ustrata);

ww = w[hx];

ee = e[hx];

XX = X[hx,];

clx = psu[hx];

uclx = unique(clx);

U = J(3,ncol(uclx),.);

do i = 1 to ncol(uclx);           /* cluster (PSU) index*/

dx = loc(clx = uclx);

d = ww[dx]#ee[dx];

P = t(d#XX[dx,]);

eb = P[,+];                /* (p x 1) sum columns */

U[,i] = eb;

end;

Z = U[,:];             /* row means */

nh = ncol(U);           /* number of PSUs in each stratum */

R = nh/(nh - 1)*(U - Z)*t(U - Z);

M = M + R;

end;

n = nrow(X);

M = (n-1)/(n-3)*M;

B = t(X)*(w#X); /* X' D X where D = diag(w) */

V = inv(B)*M*inv(B); /* covariance estimation  */

se = sqrt(vecdiag(V)); /* estimated standard error */

print se;

quit;

Regards,

Raphael

SAS Super FREQ
Posts: 4,172

Re: Variance Estimation PROC SURVEYREG

Interesting program.   I am not an expert in survey statistics, but I don't see anything obviously wrong.  Do you have any missing values in your data? Any degenerate strata? That might affect DOF computations.

There is a SAS macro that does computations similar to SURVEYREG. You might try the macro and see what results it gives.  If it matches the PROC, then look at the code it calls and see if you can find where it differs from your code.

You didn't include data, so no way for anyone to run your code.  The following simulation code creates a data set that may or may not be similar to your data. For this data set, the standard errors from SURVEYREG and IML match, so I assume that there is something special about your data.

data exexx;

call streaminit(1234);

do strata = 1 to 5;

do keyfitz = 1 to 4;

do i = 1 to 3+rand("poisson",5);

cage=rand("Normal");

white=rand("Normal");

ptincom=3+0.5*cage-1.9*white + rand("Normal");

suppwgt=strata+10;

output;

end;

end;

end;

run;

Contributor
Posts: 21

Re: Variance Estimation PROC SURVEYREG

It turns that the problem is with the data and not the code. Thank you so much.

Raphael

Super User
Posts: 13,338

Re: Variance Estimation PROC SURVEYREG

Do any of the variables used in the surveyreg except the weight variable have a format assigned? If so, might it cause some grouping?

If you have values in the data of 12345.678 and 12345.999 and the format is f5. then both could be treated as 12345 in any calculations.

Discussion stats
• 3 replies
• 400 views
• 7 likes
• 3 in conversation