Programming the statistical procedures from SAS

Beneath the hood of the DOMAIN statement in PROC SURVEYLOGISTIC

Occasional Contributor
Posts: 6

Beneath the hood of the DOMAIN statement in PROC SURVEYLOGISTIC

Hello -


I have a complex survey sample, and want to perform logistic regression on a subsample. I retain a survey observation in my subsample if a variable D takes value 1, and reject it if the variable D takes value 0. A drawback is that I don't know what proportion of the overall population (sampling frame) has D=1.  D does not define a stratum or a cluster in the design, and occurs in various proportions among the different strata.


Leslie Kish, in his classic book 'Survey Sampling' (1965) calls analysis of such a subsample "subclass analysis", and gives formulas for the estimation of a mean of a variable across such a subsample, as well as the variance of the mean. The variance, in particular, is inflated because of uncertainty around the true proportion of the population in each stratum for which D=1 holds.


In THE SAS/STAT procedure SURVEYLOGISTIC, all that is necessary to achieve correct estimation of regression parameters in this situation is to include the statement DOMAIN=D.  However, I have reviewed the full documentation for SAS/STAT, as well as a range of methodological papers, and nowhere is it clear exactly what formula PROC SURVEYLOGISTIC uses to estimate variance of logistic regression parameters. I am writing a research paper, and need to satisfy myself as to exactly what the software is doing to analyse my data. 


If anybody knows where I might find the exact mathematics behind the DOMAIN statement in PROC SURVEYLOGISTIC, I would be very grateful for a pointer.


Many thanks,


Ask a Question
Discussion stats
  • 0 replies
  • 1 in conversation