Hi. One of my regression covariates is LIMITK, a categorical variable related to having a work-limiting condition. The variable has four levels: -9 (not applicable), -8 (did not answer), 1 (condition affects type of work undertaken); and 2 (condition does not affect type of work undertaken). I use the indicator variable SUBCLASS to flag members of my subclass of interest, by including the clause DOMAIN=SUBCLASS. Crucially, there are no records in SUBCLASS=1 that are coded -8 for LIMITK. Nevertheless, the SAS output provides an estimate for the coefficient of the LIMITK dummy corresponding to the value -8. My best guess is that SAS finds it more expedient to keep the estimator the same across both domains, but to model on zeros where there are no applicable values. This approach, while computationally expedient, would not affect the estimated vector parameter beta-hat. Does that make sense to you? Another thing I'm wondering is, is there much sense even bothering with DOMAIN analysis if the overall size of the sample runs into hundereds of thousands? Even if the domain consists of a quarter of records, the variance in the estimate of its size will be the variance of the proportion of a sample, p(1-p)/n, where n is very large. If this formula is at the heart of what SAS is adding to the process when the DOMAIN command is being used (and I'm assuming it is based on my reading of Kish (1965)), it seems to suggest that it really isn't worth the bother.
... View more