03-04-2014 04:41 AM
I am wondering whether proc glimmix or proc surveylogistic is the right proc for our analysis.
We analyze data from a European survey (34 countries ) for quality of life , the sampling procedure is described as a " multi-stage stratified and clustured sampling design " . Our outcome variable is binary.
There are three levels of data to be taken into account (individuals, regions, countries). I have read a few things about these procedures, some suggesting that proc glimmix should be used ( since one can take into account the three levels of data ), but also others stating that one should apply proc survey logistic, since the parameters Estimates might otherwise be biased . However in this procedure, one cannot consider the different levels of data.
Can you make a recommendation, which procedure should be used?
Additionally, we are considering to perform a multiple imputation for missing values (we have just a small amount of missing data in our regular variables, but want to adjust for one variable that has missing data of 23%). Proc Mianalyze needs, as I have read, the parameter Estimates and Covariance Matrices , which at least in proc glimmix is not part of the output by default. How does it work with proc surveylogistic?
Thank you for your help in advance!
03-05-2014 04:15 PM
There is not a simple answer to your question. It will depend on your objectives, and you can find examples of both approaches (surveylogistic or glimmix). SURVEYLOGISTIC (and all the "SURVEY" PROCs) is essentially a fixed-effect based procedure. It can explicitly handle two levels of a hierarchy. The "SURVEY" PROCs are for doing so-called design based analyses. The populations being analyzed are considered finite. The goal is usually to very precisely estimate global parameters, such as the expected value. Taking into account the survey design helps in getting those precise estimates. Many would tell you to use this approach. However, a so-called model-based analysis can be done with mixed model software, such as GLIMMIX. You are not taking into account the (variable) sampling weights, so you can get some bias in your results for the global expected value. However, use of covariates can minimize the bias, apparently. If you consider your factor effects in the hierarchy to be random (a reasonable assumption), then you want variance estimates for each factor. You can get these with GLIMMIX (or MIXED for normal data). The book "Small Area Estimation" by J.N.K. Rao makes a good argument in favor of the model-based approach (this book is very heavy on theory). The approach forms a very natural way of get "small area estimates" or "small area predictions" for specific levels of the factors (say, for region 1 of country 2).. Essentially, you are using BLUPs. With design-based analyses, you often need to use synthetic or composite 'estimators' for the same thing.
So, I am not telling you which approach you should take. I recommend you read "Small area estimation for survey data analysis using SAS software" by P. K. Mukhopadhyay and A. McDowell (2011 SAS Global Forum; find it on-line). Nice comparison of the design-based and model-based approaches (although for normal data and two levels). Also, if you have SAS/STAT 13.1 (newly released), then GLIMMIX has new features for model based analyses of survey data that allow for survey weights (different from the usual weights in GLIMMIX). An example is 43.18 in the GLIMMIX User's Guide (for 13.1). The method is called "pseudo-likelihood for weighted multi-level models", although this "pseudo-likelihood" is different from the pseudo-likelihood calculated with the linearization methods for GLMMs.