BookmarkSubscribeRSS Feed
Fluorite | Level 6

Hello everyone, 

I am new to statistical programming for a multilevel/mixed modeling. I am conducting a study where the outcome is (YES/NO) binary indicating the provision of HIV treatment at the facility level. My primary independent variable is the facility payment type i.e. accepts private, public insurance or cash payment; covariates include the census region, state policy on HIV programs, and state HIV prevalence rate.

 I am conducting logistic regression analysis as follows:

 Provision of HIV treatment= B0 +B1 (payment type) + B2 (census region) + B3 (state HIV policy) + B4 ( tertile of state HIV prevalence rate)

My goal is to assess the association between payment type and provision of HIV treatment. My question is that the unit of analysis for this study is the FACILITY whereas the information for state HIV policy and prevalence rate is pulled from the state level data and then extended to facility level depending on which state the facility belongs to.


For eg: If California has highest HIV prevalance rate, according to the tertile category, it falls in the highest tertile group with value 2; because the tertile has values 0,1,2.

Now because California has the value of prevalence rate=2, all the facilities belonging to the state of California has value of 2 assigned to them for the purpose of the analysis. Similarly facilities belonging to other states have been assigned values depending on which category the state is assigned.


I am not sure which is the correct model to use here? does simple proc logistic will do and if the methodology of extending state level information to facility level is correct because there might be states with a large number of facilities or vice versa?


Your help is appreciated!


Jade | Level 19

The only way I can imagine fitting a mixed model, rather than a fixed effects model, that includes the predictors you have is if the census regions included are not exhaustive of all possible census regions to which you wish your inferences to apply (G-side effect), or that some sort of spatial correlation is to be applied to that variable (R side effect). In either case, a generalized estimating equation (GEE) where census_region is the clustering variable may be more tenable in terms of model convergence and run times. 



Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 2 in conversation