BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lbahanan
Calcite | Level 5

Hello,

 

Is it ok to use proc surveylogistic to compare a group of 41 participants to 10144 participants using a national data (NHANES) and weights?

or there is a special syntax ?

 

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
sbxkoenk
SAS Super FREQ

Hello,

 

I am not really familiar with survey research.

There may be some peculiarities and intricacies when the data at hand are survey data.

 

Anyway, you have very few observations in the rare category ( # 41 ).

Thus, what you want to do is known as RARE EVENT MODELLING. 

The problem is that maximum likelihood estimation of the parameters of the logistic model is well-known to suffer from substantial bias when you have such a small number of cases on the rarer of the two outcomes.

 

You could use a method, known as "penalized likelihood" (also called the Firth method, after its inventor). Penalized likelihood is a general approach to reducing small-sample bias in maximum likelihood estimation.

In PROC LOGISTIC there's the "firth" option on the model statement, but PROC SURVEYLOGISTIC does not have this option.

 

I hope a survey analytics specialist will chime in to help you out.

 

See also this paper from SAS Global Forum 2020:

Paper 4654-2020
Rare Events or Non-Convergence with a Binary Outcome? The Power of Firth Regression in PROC LOGISTIC
Patrick Karabon, Oakland University William Beaumont School of Medicine

https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2020/4654-2020.pdf

 

Kind regards,

Koen

View solution in original post

11 REPLIES 11
sbxkoenk
SAS Super FREQ

Hello,

 

Can you elaborate a bit?

What exactly do you want to do?

 

Do you want to fit a binary response model where 41 people have a 1-response and 10144 have a 0-response (or vice versa)?

 

Where exactly do you have this severe unbalanced-ness (if that is an existing English word)?

 

Thanks,

Koen

lbahanan
Calcite | Level 5
Yes, exactly. I want to fit a binary response model where 41 people have a 1-response and 10144 have a 0-response and this is variable is the predictor (independent variable) and I have other confounders in the model.
sbxkoenk
SAS Super FREQ

Hello,

 

I am not really familiar with survey research.

There may be some peculiarities and intricacies when the data at hand are survey data.

 

Anyway, you have very few observations in the rare category ( # 41 ).

Thus, what you want to do is known as RARE EVENT MODELLING. 

The problem is that maximum likelihood estimation of the parameters of the logistic model is well-known to suffer from substantial bias when you have such a small number of cases on the rarer of the two outcomes.

 

You could use a method, known as "penalized likelihood" (also called the Firth method, after its inventor). Penalized likelihood is a general approach to reducing small-sample bias in maximum likelihood estimation.

In PROC LOGISTIC there's the "firth" option on the model statement, but PROC SURVEYLOGISTIC does not have this option.

 

I hope a survey analytics specialist will chime in to help you out.

 

See also this paper from SAS Global Forum 2020:

Paper 4654-2020
Rare Events or Non-Convergence with a Binary Outcome? The Power of Firth Regression in PROC LOGISTIC
Patrick Karabon, Oakland University William Beaumont School of Medicine

https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2020/4654-2020.pdf

 

Kind regards,

Koen

lbahanan
Calcite | Level 5
Thank you so much for the helpful comment!

Unfortunately, I’m using PROC SURVEYLOGISTIC.

Can I proceed with it and mention the substantial bias in the limitations? Or it is not acceptable?

Thank you!!
Lina
sbxkoenk
SAS Super FREQ

Hello,

 

You ask:
Can I proceed with it and mention the substantial bias in the limitations? Or it is not acceptable?

Is this for a research paper in a journal?
It is a severe limitation. 
I would still take into account the serious unbalanced-ness in your analysis one way or another.

 

Good luck,
Koen

sbxkoenk
SAS Super FREQ

Hello,

 

On top of my previous reply (see above) ...

I know that @gcjfernandez is rather savvy about our survey procedures. Would be good to have his opinion.

 

Thanks,

Koen

lbahanan
Calcite | Level 5
Yes, it is a research paper.

Can you have a look at this paper.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6475717/#!po=1.85185

As I understood they only did PROC SURVEYLOGISTIC.

gcjfernandez
SAS Employee

Please provide the details of your data, design and objectives?

The paper you cited used NHANSES data and using SURVEYLOGISTIC is appropriate there.

 

Whether your data is also originated from National survey data such as  NHANES?

Then the reference population is what? Finite? Infinite?

 

Probability designs? Survey design, stratification? multistage cluster design? Survey weights? Missing value imputation?

Post stratification adjustment?

 

Please watch my past presentation on this topic? 

 

Please check this link: https://communities.sas.com/t5/Ask-the-Expert/What-Are-Best-Practices-for-Using-SAS-Survey-Procedure...

 

Thanks

 

lbahanan
Calcite | Level 5

Thank you for your helpful comment!

 

Yes, I'm using NHANES. So, I don't have any problem, right?

if yes, please can you explain the reason.

I'm to comparing 44 subjects with 10144 subjects.

 

Thank you!!

 

gcjfernandez
SAS Employee

Glad to know your data is from NHANES and therefore you should have the right design weights (Mec2yr or Mec4yr), design variables (Strata, primary cluster unit) in your data. You will be choosing one of the variance estimation methods(Tyler series-Default, JK or BRR) in your analysis. When you make comparison in SURVEYLOGISIC with the correct syntax between 42 and full sample, you are making inferences about the reference population. Therefore you can use either LSMEANS. ESTIMATE or LSMESTIMATE when making your comparison. If there is any issues with convergence or estimation SAS log will notify you.

Hope this helps.

 

 

lbahanan
Calcite | Level 5

It helps a lot!

Thank you so much!

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 974 views
  • 1 like
  • 3 in conversation