BookmarkSubscribeRSS Feed
_maldini_
Barite | Level 11

I am building sequential logistic regression models using PROC SURVEYLOGISTIC (weighted data). I have an exposure variable, an outcome variable and several covariates. As you would expect, the more covariates I add to the model, the lower the "Number of Observations Used" (See image #1).

 

Screen Shot 2022-08-30 at 4.59.57 PM.png

 

I'm trying to create a table (above), that displays the number of events and the weighted percentage for each level of the exposure variable, for each model (each model has a different number of observations). 

 

 

My understanding is that the difference between the number of observations in the domain and the number of observations used is a function of missing values, non-positive weights and possibly other factors. 

image.png

 

The log (See image #3 below) says nothing of missing values or non-positive weights.

 

Screen Shot 2022-08-30 at 5.11.42 PM.png

 

I can get the number of events and weighted percentage by using PROC SURVEGYFREQ (not shown) with a DOMAIN statement, but that uses the Number of Observations in Domain, not the Number of Observations Used.

 

My syntax for the DOMAIN variable is intended to eliminate any missing values: 

 

flag_1 = 
	(
	cann_use_status ne . and 
	ever_told_mi in(1,2) and
	age_yrs ne . and
 	gender ne . and
	race_all ne . and
 	educ_gtet_20 ne . and
 	hh_income ne . 
	); 

How can I get the number of events (i.e., number of "Yes" in ever_told_mi) and the weighted percentage using the Number of Observations Used, not the Number of Observations in Domain? 

 

 

Sorry for the long and complicated question. Thanks for your help.

1 REPLY 1
ballardw
Super User

I'm really not sure what your question is. You state: "The log (See image #3 below) says nothing of missing values or non-positive weights."

However the third Note line clearly states that. I am not going to retype the text.

 

Hint: Post LOGS as text, copy the text from the log, on the forum open a text box and paste the text. It should be easier than creating an image. Plus when we want to make a suggestion to code or point out a syntax error it is much easier.

 

If you want a result for domain when it is missing then add the option to the domain statement.

From the documentation:

When determining levels of a DOMAIN variable, an observation with missing values for this DOMAIN variable is excluded, unless you specify the MISSING option.

 

 


@_maldini_ wrote:

I am building sequential logistic regression models using PROC SURVEYLOGISTIC (weighted data). I have an exposure variable, an outcome variable and several covariates. As you would expect, the more covariates I add to the model, the lower the "Number of Observations Used" (See image #1).

 

Screen Shot 2022-08-30 at 4.59.57 PM.png

 

I'm trying to create a table (above), that displays the number of events and the weighted percentage for each level of the exposure variable, for each model (each model has a different number of observations). 

 

 

My understanding is that the difference between the number of observations in the domain and the number of observations used is a function of missing values, non-positive weights and possibly other factors. 

image.png

 

The log (See image #3 below) says nothing of missing values or non-positive weights.

 

Screen Shot 2022-08-30 at 5.11.42 PM.png

 

I can get the number of events and weighted percentage by using PROC SURVEGYFREQ (not shown) with a DOMAIN statement, but that uses the Number of Observations in Domain, not the Number of Observations Used.

 

My syntax for the DOMAIN variable is intended to eliminate any missing values: 

 

flag_1 = 
	(
	cann_use_status ne . and 
	ever_told_mi in(1,2) and
	age_yrs ne . and
 	gender ne . and
	race_all ne . and
 	educ_gtet_20 ne . and
 	hh_income ne . 
	); 

How can I get the number of events (i.e., number of "Yes" in ever_told_mi) and the weighted percentage using the Number of Observations Used, not the Number of Observations in Domain? 

 

 

Sorry for the long and complicated question. Thanks for your help.


 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 600 views
  • 1 like
  • 2 in conversation