Number of Observations in Domain v. Number of Observations Used

_maldini_ · Posted 08-30-2022 08:25 PM

I am building sequential logistic regression models using PROC SURVEYLOGISTIC (weighted data). I have an exposure variable, an outcome variable and several covariates. As you would expect, the more covariates I add to the model, the lower the "Number of Observations Used" (See image #1).

I'm trying to create a table (above), that displays the number of events and the weighted percentage for each level of the exposure variable, for each model (each model has a different number of observations).

My understanding is that the difference between the number of observations in the domain and the number of observations used is a function of missing values, non-positive weights and possibly other factors.

The log (See image #3 below) says nothing of missing values or non-positive weights.

I can get the number of events and weighted percentage by using PROC SURVEGYFREQ (not shown) with a DOMAIN statement, but that uses the Number of Observations in Domain, not the Number of Observations Used.

My syntax for the DOMAIN variable is intended to eliminate any missing values:

flag_1 = 
	(
	cann_use_status ne . and 
	ever_told_mi in(1,2) and
	age_yrs ne . and
 	gender ne . and
	race_all ne . and
 	educ_gtet_20 ne . and
 	hh_income ne . 
	);

How can I get the number of events (i.e., number of "Yes" in ever_told_mi) and the weighted percentage using the Number of Observations Used, not the Number of Observations in Domain?

Sorry for the long and complicated question. Thanks for your help.

ballardw · Posted 08-31-2022 11:06 AM

I'm really not sure what your question is. You state: "The log (See image #3 below) says nothing of missing values or non-positive weights."

However the third Note line clearly states that. I am not going to retype the text.

Hint: Post LOGS as text, copy the text from the log, on the forum open a text box and paste the text. It should be easier than creating an image. Plus when we want to make a suggestion to code or point out a syntax error it is much easier.

If you want a result for domain when it is missing then add the option to the domain statement.

From the documentation:

When determining levels of a DOMAIN variable, an observation with missing values for this DOMAIN variable is excluded, unless you specify the MISSING option.

@_maldini_ wrote:

I am building sequential logistic regression models using PROC SURVEYLOGISTIC (weighted data). I have an exposure variable, an outcome variable and several covariates. As you would expect, the more covariates I add to the model, the lower the "Number of Observations Used" (See image #1).

I'm trying to create a table (above), that displays the number of events and the weighted percentage for each level of the exposure variable, for each model (each model has a different number of observations).

My understanding is that the difference between the number of observations in the domain and the number of observations used is a function of missing values, non-positive weights and possibly other factors.

The log (See image #3 below) says nothing of missing values or non-positive weights.

I can get the number of events and weighted percentage by using PROC SURVEGYFREQ (not shown) with a DOMAIN statement, but that uses the Number of Observations in Domain, not the Number of Observations Used.

My syntax for the DOMAIN variable is intended to eliminate any missing values:
flag_1 = 
	(
	cann_use_status ne . and 
	ever_told_mi in(1,2) and
	age_yrs ne . and
 	gender ne . and
	race_all ne . and
 	educ_gtet_20 ne . and
 	hh_income ne . 
	); 
How can I get the number of events (i.e., number of "Yes" in ever_told_mi) and the weighted percentage using the Number of Observations Used, not the Number of Observations in Domain?

Sorry for the long and complicated question. Thanks for your help.

Number of Observations in Domain v. Number of Observations Used

Re: Number of Observations in Domain v. Number of Observations Used

SAS Innovate 2025: Call for Content