BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sdaniels429
Fluorite | Level 6

Hi, SAS community,

 

I'm wondering if there is a quick way to generate an indicator for observations included in regression models? In stata, after running the regression model, we can use a post-estimation function (e(sample)) to identify the actual analytic sample. I haven't been able to find an equivalent function in SAS. Is there a quick way to do it ?

 

Thank you in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Typically, SAS regression procedures exclude observations that contain missing values. Procedures usually give you a count of complete observations that were included in the fit. To get a dataset showing which observations were included, you could run:

 

data included;
set myData;
where ... ; /* Same statement as the where statement in the regression procedure */
excluded = cmiss(myYvar, myX1Var, ... ) > 0; /* All the vars from the model statement */
run;

 

Alternatively, if the regression procedure supports an output statement, you can ask for predicted values and look for missing predictions.

PG

View solution in original post

5 REPLIES 5
Reeza
Super User

The output data set generated should be pretty clear as to which observations were used and which were not. Which PROC are you using?

 


@sdaniels429 wrote:

Hi, SAS community,

 

I'm wondering if there is a quick way to generate an indicator for observations included in regression models? In stata, after running the regression model, we can use a post-estimation function (e(sample)) to identify the actual analytic sample. I haven't been able to find an equivalent function in SAS. Is there a quick way to do it ?

 

Thank you in advance!


 

 

PGStats
Opal | Level 21

Typically, SAS regression procedures exclude observations that contain missing values. Procedures usually give you a count of complete observations that were included in the fit. To get a dataset showing which observations were included, you could run:

 

data included;
set myData;
where ... ; /* Same statement as the where statement in the regression procedure */
excluded = cmiss(myYvar, myX1Var, ... ) > 0; /* All the vars from the model statement */
run;

 

Alternatively, if the regression procedure supports an output statement, you can ask for predicted values and look for missing predictions.

PG
sdaniels429
Fluorite | Level 6

Thank you @Reeza  and @PGStats for your response!

I'm not referring to a specific type of proc. I have multiple regressions to run with different kind of missingness in the independent variables and dependent variables. Then I want to see if there are systematic differences among different analytic samples I have when examining different outcomes/key independent variables. 

I was hoping there will be a quicker way to generate indicators for different analytic samples in one data set.

However, based on your response, looks like I do have to take a few more steps by first output the data set and then generate indicators for included/excluded observations. And because I have multiple output datasets to generate, I then need to merge all the data sets. Am I understanding correctly?

PGStats
Opal | Level 21

I feel we would need more concrete example code to help you further with programming issues.

 

Note however that comparing models fitted on different sets of observations can be very tricky. It might be easier to build a set of complete observations for all your models by

1)

deleting any observation showing a missing value for any of the variables from any of your models

or 2)

imputing values to replace missing values for any of the variables from any of your models.

 

PG
Reeza
Super User
All SAS procs use row wise elimination. So unless you're running models with different variables you're going to have the same base sample every time.

If any variable indicated in the PROC (not necessarily used, eg listed in CLASS but not included in MODEL) is missing a value that whole row is eliminated.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 807 views
  • 0 likes
  • 3 in conversation