BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sdaniels429
Fluorite | Level 6

Hi, SAS community,

 

I'm wondering if there is a quick way to generate an indicator for observations included in regression models? In stata, after running the regression model, we can use a post-estimation function (e(sample)) to identify the actual analytic sample. I haven't been able to find an equivalent function in SAS. Is there a quick way to do it ?

 

Thank you in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Typically, SAS regression procedures exclude observations that contain missing values. Procedures usually give you a count of complete observations that were included in the fit. To get a dataset showing which observations were included, you could run:

 

data included;
set myData;
where ... ; /* Same statement as the where statement in the regression procedure */
excluded = cmiss(myYvar, myX1Var, ... ) > 0; /* All the vars from the model statement */
run;

 

Alternatively, if the regression procedure supports an output statement, you can ask for predicted values and look for missing predictions.

PG

View solution in original post

5 REPLIES 5
Reeza
Super User

The output data set generated should be pretty clear as to which observations were used and which were not. Which PROC are you using?

 


@sdaniels429 wrote:

Hi, SAS community,

 

I'm wondering if there is a quick way to generate an indicator for observations included in regression models? In stata, after running the regression model, we can use a post-estimation function (e(sample)) to identify the actual analytic sample. I haven't been able to find an equivalent function in SAS. Is there a quick way to do it ?

 

Thank you in advance!


 

 

PGStats
Opal | Level 21

Typically, SAS regression procedures exclude observations that contain missing values. Procedures usually give you a count of complete observations that were included in the fit. To get a dataset showing which observations were included, you could run:

 

data included;
set myData;
where ... ; /* Same statement as the where statement in the regression procedure */
excluded = cmiss(myYvar, myX1Var, ... ) > 0; /* All the vars from the model statement */
run;

 

Alternatively, if the regression procedure supports an output statement, you can ask for predicted values and look for missing predictions.

PG
sdaniels429
Fluorite | Level 6

Thank you @Reeza  and @PGStats for your response!

I'm not referring to a specific type of proc. I have multiple regressions to run with different kind of missingness in the independent variables and dependent variables. Then I want to see if there are systematic differences among different analytic samples I have when examining different outcomes/key independent variables. 

I was hoping there will be a quicker way to generate indicators for different analytic samples in one data set.

However, based on your response, looks like I do have to take a few more steps by first output the data set and then generate indicators for included/excluded observations. And because I have multiple output datasets to generate, I then need to merge all the data sets. Am I understanding correctly?

PGStats
Opal | Level 21

I feel we would need more concrete example code to help you further with programming issues.

 

Note however that comparing models fitted on different sets of observations can be very tricky. It might be easier to build a set of complete observations for all your models by

1)

deleting any observation showing a missing value for any of the variables from any of your models

or 2)

imputing values to replace missing values for any of the variables from any of your models.

 

PG
Reeza
Super User
All SAS procs use row wise elimination. So unless you're running models with different variables you're going to have the same base sample every time.

If any variable indicated in the PROC (not necessarily used, eg listed in CLASS but not included in MODEL) is missing a value that whole row is eliminated.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 748 views
  • 0 likes
  • 3 in conversation