Hi, SAS community,
I'm wondering if there is a quick way to generate an indicator for observations included in regression models? In stata, after running the regression model, we can use a post-estimation function (e(sample)) to identify the actual analytic sample. I haven't been able to find an equivalent function in SAS. Is there a quick way to do it ?
Thank you in advance!
Typically, SAS regression procedures exclude observations that contain missing values. Procedures usually give you a count of complete observations that were included in the fit. To get a dataset showing which observations were included, you could run:
data included;
set myData;
where ... ; /* Same statement as the where statement in the regression procedure */
excluded = cmiss(myYvar, myX1Var, ... ) > 0; /* All the vars from the model statement */
run;
Alternatively, if the regression procedure supports an output statement, you can ask for predicted values and look for missing predictions.
The output data set generated should be pretty clear as to which observations were used and which were not. Which PROC are you using?
@sdaniels429 wrote:
Hi, SAS community,
I'm wondering if there is a quick way to generate an indicator for observations included in regression models? In stata, after running the regression model, we can use a post-estimation function (e(sample)) to identify the actual analytic sample. I haven't been able to find an equivalent function in SAS. Is there a quick way to do it ?
Thank you in advance!
Typically, SAS regression procedures exclude observations that contain missing values. Procedures usually give you a count of complete observations that were included in the fit. To get a dataset showing which observations were included, you could run:
data included;
set myData;
where ... ; /* Same statement as the where statement in the regression procedure */
excluded = cmiss(myYvar, myX1Var, ... ) > 0; /* All the vars from the model statement */
run;
Alternatively, if the regression procedure supports an output statement, you can ask for predicted values and look for missing predictions.
Thank you @Reeza and @PGStats for your response!
I'm not referring to a specific type of proc. I have multiple regressions to run with different kind of missingness in the independent variables and dependent variables. Then I want to see if there are systematic differences among different analytic samples I have when examining different outcomes/key independent variables.
I was hoping there will be a quicker way to generate indicators for different analytic samples in one data set.
However, based on your response, looks like I do have to take a few more steps by first output the data set and then generate indicators for included/excluded observations. And because I have multiple output datasets to generate, I then need to merge all the data sets. Am I understanding correctly?
I feel we would need more concrete example code to help you further with programming issues.
Note however that comparing models fitted on different sets of observations can be very tricky. It might be easier to build a set of complete observations for all your models by
1)
deleting any observation showing a missing value for any of the variables from any of your models
or 2)
imputing values to replace missing values for any of the variables from any of your models.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.