Solved: How to quickly generate an indicator for observations included in regr...

sdaniels429 · Posted 08-25-2020 01:24 PM

Hi, SAS community,

I'm wondering if there is a quick way to generate an indicator for observations included in regression models? In stata, after running the regression model, we can use a post-estimation function (e(sample)) to identify the actual analytic sample. I haven't been able to find an equivalent function in SAS. Is there a quick way to do it ?

Thank you in advance!

PGStats · Posted 08-25-2020 02:23 PM

Typically, SAS regression procedures exclude observations that contain missing values. Procedures usually give you a count of complete observations that were included in the fit. To get a dataset showing which observations were included, you could run:

data included;
set myData;
where ... ; /* Same statement as the where statement in the regression procedure */
excluded = cmiss(myYvar, myX1Var, ... ) > 0; /* All the vars from the model statement */
run;

Alternatively, if the regression procedure supports an output statement, you can ask for predicted values and look for missing predictions.

PG

View solution in original post

Reeza · Posted 08-25-2020 02:20 PM

The output data set generated should be pretty clear as to which observations were used and which were not. Which PROC are you using?

@sdaniels429 wrote:

Hi, SAS community,

I'm wondering if there is a quick way to generate an indicator for observations included in regression models? In stata, after running the regression model, we can use a post-estimation function (e(sample)) to identify the actual analytic sample. I haven't been able to find an equivalent function in SAS. Is there a quick way to do it ?

Thank you in advance!

PGStats · Posted 08-25-2020 02:23 PM

Typically, SAS regression procedures exclude observations that contain missing values. Procedures usually give you a count of complete observations that were included in the fit. To get a dataset showing which observations were included, you could run:

data included;
set myData;
where ... ; /* Same statement as the where statement in the regression procedure */
excluded = cmiss(myYvar, myX1Var, ... ) > 0; /* All the vars from the model statement */
run;

Alternatively, if the regression procedure supports an output statement, you can ask for predicted values and look for missing predictions.

PG

sdaniels429 · Posted 08-25-2020 04:01 PM

Thank you @Reeza and @PGStats for your response!

I'm not referring to a specific type of proc. I have multiple regressions to run with different kind of missingness in the independent variables and dependent variables. Then I want to see if there are systematic differences among different analytic samples I have when examining different outcomes/key independent variables.

I was hoping there will be a quicker way to generate indicators for different analytic samples in one data set.

However, based on your response, looks like I do have to take a few more steps by first output the data set and then generate indicators for included/excluded observations. And because I have multiple output datasets to generate, I then need to merge all the data sets. Am I understanding correctly?

PGStats · Posted 08-25-2020 04:36 PM

I feel we would need more concrete example code to help you further with programming issues.

Note however that comparing models fitted on different sets of observations can be very tricky. It might be easier to build a set of complete observations for all your models by

1)

deleting any observation showing a missing value for any of the variables from any of your models

or 2)

imputing values to replace missing values for any of the variables from any of your models.

PG

Reeza · Posted 08-26-2020 03:23 PM

All SAS procs use row wise elimination. So unless you're running models with different variables you're going to have the same base sample every time.

If any variable indicated in the PROC (not necessarily used, eg listed in CLASS but not included in MODEL) is missing a value that whole row is eliminated.

How to quickly generate an indicator for observations included in regression models

Re: How to quickly generate an indicator for observations included in regression models

Re: How to quickly generate an indicator for observations included in regression models

Re: How to quickly generate an indicator for observations included in regression models

Re: How to quickly generate an indicator for observations included in regression models

Re: How to quickly generate an indicator for observations included in regression models

Re: How to quickly generate an indicator for observations included in regression models

How to quickly generate an indicator for observations included in regression models

Re: How to quickly generate an indicator for observations included in regression models

Re: How to quickly generate an indicator for observations included in regression models

Re: How to quickly generate an indicator for observations included in regression models

Re: How to quickly generate an indicator for observations included in regression models

Re: How to quickly generate an indicator for observations included in regression models

Re: How to quickly generate an indicator for observations included in regression models

Register Today!

SAS Training: Just a Click Away