I have run a logistic model. Total observations were 162K, and the final model used 62K due to missing values. How can I save the number of observations used (62K) for the logistic model in SAS that includes all the variables used in the model?
You have not shown the code you ran so let's take an example from the documentation of PROC LOGISTIC.
proc logistic data=Neuralgia;
class Treatment Sex;
model Pain= Treatment Sex Treatment*Sex Age Duration / expb;
run;
So the dependent variable is PAIN and the independent variables are TREATMENT SEX AGE and DURATION.
To make a subset of NEURALGIA that only includes the cases that have no missing values for any of those variables use:
data want;
set Neuralgia;
if 0 < cmiss(of Pain Treatment Sex Age Duration);
run;
To see if you get the same answer run the same regression against the subset.
proc logistic data=want;
class Treatment Sex;
model Pain= Treatment Sex Treatment*Sex Age Duration / expb;
run;
@salehrahman wrote:
I have run a logistic model. Total observations were 162K, and the final model used 62K due to missing values. How can I save the number of observations used (62K) for the logistic model in SAS that includes all the variables used in the model?
Am I understanding you properly? You want to save the number 62000 on every record in a SAS data set that contains all variables used? Doesn't really make sense to me. Could you explain further?
If I thought I wanted only the data actually used by a model and knew that the reason observations were not used was due to missing I might be tempted to filter the data before running the model:
data formodel; set yourdatset; array _c (*) <list names of character variables in model goes here>; array _n (*) <list of names of numeric variables in model goes here>; if cmiss( of _c(*), of _n(*)) > 0 then delete; run;
Arrays are a way to reference similar variables for some purpose. Arrays can only hold character or numeric variables so if your model has both you would need to arrays. The Array is one way to write short code for functions that accept multiple variables you can use "of arrayname(*)" to use all the variables in the array.
Or you could skip the array and list all of the variables. CMISS requires a comma separated list of names.
CMISS will return how many of the variables are missing or 0 for none. So if the count is 0 the above code keeps the records that should have been used.
I would rerun the model with the above set to see if you get the same results.
Caveat: if you have records with missing dependent variable and are using the model output to get a predicted value make sure that you do not place the dependent variable in the above.
No need to define arrays. Just list the variables in CMISS() call.
You have not shown the code you ran so let's take an example from the documentation of PROC LOGISTIC.
proc logistic data=Neuralgia;
class Treatment Sex;
model Pain= Treatment Sex Treatment*Sex Age Duration / expb;
run;
So the dependent variable is PAIN and the independent variables are TREATMENT SEX AGE and DURATION.
To make a subset of NEURALGIA that only includes the cases that have no missing values for any of those variables use:
data want;
set Neuralgia;
if 0 < cmiss(of Pain Treatment Sex Age Duration);
run;
To see if you get the same answer run the same regression against the subset.
proc logistic data=want;
class Treatment Sex;
model Pain= Treatment Sex Treatment*Sex Age Duration / expb;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.