BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
salehrahman
Calcite | Level 5

I have run a logistic model. Total observations were 162K, and the final model used 62K due to missing values. How can I  save the number of observations used (62K) for the logistic model in SAS that includes all the variables used in the model?

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

You have not shown the code you ran so let's take an example from the documentation of PROC LOGISTIC.

proc logistic data=Neuralgia;
   class Treatment Sex;
   model Pain= Treatment Sex Treatment*Sex Age Duration / expb;
run;

So the dependent variable is PAIN and the independent variables are TREATMENT SEX AGE and DURATION.

 

To make a subset of NEURALGIA that only includes the cases that have no missing values for any of those variables use:

data want;
  set Neuralgia;
  if 0 < cmiss(of Pain Treatment Sex Age Duration);
run;

To see if you get the same answer run the same regression against the subset.

proc logistic data=want;
   class Treatment Sex;
   model Pain= Treatment Sex Treatment*Sex Age Duration / expb;
run;

View solution in original post

5 REPLIES 5
PaigeMiller
Diamond | Level 26

@salehrahman wrote:

I have run a logistic model. Total observations were 162K, and the final model used 62K due to missing values. How can I  save the number of observations used (62K) for the logistic model in SAS that includes all the variables used in the model?


Am I understanding you properly? You want to save the number 62000 on every record in a SAS data set that contains all variables used? Doesn't really make sense to me. Could you explain further?

--
Paige Miller
ballardw
Super User

If I thought I wanted only the data actually used by a model and knew that the reason observations were not used was due to missing I might be tempted to filter the data before running the model:

 

data formodel;
   set yourdatset;
   array _c (*) <list names of character variables in model goes here>;
   array _n (*) <list of names of numeric variables in model goes here>;
   if cmiss( of _c(*), of _n(*)) > 0 then delete;
run;

Arrays are a way to reference similar variables for some purpose. Arrays can only hold character or numeric variables so if your model has both you would need to arrays. The Array is one way to write short code for functions that accept multiple variables you can use "of arrayname(*)" to use all the variables in the array.

Or you could skip the array and list all of the variables. CMISS requires a comma separated list of names.

CMISS will return how many of the variables are missing or 0 for none. So if the count is 0 the above code keeps the records that should have been used.

I would rerun the model with the above set to see if you get the same results.

 

Caveat: if you have records with missing dependent variable and are using the model output to get a predicted value make sure that you do not place the dependent variable in the above.

Tom
Super User Tom
Super User

No need to define arrays.  Just list the variables in CMISS() call.

Tom
Super User Tom
Super User

You have not shown the code you ran so let's take an example from the documentation of PROC LOGISTIC.

proc logistic data=Neuralgia;
   class Treatment Sex;
   model Pain= Treatment Sex Treatment*Sex Age Duration / expb;
run;

So the dependent variable is PAIN and the independent variables are TREATMENT SEX AGE and DURATION.

 

To make a subset of NEURALGIA that only includes the cases that have no missing values for any of those variables use:

data want;
  set Neuralgia;
  if 0 < cmiss(of Pain Treatment Sex Age Duration);
run;

To see if you get the same answer run the same regression against the subset.

proc logistic data=want;
   class Treatment Sex;
   model Pain= Treatment Sex Treatment*Sex Age Duration / expb;
run;
salehrahman
Calcite | Level 5
Wonderful. It works. I have 100K data now. I have to make sure all the variables are included. I did not run the logistics yet.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 483 views
  • 0 likes
  • 4 in conversation