BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
chester2018
Obsidian | Level 7

Hi everyone! I'm running into some trouble with making a dataset from proc logistic. My number of observations used (2913) is smaller than the number of observations read (4457), and I want to create a dataset with only the number of observations used. Below is my code:

proc logistic data=mydata;
    class education_cat (ref="0") income_cat (ref="0")  work_cat (ref="0");
    model PASB_Cat (ref="0") = education_cat income_cat
	 work_cat / link=glogit;
	output out=used_obs (keep=PASB_Cat education_cat income_cat 
	 work_cat);
run;

data used_obs_clean;
    set used_obs;
    if nmiss(of PASB_Cat education_cat income_cat
	 work_cat) = 0;
run;

Below is the log of the code, which basically says "Invalid Numeric Data"

NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
       71:17   72:46   
 NOTE: Invalid numeric data, PASB_Cat='0' , at line 71 column 17.

And of course since I have error messages, my output dataset has no data! I appreciate any help!

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ
Your wording is confusing, but I now have to guess that you mean you want a data set of all of the observations that were used in the analysis. What I showed with the ODS OUTPUT statement just provides the actual count of those observations, not the observations themselves. If you want the used observations themselves, then use the OUTPUT statement with the PREDPROBS=I option to create a copy of your data set along with the predicted probabilities of your response levels. In the OUTPUT statement, you can use a WHERE clause to exclude any observations which have a missing predicted probability variable value or a missing response variable value. Such observations did not contribute to fitting the model. For example, if the numeric response is Y with levels 1,2, or 3:

output out=myusedobs(where=(IP_1 ne . and Y ne .)) predprobs=i;

View solution in original post

7 REPLIES 7
StatDave
SAS Super FREQ

Use an ODS OUTPUT statement for this, not the OUTPUT statement. The ODS OUTPUT statement can save any displayed table in a data set. See this note on using it. The table you want is named NObs, so use this statement to keep just the number of observations used (dropping the number read):

ods output nobs=myobsused(where=(label ? "Used")); 

chester2018
Obsidian | Level 7

Thanks for the reply! But I'm struggling with the variables not being in the smaller dataset. This is my new code!

proc logistic data=mydata;
    class education_cat (ref="0") income_cat (ref="0")  work_cat (ref="0");
    model PASB_Cat (ref="0") = education_cat income_cat
	 work_cat / link=glogit;
	ods output nobs=myobsused (keep=PASB_Cat education_cat income_cat 
	 work_cat);
run;

Based on the Number of Observations Read (2913), I should have the variables that were read in the smaller dataset.

StatDave
SAS Super FREQ
Your wording is confusing, but I now have to guess that you mean you want a data set of all of the observations that were used in the analysis. What I showed with the ODS OUTPUT statement just provides the actual count of those observations, not the observations themselves. If you want the used observations themselves, then use the OUTPUT statement with the PREDPROBS=I option to create a copy of your data set along with the predicted probabilities of your response levels. In the OUTPUT statement, you can use a WHERE clause to exclude any observations which have a missing predicted probability variable value or a missing response variable value. Such observations did not contribute to fitting the model. For example, if the numeric response is Y with levels 1,2, or 3:

output out=myusedobs(where=(IP_1 ne . and Y ne .)) predprobs=i;
chester2018
Obsidian | Level 7

This worked! Thank you so much for your help! I truly appreciate it! My smaller dataset has all the variables/observations I needed and there are no missing values!

FreelanceReinh
Jade | Level 19

Hello @chester2018,

 

Just to explain the issue with your DATA step: The NMISS function requires numeric arguments, but some of your variables are character. Use the CMISS function instead. It works for both numeric and character arguments. Since you restricted dataset USED_OBS to the variables of interest, you can also simplify the IF condition:

if cmiss(of _all_)=0;

Ksharp
Super User
As FreelanceReinh said, using CMISS() instead of NMISS().
CMISS() could take into account of both character and numeric variable, while NMISS() is only suited for numeric variable:

if nmiss(PASB_Cat , education_cat ,income_cat ,work_cat) = 0;
-->
if cmiss(PASB_Cat , education_cat ,income_cat ,work_cat) = 0;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1565 views
  • 3 likes
  • 5 in conversation