Statistical Procedures

chester2018 · Posted 01-21-2025 04:07 PM

Hi everyone! I'm running into some trouble with making a dataset from proc logistic. My number of observations used (2913) is smaller than the number of observations read (4457), and I want to create a dataset with only the number of observations used. Below is my code:

proc logistic data=mydata;
    class education_cat (ref="0") income_cat (ref="0")  work_cat (ref="0");
    model PASB_Cat (ref="0") = education_cat income_cat
	 work_cat / link=glogit;
	output out=used_obs (keep=PASB_Cat education_cat income_cat 
	 work_cat);
run;

data used_obs_clean;
    set used_obs;
    if nmiss(of PASB_Cat education_cat income_cat
	 work_cat) = 0;
run;

Below is the log of the code, which basically says "Invalid Numeric Data"

NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
       71:17   72:46   
 NOTE: Invalid numeric data, PASB_Cat='0' , at line 71 column 17.

And of course since I have error messages, my output dataset has no data! I appreciate any help!

StatDave · Posted 01-21-2025 06:41 PM

Your wording is confusing, but I now have to guess that you mean you want a data set of all of the observations that were used in the analysis. What I showed with the ODS OUTPUT statement just provides the actual count of those observations, not the observations themselves. If you want the used observations themselves, then use the OUTPUT statement with the PREDPROBS=I option to create a copy of your data set along with the predicted probabilities of your response levels. In the OUTPUT statement, you can use a WHERE clause to exclude any observations which have a missing predicted probability variable value or a missing response variable value. Such observations did not contribute to fitting the model. For example, if the numeric response is Y with levels 1,2, or 3:

output out=myusedobs(where=(IP_1 ne . and Y ne .)) predprobs=i;

View solution in original post

StatDave · Posted 01-21-2025 04:24 PM

Use an ODS OUTPUT statement for this, not the OUTPUT statement. The ODS OUTPUT statement can save any displayed table in a data set. See this note on using it. The table you want is named NObs, so use this statement to keep just the number of observations used (dropping the number read):

ods output nobs=myobsused(where=(label ? "Used"));

chester2018 · Posted 01-21-2025 06:19 PM

Thanks for the reply! But I'm struggling with the variables not being in the smaller dataset. This is my new code!

proc logistic data=mydata;
    class education_cat (ref="0") income_cat (ref="0")  work_cat (ref="0");
    model PASB_Cat (ref="0") = education_cat income_cat
	 work_cat / link=glogit;
	ods output nobs=myobsused (keep=PASB_Cat education_cat income_cat 
	 work_cat);
run;

Based on the Number of Observations Read (2913), I should have the variables that were read in the smaller dataset.

StatDave · Posted 01-21-2025 06:41 PM

Your wording is confusing, but I now have to guess that you mean you want a data set of all of the observations that were used in the analysis. What I showed with the ODS OUTPUT statement just provides the actual count of those observations, not the observations themselves. If you want the used observations themselves, then use the OUTPUT statement with the PREDPROBS=I option to create a copy of your data set along with the predicted probabilities of your response levels. In the OUTPUT statement, you can use a WHERE clause to exclude any observations which have a missing predicted probability variable value or a missing response variable value. Such observations did not contribute to fitting the model. For example, if the numeric response is Y with levels 1,2, or 3:

output out=myusedobs(where=(IP_1 ne . and Y ne .)) predprobs=i;

chester2018 · Posted 01-21-2025 10:40 PM

This worked! Thank you so much for your help! I truly appreciate it! My smaller dataset has all the variables/observations I needed and there are no missing values!

FreelanceReinh · Posted 01-21-2025 05:36 PM

Hello @chester2018,

Just to explain the issue with your DATA step: The NMISS function requires numeric arguments, but some of your variables are character. Use the CMISS function instead. It works for both numeric and character arguments. Since you restricted dataset USED_OBS to the variables of interest, you can also simplify the IF condition:

if cmiss(of _all_)=0;

coder1234 · Posted 01-21-2025 05:45 PM

debugged?

Ksharp · Posted 01-21-2025 07:50 PM

As FreelanceReinh said, using CMISS() instead of NMISS().
CMISS() could take into account of both character and numeric variable, while NMISS() is only suited for numeric variable:

if nmiss(PASB_Cat , education_cat ,income_cat ,work_cat) = 0;
-->
if cmiss(PASB_Cat , education_cat ,income_cat ,work_cat) = 0;

Statistical Procedures

Output Dataset with Number of Observations Used from Proc Logistic

Re: Output Dataset with Number of Observations Used from Proc Logistic

Re: Output Dataset with Number of Observations Used from Proc Logistic

Re: Output Dataset with Number of Observations Used from Proc Logistic

Re: Output Dataset with Number of Observations Used from Proc Logistic

Re: Output Dataset with Number of Observations Used from Proc Logistic

Re: Output Dataset with Number of Observations Used from Proc Logistic

Re: Output Dataset with Number of Observations Used from Proc Logistic

Re: Output Dataset with Number of Observations Used from Proc Logistic

combining multiple datasets using proc sql;

proc freq output dataset

Number of observations in 'output data' differs from number of observa...

Invalid Reference Value for Proc Logistic

Can we use proc datasets change to rename multiple datasets

Follow Us

What is...

Statistical Procedures

Join us for our biggest event of the year!

Follow Us

What is...