- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi everyone! I'm running into some trouble with making a dataset from proc logistic. My number of observations used (2913) is smaller than the number of observations read (4457), and I want to create a dataset with only the number of observations used. Below is my code:
proc logistic data=mydata;
class education_cat (ref="0") income_cat (ref="0") work_cat (ref="0");
model PASB_Cat (ref="0") = education_cat income_cat
work_cat / link=glogit;
output out=used_obs (keep=PASB_Cat education_cat income_cat
work_cat);
run;
data used_obs_clean;
set used_obs;
if nmiss(of PASB_Cat education_cat income_cat
work_cat) = 0;
run;
Below is the log of the code, which basically says "Invalid Numeric Data"
NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column). 71:17 72:46 NOTE: Invalid numeric data, PASB_Cat='0' , at line 71 column 17.
And of course since I have error messages, my output dataset has no data! I appreciate any help!
- Tags:
- proc logistic
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
output out=myusedobs(where=(IP_1 ne . and Y ne .)) predprobs=i;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Use an ODS OUTPUT statement for this, not the OUTPUT statement. The ODS OUTPUT statement can save any displayed table in a data set. See this note on using it. The table you want is named NObs, so use this statement to keep just the number of observations used (dropping the number read):
ods output nobs=myobsused(where=(label ? "Used"));
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the reply! But I'm struggling with the variables not being in the smaller dataset. This is my new code!
proc logistic data=mydata;
class education_cat (ref="0") income_cat (ref="0") work_cat (ref="0");
model PASB_Cat (ref="0") = education_cat income_cat
work_cat / link=glogit;
ods output nobs=myobsused (keep=PASB_Cat education_cat income_cat
work_cat);
run;
Based on the Number of Observations Read (2913), I should have the variables that were read in the smaller dataset.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
output out=myusedobs(where=(IP_1 ne . and Y ne .)) predprobs=i;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This worked! Thank you so much for your help! I truly appreciate it! My smaller dataset has all the variables/observations I needed and there are no missing values!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @chester2018,
Just to explain the issue with your DATA step: The NMISS function requires numeric arguments, but some of your variables are character. Use the CMISS function instead. It works for both numeric and character arguments. Since you restricted dataset USED_OBS to the variables of interest, you can also simplify the IF condition:
if cmiss(of _all_)=0;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
CMISS() could take into account of both character and numeric variable, while NMISS() is only suited for numeric variable:
if nmiss(PASB_Cat , education_cat ,income_cat ,work_cat) = 0;
-->
if cmiss(PASB_Cat , education_cat ,income_cat ,work_cat) = 0;