Hi,
I'm trying to write a code for my professor that will limit the input data on a proc freq command based on the data that our proc genmod command uses. The data set is too large to manipulate manually so I'm just wondering if there is any way to do this.
For example, our entire data set is roughly 10,000, our regression model uses about 7,000 of those. I'm trying to write a proc freq command that will only use those 7,000 data points instead of the 10,000 from the whole data set.
It will be helpful for us to answer if you can post some kind of a scenario what you have and what you want....
The scenario is that based on our model several values from the data set get left out so that our N varies based on the model. We want to be able to run proc freq commands and see the frequencies of only those data points used in a given model. As of now we can only run frequencies on the entire data set.
How do you identify them when you run proc genmod?
data= (data set reference)
I'm not sure if that's what you're asking for or not.
So you can run something like:
proc genmod data=(data set reference);
Why can't you then add to the same program:
proc freq data=(data set reference);
Isn't it that simple?
Here's an example of our code:
proc genmod;
title '92CATHOLIC/PROTESTANT COMPARISON MODEL WITH CATHOLIC INTERACTION VARS';
model dv92n =dv1975 divsep92 widow92 nevermarr92 child92le18 income92 vcath92divsep92vcath92widow92 vcath92nevermarr92 vcath92child92le18 vcathprot92divsep92vcathprot92widow92 vcathprot92nevermarr92 vcathprot92child92le18/D=MULTINOMIAL;
run;
Based on the varibles we used for the model, it misses data points from the data set and produces this output:
Number of Observations Read 10317
Number of Observations Used 9089
Missing Values 1228
What we want is for a proc freq command to use the exact same 9089 values that are used in the model.
Did you try:
where 0 < nmiss(
dv92n dv1975 divsep92 widow92 nevermarr92 child92le18 income92 vcath92divsep92vcath92widow92 vcath92nevermarr92 vcath92child92le18 vcathprot92divsep92vcathprot92widow92 vcathprot92nevermarr92 vcathprot92child92le18)
;
Nope, I'll give it a try! Thanks!
I like Tom's suggestion, but think that the variables have to be separated by commas and that the condition isn't quite what you want. I think the following comes closer:
data test;
set sashelp.class;
if _n_ eq 3 then call missing(age);
if _n_ eq 5 then do;
call missing(age);
call missing(height);
end;
run;
data want;
set test;
where nmiss(age,height,weight) < 1;
run;
How do you restrict the input to GENMOD to those 7000?
I would also investigate the OUTPUT option in GENMOD. You can get all of the input variables and additional information such as residuals and predicted values. Proc freq on that data might do what you're looking for. Or possibly allow a where statement to filter based on a diagnostic or other condition.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.