02-27-2012 01:51 PM
I'm trying to write a code for my professor that will limit the input data on a proc freq command based on the data that our proc genmod command uses. The data set is too large to manipulate manually so I'm just wondering if there is any way to do this.
For example, our entire data set is roughly 10,000, our regression model uses about 7,000 of those. I'm trying to write a proc freq command that will only use those 7,000 data points instead of the 10,000 from the whole data set.
02-27-2012 01:59 PM
The scenario is that based on our model several values from the data set get left out so that our N varies based on the model. We want to be able to run proc freq commands and see the frequencies of only those data points used in a given model. As of now we can only run frequencies on the entire data set.
02-27-2012 02:17 PM
So you can run something like:
proc genmod data=(data set reference);
Why can't you then add to the same program:
proc freq data=(data set reference);
Isn't it that simple?
02-27-2012 02:53 PM
Here's an example of our code:
title '92CATHOLIC/PROTESTANT COMPARISON MODEL WITH CATHOLIC INTERACTION VARS';
model dv92n =dv1975 divsep92 widow92 nevermarr92 child92le18 income92 vcath92divsep92vcath92widow92 vcath92nevermarr92 vcath92child92le18 vcathprot92divsep92vcathprot92widow92 vcathprot92nevermarr92 vcathprot92child92le18/D=MULTINOMIAL;
Based on the varibles we used for the model, it misses data points from the data set and produces this output:
Number of Observations Read 10317
Number of Observations Used 9089
Missing Values 1228
What we want is for a proc freq command to use the exact same 9089 values that are used in the model.
02-27-2012 02:55 PM
Did you try:
where 0 < nmiss(
dv92n dv1975 divsep92 widow92 nevermarr92 child92le18 income92 vcath92divsep92vcath92widow92 vcath92nevermarr92 vcath92child92le18 vcathprot92divsep92vcathprot92widow92 vcathprot92nevermarr92 vcathprot92child92le18)
02-27-2012 03:23 PM
I like Tom's suggestion, but think that the variables have to be separated by commas and that the condition isn't quite what you want. I think the following comes closer:
if _n_ eq 3 then call missing(age);
if _n_ eq 5 then do;
where nmiss(age,height,weight) < 1;
02-27-2012 02:01 PM
How do you restrict the input to GENMOD to those 7000?
I would also investigate the OUTPUT option in GENMOD. You can get all of the input variables and additional information such as residuals and predicted values. Proc freq on that data might do what you're looking for. Or possibly allow a where statement to filter based on a diagnostic or other condition.