Limiting Data for the Proc Freq command

whitecj · Posted 02-27-2012 01:51 PM

Hi,

I'm trying to write a code for my professor that will limit the input data on a proc freq command based on the data that our proc genmod command uses. The data set is too large to manipulate manually so I'm just wondering if there is any way to do this.

For example, our entire data set is roughly 10,000, our regression model uses about 7,000 of those. I'm trying to write a proc freq command that will only use those 7,000 data points instead of the 10,000 from the whole data set.

Hima · Posted 02-27-2012 01:53 PM

It will be helpful for us to answer if you can post some kind of a scenario what you have and what you want....

whitecj · Posted 02-27-2012 01:59 PM

The scenario is that based on our model several values from the data set get left out so that our N varies based on the model. We want to be able to run proc freq commands and see the frequencies of only those data points used in a given model. As of now we can only run frequencies on the entire data set.

art297 · Posted 02-27-2012 01:54 PM

How do you identify them when you run proc genmod?

whitecj · Posted 02-27-2012 02:01 PM

data= (data set reference)

I'm not sure if that's what you're asking for or not.

Astounding · Posted 02-27-2012 02:17 PM

So you can run something like:

proc genmod data=(data set reference);

Why can't you then add to the same program:

proc freq data=(data set reference);

Isn't it that simple?

whitecj · Posted 02-27-2012 02:53 PM

Here's an example of our code:

proc genmod;

title '92CATHOLIC/PROTESTANT COMPARISON MODEL WITH CATHOLIC INTERACTION VARS';

model dv92n =dv1975 divsep92 widow92 nevermarr92 child92le18 income92 vcath92divsep92vcath92widow92 vcath92nevermarr92 vcath92child92le18 vcathprot92divsep92vcathprot92widow92 vcathprot92nevermarr92 vcathprot92child92le18/D=MULTINOMIAL;

run;

Based on the varibles we used for the model, it misses data points from the data set and produces this output:

Number of Observations Read 10317

Number of Observations Used 9089

Missing Values 1228

What we want is for a proc freq command to use the exact same 9089 values that are used in the model.

Tom · Posted 02-27-2012 02:55 PM

Did you try:

where 0 < nmiss(

dv92n dv1975 divsep92 widow92 nevermarr92 child92le18 income92 vcath92divsep92vcath92widow92 vcath92nevermarr92 vcath92child92le18 vcathprot92divsep92vcathprot92widow92 vcathprot92nevermarr92 vcathprot92child92le18)

;

whitecj · Posted 02-27-2012 03:00 PM

Nope, I'll give it a try! Thanks!

art297 · Posted 02-27-2012 03:23 PM

I like Tom's suggestion, but think that the variables have to be separated by commas and that the condition isn't quite what you want. I think the following comes closer:

data test;

set sashelp.class;

if _n_ eq 3 then call missing(age);

if _n_ eq 5 then do;

call missing(age);

call missing(height);

end;

run;

data want;

set test;

where nmiss(age,height,weight) < 1;

run;

ballardw · Posted 02-27-2012 02:01 PM

How do you restrict the input to GENMOD to those 7000?

I would also investigate the OUTPUT option in GENMOD. You can get all of the input variables and additional information such as residuals and predicted values. Proc freq on that data might do what you're looking for. Or possibly allow a where statement to filter based on a diagnostic or other condition.

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away