## Limiting Data for the Proc Freq command

Occasional Contributor
Posts: 5

# Limiting Data for the Proc Freq command

Hi,

I'm trying to write a code for my professor that will limit the input data on a proc freq command based on the data that our proc genmod command uses.  The data set is too large to manipulate manually so I'm just wondering if there is any way to do this.

For example, our entire data set is roughly 10,000, our regression model uses about 7,000 of those.  I'm trying to write a proc freq command that will only use those 7,000 data points instead of the 10,000 from the whole data set.

Regular Contributor
Posts: 233

## Limiting Data for the Proc Freq command

It will be helpful for us to answer if you can post some kind of a scenario what you have and what you want....

Occasional Contributor
Posts: 5

## Limiting Data for the Proc Freq command

The scenario is that based on our model several values from the data set get left out so that our N varies based on the model.  We want to be able to run proc freq commands and see the frequencies of only those data points used in a given model.  As of now we can only run frequencies on the entire data set.

PROC Star
Posts: 8,169

## Limiting Data for the Proc Freq command

How do you identify them when you run proc genmod?

Occasional Contributor
Posts: 5

## Limiting Data for the Proc Freq command

data= (data set reference)

I'm not sure if that's what you're asking for or not.

Super User
Posts: 6,785

## Limiting Data for the Proc Freq command

So you can run something like:

proc genmod data=(data set reference);

Why can't you then add to the same program:

proc freq data=(data set reference);

Isn't it that simple?

Occasional Contributor
Posts: 5

## Limiting Data for the Proc Freq command

Here's an example of our code:

proc genmod;

title '92CATHOLIC/PROTESTANT COMPARISON MODEL WITH CATHOLIC INTERACTION VARS';

model dv92n =dv1975 divsep92 widow92 nevermarr92 child92le18 income92 vcath92divsep92vcath92widow92 vcath92nevermarr92 vcath92child92le18 vcathprot92divsep92vcathprot92widow92 vcathprot92nevermarr92 vcathprot92child92le18/D=MULTINOMIAL;

run;

Based on the varibles we used for the model, it misses data points from the data set and produces this output:

Number of Observations Used        9089

Missing Values                             1228

What we want is for a proc freq command to use the exact same 9089 values that are used in the model.

Super User
Posts: 8,127

## Limiting Data for the Proc Freq command

Did you try:

where 0 < nmiss(

dv92n dv1975 divsep92 widow92 nevermarr92 child92le18 income92 vcath92divsep92vcath92widow92 vcath92nevermarr92 vcath92child92le18 vcathprot92divsep92vcathprot92widow92 vcathprot92nevermarr92 vcathprot92child92le18)

;

Occasional Contributor
Posts: 5

## Limiting Data for the Proc Freq command

Nope, I'll give it a try! Thanks!

PROC Star
Posts: 8,169

## Limiting Data for the Proc Freq command

I like Tom's suggestion, but think that the variables have to be separated by commas and that the condition isn't quite what you want.  I think the following comes closer:

data test;

set sashelp.class;

if _n_ eq 3 then call missing(age);

if _n_ eq 5 then do;

call missing(age);

call missing(height);

end;

run;

data want;

set test;

where nmiss(age,height,weight) < 1;

run;

Super User
Posts: 13,583

## Limiting Data for the Proc Freq command

How do you restrict the input to GENMOD to those 7000?

I would also investigate the OUTPUT option in GENMOD. You can get all of the input variables and additional information such as residuals and predicted values. Proc freq on that data might do what you're looking for. Or possibly allow a where statement to filter based on a diagnostic or other condition.

Discussion stats
• 10 replies
• 204 views
• 3 likes
• 6 in conversation