BookmarkSubscribeRSS Feed
whitecj
Calcite | Level 5

Hi,

I'm trying to write a code for my professor that will limit the input data on a proc freq command based on the data that our proc genmod command uses.  The data set is too large to manipulate manually so I'm just wondering if there is any way to do this. 

For example, our entire data set is roughly 10,000, our regression model uses about 7,000 of those.  I'm trying to write a proc freq command that will only use those 7,000 data points instead of the 10,000 from the whole data set.

10 REPLIES 10
Hima
Obsidian | Level 7

It will be helpful for us to answer if you can post some kind of a scenario what you have and what you want....

whitecj
Calcite | Level 5

The scenario is that based on our model several values from the data set get left out so that our N varies based on the model.  We want to be able to run proc freq commands and see the frequencies of only those data points used in a given model.  As of now we can only run frequencies on the entire data set.

art297
Opal | Level 21

How do you identify them when you run proc genmod?

whitecj
Calcite | Level 5

data= (data set reference)

I'm not sure if that's what you're asking for or not.

Astounding
PROC Star

So you can run something like:

proc genmod data=(data set reference);

Why can't you then add to the same program:

proc freq data=(data set reference);

Isn't it that simple?

whitecj
Calcite | Level 5

Here's an example of our code:

proc genmod;

title '92CATHOLIC/PROTESTANT COMPARISON MODEL WITH CATHOLIC INTERACTION VARS';

model dv92n =dv1975 divsep92 widow92 nevermarr92 child92le18 income92 vcath92divsep92vcath92widow92 vcath92nevermarr92 vcath92child92le18 vcathprot92divsep92vcathprot92widow92 vcathprot92nevermarr92 vcathprot92child92le18/D=MULTINOMIAL;

run;

Based on the varibles we used for the model, it misses data points from the data set and produces this output:

                                              Number of Observations Read       10317

                                              Number of Observations Used        9089

                                              Missing Values                             1228

What we want is for a proc freq command to use the exact same 9089 values that are used in the model.

Tom
Super User Tom
Super User

Did you try:

where 0 < nmiss(

dv92n dv1975 divsep92 widow92 nevermarr92 child92le18 income92 vcath92divsep92vcath92widow92 vcath92nevermarr92 vcath92child92le18 vcathprot92divsep92vcathprot92widow92 vcathprot92nevermarr92 vcathprot92child92le18)

;


whitecj
Calcite | Level 5

Nope, I'll give it a try! Thanks!

art297
Opal | Level 21

I like Tom's suggestion, but think that the variables have to be separated by commas and that the condition isn't quite what you want.  I think the following comes closer:

data test;

  set sashelp.class;

  if _n_ eq 3 then call missing(age);

  if _n_ eq 5 then do;

    call missing(age);

    call missing(height);

  end;

run;

data want;

  set test;

  where nmiss(age,height,weight) < 1;

run;

ballardw
Super User

How do you restrict the input to GENMOD to those 7000?

I would also investigate the OUTPUT option in GENMOD. You can get all of the input variables and additional information such as residuals and predicted values. Proc freq on that data might do what you're looking for. Or possibly allow a where statement to filter based on a diagnostic or other condition.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 2431 views
  • 3 likes
  • 6 in conversation