Help using Base SAS procedures

How to get number of non-missing without running the entire frequency

Accepted Solution Solved
Reply
Contributor
Posts: 57
Accepted Solution

How to get number of non-missing without running the entire frequency

Hi SAS friends,

is there a way to run a frequency on a variable without displaying every level for that variable? For example i have a variable called Phone for which i have 1M observations. I don't want to know the number of occurrences of phone i just want to know if its missing. If i run the normal proc freq with the missing option it takes 20+minutes to get results.

I'm running SAS 9.4 with STAT.

Thank you,

data BigDS;

do i = 1 to 1000000;

  PhoneNums=INT((RanUni(07022014)*10000000000));

  MissRate=RanUni(1234);

  if MissRate>=.8 then call missing(PhoneNums);

  keep PhoneNums;

  output;

end;

run;

proc freq data = BigDS;

  table PhoneNums/missing;

run;


Accepted Solutions
Solution
‎07-02-2014 11:16 AM
Respected Advisor
Posts: 3,328

Re: How to get number of non-missing without running the entire frequency

Outside of the SQL world, there are two standard ways to accomplish this.  First, you could switch to PROC MEANS:

proc means data=have n nmiss;

var phonenums;

run;

The advantage here is that you can easily add many variables to the VAR statement.

To use PROC FREQ, apply a format that groups all nonmissing values:

proc format;

   value present low-high='Present' other='Missing';

run;

Then use the format:

proc freq data=have;

   tables phonenum / missing;

   format phonenum present.;

run;

Only testing will tell you which is fastest.

View solution in original post


All Replies
Respected Advisor
Respected Advisor
Posts: 5,030

Re: How to get number of non-missing without running the entire frequency

You could do a proc sort with nodupkey.  Or alternatively use:
proc sql;

     create table WANT as

     select     count(1)

     from       HAVE

     where     PHONENUMS is null;

quit;

As for freq, I don't use it so don't know.

Super Contributor
Posts: 256

Re: How to get number of non-missing without running the entire frequency


proc sql;
  select nmiss(phonenums) from bigds;
  quit;

Solution
‎07-02-2014 11:16 AM
Respected Advisor
Posts: 3,328

Re: How to get number of non-missing without running the entire frequency

Outside of the SQL world, there are two standard ways to accomplish this.  First, you could switch to PROC MEANS:

proc means data=have n nmiss;

var phonenums;

run;

The advantage here is that you can easily add many variables to the VAR statement.

To use PROC FREQ, apply a format that groups all nonmissing values:

proc format;

   value present low-high='Present' other='Missing';

run;

Then use the format:

proc freq data=have;

   tables phonenum / missing;

   format phonenum present.;

run;

Only testing will tell you which is fastest.

Contributor
Posts: 57

Re: How to get number of non-missing without running the entire frequency

Astounding is there a low-high equivalent for character variables?

Respected Advisor
Posts: 3,328

Re: How to get number of non-missing without running the entire frequency

For character variables, the range low-high does not help because the hex code for a blank falls in the middle of the range of possible values.  Instead, this format would do the trick:

proc format;

value $present ' '='Missing' other='Present';

run;

Community Manager
Posts: 2,155

Re: How to get number of non-missing without running the entire frequency

Your original program might also pay a penalty for creating large ODS output, depending on your settings.  If running in SAS Enterprise Guide or in SAS Display Manager with default settings, you're creating a large HTML or SAS Report output.  The creation/transfer of that file would slow down the overall process.

has good suggestions for how to reduce the "counting" part of the process, making it much more efficient.  But your original approach would run pretty fast if you output just a data set, and not any report.

proc freq data = BigDS noprint;

  table PhoneNums/missing out=miss_count;

run;

Or optionally, to declutter the output:

proc freq data = BigDS noprint;

  table PhoneNums/missing out=miss_count(where=(PhoneNums=.));

run;

Chris

Occasional Contributor
Posts: 12

Re: How to get number of non-missing without running the entire frequency

I have been using the following in my autoexec file for the last 12 years or so:

 

PROC FORMAT /*LIB=BBBANAL.SEGFORMATS*/ ; 
  VALUE MISSING .="Missing" Other="Not Missing" ;
  VALUE $MISSING ' '="Missing" Other="Not Missing" ;

RUN ;

Contributor
Posts: 57

Re: How to get number of non-missing without running the entire frequency

Thank you everyone for your input. i like asking these questions because i always learn multiple ways to accomplish something.

Esteemed Advisor
Esteemed Advisor
Posts: 5,216

Re: How to get number of non-missing without running the entire frequency

If you just want to know where there are ANY missing values then use the NLEVELS output from proc FREQ.

proc freq data = BigDS nlevels ;

  table PhoneNums / noprint;

run;

Post a Question
Discussion Stats
  • 15 replies
  • 1275 views
  • 8 likes
  • 10 in conversation