Help using Base SAS procedures

How to get number of non-missing without running the entire frequency

Accepted Solution Solved
Reply
Contributor
Posts: 62
Accepted Solution

How to get number of non-missing without running the entire frequency

Hi SAS friends,

is there a way to run a frequency on a variable without displaying every level for that variable? For example i have a variable called Phone for which i have 1M observations. I don't want to know the number of occurrences of phone i just want to know if its missing. If i run the normal proc freq with the missing option it takes 20+minutes to get results.

I'm running SAS 9.4 with STAT.

Thank you,

data BigDS;

do i = 1 to 1000000;

  PhoneNums=INT((RanUni(07022014)*10000000000));

  MissRate=RanUni(1234);

  if MissRate>=.8 then call missing(PhoneNums);

  keep PhoneNums;

  output;

end;

run;

proc freq data = BigDS;

  table PhoneNums/missing;

run;


Accepted Solutions
Solution
‎07-02-2014 11:16 AM
Respected Advisor
Posts: 3,810

Re: How to get number of non-missing without running the entire frequency

Outside of the SQL world, there are two standard ways to accomplish this.  First, you could switch to PROC MEANS:

proc means data=have n nmiss;

var phonenums;

run;

The advantage here is that you can easily add many variables to the VAR statement.

To use PROC FREQ, apply a format that groups all nonmissing values:

proc format;

   value present low-high='Present' other='Missing';

run;

Then use the format:

proc freq data=have;

   tables phonenum / missing;

   format phonenum present.;

run;

Only testing will tell you which is fastest.

View solution in original post


All Replies
Esteemed Advisor
Esteemed Advisor
Posts: 5,604

Re: How to get number of non-missing without running the entire frequency

You could do a proc sort with nodupkey.  Or alternatively use:
proc sql;

     create table WANT as

     select     count(1)

     from       HAVE

     where     PHONENUMS is null;

quit;

As for freq, I don't use it so don't know.

Super Contributor
Posts: 261

Re: How to get number of non-missing without running the entire frequency


proc sql;
  select nmiss(phonenums) from bigds;
  quit;

Solution
‎07-02-2014 11:16 AM
Respected Advisor
Posts: 3,810

Re: How to get number of non-missing without running the entire frequency

Outside of the SQL world, there are two standard ways to accomplish this.  First, you could switch to PROC MEANS:

proc means data=have n nmiss;

var phonenums;

run;

The advantage here is that you can easily add many variables to the VAR statement.

To use PROC FREQ, apply a format that groups all nonmissing values:

proc format;

   value present low-high='Present' other='Missing';

run;

Then use the format:

proc freq data=have;

   tables phonenum / missing;

   format phonenum present.;

run;

Only testing will tell you which is fastest.

Contributor
Posts: 62

Re: How to get number of non-missing without running the entire frequency

Astounding is there a low-high equivalent for character variables?

Respected Advisor
Posts: 3,810

Re: How to get number of non-missing without running the entire frequency

For character variables, the range low-high does not help because the hex code for a blank falls in the middle of the range of possible values.  Instead, this format would do the trick:

proc format;

value $present ' '='Missing' other='Present';

run;

Community Manager
Posts: 2,302

Re: How to get number of non-missing without running the entire frequency

Your original program might also pay a penalty for creating large ODS output, depending on your settings.  If running in SAS Enterprise Guide or in SAS Display Manager with default settings, you're creating a large HTML or SAS Report output.  The creation/transfer of that file would slow down the overall process.

has good suggestions for how to reduce the "counting" part of the process, making it much more efficient.  But your original approach would run pretty fast if you output just a data set, and not any report.

proc freq data = BigDS noprint;

  table PhoneNums/missing out=miss_count;

run;

Or optionally, to declutter the output:

proc freq data = BigDS noprint;

  table PhoneNums/missing out=miss_count(where=(PhoneNums=.));

run;

Chris

Occasional Contributor
Posts: 12

Re: How to get number of non-missing without running the entire frequency

I have been using the following in my autoexec file for the last 12 years or so:

 

PROC FORMAT /*LIB=BBBANAL.SEGFORMATS*/ ; 
  VALUE MISSING .="Missing" Other="Not Missing" ;
  VALUE $MISSING ' '="Missing" Other="Not Missing" ;

RUN ;

Contributor
Posts: 62

Re: How to get number of non-missing without running the entire frequency

Thank you everyone for your input. i like asking these questions because i always learn multiple ways to accomplish something.

Esteemed Advisor
Esteemed Advisor
Posts: 5,380

Re: How to get number of non-missing without running the entire frequency

If you just want to know where there are ANY missing values then use the NLEVELS output from proc FREQ.

proc freq data = BigDS nlevels ;

  table PhoneNums / noprint;

run;

Contributor
Posts: 62

Re: How to get number of non-missing without running the entire frequency

Tom that is awesome! can you route the results of a Nlevels table to a dataset?

Esteemed Advisor
Esteemed Advisor
Posts: 5,380

Re: How to get number of non-missing without running the entire frequency

ODS OUTPUT.

But watch because if there are no missing values then the variable NMissLevels will not even be on the output dataset.

ods output nlevels=nlevels;

ods exclude nlevels;

proc freq data = BigDS nlevels ;

  table PhoneNums / noprint;

run;


1722  data _null_; set nlevels; put (_all_) (=/); run;

TableVar=PhoneNums

NLevels=799696

NMissLevels=1

NNonMissLevels=799695

New Contributor
Posts: 2

Re: How to get number of non-missing without running the entire frequency

Tom, how does this produce a table? I'm also trying to save the value from running nlevels so I can merge it in with other data. I just need the one count variable produced with nlevels, but when I try to save the output it saves the underlying individual level data, not the nlevels value.

Esteemed Advisor
Esteemed Advisor
Posts: 5,380

Re: How to get number of non-missing without running the entire frequency

The ODS OUTPUT statement tells it to save the output that the NLEVELS option produces as a dataset.

The ODS EXCLUDE statement tells it to not produce any printed output for the NLEVELS option.

The NOPRINT option on the TABLES statement tells it not to produce any printed output for the variable frequencies.

So all that it produces is the dataset with the number of levels (and number of missing levels if any).

Valued Guide
Posts: 720

Re: How to get number of non-missing without running the entire frequency

Hi ... if you have a number of variables (NUM, CHAR, or both types), you can try the method in this paper (it uses NLEVELS)...

An Easy Route to a Missing Data Report with ODS+PROC FREQ+A Data Step

http://www.lexjansen.com/nesug/nesug11/ds/ds12.pdf

Using your data step to create 3 numeric variables ...

data x;

do i = 1 to 1000000;

  Phone1=INT((RanUni(07022014)*10000000000));

  phone2 = phone1;

  phone3 = phone2;

  if RanUni(1234)>=.8 then call missing(Phone1);

  if RanUni(1234)>=.9 then call missing(Phone2);

  if RanUni(1234)>=.4 then call missing(Phone3);

  output;

end;

keep Phone: ;

run;

CPU time was < 1 second to produce a report ...

Obs     var         miss    p_miss         ok     p_ok

1     Phone1    199,849     20.0     800,151     80.0

2     phone2     99,747     10.0     900,253     90.0

3     phone3    600,976     60.1     399,024     39.9

You can also use PROC MI to look for patterns in missing numeric data if you have two or more numeric variables (takes longer than the NLEVELS approach) ...

proc mi data=x;

ods select misspattern;

run;

Post a Question
Discussion Stats
  • 15 replies
  • 1368 views
  • 8 likes
  • 10 in conversation