BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SAShole
Pyrite | Level 9

Hi SAS friends,

is there a way to run a frequency on a variable without displaying every level for that variable? For example i have a variable called Phone for which i have 1M observations. I don't want to know the number of occurrences of phone i just want to know if its missing. If i run the normal proc freq with the missing option it takes 20+minutes to get results.

I'm running SAS 9.4 with STAT.

Thank you,

data BigDS;

do i = 1 to 1000000;

  PhoneNums=INT((RanUni(07022014)*10000000000));

  MissRate=RanUni(1234);

  if MissRate>=.8 then call missing(PhoneNums);

  keep PhoneNums;

  output;

end;

run;

proc freq data = BigDS;

  table PhoneNums/missing;

run;

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

Outside of the SQL world, there are two standard ways to accomplish this.  First, you could switch to PROC MEANS:

proc means data=have n nmiss;

var phonenums;

run;

The advantage here is that you can easily add many variables to the VAR statement.

To use PROC FREQ, apply a format that groups all nonmissing values:

proc format;

   value present low-high='Present' other='Missing';

run;

Then use the format:

proc freq data=have;

   tables phonenum / missing;

   format phonenum present.;

run;

Only testing will tell you which is fastest.

View solution in original post

15 REPLIES 15
RW9
Diamond | Level 26 RW9
Diamond | Level 26

You could do a proc sort with nodupkey.  Or alternatively use:
proc sql;

     create table WANT as

     select     count(1)

     from       HAVE

     where     PHONENUMS is null;

quit;

As for freq, I don't use it so don't know.

slchen
Lapis Lazuli | Level 10


proc sql;
  select nmiss(phonenums) from bigds;
  quit;

Astounding
PROC Star

Outside of the SQL world, there are two standard ways to accomplish this.  First, you could switch to PROC MEANS:

proc means data=have n nmiss;

var phonenums;

run;

The advantage here is that you can easily add many variables to the VAR statement.

To use PROC FREQ, apply a format that groups all nonmissing values:

proc format;

   value present low-high='Present' other='Missing';

run;

Then use the format:

proc freq data=have;

   tables phonenum / missing;

   format phonenum present.;

run;

Only testing will tell you which is fastest.

SAShole
Pyrite | Level 9

Astounding is there a low-high equivalent for character variables?

Astounding
PROC Star

For character variables, the range low-high does not help because the hex code for a blank falls in the middle of the range of possible values.  Instead, this format would do the trick:

proc format;

value $present ' '='Missing' other='Present';

run;

ChrisHemedinger
Community Manager

Your original program might also pay a penalty for creating large ODS output, depending on your settings.  If running in SAS Enterprise Guide or in SAS Display Manager with default settings, you're creating a large HTML or SAS Report output.  The creation/transfer of that file would slow down the overall process.

has good suggestions for how to reduce the "counting" part of the process, making it much more efficient.  But your original approach would run pretty fast if you output just a data set, and not any report.

proc freq data = BigDS noprint;

  table PhoneNums/missing out=miss_count;

run;

Or optionally, to declutter the output:

proc freq data = BigDS noprint;

  table PhoneNums/missing out=miss_count(where=(PhoneNums=.));

run;

Chris

It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.
bbenbaruch
Quartz | Level 8

I have been using the following in my autoexec file for the last 12 years or so:

 

PROC FORMAT /*LIB=BBBANAL.SEGFORMATS*/ ; 
  VALUE MISSING .="Missing" Other="Not Missing" ;
  VALUE $MISSING ' '="Missing" Other="Not Missing" ;

RUN ;

SAShole
Pyrite | Level 9

Thank you everyone for your input. i like asking these questions because i always learn multiple ways to accomplish something.

Tom
Super User Tom
Super User

If you just want to know where there are ANY missing values then use the NLEVELS output from proc FREQ.

proc freq data = BigDS nlevels ;

  table PhoneNums / noprint;

run;

SAShole
Pyrite | Level 9

Tom that is awesome! can you route the results of a Nlevels table to a dataset?

Tom
Super User Tom
Super User

ODS OUTPUT.

But watch because if there are no missing values then the variable NMissLevels will not even be on the output dataset.

ods output nlevels=nlevels;

ods exclude nlevels;

proc freq data = BigDS nlevels ;

  table PhoneNums / noprint;

run;


1722  data _null_; set nlevels; put (_all_) (=/); run;

TableVar=PhoneNums

NLevels=799696

NMissLevels=1

NNonMissLevels=799695

sarak
Calcite | Level 5

Tom, how does this produce a table? I'm also trying to save the value from running nlevels so I can merge it in with other data. I just need the one count variable produced with nlevels, but when I try to save the output it saves the underlying individual level data, not the nlevels value.

Tom
Super User Tom
Super User

The ODS OUTPUT statement tells it to save the output that the NLEVELS option produces as a dataset.

The ODS EXCLUDE statement tells it to not produce any printed output for the NLEVELS option.

The NOPRINT option on the TABLES statement tells it not to produce any printed output for the variable frequencies.

So all that it produces is the dataset with the number of levels (and number of missing levels if any).

MikeZdeb
Rhodochrosite | Level 12

Hi ... if you have a number of variables (NUM, CHAR, or both types), you can try the method in this paper (it uses NLEVELS)...

An Easy Route to a Missing Data Report with ODS+PROC FREQ+A Data Step

http://www.lexjansen.com/nesug/nesug11/ds/ds12.pdf

Using your data step to create 3 numeric variables ...

data x;

do i = 1 to 1000000;

  Phone1=INT((RanUni(07022014)*10000000000));

  phone2 = phone1;

  phone3 = phone2;

  if RanUni(1234)>=.8 then call missing(Phone1);

  if RanUni(1234)>=.9 then call missing(Phone2);

  if RanUni(1234)>=.4 then call missing(Phone3);

  output;

end;

keep Phone: ;

run;

CPU time was < 1 second to produce a report ...

Obs     var         miss    p_miss         ok     p_ok

1     Phone1    199,849     20.0     800,151     80.0

2     phone2     99,747     10.0     900,253     90.0

3     phone3    600,976     60.1     399,024     39.9

You can also use PROC MI to look for patterns in missing numeric data if you have two or more numeric variables (takes longer than the NLEVELS approach) ...

proc mi data=x;

ods select misspattern;

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 5334 views
  • 8 likes
  • 10 in conversation