Help using Base SAS procedures

How to count missing records in each variable

Reply
Occasional Contributor
Posts: 10

How to count missing records in each variable

Hi Everyone,

I have a Data set ,it contains character variables .some of the records are missing my requirement is how many records are missing in each variable.i tried with Freq and means.Means and summary giving stats only for numeric variables but i my Data set is contains character variables. is their any alternate method to get the requirement

Example:

DATA _null_;

Input Customer_Id $ Gender $ location $;

Cards;

1001  M   XXXX

1002       YYYY

         M   ZZZZ

          F  

;

RUN;

My Requirement is :

column_name     Miss  Nmiss

custom_id          2          2

Gender               1         3

location              1          3

Valued Guide
Posts: 2,168

Re: How to count missing records in each variable

If you don't mind multiple passes through your data the following mIght build the table you seek with PROC SQL

proc contents data = your.data out= conts noprint ;

Run;

proc sql noprint ;

Select ' insert into cStats select quote(!! Name !! '), sum( case when ' !! name !! ' is null then 1 else 0 end ) as miss, sum( case when , !! Name !! ' is null then 0 else 1 end ) as nMiss'

Into :collector separated by ' ; '

From conts

where type = 2

order by name ;

Create table cstats ( column_name char(32), miss integer, nmiss integer) ;

&collector ;

quit ;

Super User
Posts: 1,066

Re: How to count missing records in each variable

Alternately,

PROC FORMAT;

    VALUE $miss

      " " = "Missing"

      OTHER = "Non_Missing";

RUN;

%macro freq(in,out);

ods output OneWayFreqs=OneWayFreqs;

proc freq data=have;

  table &in / missing ;

  format &in $miss.;

run;

ods output close ;

proc transpose data=OneWayFreqs out=&out(drop=_name_);

by table;

id f_&in ;

var Frequency;

run;

%mend;

%freq(Customer_Id,want1)

%freq(Gender,want2)

%freq(location,want3)

data all;

  set want1 want2 want3;

run;

Thanks,

Jag

Thanks,
Jag
Valued Guide
Posts: 2,168

Re: How to count missing records in each variable

Should be possible to acquire these stats on a single pass

data stats ;

do rows=0 by 1 while( not end_data ) ;

set your.data end= end_data ;

Array chrs _character_ ;

Array counts(999) _temporary_ ;

do over chrs ;

Counts(_i_) + missing( chr) ;

End ;

End ;

length colunn_name $32 ;

Do over chrs ;

Column_name = vname( chrs) ;

Miss = counts(_i_) ;

Nmiss = rows- miss ;

Output ;

End ;

stop;

KEEP COLUMN_NAME MISS NMISS ;

run;

Occasional Contributor
Posts: 8

Re: How to count missing records in each variable

[ Edited ]

works a treat!  Just a couple of small typos:

 

 - Counts(_i_) + missing( chrS) ;

 - length coluMn_name $32 ;

Esteemed Advisor
Posts: 7,056

Re: How to count missing records in each variable

If I correctly understand what you are trying to do, wouldn't something like the following accomplish the task?

data have;

  input a b $ c d $ e f $;

  cards;

1 . . 4 5 .

1 2 3 4 5 6

. . . . . .

1 2 . . 5 6

. . . . . .

. 2 . 4 5 .

;

data need;

  set have;

  num_missing=cmiss(of a--f);

run;

proc freq data=need;

  tables num_missing;

run;

or, to do it in a single pass:

data want (keep=num_missing);

  set have end=last;

  num_missing+cmiss(of a--f) eq 6;

  if last then output;

run;

proc print;

run;

Super Contributor
Posts: 276

Re: How to count missing records in each variable

HI Trinesh..

I know my code not so effictive ,But if you have only 2 or 3 variables ,you may get the desired results with bellow simple datastep code..

Data Test;

input ID $ Name $;

ID_new="ID    ";

Name_New="Name";

cards;

100 San

. sant

101 .

102 .

103 ref

;

run;

Data New1;

Set New(Rename=(ID_new=var ID=name) Keep=ID ID_new  ) New(Rename=(name_new=var) Keep=name name_new ) ;

run;

Proc Sql;

create table final as

Select Var,Sum(missing(name)) as missing,count(Name) as nonmissing from new1

group by var;

quit;

Thanks,

Sanjeev.K

Occasional Contributor
Posts: 9

Re: How to count missing records in each variable

Here is a cheat's way of doing it - assuming you don't need it to be dynamic in identifying column names. If you do then you need to be clever and add some macro code perhaps.

DATA test;
infile datalines dlm=',' dsd;
Input Customer_Id $ Gender $ location $;

Cards;
1001,M,XXXX
1002,,YYYY
,M,ZZZZ
,F,
;
RUN;

data test2(drop=count_customer count_gender count_location Customer_Id Gender location);
   retain count_customer count_gender count_location;
   set test end=end;

   if _n_ = 1 then
      do;
          count_customer =0;
          count_gender=0;
          count_location=0;
       end;

   if Customer_Id = "" then
          count_customer = count_customer+1;

   if Gender = "" then
          count_gender = count_gender+1;

   if location = "" then
          count_location = count_location+1;

   if end = 1 then
       do;
          column_name = "Customer_Id";
          miss = count_customer;
          not_missing = _n_ - count_customer;
          output;

         column_name = "Gender";
          miss = count_gender;
          not_missing = _n_ - count_gender;
          output;

         column_name = "Location";
          miss = count_location;
          not_missing = _n_ - count_location;
          output;
       end;
run;

Occasional Contributor
Posts: 5

Re: How to count missing records in each variable

Try this.

%macro missing_rpt(data=);

proc contents data=&data noprint out=_contents;

run;

data _null_;

   set _contents end=last;

   call symput(cats('var',_n_),strip(name));

   call symput(cats('type',_n_),strip(type));

   if last then do;

      call symput('nvars',strip(_n_));

      end;

   run;

%let delim=;

proc sql;

create table _missing_counts as

select %do i=1 %to &nvars;

           %if &i>1 %then %let delim=%str(,);

           %if &&type&i=1 %then %do;

              &delim.sum(case when &&var&i=. then 1 else 0 end) as &&var&i

              %end;

           %else %do;

              &delim.sum(case when &&var&i='' then 1 else 0 end) as &&var&i

              %end;

           %end;

from &data

;quit;

proc transpose data=_missing_counts out=_missing_counts_t;

run;

proc sql;

select _name_ label='Variable',

       col1 label='Missing Value Count'

from _missing_counts_t

order by _name_

;quit;

%mend;

%missing_rpt(data=mylib.mydata);

Respected Advisor
Posts: 3,773

Re: How to count missing records in each variable

Getting the counts is easy if you apply a format and PROC SUMMARY will do all the variables at once character and numeric.  Then it is "just" a matter of formatting the output for the report.

DATA cust;
   Input Customer_Id $ Gender $ location $ x y;
Cards;
1001  M   XXXX 1 2
1002  .   YYYY . 2
.     M   ZZZZ 1 2
.     F   .    3 2
;
  
RUN;
proc format;
  
value _miss ._-.Z = 'Missing' other='NotMissing';
  
value $_miss ' '  = 'Missing' other='NotMissing';
  
run;
  
proc summary chartype missing;
  
class _all_ / mlf; *MLF converts all class variables to character;
  
ways 1;
  
format _numeric_ _miss. _character_ $_miss.;
  
output out=count;
   run;

data report0(keep=_type_ variable value _freq_);
   set count;
   array v
  • _character_;
      
    length variable $32 value $16;
       variable = vname(v[indexc(_type_,
    '1')]);
       value = coalescec(of v
  • );
      
  • format _all_; *remove formats from _TYPE_ and _FREQ_;
      
    run;
    proc print;
      
    run;
    proc transpose out=report(drop=_name_);
       by _type_ variable;
       var _freq_;
       id value;
       run;
    proc print;
      
    run;
    Ask a Question
    Discussion stats
    • 9 replies
    • 1653 views
    • 1 like
    • 9 in conversation