BookmarkSubscribeRSS Feed
SonyTrinesh
Calcite | Level 5

Hi Everyone,

I have a Data set ,it contains character variables .some of the records are missing my requirement is how many records are missing in each variable.i tried with Freq and means.Means and summary giving stats only for numeric variables but i my Data set is contains character variables. is their any alternate method to get the requirement

Example:

DATA _null_;

Input Customer_Id $ Gender $ location $;

Cards;

1001  M   XXXX

1002       YYYY

         M   ZZZZ

          F  

;

RUN;

My Requirement is :

column_name     Miss  Nmiss

custom_id          2          2

Gender               1         3

location              1          3

9 REPLIES 9
Peter_C
Rhodochrosite | Level 12

If you don't mind multiple passes through your data the following mIght build the table you seek with PROC SQL

proc contents data = your.data out= conts noprint ;

Run;

proc sql noprint ;

Select ' insert into cStats select quote(!! Name !! '), sum( case when ' !! name !! ' is null then 1 else 0 end ) as miss, sum( case when , !! Name !! ' is null then 0 else 1 end ) as nMiss'

Into :collector separated by ' ; '

From conts

where type = 2

order by name ;

Create table cstats ( column_name char(32), miss integer, nmiss integer) ;

&collector ;

quit ;

Jagadishkatam
Amethyst | Level 16

Alternately,

PROC FORMAT;

    VALUE $miss

      " " = "Missing"

      OTHER = "Non_Missing";

RUN;

%macro freq(in,out);

ods output OneWayFreqs=OneWayFreqs;

proc freq data=have;

  table &in / missing ;

  format &in $miss.;

run;

ods output close ;

proc transpose data=OneWayFreqs out=&out(drop=_name_);

by table;

id f_&in ;

var Frequency;

run;

%mend;

%freq(Customer_Id,want1)

%freq(Gender,want2)

%freq(location,want3)

data all;

  set want1 want2 want3;

run;

Thanks,

Jag

Thanks,
Jag
Peter_C
Rhodochrosite | Level 12

Should be possible to acquire these stats on a single pass

data stats ;

do rows=0 by 1 while( not end_data ) ;

set your.data end= end_data ;

Array chrs _character_ ;

Array counts(999) _temporary_ ;

do over chrs ;

Counts(_i_) + missing( chr) ;

End ;

End ;

length colunn_name $32 ;

Do over chrs ;

Column_name = vname( chrs) ;

Miss = counts(_i_) ;

Nmiss = rows- miss ;

Output ;

End ;

stop;

KEEP COLUMN_NAME MISS NMISS ;

run;

AllanBowe
Barite | Level 11

works a treat!  Just a couple of small typos:

 

 - Counts(_i_) + missing( chrS) ;

 - length coluMn_name $32 ;

/Allan
SAS Challenges - SASensei
MacroCore library for app developers
SAS networking events (BeLux, Germany, UK&I)

Data Workflows, Data Contracts, Data Lineage, Drag & drop excel EUCs to SAS 9 & Viya - Data Controller
DevOps and AppDev on SAS 9 / Viya / Base SAS - SASjs
art297
Opal | Level 21

If I correctly understand what you are trying to do, wouldn't something like the following accomplish the task?

data have;

  input a b $ c d $ e f $;

  cards;

1 . . 4 5 .

1 2 3 4 5 6

. . . . . .

1 2 . . 5 6

. . . . . .

. 2 . 4 5 .

;

data need;

  set have;

  num_missing=cmiss(of a--f);

run;

proc freq data=need;

  tables num_missing;

run;

or, to do it in a single pass:

data want (keep=num_missing);

  set have end=last;

  num_missing+cmiss(of a--f) eq 6;

  if last then output;

run;

proc print;

run;

kuridisanjeev
Quartz | Level 8

HI Trinesh..

I know my code not so effictive ,But if you have only 2 or 3 variables ,you may get the desired results with bellow simple datastep code..

Data Test;

input ID $ Name $;

ID_new="ID    ";

Name_New="Name";

cards;

100 San

. sant

101 .

102 .

103 ref

;

run;

Data New1;

Set New(Rename=(ID_new=var ID=name) Keep=ID ID_new  ) New(Rename=(name_new=var) Keep=name name_new ) ;

run;

Proc Sql;

create table final as

Select Var,Sum(missing(name)) as missing,count(Name) as nonmissing from new1

group by var;

quit;

Thanks,

Sanjeev.K

PaulLee
Fluorite | Level 6

Here is a cheat's way of doing it - assuming you don't need it to be dynamic in identifying column names. If you do then you need to be clever and add some macro code perhaps.

DATA test;
infile datalines dlm=',' dsd;
Input Customer_Id $ Gender $ location $;

Cards;
1001,M,XXXX
1002,,YYYY
,M,ZZZZ
,F,
;
RUN;

data test2(drop=count_customer count_gender count_location Customer_Id Gender location);
   retain count_customer count_gender count_location;
   set test end=end;

   if _n_ = 1 then
      do;
          count_customer =0;
          count_gender=0;
          count_location=0;
       end;

   if Customer_Id = "" then
          count_customer = count_customer+1;

   if Gender = "" then
          count_gender = count_gender+1;

   if location = "" then
          count_location = count_location+1;

   if end = 1 then
       do;
          column_name = "Customer_Id";
          miss = count_customer;
          not_missing = _n_ - count_customer;
          output;

         column_name = "Gender";
          miss = count_gender;
          not_missing = _n_ - count_gender;
          output;

         column_name = "Location";
          miss = count_location;
          not_missing = _n_ - count_location;
          output;
       end;
run;

Trey
Fluorite | Level 6

Try this.

%macro missing_rpt(data=);

proc contents data=&data noprint out=_contents;

run;

data _null_;

   set _contents end=last;

   call symput(cats('var',_n_),strip(name));

   call symput(cats('type',_n_),strip(type));

   if last then do;

      call symput('nvars',strip(_n_));

      end;

   run;

%let delim=;

proc sql;

create table _missing_counts as

select %do i=1 %to &nvars;

           %if &i>1 %then %let delim=%str(,);

           %if &&type&i=1 %then %do;

              &delim.sum(case when &&var&i=. then 1 else 0 end) as &&var&i

              %end;

           %else %do;

              &delim.sum(case when &&var&i='' then 1 else 0 end) as &&var&i

              %end;

           %end;

from &data

;quit;

proc transpose data=_missing_counts out=_missing_counts_t;

run;

proc sql;

select _name_ label='Variable',

       col1 label='Missing Value Count'

from _missing_counts_t

order by _name_

;quit;

%mend;

%missing_rpt(data=mylib.mydata);

data_null__
Jade | Level 19

Getting the counts is easy if you apply a format and PROC SUMMARY will do all the variables at once character and numeric.  Then it is "just" a matter of formatting the output for the report.

DATA cust;
   Input Customer_Id $ Gender $ location $ x y;
Cards;
1001  M   XXXX 1 2
1002  .   YYYY . 2
.     M   ZZZZ 1 2
.     F   .    3 2
;
  
RUN;
proc format;
  
value _miss ._-.Z = 'Missing' other='NotMissing';
  
value $_miss ' '  = 'Missing' other='NotMissing';
  
run;
  
proc summary chartype missing;
  
class _all_ / mlf; *MLF converts all class variables to character;
  
ways 1;
  
format _numeric_ _miss. _character_ $_miss.;
  
output out=count;
   run;

data report0(keep=_type_ variable value _freq_);
   set count;
   array v
  • _character_;
      
    length variable $32 value $16;
       variable = vname(v[indexc(_type_,
    '1')]);
       value = coalescec(of v
  • );
      
  • format _all_; *remove formats from _TYPE_ and _FREQ_;
      
    run;
    proc print;
      
    run;
    proc transpose out=report(drop=_name_);
       by _type_ variable;
       var _freq_;
       id value;
       run;
    proc print;
      
    run;

    SAS Innovate 2025: Register Now

    Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
    Sign up by Dec. 31 to get the 2024 rate of just $495.
    Register now!

    What is Bayesian Analysis?

    Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

    Find more tutorials on the SAS Users YouTube channel.

    SAS Training: Just a Click Away

     Ready to level-up your skills? Choose your own adventure.

    Browse our catalog!

    Discussion stats
    • 9 replies
    • 11300 views
    • 2 likes
    • 9 in conversation