BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

Hi all,

I have a data set more than 100 fields ,where I have to check the missing values of each field  an the percentage of missing .

So i have to write a code where multiple data sets can be executed and should give  the reports of each field of all data set in one final output .

your help is much appreciated

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
MikeZdeb
Rhodochrosite | Level 12

Hi.  Take a look at (fixed the indenting in the data so you can actually see what's going on) ...

 

An Easy Route to a Missing Data Report with ODS+PROC FREQ+A Data Step

http://www.lexjansen.com/nesug/nesug11/ds/ds12.pdf

 

It explains why this SAS code works.  All you have to do for any data set is change the name of the data set used in PROC FREQ.  No other changes are required.  The code produced the attached report when I used the SASHELP.HEART.  All you have to do is cut/paste/change data set name/run.

 

proc format;
value nm . = '0' other = '1';
value $ch ' ' = '0' other = '1';
run;

 

ods listing close;
ods output onewayfreqs=tables;

proc freq data=sashelp.heart;
   tables _all_ / missing;
   format _numeric_ nm.
   _character_ $ch.;
run;

ods output close;
ods listing;

 

data report;
length var $32;
do until (last.table);
  set tables;
  by table notsorted;
  array names(*) f_: ;
  select (names(_n_));
    when ('0') do;
    miss = frequency;
    p_miss = percent;
  end;
  when ('1') do;
    ok = frequency;
    p_ok = percent;
  end;
  end;
end;
miss = coalesce(miss,0);
ok = coalesce(ok,0);
p_miss = coalesce(p_miss,0);
p_ok = coalesce(p_ok,0);
var = scan(table,-1);
keep var miss ok p_: ;
format miss ok comma7. p_: 5.1;
label
miss = 'N_MISSING'
ok = 'N_OK'
p_miss = '%_MISSING'
p_ok = '%_OK'
var = 'VARIABLE'
;
run;

 


report.png

View solution in original post

4 REPLIES 4
PaigeMiller
Diamond | Level 26

For one dataset, you would use PROC MEANS with the NMISS option in the OUTPUT statement.

 

For multiple datasets, you could use a macro that loops through all of them, each using PROC MEANS.

--
Paige Miller
Astounding
PROC Star

Here's an approach:

 

proc format;

value present low-high='Present' other='Missing';

value $present ' '='Missing' other='Present';

run;

 

proc freq data=mydata;

   tables _character_ _numeric_ / missing;

   format _character_ $present.;

   format _numeric_ present.;

run;

 

You will get an error message if you encounter a data set that contains all numeric variables (or all character variables).  A mix is required.

 

Good luck.

MikeZdeb
Rhodochrosite | Level 12

Hi.  Take a look at (fixed the indenting in the data so you can actually see what's going on) ...

 

An Easy Route to a Missing Data Report with ODS+PROC FREQ+A Data Step

http://www.lexjansen.com/nesug/nesug11/ds/ds12.pdf

 

It explains why this SAS code works.  All you have to do for any data set is change the name of the data set used in PROC FREQ.  No other changes are required.  The code produced the attached report when I used the SASHELP.HEART.  All you have to do is cut/paste/change data set name/run.

 

proc format;
value nm . = '0' other = '1';
value $ch ' ' = '0' other = '1';
run;

 

ods listing close;
ods output onewayfreqs=tables;

proc freq data=sashelp.heart;
   tables _all_ / missing;
   format _numeric_ nm.
   _character_ $ch.;
run;

ods output close;
ods listing;

 

data report;
length var $32;
do until (last.table);
  set tables;
  by table notsorted;
  array names(*) f_: ;
  select (names(_n_));
    when ('0') do;
    miss = frequency;
    p_miss = percent;
  end;
  when ('1') do;
    ok = frequency;
    p_ok = percent;
  end;
  end;
end;
miss = coalesce(miss,0);
ok = coalesce(ok,0);
p_miss = coalesce(p_miss,0);
p_ok = coalesce(p_ok,0);
var = scan(table,-1);
keep var miss ok p_: ;
format miss ok comma7. p_: 5.1;
label
miss = 'N_MISSING'
ok = 'N_OK'
p_miss = '%_MISSING'
p_ok = '%_OK'
var = 'VARIABLE'
;
run;

 


report.png

Here is code to get you started. It will create a data set called missing_vars that has the percent and variable names of the data set test. You will need to macro-tize this so it will run through all the data sets you need to include in your report.

 

data test;
input x1 x2 x3 x4 y $;
cards;
1 . 1 1 a
13 2 . 5 b
11 . 1 . c
2 3 4 5 d
3 4 5 6 e
. 8 9 . f
5 6 7 8 g
8 9 8 9 h
3 5 8 . i
6 8 7 9 j
;
options mprint symbolgen mlogic;
data _null_;
if 0 then set test;
array num(*) _numeric_;
array chars(*) _character_;
call symput('nums',trim(left(put(dim(num),8.))));
call symput('chars',trim(left(put(dim(chars),8.))));
stop;
run;
data missing_vars;
set test end=end nobs=obsnum;
array num(*) _numeric_;
array chars(*) $ _character_;
array misscountn (&nums.) (&nums.*0);
array misscountc (&chars.) (&chars.*0);

do i=1 to dim(num);
if num(i) = . then misscountn(i) + 1;
end;
do i=1 to dim(chars);
if chars(i) = '' then misscountc(i) + 1;
end;

if end=1 then do;
do i= 1 to dim(num);
percentmiss= misscountn(i) / obsnum ;
var=(vname(num(i)));
output;
end;
do i= 1 to dim(chars);
percentmiss= misscountc(i) / obsnum ;
var=(vname(chars(i)));
output;
end;
end;
keep percentmiss var;
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 4074 views
  • 1 like
  • 5 in conversation