Solved: Re: Count missing values of 100+ variables in with column output

jenim514 · Posted 11-17-2015 12:26 PM

HI! I have a table with 100+ variables and I would like to know the easiest/ fastest way to check for missing/nonmissing values without having to run frequency count for each varaible in separate tables. Preferably, I would like all variables in on column with missing and nonmissing values in columns. For example:

Variable Missing Non_Missing

RAWREG 2 8

PVPRISM 0 10

DAYS 1 9

OVRIDE3 10 0

Again, I have 100+ variables, so can I use some kind of Select (*) function w/o having to type out all variables?

Here is a sample data set:

Obs	RAWREG	PVPRISM	DAYS
1	01	0939	408
2		0943	379
3	09	0029	170
4	11	0948
5		0948	59
6	04	6108	57
7	06	0993	59
8	06	6111	59
9	06	0993	220
10	04	0901	59

BrunoMueller · Posted 11-18-2015 02:04 PM

The NLEVELS of Proc FREQ does not exactly do what you need. Had another go using the Hash object in the DATA Step.

Have a look

data have;
  infile cards dlm="," dsd;
  input
    RAWREG 
    PVPRISM $
    OVRIDE3 $
    DAYS 
  ;
cards;
01,0939,,408
,0943,,379
09,0029,,170
11,0948,,
,0948,,59
04,6108,,57
06,0993,,59
06,6111,,59
06,0993,,220
04,0901,,59
,,3,
;
data _null_;
  set have end=last;
  array _xchar{*} _character_;
  array _xnum{*} _numeric_;

  if _n_ = 1 then do;
    length 
      _colName $ 32
      _nMiss 8
      _n 8
    ;
    declare hash hmiss( ordered: "Y");
    hmiss.defineKey("_colName");
    hmiss.defineData("_colName", "_n", "_nMiss");
    hmiss.defineDone();
  end;

  do i = 1 to dim(_xchar);
    _nMiss=0;
    _n=0;
    _colName = vname( _xchar{i} );
    _rc = hmiss.find();
    _nMiss + (missing(_xchar{i}) = 1);
    _n + (missing(_xchar{i}) = 0);

    if _rc = 0 then do;
      hmiss.replace();
    end;
    else do;
      hmiss.add();
    end;
  end;

  do i = 1 to dim(_xnum);
    _nMiss=0;
    _n=0;
    _colName = vname( _xnum{i} );
    _rc = hmiss.find();
    _nMiss + (missing(_xnum{i}) = 1);
    _n + (missing(_xnum{i}) = 0);

    if _rc = 0 then do;
      hmiss.replace();
    end;
    else do;
      hmiss.add();
    end;

  end;

  if last = 1 then do;
    hmiss.output(dataset: "hashmiss");
  end;
run;

proc print data=hashmiss;
run;

proc print data=have;
run;

Bruno

View solution in original post

PaigeMiller · Posted 11-17-2015 12:30 PM

UNTESTED CODE

proc summary data=have;
     var rawreg--days;
     output out=stats n= nmiss=/autoname;
run;

--
Paige Miller

jenim514 · Posted 11-17-2015 12:45 PM

I'm getting error with this code:

ERROR: Variable RAWREG in list does not match type prescribed for this list
ERROR: Variable PVPRISM in list does not match type prescribed for this list

...these error line go on for each variable in my table. I'm not sure what this means?

LinusH · Posted 11-17-2015 01:08 PM

It seems I can't say this often enough: don't have tables with hundreds of variables. It just causes headache.

Change your data model (e.g. by transposing your data), then it's much easier to analyze it.

Data never sleeps

Astounding · Posted 11-17-2015 01:27 PM

This means that you created your data set with all character variables, no numeric variables. You have two basic choices at this point. Either change your data, so that you actually have numeric variables. Or, write a more complex program that will count missings and nonmissings for character variables. As you make your choice, you might want to consider how you plan to use the data in the future ... will you actually need numeric variables in your data set.

At this point, first you make your choice. Then the programming can begin.

PGStats · Posted 11-17-2015 01:28 PM

It means that your variables are character strings, not numbers as they appear. You should fix that first.

PG

pearsoninst · Posted 11-17-2015 01:38 PM

Did you tried this ?

proc means data = sample n nmiss;
var _numeric_;
run;

FreelanceReinh · Posted 11-17-2015 01:49 PM

Wouldn't this be sufficient to get a quick overview?

proc format;
value miss
._-.z = 'Missing'
other = 'Non_Missing';

value $miss
' ' = 'Missing'
other = 'Non_Missing';
run;

proc freq data=have;
format _character_ $miss.
       _numeric_ miss.;
tables _all_ / missing;
run;

Rick_SAS · Posted 11-17-2015 02:10 PM

See the ideas, examples, and links in the article "Count the number of missing values in for each variable"

BrunoMueller · Posted 11-18-2015 02:50 AM

Proc FREQ has the NLEVELS option that is helpfull for this, see code sample below.

data have;
infile cards dlm="," dsd;
input
 RAWREG $
 PVPRISM $
 OVRIDE3 $
 DAYS $
;
cards;
01,0939,,408
,0943,,379
09,0029,,170
11,0948,,
,0948,,59
04,6108,,57
06,0993,,59
06,6111,,59
06,0993,,220
04,0901,,59
;

proc freq data=have nlevels;
  table _all_ / noprint;
run;

It will create this output:

The FREQ Procedure

Number of Variable Levels
Variable	Levels	Missing Levels	Nonmissing Levels
RAWREG	6	1	5
PVPRISM	8	0	8
OVRIDE3	1	1	0
DAYS	7	1	6

Bruno

jenim514 · Posted 11-18-2015 01:36 PM

Hi! Nlevels definitely works, but I data set is huge...14 Million observations and 100+ variables. It keeps running out of memory every time I run the code. It worked great for a smaller test data set!

Rick_SAS · Posted 11-18-2015 06:04 AM

Please clarify: Are you interested in only NUMERIC variables, or do you want the tabulation for numeric and character variables?

jenim514 · Posted 11-18-2015 12:32 PM

Preferably bother character and numeric. But I know that code can get a little tricky. I can do it separate, counting character using:

proc format;
value $missfmt ' '= 'Missing' other='Not Missing';
run;
proc freq data= have;
format _CHAR_ $missfmt.;
tables _char_ / missing missprint nocum nopercent;
by
run;

But is there a way to for the output to be alphabetical in the results viewer?

jenim514 · Posted 11-18-2015 12:59 PM

Preferably bother character and numeric. But I know that code can get a little tricky. I can do it separate, counting character using:

proc format;

value $missfmt ' '= 'Missing' other='Not Missing';

run;

proc freq data= have;

format _CHAR_ $missfmt.;

tables _char_ / missing missprint nocum nopercent;

run;

***But is there a way to for the character sas variables to be alphabetical in the results viewer?

Steelers_In_DC · Posted 11-18-2015 01:07 PM

Here is a solution, the _all_ adds some noise but it'll work.:

data have;
infile cards dsd;
input Obs RAWREG$ PVPRISM$ OVRIDE3 DAYS;
cards;
1,01,0939,,408
2,,0943,,379
3,09,0029,,170
4,11,0948,,
5,,0948,,59
6,04,6108,,57
7,06,0993,,59
8,06,6111,,59
9,06,0993,,220
10,04,0901,,59
;

proc sql;
create table get_max as
select *,max(obs) as max_obs
from have;

proc transpose data=get_max out=tran;by max_obs;var _all_;

data want (keep=Variable missing non_missing);
set tran(rename=(_name_ = Variable));
Missing=nmiss(of col: );
Non_Missing=max_obs - missing;

if variable in ('Obs','max_obs') then delete;
run;

Registration is open

SAS Training: Just a Click Away