BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jenim514
Pyrite | Level 9

HI!  I have a table with 100+ variables and I would like to know the easiest/ fastest way to check for missing/nonmissing values without having to run frequency count for each varaible in separate tables.  Preferably, I would like all variables in on column with missing and nonmissing values in columns.  For example: 

 

Variable         Missing      Non_Missing

RAWREG       2               8

PVPRISM        0              10

DAYS              1               9

OVRIDE3        10             0

 

 

Again, I have 100+ variables, so can I use some kind of Select (*) function w/o having to type out all variables?

 

Here is a sample data set:

 

Obs RAWREG PVPRISM OVRIDE3 DAYS
1 01 0939   408
2   0943   379
3 09 0029   170
4 11 0948    
5   0948   59
6 04 6108   57
7 06 0993   59
8 06 6111   59
9 06 0993   220
10 04 0901   59
1 ACCEPTED SOLUTION

Accepted Solutions
BrunoMueller
SAS Super FREQ

The NLEVELS of Proc FREQ does not exactly do what you need. Had another go using the Hash object in the DATA Step.

 

Have a look

 

data have;
  infile cards dlm="," dsd;
  input
    RAWREG 
    PVPRISM $
    OVRIDE3 $
    DAYS 
  ;
cards;
01,0939,,408
,0943,,379
09,0029,,170
11,0948,,
,0948,,59
04,6108,,57
06,0993,,59
06,6111,,59
06,0993,,220
04,0901,,59
,,3,
;
data _null_;
  set have end=last;
  array _xchar{*} _character_;
  array _xnum{*} _numeric_;

  if _n_ = 1 then do;
    length 
      _colName $ 32
      _nMiss 8
      _n 8
    ;
    declare hash hmiss( ordered: "Y");
    hmiss.defineKey("_colName");
    hmiss.defineData("_colName", "_n", "_nMiss");
    hmiss.defineDone();
  end;

  do i = 1 to dim(_xchar);
    _nMiss=0;
    _n=0;
    _colName = vname( _xchar{i} );
    _rc = hmiss.find();
    _nMiss + (missing(_xchar{i}) = 1);
    _n + (missing(_xchar{i}) = 0);

    if _rc = 0 then do;
      hmiss.replace();
    end;
    else do;
      hmiss.add();
    end;
  end;

  do i = 1 to dim(_xnum);
    _nMiss=0;
    _n=0;
    _colName = vname( _xnum{i} );
    _rc = hmiss.find();
    _nMiss + (missing(_xnum{i}) = 1);
    _n + (missing(_xnum{i}) = 0);

    if _rc = 0 then do;
      hmiss.replace();
    end;
    else do;
      hmiss.add();
    end;

  end;

  if last = 1 then do;
    hmiss.output(dataset: "hashmiss");
  end;
run;

proc print data=hashmiss;
run;

proc print data=have;
run;

Bruno

View solution in original post

23 REPLIES 23
PaigeMiller
Diamond | Level 26

UNTESTED CODE

 

proc summary data=have;
     var rawreg--days;
     output out=stats n= nmiss=/autoname;
run;
--
Paige Miller
jenim514
Pyrite | Level 9
I'm getting error with this code:

ERROR: Variable RAWREG in list does not match type prescribed for this list
ERROR: Variable PVPRISM in list does not match type prescribed for this list

...these error line go on for each variable in my table. I'm not sure what this means?

LinusH
Tourmaline | Level 20

It seems I can't say this often enough: don't have tables with hundreds of variables. It just causes headache.

Change your data model (e.g. by transposing your data), then it's much easier to analyze it.

Data never sleeps
Astounding
PROC Star

This means that you created your data set with all character variables, no numeric variables.  You have two basic choices at this point.  Either change your data, so that you actually have numeric variables.  Or, write a more complex program that will count missings and nonmissings for character variables.  As you make your choice, you might want to consider how you plan to use the data in the future ... will you actually need numeric variables in your data set. 

 

At this point, first you make your choice.  Then the programming can begin.

PGStats
Opal | Level 21

It means that your variables are character strings, not numbers as they appear. You should fix that first.

PG
pearsoninst
Pyrite | Level 9
Did you tried this ?

proc means data = sample n nmiss;
var _numeric_;
run;
FreelanceReinh
Jade | Level 19

Wouldn't this be sufficient to get a quick overview?

 

proc format;
value miss
._-.z = 'Missing'
other = 'Non_Missing';

value $miss
' ' = 'Missing'
other = 'Non_Missing';
run;

proc freq data=have;
format _character_ $miss.
       _numeric_ miss.;
tables _all_ / missing;
run;
BrunoMueller
SAS Super FREQ

Proc FREQ has the NLEVELS option that is helpfull for this, see code sample below.

 

data have;
infile cards dlm="," dsd;
input
 RAWREG $
 PVPRISM $
 OVRIDE3 $
 DAYS $
;
cards;
01,0939,,408
,0943,,379
09,0029,,170
11,0948,,
,0948,,59
04,6108,,57
06,0993,,59
06,6111,,59
06,0993,,220
04,0901,,59
;

proc freq data=have nlevels;
  table _all_ / noprint;
run;

It will create this output:

The FREQ Procedure

 

Number of Variable Levels
Variable Levels Missing Levels Nonmissing Levels
RAWREG 6 1 5
PVPRISM 8 0 8
OVRIDE3 1 1 0
DAYS 7 1 6
Bruno
jenim514
Pyrite | Level 9
Hi! Nlevels definitely works, but I data set is huge...14 Million observations and 100+ variables. It keeps running out of memory every time I run the code. It worked great for a smaller test data set!
Rick_SAS
SAS Super FREQ

Please clarify: Are you interested in only NUMERIC variables, or do you want the tabulation for numeric and character variables?

jenim514
Pyrite | Level 9
Preferably bother character and numeric. But I know that code can get a little tricky. I can do it separate, counting character using:

proc format;
value $missfmt ' '= 'Missing' other='Not Missing';
run;
proc freq data= have;
format _CHAR_ $missfmt.;
tables _char_ / missing missprint nocum nopercent;
by
run;

But is there a way to for the output to be alphabetical in the results viewer?
jenim514
Pyrite | Level 9

Preferably bother character and numeric. But I know that code can get a little tricky. I can do it separate, counting character using:

 

proc format;

value $missfmt ' '= 'Missing' other='Not Missing';

run;

proc freq data= have;

format _CHAR_ $missfmt.;

tables _char_ / missing missprint nocum nopercent;

run;

 

 

***But is there a way to for the character sas variables to be alphabetical in the results viewer?

Steelers_In_DC
Barite | Level 11

Here is a solution, the _all_ adds some noise but it'll work.:

 

data have;
infile cards dsd;
input Obs     RAWREG$     PVPRISM$     OVRIDE3     DAYS;
cards;
1,01,0939,,408
2,,0943,,379
3,09,0029,,170
4,11,0948,,
5,,0948,,59
6,04,6108,,57
7,06,0993,,59
8,06,6111,,59
9,06,0993,,220
10,04,0901,,59
;

proc sql;
create table get_max as
select *,max(obs) as max_obs
from have;

proc transpose data=get_max out=tran;by max_obs;var _all_;

data want (keep=Variable missing non_missing);
set tran(rename=(_name_ = Variable));
Missing=nmiss(of col: );
Non_Missing=max_obs - missing;

if variable in ('Obs','max_obs') then delete;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 23 replies
  • 10081 views
  • 9 likes
  • 10 in conversation