SAS dataset attribute analysis

Accepted Solution Solved
Reply
Contributor
Posts: 56
Accepted Solution

SAS dataset attribute analysis

Dear SAS support users,

I need a count on missing values of all variables (over 100) in my sas dataset.

The output should be:

Variable Name        Value missing      Value existing     % missing     % existing

var 1                         0                       5437                   0                 100%       * all values existing *;

var 2                       5437                    0                       100%            0             * all values missing *

var 3                      2000                    3437                    36.8%         63.2%

...............................................

Currently, I convert sas datset to csv file and download to local storage.

Then using excel formula COUNTBLANK to count the missing value and  other values derived from that missing value.

This process is time consuming for a large dataset. I would like to do this in SAS code.

Thanks for your support to share your knowledge.

Best Regards,

William


Accepted Solutions
Solution
‎06-12-2015 11:32 AM
Super User
Posts: 9,691

Re: SAS dataset attribute analysis

Code: Program


data class;
set sashelp.class;
if ranuni(0) lt .5 then call missing(name,age);
run;

proc transpose data=class(obs=0) out=name;
var _all_;
run;
data _null_;
set name end=last;
if _n_ eq 1 then call execute('proc sql;create table want as ');
call execute(catt('select "',_name_,'" as var_name length=40 ,
  nmiss(',_name_,') as missing ,
  n(',_name_,') as non_missing ,
  nmiss(',_name_,')/count(*) as per_missing format=percent8.2,
  n(',_name_,')/count(*) as per_non_missing format=percent8.2
  from class'));
if not last then call execute('union');
  else call execute(';quit;');
run;

View solution in original post


All Replies
Super User
Super User
Posts: 7,430

Re: SAS dataset attribute analysis

Well, there are other ways depending on if all variables are the same type, numeric for instance, but this should cover both types:

data tmp;

  length var1-var5 $200;

  var1="abc";

  var6=3;

run;

proc sql noprint;

  create table RESULTS

  (

    VAR_NAME char(200),

    VAL_MISS num,

    VAL_EXIST num,

    PCENT_MISS num,

    PCENT_EXIST num

  );

  select  count(1)

  into    :TOT

  from    WORK.TMP;

quit;

data _null_;

  set sashelp.vcolumn (where=(libname="WORK" and memname="TMP"));

  call execute('proc sql;

                  insert into RESULTS

                  set VAR_NAME="'||strip(NAME)||'",

                      VAL_MISS=&TOT. - (select count('||strip(NAME)||') from TMP where '||strip(NAME)||' is not null),

                      VAL_EXIST=(select count('||strip(NAME)||') from TMP where '||strip(NAME)||' is not null),

                      PCENT_MISS=(&TOT. - (select count('||strip(NAME)||') from TMP where '||strip(NAME)||' is not null) / &TOT.) * 100,

                      PCENT_EXIST=((select count('||strip(NAME)||') from TMP where '||strip(NAME)||' is not null) / &TOT.) * 100;

                quit;');

run;

Solution
‎06-12-2015 11:32 AM
Super User
Posts: 9,691

Re: SAS dataset attribute analysis

Code: Program


data class;
set sashelp.class;
if ranuni(0) lt .5 then call missing(name,age);
run;

proc transpose data=class(obs=0) out=name;
var _all_;
run;
data _null_;
set name end=last;
if _n_ eq 1 then call execute('proc sql;create table want as ');
call execute(catt('select "',_name_,'" as var_name length=40 ,
  nmiss(',_name_,') as missing ,
  n(',_name_,') as non_missing ,
  nmiss(',_name_,')/count(*) as per_missing format=percent8.2,
  n(',_name_,')/count(*) as per_non_missing format=percent8.2
  from class'));
if not last then call execute('union');
  else call execute(';quit;');
run;
Contributor
Posts: 56

Re: SAS dataset attribute analysis

Hi Xia

Your SAS code is simple to understand and run quite well with small sample.

I need to verify the code in real data set with 125 variable names and about 50 millions of observations.

Let you know after I ran it.

Thanks in advance for your quick response.

William

Super User
Posts: 5,099

Re: SAS dataset attribute analysis

Here's an easy program, but with limitations.  It works for numeric variables only, and gives you just 3 of the 4 statistics.

proc tabulate data=have;

  var _numeric_;

  tables _numeric_, n='Value Existing' nmiss='Value Missing' pctn='% Existing';

run;

Good luck.

Contributor
Posts: 56

Re: SAS dataset attribute analysis

Dear Astouding

You program ran fast but the limitation on character variables.

I modified the program as follow:

-------------------------------------------------------------------------

data class;

set sashelp.class;

if ranuni(0) lt .5 then call missing(name,age);

if name = '  ' then name_count = . ;

else                name_count = 1 ;

run;

Proc tabulate data=class;

  var _numeric_;

  tables _numeric_, n='Value Existing' nmiss='Value Missing' ;

run;

----------------------------------------------------------------

output result :

           | Value Existing | Value Missing

Age               9              10

Height           19               0

Weight           19               0

name_count        9              10

----------------------------------------------------------

The problem is how to quickly convert all character variables like the name var above. My data set has about  105 character variables out of 125 total variables.

Super User
Posts: 5,099

Re: SAS dataset attribute analysis

Some of the other suggestions (such as Reeza's macro) will give you much closer to what you asked for.  However, if you want something quick and dirty that will work with character variables, you could try:

proc format;

   value $there ' '='Missing' other='Present';

run;

proc freq data=have;

   tables _character_ / missing;

   format _character_ $there.;

run;

It won't give you pretty output, but it does provide the right information.  And it is short and easy to understand. 

Super User
Posts: 17,960

Re: SAS dataset attribute analysis

Here's a macro that I wrote a while back.  I don't know that it calculates all your measures but they should all be derivable from what is there.

SAS - Missing Macro - For a dataset, variable level report the number of missing and non-missing obs...

Contributor
Posts: 56

Re: SAS dataset attribute analysis

Thanks for all contributions to my problem.

All solutions generated some levels of my purpose and the running time is a major impact on the efficiency of the solution.

Finally I comprise all solutions to get a method that is suitable to solve my problem.

Best Regards,

William

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 290 views
  • 6 likes
  • 5 in conversation