Desktop productivity for business analysts and programmers

Drop variables that have only missing or null values

Accepted Solution Solved
Reply
Contributor
Posts: 22
Accepted Solution

Drop variables that have only missing or null values

I have this kind of table and I would like to delete Header4 and Header6 which are containing only missing / null values.

Header 1Header 2Header 3Header 4Header 5Header 6Header 7
13153
24664
35736

And I have few other tables to similar of this table, but not sure which columns are missing values. Can I use macro language to run this process?

Thanks.


Accepted Solutions
Solution
‎05-23-2015 12:17 AM
Super User
Posts: 9,874

Re: Drop variables that have only missing or null values

OK. Actually I have already posted such code for many posts .

OR if you like IML , you also could get it ,and IML is very fast . But IML only can deal with the variables which are all NUM or CHAR .

 
data have;
  infile datalines dlm="," dsd missover;
  input (Header1 Header2 Header3 Header4 Header5 Header6 Header7) ($);
datalines;
1,3,1,,5,,3
2,4,6,,6,,4
3,5,7,,3,,6
;
run;
proc transpose data=have(obs=0) out=vname ;
 var _all_;
run;

proc sql;
 select catx(' ','n(',_name_,') as ',_name_) into : list separated by ',' from vname;
 create table temp as
  select &list from have;
quit;

proc transpose data=temp out=drop ;
 var _all_;
run;

proc sql;
 select _name_ into : drop separated by ' ' from drop where col1=0;
quit;

data want;
 set have(drop=&drop);
run;



Xia Keshan

View solution in original post


All Replies
Contributor
Posts: 42

Re: Drop variables that have only missing or null values

Hi,

I made this code some time ago. The macro takes a library in library=parameter and all the datasets inside. Checks for columns with all variables missing and does a copy of the table which is written to library in outlib=parameter. It also adds 'nomiss' string at the end of each table name.

As the macro generates a lot of code, it is written to temporary external file

%macro new(library=,outlib=);

%local i j k;

proc sql noprint;

select memname into :tables separated by " "

from dictionary.tables where libname="%upcase(&library.)";

quit;

%do i = 1 %to %util_countwords(&tables.);

  proc sql noprint;

  create table my_dictionary_columns as

  select name, monotonic() as N from dictionary.columns

  where memname="%scan(&tables,&i.)" and libname="%upcase(&library.)";

  quit;

data _null_;

file 'temp.txt';

set my_dictionary_columns end=the_end;

put "retain miss" _N_" 0; miss" _N_"=missing";

put "(" name ")";

put  "+miss" _N_";keep miss" _N_";";

if the_end then call symputx('N_columns',_N_);

run;

/*  option symbolgen;*/

  data miss%scan(&tables,&i.);

  set %upcase(&library.).%scan(&tables,&i.) end=the_end;

  %include 'temp.txt';

  nasumeno+1;

  keep nasumeno;

  if the_end then output;

  run;

  data concated;

  set miss%scan(&tables,&i.);

  array misses

  • miss: ;
  •   keep i ;

      do i = 1 to dim(misses);

          if misses/nasumeno=1 then do;

        check+1;

      output;

        end;

      end;

      call symputx('check',check);

      run;

      proc sql noprint;

      create table to_delete as

      select a.name

      from my_dictionary_columns as a inner join concated as b on(a.N=b.i)

      ;

      quit;

      data _null_;

      file 'drop_temp.txt';

      set to_delete;

      put "drop " name ";";

      run;

    /****************************************************************************************************/

      data &outlib..%substr(%scan(&tables,&i.),1,%sysfunc(min(27,%length(%scan(&tables,&i.)))))nomis;

      set %upcase(&library.).%scan(&tables,&i.);

      ;

      %if &check.>0 %then %do;

      %include 'drop_temp.txt';

      %put &check. variables removed due to missing values from %upcase(&library.).%scan(&tables,&i.);

      %end;

      %else %do;

      %put no missing data discovered in table %upcase(&library.).%scan(&tables,&i.);

      %end;

      run;

    %end;

    %mend new;


    %new(outlib=&outlib.,library=&library.);


    Jakub

    Super User
    Super User
    Posts: 7,720

    Re: Drop variables that have only missing or null values

    Quite simple really, normalise your data, then take variables with missing sum, then drop that list:

    data have;

      infile datalines dlm="," dsd missover;

      input Header1 Header2 Header3 Header4 Header5 Header6 Header7;

    datalines;

    1,3,1,,5,,3

    2,4,6,,6,,4

    3,5,7,,3,,6

    ;

    run;

    data inter (keep=head val_res);

      set have;

      array header{7};

      do i=1 to 7;

        head="HEADER"||strip(put(i,best.));

        val_res=header{i};

        output;

      end;

    run;

    proc sql noprint;

      select  distinct HEAD

      into    Smiley Very HappyROP_LIST separated by " "

      from    WORK.INTER

      group by HEAD

      having  sum(VAL_RES)=.;

    quit;

    data want;

      set have (drop=&DROP_LIST.);

    run;

    Respected Advisor
    Posts: 3,788

    Re: Drop variables that have only missing or null values

    If all the variables are numeric why all the extra processing?.  You can get Ns directly and done.  The more difficult question is how do you do this efficiently for data with both character and numeric variables and a large number of obs.  Once you find the first non-missing for a variable you can stop looking at that variable but variables with all missing values every observations must be tested.

    ods select none;
    proc means n stackods data=have;
       ods output summary=summary(where=(n=0));
       run;
    ods select all;

    5-22-2015 8-45-05 AM.png


    Super User
    Posts: 9,874

    Re: Drop variables that have only missing or null values

    This topic has been heavily talked . If your data are all numeric type , this could easily been done by the following.

    But I would like to use SQL which can also take care CHARACTER type variable and more control .

    data have;
      infile datalines dlm="," dsd missover;
      input Header1 Header2 Header3 Header4 Header5 Header6 Header7;
    datalines;
    1,3,1,,5,,3
    2,4,6,,6,,4
    3,5,7,,3,,6
    ;
    run;
    ods select none;
    ods output nlevels=want(where=(NNonMissLevels=0));
    proc freq data=have nlevels;
    tables _numeric_;
    run;
    
    
    

    x.png

    Xia Keshan

    Contributor
    Posts: 22

    Re: Drop variables that have only missing or null values

    My dataset, it is all character type.. Sorry, I should have mentioned at the beginning.

    Super User
    Super User
    Posts: 7,720

    Re: Drop variables that have only missing or null values

    Well, with a minor change to my code it should handle character:

    data inter (keep=head val_res);

      set have;

      array header{7};

      do i=1 to 7;

        head="HEADER"||strip(put(i,best.));

        if strip(header{i})="" then val_res=0;

        else val_res=1;

        output;

      end;

    run;

    proc sql noprint;

      select  distinct HEAD

      into    Smiley Very HappyROP_LIST separated by " "

      from    WORK.INTER

      group by HEAD

      having  sum(VAL_RES)=0;

    quit;

    data want;

      set have (drop=&DROP_LIST.);

    run;

    Note I am not at work so can't test.  A similar approach could do both char and numeric.

    Solution
    ‎05-23-2015 12:17 AM
    Super User
    Posts: 9,874

    Re: Drop variables that have only missing or null values

    OK. Actually I have already posted such code for many posts .

    OR if you like IML , you also could get it ,and IML is very fast . But IML only can deal with the variables which are all NUM or CHAR .

     
    data have;
      infile datalines dlm="," dsd missover;
      input (Header1 Header2 Header3 Header4 Header5 Header6 Header7) ($);
    datalines;
    1,3,1,,5,,3
    2,4,6,,6,,4
    3,5,7,,3,,6
    ;
    run;
    proc transpose data=have(obs=0) out=vname ;
     var _all_;
    run;
    
    proc sql;
     select catx(' ','n(',_name_,') as ',_name_) into : list separated by ',' from vname;
     create table temp as
      select &list from have;
    quit;
    
    proc transpose data=temp out=drop ;
     var _all_;
    run;
    
    proc sql;
     select _name_ into : drop separated by ' ' from drop where col1=0;
    quit;
    
    data want;
     set have(drop=&drop);
    run;
    
    
    
    

    Xia Keshan

    Super User
    Posts: 9,874

    Re: Drop variables that have only missing or null values

    IML code for such question. Since you start such a topic, I just wrote some IML code for this . IML is very fast, Rick might want see this.

     
    data have;
      infile datalines dlm="," dsd missover;
      input (Header1 Header2 Header3 Header4 Header5 Header6 Header7) ($);
    datalines;
    1,3,1,,5,,3
    2,4,6,,6,,4
    3,5,7,,3,,6
    ;
    run;
    
    proc iml;
    use have;
    read all var _char_ into x[c=vname];
    drop= vname[loc(missing(x)[+,]=nrow(x))];
    submit drop;
     data want;
      set have(drop=&drop);
     run;
    endsubmit;
    quit;
    
    
    

    Xia Keshan

    Contributor
    Posts: 22

    Re: Drop variables that have only missing or null values

    Thanks so much!! That works perfect, but PROC IML can't be run in SAS enterprise guide?

    Super User
    Posts: 9,874

    Re: Drop variables that have only missing or null values

    Yeah. But you can run it in SAS University Edition which is totally free , you can downlaod it from sas.com  .

    And one more thing , You could also try my PROC FREQ code  Like :

    tables  _char_ ;

    OR

    tables _all_;

    Xia Keshan

    Super User
    Posts: 9,874

    Re: Drop variables that have only missing or null values

    You can run IML code in EG as long as you have it .

    Run

    proc setinit;run;

    Check if you have IML .

    Xia Keshan

    Occasional Learner
    Posts: 1

    Re: Drop variables that have only missing or null values

    Thank you! This was just what I needed. I have been using a MUCH longer complex macro.

    It's curious that SAS doesn't properly recognize the code for purposes of assigning correct  colors to the various lines.

    Nonetheless, your code works like a charm!

    Gerry

    🔒 This topic is solved and locked.

    Need further help from the community? Please ask a new question.

    Discussion stats
    • 12 replies
    • 2357 views
    • 5 likes
    • 6 in conversation