BookmarkSubscribeRSS Feed
marleeakerson
Calcite | Level 5

Hello, 

 

I am working with a very big dataset (more than 10k rows) and am trying to find all the columns that have no data in them. I ran a proc freq but is there a way to put all those columns that are completely empty into a list or a table? 

 

Thank you. 

5 REPLIES 5
Reeza
Super User

Have you tried any of the solutions found on here? There are several answered questions relating to this:

 

https://communities.sas.com/t5/SAS-Programming/deleting-empty-columns/m-p/308045

 

https://communities.sas.com/t5/New-SAS-User/Deleting-Variables-with-Only-Missing-Values-from-Dataset...

 

https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/Delete-quot-zero-columns-quot/m-p/164261

 

If none of those work, please indicate how your situation is different so we can help you find a customized solution.


@marleeakerson wrote:

Hello, 

 

I am working with a very big dataset (more than 10k rows) and am trying to find all the columns that have no data in them. I ran a proc freq but is there a way to put all those columns that are completely empty into a list or a table? 

 

Thank you. 


 

marleeakerson
Calcite | Level 5
I have seen those but I am trying to find all the empty columns, not delete them, so I can compare them to another table I have.

Thank you.
Reeza
Super User

Then just don't drop them.

You'll get a list of your variables this way for example, slightly modified from first link:

 



ods select none;
ods output nlevels=temp;
proc freq data=your_data nlevels;
 tables _all_;
run;
proc sql;
 select tablevar into : drop separated by ','
  from temp
   where NNonMissLevels=0;
   
quit;


%put &drop;
ballardw
Super User

You might try the option NLEVELS on proc freq with the ODS Select to restrict the output to that table only.

Quick example:

data example;
   input x y z q;
datalines;
1 . 3  .
2 2 2  .
3 . 3  .
;

ods select nlevels;
proc freq data=example nlevels;
run;

The output only shows the "levels" information, which is number of different values.

The key for your part is in the result for Q that shows Nonmissing levels of 0. That is what appears if a variable only has missing values.