SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

How to count number of entries in each column and find empty columns

Accepted Solution Solved
Reply
Super Contributor
Posts: 441
Accepted Solution

How to count number of entries in each column and find empty columns

Hi,

suppose I have a very big table with millions of rows. Some columns are almost full, some less, some have very few data and some are completely empty.

Is it possible to count the number of entries that are in each column?

Thank you


Accepted Solutions
Solution
‎12-19-2014 08:49 PM
PROC Star
Posts: 7,474

Re: How to count number of entries in each column and find empty columns

You could do something like the following:

data test;

  set sashelp.class;

  if _n_ eq 7 then do;

    call missing(name);

    call missing(height);

  end;

run;

proc format;

value $missfmt ' '='Missing' other='Not Missing';

value  missfmt  . ='Missing' other='Not Missing';

run;

proc freq data=test;

  format _CHARACTER_ $missfmt.;

  tables _CHARACTER_ / missing nocum nopercent;

  format _NUMERIC_ missfmt.;

  tables _NUMERIC_ / missing nocum nopercent;

run;

View solution in original post


All Replies
Solution
‎12-19-2014 08:49 PM
PROC Star
Posts: 7,474

Re: How to count number of entries in each column and find empty columns

You could do something like the following:

data test;

  set sashelp.class;

  if _n_ eq 7 then do;

    call missing(name);

    call missing(height);

  end;

run;

proc format;

value $missfmt ' '='Missing' other='Not Missing';

value  missfmt  . ='Missing' other='Not Missing';

run;

proc freq data=test;

  format _CHARACTER_ $missfmt.;

  tables _CHARACTER_ / missing nocum nopercent;

  format _NUMERIC_ missfmt.;

  tables _NUMERIC_ / missing nocum nopercent;

run;

Super Contributor
Posts: 441

Re: How to count number of entries in each column and find empty columns

Hi Arthur and Patrick, thanks for the quick reply and both of your codes are very helpful!!

Just some small extra questions if I may:

when doing the proc freq, I realized that some columns which I thought were completely empty and which I was going to delete actually had some data, albeit small, which I might use in the future.

1) Is it possible to see what is the number of the observation where these data are present ?

2) is it possible to make an output file of all the columns but conditional on data being present on columns of my choice?

Again thank you very much!!!

PROC Star
Posts: 7,474

Re: How to count number of entries in each column and find empty columns

Not sure what you're asking. You could always use a datastep, with a where statement, only selecting records that have non-missing values for the variables you are interested in.

Super Contributor
Posts: 441

Re: How to count number of entries in each column and find empty columns

Hi arthur,

I have millions of lines of code, and in a special column of interest there are around 1000 rows with nonmissing data, and the rest are missing.

Is it possible to know where those nonmissing values are, i.e to output their observation number?

Thank you

Super Contributor
Posts: 441

Re: How to count number of entries in each column and find empty columns

Hi Arthur,

I actually figured out how to do my second little extra question, its quite simple if I'm correct and is of the following:

data a;

input name$ nu ;

datalines;

a 1

s 3

d .

g 4

t .

kk .

f 6

;

run;

data b;

set a;

if nu ne " ";

run;


thanks!

PROC Star
Posts: 7,474

Re: How to count number of entries in each column and find empty columns

Just about. But that, alone, won;t show you which observations were missing. How about adding one extra variable to your code:

data b;

set a;

if nu ne " ";

observation=_n_;

run;


Super Contributor
Posts: 441

Re: How to count number of entries in each column and find empty columns

Thanks!!

Respected Advisor
Posts: 4,173

Re: How to count number of entries in each column and find empty columns

Below code counting the number of distinct values per column in a table. Is that what you're after? It could become quite a big table if you're having a lot of different values.

data sample;

  set sashelp.class;

  if _n_=5 then call missing(sex);

run;

ods _all_ close;

ods output OneWayFreqs = Freqs(keep=table F_: Frequency rename=(table=Varname));;

proc freq data=sample ;

  table _all_  /nocum nopercent missing;

quit;

ods _all_ close;

ods listing;

data want(keep=Varname Value Frequency);

  set freqs;

  Varname=prxchange('s/^table(.*)/\1/oi',1,Varname);

  Value=coalescec(of F_Smiley Happy;

run;

/*proc print data=want;*/

/*run;*/

Super User
Posts: 10,030

Re: How to count number of entries in each column and find empty columns

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 529 views
  • 6 likes
  • 4 in conversation