## Count of non-zero observations for numerous variables

Solved
Occasional Contributor
Posts: 5

# Count of non-zero observations for numerous variables

Hi,

I have a dataset like the following:

CompanyVAR1VAR2VAR3VAR4VAR5
Company1028000
Company24001219
Company37065525
Company4802307
Company51015103452

The dataset has a total of 160 variables, of which I want to find summary statistics for each variable. However, I first want to determine the number of non-zero observations for each variable. To do this with a small number of variables I would probably do the following:

Proc sql;

create table want as

select *,

sum(case when VAR1 ne 0 then 1 else 0 end) as count1,

sum(case when VAR2 ne 0 then 1 else 0 end) as count2

from have;

Quit;

However, this is code is not as useful when I have 160 variables I want to get counts for. Is there an easier way this can be done? I assume this can be done with proc freq or proc summary, but am not very familiar with these procs. Keep in mind I also want to keep the values for each variable as I want to eventually gather summary statistics for each variable. Eventually I would want my final dataset to look something like this:

VariableNumber of non-zero observations for each variableMinMaxMeanMedian
Var14
Var22
Var33
Var43
Var54
Var6...etc.
Var7...etc.

I can probably handle the transposing and re-arranging the data from a proc means output, so I was mostly wondering what the best way to get the count of non-zero observations for each variable.

Thank you.

Accepted Solutions
Solution
‎08-03-2015 09:27 PM
Super User
Posts: 6,752

## Re: Count of non-zero observations for numerous variables

OK, in that case ...

There is little to change about the PROC MEANS.  It can run on your original data set, but does not require the N statistic as part of the output.

PROC FREQ can generate the counts, but it would probably be easier to use a DATA step:

data want;

set have end=done;

array in {160} var1-var160;

array out {160} count1-count160;

retain count1-count160 0;

do _i_=1 to 160;

if in{_i_} not in (., 0) then out{_i_} + 1;

end;

if done;

keep count1-count160;

run;

So already the code gets a bit complex.  Then there is the requirement to combine both sets of results, which uses perhaps more complex tools.  How comfortable are you with complex code?  Would you prefer to drop the requirement about just counting the non-zeros?  Either path is possible.

All Replies
Super User
Posts: 6,752

## Re: Count of non-zero observations for numerous variables

There are ways to do this, but first you have to clarify what you want.  Are the Min, Max, Mean, and Median supposed to be based on the non-zero values only?  That's the easiest case of all.  Just change all zeros to missing values:

data want;

set have;

array vars {160} var1-var160;

do _i_=1 to 160;

if vars{_i_}=0 then vars{_i_}=.;

end;

run;

proc means data=want n min max mean median;

var var1-var160;

run;

If you want statistics to include all the zeros, that will take a little more work, but is still fairly easy.  So what are you looking for?

Good luck.

Occasional Contributor
Posts: 5

## Re: Count of non-zero observations for numerous variables

Thank you for the response. I would like to include any zero or negative values in my summary statistics, but have a count of how many observations were not equal to zero for each variable.

Solution
‎08-03-2015 09:27 PM
Super User
Posts: 6,752

## Re: Count of non-zero observations for numerous variables

OK, in that case ...

There is little to change about the PROC MEANS.  It can run on your original data set, but does not require the N statistic as part of the output.

PROC FREQ can generate the counts, but it would probably be easier to use a DATA step:

data want;

set have end=done;

array in {160} var1-var160;

array out {160} count1-count160;

retain count1-count160 0;

do _i_=1 to 160;

if in{_i_} not in (., 0) then out{_i_} + 1;

end;

if done;

keep count1-count160;

run;

So already the code gets a bit complex.  Then there is the requirement to combine both sets of results, which uses perhaps more complex tools.  How comfortable are you with complex code?  Would you prefer to drop the requirement about just counting the non-zeros?  Either path is possible.

Occasional Contributor
Posts: 5

## Re: Count of non-zero observations for numerous variables

Thank you. This should work. I can probably get my data into the final desired format from the code provided by you and ballerdw + Jag.

Posts: 1,147

## Re: Count of non-zero observations for numerous variables

I believe   has provided you the right answer. The below will generate the output similar to but only nonzero obs count. Hope it helps

data have;

input Company\$    VAR1    VAR2    VAR3    VAR4    VAR5;

cards;

Company1    0    28    0    0    0

Company2    4    0    0    12    19

Company3    7    0    6    55    25

Company4    8    0    23    0    7

Company5    10    15    10    34    52

;

proc transpose data=have;

run;

data want;

set data1;

array cnt col1-col99;

array cnts cols1-cols99;

do over cnt;

if cnt in (0,.) then cnts=1;

else cnts=.;

allcnts=nmiss(of cols1-cols99);

end;

drop col: cols:;

run;

Thanks,

Jag

Thanks,
Jag
Super User
Posts: 13,501

## Re: Count of non-zero observations for numerous variables

And by non- zero do you have an negative values to consider?

If you want a count one way is a custom format and proc freq/report/tabulate

proc format;

value nonzero

0='Zero'

0<-high = 'Nonzero'  /* assumes all the non-zero values of interest are positive*/

;

run;

proc freq data= have;

tables _numeric_ ;

format _numeric_ nonzero.;

run;

Or any list of variables will work.

🔒 This topic is solved and locked.