turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- Count of non-zero observations for numerous variab...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-03-2015 04:15 PM

Hi,

I have a dataset like the following:

Company | VAR1 | VAR2 | VAR3 | VAR4 | VAR5 |
---|---|---|---|---|---|

Company1 | 0 | 28 | 0 | 0 | 0 |

Company2 | 4 | 0 | 0 | 12 | 19 |

Company3 | 7 | 0 | 6 | 55 | 25 |

Company4 | 8 | 0 | 23 | 0 | 7 |

Company5 | 10 | 15 | 10 | 34 | 52 |

The dataset has a total of 160 variables, of which I want to find summary statistics for each variable. However, I first want to determine the number of non-zero observations for each variable. To do this with a small number of variables I would probably do the following:

Proc sql;

create table want as

select *,

sum(case when VAR1 ne 0 then 1 else 0 end) as count1,

sum(case when VAR2 ne 0 then 1 else 0 end) as count2

from have;

Quit;

However, this is code is not as useful when I have 160 variables I want to get counts for. Is there an easier way this can be done? I assume this can be done with proc freq or proc summary, but am not very familiar with these procs. Keep in mind I also want to keep the values for each variable as I want to eventually gather summary statistics for each variable. Eventually I would want my final dataset to look something like this:

Variable | Number of non-zero observations for each variable | Min | Max | Mean | Median |
---|---|---|---|---|---|

Var1 | 4 | ||||

Var2 | 2 | ||||

Var3 | 3 | ||||

Var4 | 3 | ||||

Var5 | 4 | ||||

Var6 | ...etc. | ||||

Var7 | ...etc. |

I can probably handle the transposing and re-arranging the data from a proc means output, so I was mostly wondering what the best way to get the count of non-zero observations for each variable.

Thank you.

Accepted Solutions

Solution

08-03-2015
09:27 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-03-2015 09:27 PM

OK, in that case ...

There is little to change about the PROC MEANS. It can run on your original data set, but does not require the N statistic as part of the output.

PROC FREQ can generate the counts, but it would probably be easier to use a DATA step:

data want;

set have end=done;

array in {160} var1-var160;

array out {160} count1-count160;

retain count1-count160 0;

do _i_=1 to 160;

if in{_i_} not in (., 0) then out{_i_} + 1;

end;

if done;

keep count1-count160;

run;

So already the code gets a bit complex. Then there is the requirement to combine both sets of results, which uses perhaps more complex tools. How comfortable are you with complex code? Would you prefer to drop the requirement about just counting the non-zeros? Either path is possible.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-03-2015 06:47 PM

There are ways to do this, but first you have to clarify what you want. Are the Min, Max, Mean, and Median supposed to be based on the non-zero values only? That's the easiest case of all. Just change all zeros to missing values:

data want;

set have;

array vars {160} var1-var160;

do _i_=1 to 160;

if vars{_i_}=0 then vars{_i_}=.;

end;

run;

proc means data=want n min max mean median;

var var1-var160;

run;

If you want statistics to include all the zeros, that will take a little more work, but is still fairly easy. So what are you looking for?

Good luck.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-03-2015 07:31 PM

Thank you for the response. I would like to include any zero or negative values in my summary statistics, but have a count of how many observations were not equal to zero for each variable.

Solution

08-03-2015
09:27 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-03-2015 09:27 PM

OK, in that case ...

There is little to change about the PROC MEANS. It can run on your original data set, but does not require the N statistic as part of the output.

PROC FREQ can generate the counts, but it would probably be easier to use a DATA step:

data want;

set have end=done;

array in {160} var1-var160;

array out {160} count1-count160;

retain count1-count160 0;

do _i_=1 to 160;

if in{_i_} not in (., 0) then out{_i_} + 1;

end;

if done;

keep count1-count160;

run;

So already the code gets a bit complex. Then there is the requirement to combine both sets of results, which uses perhaps more complex tools. How comfortable are you with complex code? Would you prefer to drop the requirement about just counting the non-zeros? Either path is possible.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-03-2015 10:05 PM

Thank you. This should work. I can probably get my data into the final desired format from the code provided by you and ballerdw + Jag.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-03-2015 09:49 PM

I believe **Astounding** has provided you the right answer. The below will generate the output similar to **ballardw** but only nonzero obs count. Hope it helps

Please try

data have;

input Company$ VAR1 VAR2 VAR3 VAR4 VAR5;

cards;

Company1 0 28 0 0 0

Company2 4 0 0 12 19

Company3 7 0 6 55 25

Company4 8 0 23 0 7

Company5 10 15 10 34 52

;

proc transpose data=have;

run;

data want;

set data1;

array cnt col1-col99;

array cnts cols1-cols99;

do over cnt;

if cnt in (0,.) then cnts=1;

else cnts=.;

allcnts=nmiss(of cols1-cols99);

end;

drop col: cols:;

run;

Thanks,

Jag

Thanks,

Jag

Jag

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-03-2015 07:20 PM

And by non- zero do you have an negative values to consider?

If you want a count one way is a custom format and proc freq/report/tabulate

proc format;

value nonzero

0='Zero'

0<-high = 'Nonzero' /* assumes all the non-zero values of interest are positive*/

;

run;

proc freq data= have;

tables _numeric_ ;

format _numeric_ nonzero.;

run;

Or any list of variables will work.