Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Count the number of non-zero occurrences for a variable

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 6
Accepted Solution

Count the number of non-zero occurrences for a variable


I'm working with a very large data set (over 15 million observations and over 1000 variables). I would like to count the number of non-zero entries for a handful of variables from within my data set.  For example, in the table below VAR 1 has 3 occurrences, and VAR 2 has two occurrences.  I'm a beginner, so please provide some explanation if possible.  I appreciate any help!

Client #Var 1Var 2

1

2452699
200
336940
415480
503554

Accepted Solutions
Solution
‎07-31-2015 10:01 AM
Super User
Super User
Posts: 7,942

Re: Count the number of non-zero occurrences for a variable

Ah, ok, so you want to check if the sum per column is 0.  Then your best off checking one of the procedures associated with that.  Proc means, or proc summary.  Say:

proc means data=have;

     var var1-var1000;

     output out=results mean=mean;

run;

Then if the resulting mean for the variable is 0 you know they are all zeroes. 

View solution in original post


All Replies
Super Contributor
Posts: 337

Re: Count the number of non-zero occurrences for a variable

If you have SAS IML, there is one really fast way to count missing/not missing.

Look at this post: http://blogs.sas.com/content/iml/2011/09/19/count-the-number-of-missing-values-for-each-variable.htm...

Do you have HP procs and a grid license? That is another route.

I hope it helps,

Miguel

Valued Guide
Posts: 860

Re: Count the number of non-zero occurrences for a variable

I'm sure there will be some good array solutions, but this is the first thing that came to mind.  If it is only a handful of variables it's a viable solution.

data have;

infile cards dsd;

input Client Var1 Var2;

cards;

1,245,2699

2,0,0

3,3694,0

4,1548,0

5,0,3554

;

run;

data want;

set have;

count = sum((var1 ne 0)+(var2 ne 0));

run;

Super User
Super User
Posts: 7,942

Re: Count the number of non-zero occurrences for a variable

Hi,

Yes, you can use arrays.  This would be preferred if there were many variables. 

data have;

  client=1; var1=245; var2=2696; output;

  client=2; var1=0; var2=0; output;

run;

data want (drop=i);

  set have;

  array list{*} var1 var2;  

  count_miss=0;

  do i=1 to dim(list);

    if list{i}=0 then count_miss=sum(count_miss,1);

  end;

run;

If they all have the same prefix then its even easier:

data want (drop=i);

  set have;

  array var{*};  

  count_miss=0;

  do i=1 to dim(var);

    if var{i}=0 then count_miss=sum(count_miss,1);

  end;

run;

Valued Guide
Posts: 860

Re: Count the number of non-zero occurrences for a variable

RW9, I'm a beginner at arrays.  I ran what you have and get the following error.  Can you assist?

22     GOPTIONS ACCESSIBLE;
23     data want2 (drop=i);
24       set have;
25       array var{*};

ERROR: The array var has been defined with zero elements.

26       count_miss=0;
27       do i=1 to dim(var);
28         if var{i}=0 then count_miss=sum(count_miss,1);

ERROR: Too many array subscripts specified for array var.

29       end;
30     run;
Occasional Contributor
Posts: 6

Re: Count the number of non-zero occurrences for a variable

Hi RW9,

I ran the code, but I'm not getting what I had hoped for.  The code ran the number of misses (zero values), by row.  For example, client 1 would have 0 misses and client 2 would have 2 misses.  However, I want the total number of times each variable (column) has a value.  In my real data set, I have over 15 million clients and 1000 variables. I want separate totals for each variable, so variable 1 would have 8 million occurrences and variable 2 would have 9 million (as an example).  Can you help me modify the code?

Thanks,

Valued Guide
Posts: 860

Re: Count the number of non-zero occurrences for a variable

I thought you were checking "a handful" of variables.  Do you need to check 1000 now?  If so are they similarly named, as in VAR1 VAR2...VAR1000?

Occasional Contributor
Posts: 6

Re: Count the number of non-zero occurrences for a variable

Posted in reply to Steelers_In_DC

I am only checking a handful of them.  They are not similarly named.  I think RW9's response did the trick. 

Solution
‎07-31-2015 10:01 AM
Super User
Super User
Posts: 7,942

Re: Count the number of non-zero occurrences for a variable

Ah, ok, so you want to check if the sum per column is 0.  Then your best off checking one of the procedures associated with that.  Proc means, or proc summary.  Say:

proc means data=have;

     var var1-var1000;

     output out=results mean=mean;

run;

Then if the resulting mean for the variable is 0 you know they are all zeroes. 

Super User
Posts: 11,343

Re: Count the number of non-zero occurrences for a variable

A different approach:

Use a custom format to identify ranges of values of interest. This one treats every non-missing value except 0 as nonzero and shows 0 as such.

proc format;

value nonzero

low - <0, 0<-high = 'Non-zero'

0 = 'Zero';

run;


_numeric_ is a special variable list identifier that says to do what ever for all numeric variables. So can be a short hand for large lists.

Using proc freq and the format will give you nonzero and zero counts and percentages for each variable plus the missing by default.

proc freq data=have;

tables _numeric_;

format _numeric_ nonzero. ;

run;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 2198 views
  • 1 like
  • 5 in conversation