Solved: Re: Count of specific value across row

mt88

Hello,

I'd have a dataset where I'd like to create a new variable that gives me a count of a specific value for a set of variables across a row of data. For example, in this test dataset I want to create a new variable that that gives me a count of '0' values across variables a, b, c, and d.

data test;
input a b c d;
cards;
1 0 9 1
1 1 0 0
0 9 1 1
0 0 9 1
1 0 9 9
0 1 1 0
1 9 9 1
1 9 0 0
0 0 9 1
9 1 0 0;
run;

The new variable (newvar) would return the following:

newvar

1

2

1

2

1

2

0

2

Any help is appreciated. I can provide more information if that is helpful.

Thanks, Matt

ballardw

Your data implies all of the values are one digit, is that in fact the case? I ask because there is a moderately slick solution if it is that may not work if you have some values that are 2 or more digits or non-integer.

The CATS function creates a string with no spaces between converted numbers into a single long string and uses the Count function to determine how many 0 characters appear.

data test;
input a b c d;
Numzero = count(cats(a,b,c,d),'0');
cards;
1 0 9 1
1 1 0 0
0 9 1 1
0 0 9 1
1 0 9 9
0 1 1 0
1 9 9 1
1 9 0 0
0 0 9 1
9 1 0 0
;
run;

Note: Datalines end on the line with a semicolon or ;;;; if using datalines4; As such you do not want the semicolon on a line with data as it gets ignored.

And here is an example where the result is incorrect for two-digit value

data test;
input a b c d;
Numzero = count(cats(a,b,c,d),'11');
cards;
1 0 9 1
1 1 0 0
0 9 11 1
0 0 9 1
;
run;

The incorrect result for line 1 is because the CATS result for the line looks like "1100" and so there are two 1's next to each other and the Count function treats them as such.

If your data is more complex (and possibly contains many more variables or search for more values):

data test;
   input a b c d;
   array v(*) a b c d;
   do i=1 to dim(v);
      numzero = sum(numzero,(v[i]=0));
      numone  = sum(numzero,(v[i]=1));
      numnine = sum(numzero,(v[i]=9));
   end;
cards;
1 0 9 1
1 1 0 0
0 9 1 1
0 0 9 1
1 0 9 9
0 1 1 0
1 9 9 1
1 9 0 0
0 0 9 1
9 1 0 0
;
run;

View solution in original post

ballardw

Your data implies all of the values are one digit, is that in fact the case? I ask because there is a moderately slick solution if it is that may not work if you have some values that are 2 or more digits or non-integer.

The CATS function creates a string with no spaces between converted numbers into a single long string and uses the Count function to determine how many 0 characters appear.

data test;
input a b c d;
Numzero = count(cats(a,b,c,d),'0');
cards;
1 0 9 1
1 1 0 0
0 9 1 1
0 0 9 1
1 0 9 9
0 1 1 0
1 9 9 1
1 9 0 0
0 0 9 1
9 1 0 0
;
run;

Note: Datalines end on the line with a semicolon or ;;;; if using datalines4; As such you do not want the semicolon on a line with data as it gets ignored.

And here is an example where the result is incorrect for two-digit value

data test;
input a b c d;
Numzero = count(cats(a,b,c,d),'11');
cards;
1 0 9 1
1 1 0 0
0 9 11 1
0 0 9 1
;
run;

The incorrect result for line 1 is because the CATS result for the line looks like "1100" and so there are two 1's next to each other and the Count function treats them as such.

If your data is more complex (and possibly contains many more variables or search for more values):

data test;
   input a b c d;
   array v(*) a b c d;
   do i=1 to dim(v);
      numzero = sum(numzero,(v[i]=0));
      numone  = sum(numzero,(v[i]=1));
      numnine = sum(numzero,(v[i]=9));
   end;
cards;
1 0 9 1
1 1 0 0
0 9 1 1
0 0 9 1
1 0 9 9
0 1 1 0
1 9 9 1
1 9 0 0
0 0 9 1
9 1 0 0
;
run;

mkeintz

Let's generalize a bit. Say you have integer values that range from -3 0 to 15, but you only want counts of 0's, 4's, 8's, and 14's. Then you could

data test;
input a b c d;
cards;
1 0 9 1
1 1 0 0
0 9 1 1
0 0 9 1
1 0 9 9
0 1 1 0
1 9 9 1
1 9 0 0
0 0 9 1
9 1 0 0
5 6 14 15
8 12 4  0
run;

data want (drop=_:);
  set test;
  array values a -- d;
  array counts {0:15}  count_0  _dum1 - _dum3
                       count_4  _dum5 - _dum7 
                       count_8  _dum9 - _dum13
                       count_14 _dum15 ;

  do _v=1 to dim(values);
    counts{values{_v}}=sum(counts{values{_v}},1);
  end;

  /*convert missing counts to zeroes */
  array cnts count_: ;    /* Was mistakenly array cnts  n_: ; */
  do over cnts;
    cnts=coalesce(cnts,0);
  end;

run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Tom

A simple way to count is use the fact that SAS will evaluate boolean expressions to 1 (TRUE) or 0 (FALSE) so that you can just add up the results to find the number of TRUE values.

data test;
  input a b c d expect;
cards;
1 0 9 1  1
1 1 0 0  2
0 9 1 1  1
0 0 9 1  2
1 0 9 9  1
0 1 1 0  2
1 9 9 1  0
1 9 0 0  2
0 0 9 1  2
9 1 0 0  2
;

data want;
  set test;
  array x a b c d ;
  want=0;
  do over x;
    want+(x=0);
  end;
run;

Result

Obs    a    b    c    d    expect    want

  1    1    0    9    1       1        1
  2    1    1    0    0       2        2
  3    0    9    1    1       1        1
  4    0    0    9    1       2        2
  5    1    0    9    9       1        1
  6    0    1    1    0       2        2
  7    1    9    9    1       0        0
  8    1    9    0    0       2        2
  9    0    0    9    1       2        2
 10    9    1    0    0       2        2

FreelanceReinh

Hello @mt88,

@ballardw wrote:

And here is an example where the result is incorrect for two-digit value
data test;
input a b c d;
Numzero = count(cats(a,b,c,d),'11');
cards;
1 0 9 1
1 1 0 0
0 9 11 1
0 0 9 1
;
run;
The incorrect result for line 1 is because the CATS result for the line looks like "1100" and so there are two 1's next to each other and the Count function treats them as such.

This problem can be avoided by introducing a delimiter:

data want;
set test;
newvar=count(catx('||',.,of a--d,.),'|0|');
run;

(where 0 could be replaced by 11 or any other number whose character representation is to be counted -- or even by a non-numeric character string).

Count of specific value across row

Re: Count of specific value across row

Re: Count of specific value across row

Re: Count of specific value across row

Re: Count of specific value across row

Re: Count of specific value across row

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!