BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
mt88
Calcite | Level 5

Hello,

 

I'd have a dataset where I'd like to create a new variable that gives me a count of a specific value for a set of variables across a row of data. For example, in this test dataset I want to create a new variable that that gives me a count of '0' values across variables a, b, c, and d.

 

data test;
input a b c d;
cards;
1 0 9 1
1 1 0 0
0 9 1 1
0 0 9 1
1 0 9 9
0 1 1 0
1 9 9 1
1 9 0 0
0 0 9 1
9 1 0 0;
run;

 

The new variable (newvar) would return the following:

newvar

1

2

1

2

1

2

0

2

2

2

 

Any help is appreciated. I can provide more information if that is helpful.

 

Thanks, Matt

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Your data implies all of the values are one digit, is that in fact the case? I ask because there is a moderately slick solution if it is that may not work if you have some values that are 2 or more digits or non-integer.

The CATS function creates a string with no spaces between converted numbers into a single long string and uses the Count function to determine how many 0 characters appear.

data test;
input a b c d;
Numzero = count(cats(a,b,c,d),'0');
cards;
1 0 9 1
1 1 0 0
0 9 1 1
0 0 9 1
1 0 9 9
0 1 1 0
1 9 9 1
1 9 0 0
0 0 9 1
9 1 0 0
;
run;

Note: Datalines end on the line with a semicolon or ;;;; if using datalines4; As such you do not want the semicolon on a line with data as it gets ignored.

 

And here is an example where the result is incorrect for two-digit value

data test;
input a b c d;
Numzero = count(cats(a,b,c,d),'11');
cards;
1 0 9 1
1 1 0 0
0 9 11 1
0 0 9 1
;
run;

The incorrect result for line 1 is because the CATS result for the line looks like "1100" and so there are two 1's next to each other and the Count function treats them as such.

 

If your data is more complex (and possibly contains many more variables or search for more values):

data test;
   input a b c d;
   array v(*) a b c d;
   do i=1 to dim(v);
      numzero = sum(numzero,(v[i]=0));
      numone  = sum(numzero,(v[i]=1));
      numnine = sum(numzero,(v[i]=9));
   end;
cards;
1 0 9 1
1 1 0 0
0 9 1 1
0 0 9 1
1 0 9 9
0 1 1 0
1 9 9 1
1 9 0 0
0 0 9 1
9 1 0 0
;
run;

 

View solution in original post

4 REPLIES 4
ballardw
Super User

Your data implies all of the values are one digit, is that in fact the case? I ask because there is a moderately slick solution if it is that may not work if you have some values that are 2 or more digits or non-integer.

The CATS function creates a string with no spaces between converted numbers into a single long string and uses the Count function to determine how many 0 characters appear.

data test;
input a b c d;
Numzero = count(cats(a,b,c,d),'0');
cards;
1 0 9 1
1 1 0 0
0 9 1 1
0 0 9 1
1 0 9 9
0 1 1 0
1 9 9 1
1 9 0 0
0 0 9 1
9 1 0 0
;
run;

Note: Datalines end on the line with a semicolon or ;;;; if using datalines4; As such you do not want the semicolon on a line with data as it gets ignored.

 

And here is an example where the result is incorrect for two-digit value

data test;
input a b c d;
Numzero = count(cats(a,b,c,d),'11');
cards;
1 0 9 1
1 1 0 0
0 9 11 1
0 0 9 1
;
run;

The incorrect result for line 1 is because the CATS result for the line looks like "1100" and so there are two 1's next to each other and the Count function treats them as such.

 

If your data is more complex (and possibly contains many more variables or search for more values):

data test;
   input a b c d;
   array v(*) a b c d;
   do i=1 to dim(v);
      numzero = sum(numzero,(v[i]=0));
      numone  = sum(numzero,(v[i]=1));
      numnine = sum(numzero,(v[i]=9));
   end;
cards;
1 0 9 1
1 1 0 0
0 9 1 1
0 0 9 1
1 0 9 9
0 1 1 0
1 9 9 1
1 9 0 0
0 0 9 1
9 1 0 0
;
run;

 

mkeintz
PROC Star

Let's generalize a bit.  Say you have integer values that range from -3  0 to 15, but you only want counts of 0's, 4's, 8's, and 14's.  Then you could

 

 

data test;
input a b c d;
cards;
1 0 9 1
1 1 0 0
0 9 1 1
0 0 9 1
1 0 9 9
0 1 1 0
1 9 9 1
1 9 0 0
0 0 9 1
9 1 0 0
5 6 14 15
8 12 4  0
run;

data want (drop=_:);
  set test;
  array values a -- d;
  array counts {0:15}  count_0  _dum1 - _dum3
                       count_4  _dum5 - _dum7 
                       count_8  _dum9 - _dum13
                       count_14 _dum15 ;

  do _v=1 to dim(values);
    counts{values{_v}}=sum(counts{values{_v}},1);
  end;

  /*convert missing counts to zeroes */
  array cnts count_: ;    /* Was mistakenly array cnts  n_: ; */
  do over cnts;
    cnts=coalesce(cnts,0);
  end;

run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tom
Super User Tom
Super User

A simple way to count is use the fact that SAS will evaluate boolean expressions to 1 (TRUE) or 0 (FALSE) so that you can just add up the results to find the number of TRUE values.

data test;
  input a b c d expect;
cards;
1 0 9 1  1
1 1 0 0  2
0 9 1 1  1
0 0 9 1  2
1 0 9 9  1
0 1 1 0  2
1 9 9 1  0
1 9 0 0  2
0 0 9 1  2
9 1 0 0  2
;

data want;
  set test;
  array x a b c d ;
  want=0;
  do over x;
    want+(x=0);
  end;
run;

Result

Obs    a    b    c    d    expect    want

  1    1    0    9    1       1        1
  2    1    1    0    0       2        2
  3    0    9    1    1       1        1
  4    0    0    9    1       2        2
  5    1    0    9    9       1        1
  6    0    1    1    0       2        2
  7    1    9    9    1       0        0
  8    1    9    0    0       2        2
  9    0    0    9    1       2        2
 10    9    1    0    0       2        2
FreelanceReinh
Jade | Level 19

Hello @mt88,

 


@ballardw wrote:

And here is an example where the result is incorrect for two-digit value

data test;
input a b c d;
Numzero = count(cats(a,b,c,d),'11');
cards;
1 0 9 1
1 1 0 0
0 9 11 1
0 0 9 1
;
run;

The incorrect result for line 1 is because the CATS result for the line looks like "1100" and so there are two 1's next to each other and the Count function treats them as such.


This problem can be avoided by introducing a delimiter:

data want;
set test;
newvar=count(catx('||',.,of a--d,.),'|0|');
run;

(where 0 could be replaced by 11 or any other number whose character representation is to be counted -- or even by a non-numeric character string).

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 800 views
  • 7 likes
  • 5 in conversation