Hello,
I'd have a dataset where I'd like to create a new variable that gives me a count of a specific value for a set of variables across a row of data. For example, in this test dataset I want to create a new variable that that gives me a count of '0' values across variables a, b, c, and d.
data test;
input a b c d;
cards;
1 0 9 1
1 1 0 0
0 9 1 1
0 0 9 1
1 0 9 9
0 1 1 0
1 9 9 1
1 9 0 0
0 0 9 1
9 1 0 0;
run;
The new variable (newvar) would return the following:
newvar
1
2
1
2
1
2
0
2
2
2
Any help is appreciated. I can provide more information if that is helpful.
Thanks, Matt
Your data implies all of the values are one digit, is that in fact the case? I ask because there is a moderately slick solution if it is that may not work if you have some values that are 2 or more digits or non-integer.
The CATS function creates a string with no spaces between converted numbers into a single long string and uses the Count function to determine how many 0 characters appear.
data test; input a b c d; Numzero = count(cats(a,b,c,d),'0'); cards; 1 0 9 1 1 1 0 0 0 9 1 1 0 0 9 1 1 0 9 9 0 1 1 0 1 9 9 1 1 9 0 0 0 0 9 1 9 1 0 0 ; run;
Note: Datalines end on the line with a semicolon or ;;;; if using datalines4; As such you do not want the semicolon on a line with data as it gets ignored.
And here is an example where the result is incorrect for two-digit value
data test; input a b c d; Numzero = count(cats(a,b,c,d),'11'); cards; 1 0 9 1 1 1 0 0 0 9 11 1 0 0 9 1 ; run;
The incorrect result for line 1 is because the CATS result for the line looks like "1100" and so there are two 1's next to each other and the Count function treats them as such.
If your data is more complex (and possibly contains many more variables or search for more values):
data test; input a b c d; array v(*) a b c d; do i=1 to dim(v); numzero = sum(numzero,(v[i]=0)); numone = sum(numzero,(v[i]=1)); numnine = sum(numzero,(v[i]=9)); end; cards; 1 0 9 1 1 1 0 0 0 9 1 1 0 0 9 1 1 0 9 9 0 1 1 0 1 9 9 1 1 9 0 0 0 0 9 1 9 1 0 0 ; run;
Your data implies all of the values are one digit, is that in fact the case? I ask because there is a moderately slick solution if it is that may not work if you have some values that are 2 or more digits or non-integer.
The CATS function creates a string with no spaces between converted numbers into a single long string and uses the Count function to determine how many 0 characters appear.
data test; input a b c d; Numzero = count(cats(a,b,c,d),'0'); cards; 1 0 9 1 1 1 0 0 0 9 1 1 0 0 9 1 1 0 9 9 0 1 1 0 1 9 9 1 1 9 0 0 0 0 9 1 9 1 0 0 ; run;
Note: Datalines end on the line with a semicolon or ;;;; if using datalines4; As such you do not want the semicolon on a line with data as it gets ignored.
And here is an example where the result is incorrect for two-digit value
data test; input a b c d; Numzero = count(cats(a,b,c,d),'11'); cards; 1 0 9 1 1 1 0 0 0 9 11 1 0 0 9 1 ; run;
The incorrect result for line 1 is because the CATS result for the line looks like "1100" and so there are two 1's next to each other and the Count function treats them as such.
If your data is more complex (and possibly contains many more variables or search for more values):
data test; input a b c d; array v(*) a b c d; do i=1 to dim(v); numzero = sum(numzero,(v[i]=0)); numone = sum(numzero,(v[i]=1)); numnine = sum(numzero,(v[i]=9)); end; cards; 1 0 9 1 1 1 0 0 0 9 1 1 0 0 9 1 1 0 9 9 0 1 1 0 1 9 9 1 1 9 0 0 0 0 9 1 9 1 0 0 ; run;
Let's generalize a bit. Say you have integer values that range from -3 0 to 15, but you only want counts of 0's, 4's, 8's, and 14's. Then you could
data test;
input a b c d;
cards;
1 0 9 1
1 1 0 0
0 9 1 1
0 0 9 1
1 0 9 9
0 1 1 0
1 9 9 1
1 9 0 0
0 0 9 1
9 1 0 0
5 6 14 15
8 12 4 0
run;
data want (drop=_:);
set test;
array values a -- d;
array counts {0:15} count_0 _dum1 - _dum3
count_4 _dum5 - _dum7
count_8 _dum9 - _dum13
count_14 _dum15 ;
do _v=1 to dim(values);
counts{values{_v}}=sum(counts{values{_v}},1);
end;
/*convert missing counts to zeroes */
array cnts count_: ; /* Was mistakenly array cnts n_: ; */
do over cnts;
cnts=coalesce(cnts,0);
end;
run;
A simple way to count is use the fact that SAS will evaluate boolean expressions to 1 (TRUE) or 0 (FALSE) so that you can just add up the results to find the number of TRUE values.
data test;
input a b c d expect;
cards;
1 0 9 1 1
1 1 0 0 2
0 9 1 1 1
0 0 9 1 2
1 0 9 9 1
0 1 1 0 2
1 9 9 1 0
1 9 0 0 2
0 0 9 1 2
9 1 0 0 2
;
data want;
set test;
array x a b c d ;
want=0;
do over x;
want+(x=0);
end;
run;
Result
Obs a b c d expect want 1 1 0 9 1 1 1 2 1 1 0 0 2 2 3 0 9 1 1 1 1 4 0 0 9 1 2 2 5 1 0 9 9 1 1 6 0 1 1 0 2 2 7 1 9 9 1 0 0 8 1 9 0 0 2 2 9 0 0 9 1 2 2 10 9 1 0 0 2 2
Hello @mt88,
@ballardw wrote:
And here is an example where the result is incorrect for two-digit value
data test; input a b c d; Numzero = count(cats(a,b,c,d),'11'); cards; 1 0 9 1 1 1 0 0 0 9 11 1 0 0 9 1 ; run;The incorrect result for line 1 is because the CATS result for the line looks like "1100" and so there are two 1's next to each other and the Count function treats them as such.
This problem can be avoided by introducing a delimiter:
data want;
set test;
newvar=count(catx('||',.,of a--d,.),'|0|');
run;
(where 0 could be replaced by 11 or any other number whose character representation is to be counted -- or even by a non-numeric character string).
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.