Hello everyone,
I have a dataset in the following form:
data have;
infile datalines delimiter=",";
input ID $ x1 x2 x3;
datalines;
id1, 1, 2, 2
id1, 1, 2, 1
id2, 2, 2, 2
id3, 1, 2, 1
;
I would like to count the number of times 1 appears in a row for each of the ID variables. In this case the result would look like:
id1, 3
id2, 0
id3, 2
My actual data set has a few hundred IDs and about 20 variables of interest (unfortunately with no patterned naming scheme like I have here). Is there a slick way to accomplish this? I'd prefer not to make a long list of variables with something like:
if x1 = 1 then x1_ind = 1 else x1_ind = 0;
but if that's what needs to be done then so be it.
Thanks!
This may be a good example where a long data set is preferable to a wide data set, if you have that under your control. Or even use 0 and 1 rather than 2 and 1 would make the programming easier. Nevertheless, to search across rows you need an ARRAY.
data have;
infile datalines delimiter=",";
input ID $ x1 x2 x3;
array x x1-x3;
sum=0;
do i=1 to dim(x);
if x(i)=1 then sum=sum+1;
end;
drop i;
datalines;
id1, 1, 2, 2
id1, 1, 2, 1
id2, 2, 2, 2
id3, 1, 2, 1
;
proc summary data=have nway;
class id;
var sum;
output out=want sum=;
run;
This may be a good example where a long data set is preferable to a wide data set, if you have that under your control. Or even use 0 and 1 rather than 2 and 1 would make the programming easier. Nevertheless, to search across rows you need an ARRAY.
data have;
infile datalines delimiter=",";
input ID $ x1 x2 x3;
array x x1-x3;
sum=0;
do i=1 to dim(x);
if x(i)=1 then sum=sum+1;
end;
drop i;
datalines;
id1, 1, 2, 2
id1, 1, 2, 1
id2, 2, 2, 2
id3, 1, 2, 1
;
proc summary data=have nway;
class id;
var sum;
output out=want sum=;
run;
This worked perfectly, thank you!
The only change I had to make was that I couldn't use the variable range syntax since my columns don't have patterned names, but since I already had them in a keep statement it wasn't a big issue:
array x var1 var2 ... varN;
The generic way to do the same operation with a group of variables on an observation is an array.
So one way to get the count per row and then sum:
data have; infile datalines delimiter=","; input ID $ x1 x2 x3; datalines; id1, 1, 2, 2 id1, 1, 2, 1 id2, 2, 2, 2 id3, 1, 2, 1 ; data want; set have ; array a x1 x2 x3; counter = 0; do i=1 to dim(a); counter = sum(counter,a[i]=1); end; drop i; run; proc means data=want sum; class id; var counter; run;
There are several different ways to provide a list of variables but we would need to know more about your exact data set structure, as in names of variables and order in the data set to provide any "slick" list.
The a[i]=1 uses the SAS behavior of a true comparison having a numeric value of 1 and 0 for false.
Hello @icrandell,
Another option: Use the COUNT function:
data want;
do until(last.id);
set have;
by id;
n1=sum(n1,count('#'||catx('##',of x1--x3)||'#','#1#'));
end;
run;
If the "x values" are only one-digit integers, the definition of n1 could be simplified to
n1=sum(n1,count(cats(of x1--x3),'1'));
The "double dash" variable list does not rely on common name prefixes, but on the variables' positions in the PDV. Alternatively, you could define and use an array.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.