Hello everyone,
I have a dataset in the following form:
data have;
infile datalines delimiter=",";
input ID $ x1 x2 x3;
datalines;
id1, 1, 2, 2
id1, 1, 2, 1
id2, 2, 2, 2
id3, 1, 2, 1
;
I would like to count the number of times 1 appears in a row for each of the ID variables. In this case the result would look like:
id1, 3
id2, 0
id3, 2
My actual data set has a few hundred IDs and about 20 variables of interest (unfortunately with no patterned naming scheme like I have here). Is there a slick way to accomplish this? I'd prefer not to make a long list of variables with something like:
if x1 = 1 then x1_ind = 1 else x1_ind = 0;
but if that's what needs to be done then so be it.
Thanks!
This may be a good example where a long data set is preferable to a wide data set, if you have that under your control. Or even use 0 and 1 rather than 2 and 1 would make the programming easier. Nevertheless, to search across rows you need an ARRAY.
data have;
infile datalines delimiter=",";
input ID $ x1 x2 x3;
array x x1-x3;
sum=0;
do i=1 to dim(x);
if x(i)=1 then sum=sum+1;
end;
drop i;
datalines;
id1, 1, 2, 2
id1, 1, 2, 1
id2, 2, 2, 2
id3, 1, 2, 1
;
proc summary data=have nway;
class id;
var sum;
output out=want sum=;
run;
This may be a good example where a long data set is preferable to a wide data set, if you have that under your control. Or even use 0 and 1 rather than 2 and 1 would make the programming easier. Nevertheless, to search across rows you need an ARRAY.
data have;
infile datalines delimiter=",";
input ID $ x1 x2 x3;
array x x1-x3;
sum=0;
do i=1 to dim(x);
if x(i)=1 then sum=sum+1;
end;
drop i;
datalines;
id1, 1, 2, 2
id1, 1, 2, 1
id2, 2, 2, 2
id3, 1, 2, 1
;
proc summary data=have nway;
class id;
var sum;
output out=want sum=;
run;
This worked perfectly, thank you!
The only change I had to make was that I couldn't use the variable range syntax since my columns don't have patterned names, but since I already had them in a keep statement it wasn't a big issue:
array x var1 var2 ... varN;
The generic way to do the same operation with a group of variables on an observation is an array.
So one way to get the count per row and then sum:
data have; infile datalines delimiter=","; input ID $ x1 x2 x3; datalines; id1, 1, 2, 2 id1, 1, 2, 1 id2, 2, 2, 2 id3, 1, 2, 1 ; data want; set have ; array a x1 x2 x3; counter = 0; do i=1 to dim(a); counter = sum(counter,a[i]=1); end; drop i; run; proc means data=want sum; class id; var counter; run;
There are several different ways to provide a list of variables but we would need to know more about your exact data set structure, as in names of variables and order in the data set to provide any "slick" list.
The a[i]=1 uses the SAS behavior of a true comparison having a numeric value of 1 and 0 for false.
Hello @icrandell,
Another option: Use the COUNT function:
data want;
do until(last.id);
set have;
by id;
n1=sum(n1,count('#'||catx('##',of x1--x3)||'#','#1#'));
end;
run;
If the "x values" are only one-digit integers, the definition of n1 could be simplified to
n1=sum(n1,count(cats(of x1--x3),'1'));
The "double dash" variable list does not rely on common name prefixes, but on the variables' positions in the PDV. Alternatively, you could define and use an array.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.