Solved: Re: Sum by an ID variable according to a logical test

icrandell · Posted 03-27-2020 10:36 AM

Hello everyone,

I have a dataset in the following form:

data have;
	infile datalines delimiter=",";
	input ID $ x1 x2 x3;
	datalines;
id1, 1, 2, 2
id1, 1, 2, 1
id2, 2, 2, 2
id3, 1, 2, 1
;

I would like to count the number of times 1 appears in a row for each of the ID variables. In this case the result would look like:

id1, 3
id2, 0
id3, 2

My actual data set has a few hundred IDs and about 20 variables of interest (unfortunately with no patterned naming scheme like I have here). Is there a slick way to accomplish this? I'd prefer not to make a long list of variables with something like:

if x1 = 1 then x1_ind = 1 else x1_ind = 0;

but if that's what needs to be done then so be it.

Thanks!

PaigeMiller · Posted 03-27-2020 10:50 AM

This may be a good example where a long data set is preferable to a wide data set, if you have that under your control. Or even use 0 and 1 rather than 2 and 1 would make the programming easier. Nevertheless, to search across rows you need an ARRAY.

data have;
	infile datalines delimiter=",";
	input ID $ x1 x2 x3;
	array x x1-x3;
	sum=0;
	do i=1 to dim(x);
	    if x(i)=1 then sum=sum+1;
	end;
	drop i;
	datalines;
id1, 1, 2, 2
id1, 1, 2, 1
id2, 2, 2, 2
id3, 1, 2, 1
;

proc summary data=have nway;
	class id;
	var sum;
	output out=want sum=;
run;

--
Paige Miller

View solution in original post

PaigeMiller · Posted 03-27-2020 10:50 AM

This may be a good example where a long data set is preferable to a wide data set, if you have that under your control. Or even use 0 and 1 rather than 2 and 1 would make the programming easier. Nevertheless, to search across rows you need an ARRAY.

data have;
	infile datalines delimiter=",";
	input ID $ x1 x2 x3;
	array x x1-x3;
	sum=0;
	do i=1 to dim(x);
	    if x(i)=1 then sum=sum+1;
	end;
	drop i;
	datalines;
id1, 1, 2, 2
id1, 1, 2, 1
id2, 2, 2, 2
id3, 1, 2, 1
;

proc summary data=have nway;
	class id;
	var sum;
	output out=want sum=;
run;

--
Paige Miller

icrandell · Posted 03-27-2020 11:08 AM

This worked perfectly, thank you!

The only change I had to make was that I couldn't use the variable range syntax since my columns don't have patterned names, but since I already had them in a keep statement it wasn't a big issue:

array x var1 var2 ... varN;

ballardw · Posted 03-27-2020 10:55 AM

The generic way to do the same operation with a group of variables on an observation is an array.

So one way to get the count per row and then sum:

data have;
	infile datalines delimiter=",";
	input ID $ x1 x2 x3;
	datalines;
id1, 1, 2, 2
id1, 1, 2, 1
id2, 2, 2, 2
id3, 1, 2, 1
;

data want;
   set have ;
	array a x1 x2 x3;
	counter = 0;
	do i=1 to dim(a);
		counter = sum(counter,a[i]=1);
	end;
	drop i;
run;

proc means data=want sum;
	class id;
	var counter;
run;

There are several different ways to provide a list of variables but we would need to know more about your exact data set structure, as in names of variables and order in the data set to provide any "slick" list.

The a[i]=1 uses the SAS behavior of a true comparison having a numeric value of 1 and 0 for false.

FreelanceReinh · Posted 03-27-2020 11:21 AM

Hello @icrandell,

Another option: Use the COUNT function:

data want;
do until(last.id);
  set have;
  by id;
  n1=sum(n1,count('#'||catx('##',of x1--x3)||'#','#1#'));
end;
run;

If the "x values" are only one-digit integers, the definition of n1 could be simplified to

n1=sum(n1,count(cats(of x1--x3),'1'));

The "double dash" variable list does not rely on common name prefixes, but on the variables' positions in the PDV. Alternatively, you could define and use an array.

Sum by an ID variable according to a logical test

Re: Sum by an ID variable according to a logical test

Re: Sum by an ID variable according to a logical test

Re: Sum by an ID variable according to a logical test

Re: Sum by an ID variable according to a logical test

Re: Sum by an ID variable according to a logical test

Registration is open

SAS Training: Just a Click Away