Solved: Cumulative sum without PROC SQL

niam · Posted 01-28-2020 03:36 PM

Hi,

I am using PROC NLMIX and to calculate likelihoods, I need to take the sum of values in a column, for each unique ID. here is an example of my data:

data have;
input id1 $  id2  $  x;
datalines;
A     z1   1    
A     z2   2
A     z3   3
B     y1   1
B     y2   3
C     w1   3
C     w2   2
C     w3   1
;
end;

I want to create the new column y that is the cumulative sum of values in X starting within each group identified by variable id1, in which the first value in the cumulative sum is the value of x in the same row. For example, in the first row, since the group is id1, and we have 3 observations with id1=A then we should have y= 1 + 2 + 3. This is the cumulative sum of x values in which the id1=A, staring from x1=1 which is the value of x in the first row.

Now, in the second row, since Y=2+3 since I want the cumulative sum to start from the value of x in the same row, and finally, in the third row, Y=3.

The final data-set should look like this:

ID1 ID2  X   Y
A     z1   1   6
A     z2   2   5
A     z3   3   3
B     y1   1   4
B     y2   3   3
C     w1   3  6
C     w2   2  3
C     w3   1  1

I need to achive this without proc sql because I am implementing it inside proc nlmixed. Maybe arrays is the way to go?

PaigeMiller · Posted 01-28-2020 03:45 PM

I'm not that familiar with NLMIXED, but I would suggest you simply create variable Y in a DATA step, and then use this newly create data set in NLMIXED.

proc summary nway data=have;
    class id1;
	var x;
	output out=sums(drop=_:) sum=sum_x;
Run;
data want;
    merge have sums;
	by id1;
	prev_x=lag(x);
	if first.id1 then y=sum_x;
	else y+ (-prev_x);
	drop sum_x prev_x;
run;

By the way, SQL is a particularly poor choice for cumulative sums.

--
Paige Miller

View solution in original post

PaigeMiller · Posted 01-28-2020 03:45 PM

I'm not that familiar with NLMIXED, but I would suggest you simply create variable Y in a DATA step, and then use this newly create data set in NLMIXED.

proc summary nway data=have;
    class id1;
	var x;
	output out=sums(drop=_:) sum=sum_x;
Run;
data want;
    merge have sums;
	by id1;
	prev_x=lag(x);
	if first.id1 then y=sum_x;
	else y+ (-prev_x);
	drop sum_x prev_x;
run;

By the way, SQL is a particularly poor choice for cumulative sums.

--
Paige Miller

mkeintz · Posted 01-28-2020 03:52 PM

I am unaware of this being possible entirely within NLMIXED. But you can fairly easily precede your PROC NLMIXED with a data step producing the desired CUMSUM variable.

data have;
input id1 $  id2  $  x;
datalines;
A     z1   1    
A     z2   2
A     z3   3
B     y1   1
B     y2   3
C     w1   3
C     w2   2
C     w3   1
;

data need;
  set have (in=firstpass)  have (in=secondpass);
  by id1;
  if first.id1 then cumsum=0;
  if firstpass=1 then cumsum+x;
  if secondpass;
  output;
  cumsum+ (-x);
run;

proc nlmixed data=need ....

The "trick" here is the SET HAVE …; BY ID1: pair of statements. It tells SAS to read all records from each ID1 once (the first pass), and then read those same records for the same ID1 once more (the second pass). Use the first pass to build the CUMSUM for all of ID1, and then use the second pass to progressively decrement the CUMSUM.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

PGStats · Posted 01-28-2020 03:56 PM

A double DO Until() loop will do the job:

data have;
input id1 $  id2  $  x;
datalines;
A     z1   1    
A     z2   2
A     z3   3
B     y1   1
B     y2   3
C     w1   3
C     w2   2
C     w3   1
;

data want;
do until(last.id1);
	set have; by id1;
	y = sum(y, x);
	end;
do until(last.id1);
	set have; by id1;
	output;
	y = y - x;
	end;
run;

PG

Sajid01 · Posted 01-28-2020 04:16 PM

data have;
input id1 $ id2 $ x;
datalines;
A z1 1
A z2 2
A z3 3
B y1 1
B y2 3
C w1 3
C w2 2
C w3 1
;
run;
proc sort data=have;
by id1;
run;
data have2;
retain row;
set have;
row=_n_;
run;
proc sort data=have2;
by id1 descending row;
run;
data have3;
Retain id1 id2 x y;
set have2;
by id1;
if first.id1 then do;
y=x;
end;
else do;
y=x+y;
end;
run;
proc sort data=have3 out=want(drop=row);
by id1 row;
run;

The output is like this :

id1	id2	x	y
A	z1	1	6
A	z2	2	5
A	z3	3	3
B	y1	1	4
B	y2	3	3
C	w1	3	6
C	w2	2	3
C	w3	1	1

Cumulative sum without PROC SQL

Re: Cumulative sum without PROC SQL

Re: Cumulative sum without PROC SQL

Re: Cumulative sum without PROC SQL

Re: Cumulative sum without PROC SQL

Re: Cumulative sum without PROC SQL

Cumulative sum without PROC SQL

Re: Cumulative sum without PROC SQL

Re: Cumulative sum without PROC SQL

Re: Cumulative sum without PROC SQL

Re: Cumulative sum without PROC SQL

Re: Cumulative sum without PROC SQL

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away