Hi,
I am using PROC NLMIX and to calculate likelihoods, I need to take the sum of values in a column, for each unique ID. here is an example of my data:
data have;
input id1 $ id2 $ x;
datalines;
A z1 1
A z2 2
A z3 3
B y1 1
B y2 3
C w1 3
C w2 2
C w3 1
;
end;
I want to create the new column y that is the cumulative sum of values in X starting within each group identified by variable id1, in which the first value in the cumulative sum is the value of x in the same row. For example, in the first row, since the group is id1, and we have 3 observations with id1=A then we should have y= 1 + 2 + 3. This is the cumulative sum of x values in which the id1=A, staring from x1=1 which is the value of x in the first row.
Now, in the second row, since Y=2+3 since I want the cumulative sum to start from the value of x in the same row, and finally, in the third row, Y=3.
The final data-set should look like this:
ID1 ID2 X Y
A z1 1 6
A z2 2 5
A z3 3 3
B y1 1 4
B y2 3 3
C w1 3 6
C w2 2 3
C w3 1 1
I need to achive this without proc sql because I am implementing it inside proc nlmixed. Maybe arrays is the way to go?
I'm not that familiar with NLMIXED, but I would suggest you simply create variable Y in a DATA step, and then use this newly create data set in NLMIXED.
proc summary nway data=have;
class id1;
var x;
output out=sums(drop=_:) sum=sum_x;
Run;
data want;
merge have sums;
by id1;
prev_x=lag(x);
if first.id1 then y=sum_x;
else y+ (-prev_x);
drop sum_x prev_x;
run;
By the way, SQL is a particularly poor choice for cumulative sums.
I'm not that familiar with NLMIXED, but I would suggest you simply create variable Y in a DATA step, and then use this newly create data set in NLMIXED.
proc summary nway data=have;
class id1;
var x;
output out=sums(drop=_:) sum=sum_x;
Run;
data want;
merge have sums;
by id1;
prev_x=lag(x);
if first.id1 then y=sum_x;
else y+ (-prev_x);
drop sum_x prev_x;
run;
By the way, SQL is a particularly poor choice for cumulative sums.
I am unaware of this being possible entirely within NLMIXED. But you can fairly easily precede your PROC NLMIXED with a data step producing the desired CUMSUM variable.
data have;
input id1 $ id2 $ x;
datalines;
A z1 1
A z2 2
A z3 3
B y1 1
B y2 3
C w1 3
C w2 2
C w3 1
;
data need;
set have (in=firstpass) have (in=secondpass);
by id1;
if first.id1 then cumsum=0;
if firstpass=1 then cumsum+x;
if secondpass;
output;
cumsum+ (-x);
run;
proc nlmixed data=need ....
The "trick" here is the SET HAVE …; BY ID1: pair of statements. It tells SAS to read all records from each ID1 once (the first pass), and then read those same records for the same ID1 once more (the second pass). Use the first pass to build the CUMSUM for all of ID1, and then use the second pass to progressively decrement the CUMSUM.
A double DO Until() loop will do the job:
data have;
input id1 $ id2 $ x;
datalines;
A z1 1
A z2 2
A z3 3
B y1 1
B y2 3
C w1 3
C w2 2
C w3 1
;
data want;
do until(last.id1);
set have; by id1;
y = sum(y, x);
end;
do until(last.id1);
set have; by id1;
output;
y = y - x;
end;
run;
data have;
input id1 $ id2 $ x;
datalines;
A z1 1
A z2 2
A z3 3
B y1 1
B y2 3
C w1 3
C w2 2
C w3 1
;
run;
proc sort data=have;
by id1;
run;
data have2;
retain row;
set have;
row=_n_;
run;
proc sort data=have2;
by id1 descending row;
run;
data have3;
Retain id1 id2 x y;
set have2;
by id1;
if first.id1 then do;
y=x;
end;
else do;
y=x+y;
end;
run;
proc sort data=have3 out=want(drop=row);
by id1 row;
run;
The output is like this :
id1 | id2 | x | y |
A | z1 | 1 | 6 |
A | z2 | 2 | 5 |
A | z3 | 3 | 3 |
B | y1 | 1 | 4 |
B | y2 | 3 | 3 |
C | w1 | 3 | 6 |
C | w2 | 2 | 3 |
C | w3 | 1 | 1 |
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.