BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
niam
Quartz | Level 8

Hi, 

I am using PROC NLMIX and to calculate likelihoods, I need to take the sum of values in a column, for each unique ID. here is an example of my data:

 

data have;
input id1 $  id2  $  x;
datalines;
A     z1   1    
A     z2   2
A     z3   3
B     y1   1
B     y2   3
C     w1   3
C     w2   2
C     w3   1
;
end;

I want to create the new column y that is the cumulative sum of values in X starting within each group identified by variable id1, in which the first value in the cumulative sum is the value of x in the same row. For example, in the first row, since the group is id1, and we have 3 observations with id1=A then we should have y= 1 + 2 + 3. This is the cumulative sum of x values in which the id1=A, staring from x1=1 which is the value of x in the first row. 

Now, in the second row, since Y=2+3 since I want the cumulative sum to start from the value of x in the same row, and finally, in the third row, Y=3. 

The final data-set should look like this:

ID1 ID2  X   Y
A     z1   1   6
A     z2   2   5
A     z3   3   3
B     y1   1   4
B     y2   3   3
C     w1   3  6
C     w2   2  3
C     w3   1  1

I need to achive this without proc sql because I am implementing it inside proc nlmixed. Maybe arrays is the way to go?

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

I'm not that familiar with NLMIXED, but I would suggest you simply create variable Y in a DATA step, and then use this newly create data set in NLMIXED.

 

proc summary nway data=have;
    class id1;
	var x;
	output out=sums(drop=_:) sum=sum_x;
Run;
data want;
    merge have sums;
	by id1;
	prev_x=lag(x);
	if first.id1 then y=sum_x;
	else y+ (-prev_x);
	drop sum_x prev_x;
run;

By the way, SQL is a particularly poor choice for cumulative sums.

--
Paige Miller

View solution in original post

4 REPLIES 4
PaigeMiller
Diamond | Level 26

I'm not that familiar with NLMIXED, but I would suggest you simply create variable Y in a DATA step, and then use this newly create data set in NLMIXED.

 

proc summary nway data=have;
    class id1;
	var x;
	output out=sums(drop=_:) sum=sum_x;
Run;
data want;
    merge have sums;
	by id1;
	prev_x=lag(x);
	if first.id1 then y=sum_x;
	else y+ (-prev_x);
	drop sum_x prev_x;
run;

By the way, SQL is a particularly poor choice for cumulative sums.

--
Paige Miller
mkeintz
PROC Star

I am unaware of this being possible entirely within NLMIXED.  But you can fairly easily precede your PROC NLMIXED with a data step producing the desired CUMSUM variable.

 

data have;
input id1 $  id2  $  x;
datalines;
A     z1   1    
A     z2   2
A     z3   3
B     y1   1
B     y2   3
C     w1   3
C     w2   2
C     w3   1
;

data need;
  set have (in=firstpass)  have (in=secondpass);
  by id1;
  if first.id1 then cumsum=0;
  if firstpass=1 then cumsum+x;
  if secondpass;
  output;
  cumsum+ (-x);
run;

proc nlmixed data=need .... 

The "trick" here is the SET HAVE …; BY ID1: pair of statements.  It tells SAS to read all records from each ID1 once (the first pass), and then read those same records for the same ID1 once more (the second pass).   Use the first pass to build the CUMSUM for all of ID1, and then use the second pass to progressively decrement the CUMSUM.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
PGStats
Opal | Level 21

A double DO Until() loop will do the job:

 

data have;
input id1 $  id2  $  x;
datalines;
A     z1   1    
A     z2   2
A     z3   3
B     y1   1
B     y2   3
C     w1   3
C     w2   2
C     w3   1
;

data want;
do until(last.id1);
	set have; by id1;
	y = sum(y, x);
	end;
do until(last.id1);
	set have; by id1;
	output;
	y = y - x;
	end;
run;
PG
Sajid01
Meteorite | Level 14

data have;
input id1 $ id2 $ x;
datalines;
A z1 1
A z2 2
A z3 3
B y1 1
B y2 3
C w1 3
C w2 2
C w3 1
;
run;
proc sort data=have;
by id1;
run;
data have2;
retain row;
set have;
row=_n_;
run;
proc sort data=have2;
by id1 descending row;
run;
data have3;
Retain id1 id2 x y;
set have2;
by id1;
if first.id1 then do;
y=x;
end;
else do;
y=x+y;
end;
run;
proc sort data=have3 out=want(drop=row);
by id1 row;
run;

The output is like this :

id1id2xy
Az116
Az225
Az333
By114
By233
Cw136
Cw223
Cw311

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1865 views
  • 5 likes
  • 5 in conversation