Hi all,
When I was coding for data step, I faced a problem that I don't know how tp cumulate a variable over ID.
Here's a part of my data:
Obs ID Difference 1 CGMHLK0001 0.00000 2 CGMHLK0001 0.98333 3 CGMHLK0001 0.51667 4 CGMHLK0001 0.63333 5 CGMHLK0001 0.50000 6 CGMHLK0001 0.36667 7 CGMHLK0001 0.55000 8 CGMHLK0001 0.58333 9 CGMHLK0001 0.50000 10 CGMHLK0001 0.35000 11 CGMHLK0001 0.50000 12 CGMHLK0001 0.63333 13 CGMHLK0001 0.38333 14 CGMHLK0002 0.00000 15 CGMHLK0002 1.00000 16 CGMHLK0002 1.00000 17 CGMHLK0002 1.00000 18 CGMHLK0002 1.00000 19 CGMHLK0002 1.00000 20 CGMHLK0002 1.00000 21 CGMHLK0002 1.15000 22 CGMHLK0002 0.83333 23 CGMHLK0002 1.00000
ID is each patient's ID number.
What I want to do is to create a new variable to cumulate the difference for each patient.
For example:
Obs ID Difference NewVariable 1 CGMHLK0001 0.00000 0 2 CGMHLK0001 0.98333 0.98333 3 CGMHLK0001 0.51667 1.5 4 CGMHLK0001 0.63333 2.13333 5 CGMHLK0001 0.50000 2.63333 6 CGMHLK0001 0.36667 3 7 CGMHLK0001 0.55000 3.55 8 CGMHLK0001 0.58333 4.13333 9 CGMHLK0001 0.50000 4.63333 10 CGMHLK0001 0.35000 4.98333 11 CGMHLK0001 0.50000 5.48333 12 CGMHLK0001 0.63333 6.11663 13 CGMHLK0001 0.38333 6.49996 14 CGMHLK0002 0.00000 0 15 CGMHLK0002 1.00000 1 16 CGMHLK0002 1.00000 2 17 CGMHLK0002 1.00000 3 18 CGMHLK0002 1.00000 4 19 CGMHLK0002 1.00000 5 20 CGMHLK0002 1.00000 6 21 CGMHLK0002 1.15000 7.15 22 CGMHLK0002 0.83333 7.98333 23 CGMHLK0002 1.00000 8.98333
While searching "do loop" for this, hope you guys can give me a hand.
Thanks in advance!
data have;
input ID$12. Difference;
cards;
CGMHLK0001 0.00000
CGMHLK0001 0.98333
CGMHLK0001 0.51667
CGMHLK0001 0.63333
CGMHLK0001 0.50000
CGMHLK0001 0.36667
CGMHLK0001 0.55000
CGMHLK0001 0.58333
CGMHLK0001 0.50000
CGMHLK0001 0.35000
CGMHLK0001 0.50000
CGMHLK0001 0.63333
CGMHLK0001 0.38333
CGMHLK0002 0.00000
CGMHLK0002 1.00000
CGMHLK0002 1.00000
CGMHLK0002 1.00000
CGMHLK0002 1.00000
CGMHLK0002 1.00000
CGMHLK0002 1.00000
CGMHLK0002 1.15000
CGMHLK0002 0.83333
CGMHLK0002 1.00000
;
data want;
set have;
by id difference notsorted;
retain newvar;
if first.id then newvar=difference;
else newvar+difference;
run;
data have;
input ID$12. Difference;
cards;
CGMHLK0001 0.00000
CGMHLK0001 0.98333
CGMHLK0001 0.51667
CGMHLK0001 0.63333
CGMHLK0001 0.50000
CGMHLK0001 0.36667
CGMHLK0001 0.55000
CGMHLK0001 0.58333
CGMHLK0001 0.50000
CGMHLK0001 0.35000
CGMHLK0001 0.50000
CGMHLK0001 0.63333
CGMHLK0001 0.38333
CGMHLK0002 0.00000
CGMHLK0002 1.00000
CGMHLK0002 1.00000
CGMHLK0002 1.00000
CGMHLK0002 1.00000
CGMHLK0002 1.00000
CGMHLK0002 1.00000
CGMHLK0002 1.15000
CGMHLK0002 0.83333
CGMHLK0002 1.00000
;
data want;
set have;
by id difference notsorted;
retain newvar;
if first.id then newvar=difference;
else newvar+difference;
run;
@Chung-Li Retain is the correct method. In fact, your solution uses a SUM statement which includes an implied RETAIN rather than an explicit RETAIN.
Typically in a datastep at the 'top' of each loop, all values are set to missing. RETAIN allows you to hold the value across the step boundary. Usually people run into issues with RETAIN either when using a do loop without a data step boundary or when using a variable that already exists. RETAIN works differently with variables that already exist...I usually just avoid that.
Thank you so much!
With your explaination plus example from Jag, I finally understand the difference between these settings.
Hey guys,
I've found the solution!
Here's the SAS code:
DATA want;
SET have;
BY ID;
IF FIRST.ID THEN NewTimeVariable=0;
NewTimeVariable+Difference;
RUN;
Reference: https://communities.sas.com/t5/SAS-Data-Management/Calculating-cumulative-sum/td-p/145799
I leave solution here.
You can use this method if you have same problem
@Chung-Li: In answer to one of your questions
NewTimeVariable+Difference;
is a form of sum that automatically retains the variable NewTimeVariable
As such, adding the retain statement when using that form, would be redundant. Would still produce the same result, but not necessary.
Art, CEO, AnalystFinder.com
Thank you for your explanation!
Now, I know when should I use "retain", and when shouldn't I use it.
Most importantly, knowing "+" would automatically trigger retain statement is very helpful.
Thanks again!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.