BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Chung-Li
Quartz | Level 8

Hi all, 

 

When I was coding for data step, I faced a problem that I don't know how tp cumulate a variable over ID.

Here's a part of my data:

 

                                 Obs       ID                   Difference

                                  1    CGMHLK0001                0.00000
                                  2    CGMHLK0001                0.98333
                                  3    CGMHLK0001                0.51667
                                  4    CGMHLK0001                0.63333
                                  5    CGMHLK0001                0.50000
                                  6    CGMHLK0001                0.36667
                                  7    CGMHLK0001                0.55000
                                  8    CGMHLK0001                0.58333
                                  9    CGMHLK0001                0.50000
                                 10    CGMHLK0001                0.35000
                                 11    CGMHLK0001                0.50000
                                 12    CGMHLK0001                0.63333
                                 13    CGMHLK0001                0.38333
                                 14    CGMHLK0002                0.00000
                                 15    CGMHLK0002                1.00000
                                 16    CGMHLK0002                1.00000
                                 17    CGMHLK0002                1.00000
                                 18    CGMHLK0002                1.00000
                                 19    CGMHLK0002                1.00000
                                 20    CGMHLK0002                1.00000
                                 21    CGMHLK0002                1.15000
                                 22    CGMHLK0002                0.83333
                                 23    CGMHLK0002                1.00000

ID is each patient's ID number.

 

What I want to do is to create a new variable to cumulate the difference for each patient.

For example:

                                 Obs      ID                    Difference   NewVariable

                                  1    CGMHLK0001                0.00000        0
                                  2    CGMHLK0001                0.98333        0.98333
                                  3    CGMHLK0001                0.51667        1.5
                                  4    CGMHLK0001                0.63333        2.13333
                                  5    CGMHLK0001                0.50000        2.63333
                                  6    CGMHLK0001                0.36667        3
                                  7    CGMHLK0001                0.55000        3.55
                                  8    CGMHLK0001                0.58333        4.13333
                                  9    CGMHLK0001                0.50000        4.63333
                                 10    CGMHLK0001                0.35000        4.98333
                                 11    CGMHLK0001                0.50000        5.48333
                                 12    CGMHLK0001                0.63333        6.11663
                                 13    CGMHLK0001                0.38333        6.49996
                                 14    CGMHLK0002                0.00000        0
                                 15    CGMHLK0002                1.00000        1
                                 16    CGMHLK0002                1.00000        2
                                 17    CGMHLK0002                1.00000        3
                                 18    CGMHLK0002                1.00000        4
                                 19    CGMHLK0002                1.00000        5
                                 20    CGMHLK0002                1.00000        6
                                 21    CGMHLK0002                1.15000        7.15
                                 22    CGMHLK0002                0.83333        7.98333
                                 23    CGMHLK0002                1.00000        8.98333

 

While searching "do loop" for this, hope you guys can give me a hand.

Thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
Jagadishkatam
Amethyst | Level 16
data have;
input ID$12.  Difference;
cards;
CGMHLK0001  0.00000
CGMHLK0001  0.98333
CGMHLK0001  0.51667
CGMHLK0001  0.63333
CGMHLK0001  0.50000
CGMHLK0001  0.36667
CGMHLK0001  0.55000
CGMHLK0001  0.58333
CGMHLK0001  0.50000
CGMHLK0001  0.35000
CGMHLK0001  0.50000
CGMHLK0001  0.63333
CGMHLK0001  0.38333
CGMHLK0002  0.00000
CGMHLK0002  1.00000
CGMHLK0002  1.00000
CGMHLK0002  1.00000
CGMHLK0002  1.00000
CGMHLK0002  1.00000
CGMHLK0002  1.00000
CGMHLK0002  1.15000
CGMHLK0002  0.83333
CGMHLK0002  1.00000
;

data want;
set have;
by id difference notsorted;
retain newvar;
if first.id then newvar=difference;
else newvar+difference;
run;
Thanks,
Jag

View solution in original post

9 REPLIES 9
Jagadishkatam
Amethyst | Level 16
data have;
input ID$12.  Difference;
cards;
CGMHLK0001  0.00000
CGMHLK0001  0.98333
CGMHLK0001  0.51667
CGMHLK0001  0.63333
CGMHLK0001  0.50000
CGMHLK0001  0.36667
CGMHLK0001  0.55000
CGMHLK0001  0.58333
CGMHLK0001  0.50000
CGMHLK0001  0.35000
CGMHLK0001  0.50000
CGMHLK0001  0.63333
CGMHLK0001  0.38333
CGMHLK0002  0.00000
CGMHLK0002  1.00000
CGMHLK0002  1.00000
CGMHLK0002  1.00000
CGMHLK0002  1.00000
CGMHLK0002  1.00000
CGMHLK0002  1.00000
CGMHLK0002  1.15000
CGMHLK0002  0.83333
CGMHLK0002  1.00000
;

data want;
set have;
by id difference notsorted;
retain newvar;
if first.id then newvar=difference;
else newvar+difference;
run;
Thanks,
Jag
Chung-Li
Quartz | Level 8
Jag,

Thank you for replying me so fast!!!
Basically, you and I use the same concept.
But, can I ask you a tiny question that what's the purpose of "retain"?
This confuses me all the time.
When I check SAS document for retain, it seems like this statement allows variable to be exsited throughout the whole "Do loop" process.
Yet, when I do not use this statement, it still work.

Could you please explain the difference to me, thanks!
Reeza
Super User

@Chung-Li Retain is the correct method. In fact, your solution uses a SUM statement which includes an implied RETAIN rather than an explicit RETAIN. 

 

Typically in a datastep at the 'top' of each loop, all values are set to missing.  RETAIN allows you to hold the value across the step boundary. Usually people run into issues with RETAIN either when using a do loop without a data step boundary or when using a variable that already exists. RETAIN works differently with variables that already exist...I usually just avoid that. 

Chung-Li
Quartz | Level 8

@Reeza,

 

Thank you so much!

With your explaination plus example from Jag, I finally understand the difference between these settings.

 

Jagadishkatam
Amethyst | Level 16
Yes the concept of retain is right. The reason your code worked without the retain is that the code NewTimeVariable+Difference; here represents a sum statement in which the retain is there. if you are using the code without the sum statement then you will find the use of retain like the code as below which requires the retain statement else you will not get correct result.

if first.id then newvar=difference;
else newvar=newvar+difference;
Thanks,
Jag
Chung-Li
Quartz | Level 8
You know what Jag, actually my first attempt for this task was using A=A+B without retain.
No wonder I can't get the right one no matter how hard I try...
But now, with your explanation, I totally understand the difference.
Thanks again, always learn so much from you guys!
Chung-Li
Quartz | Level 8

Hey guys,

 

I've found the solution!

Here's the SAS code:

DATA want;
		SET have;
		BY ID;
	IF FIRST.ID THEN NewTimeVariable=0;
	NewTimeVariable+Difference;
RUN;

 

Reference: https://communities.sas.com/t5/SAS-Data-Management/Calculating-cumulative-sum/td-p/145799

I leave solution here.

You can use this method if you have same problem 

art297
Opal | Level 21

@Chung-Li: In answer to one of your questions

	NewTimeVariable+Difference;

is a form of sum that automatically retains the variable NewTimeVariable

 

As such, adding the retain statement when using that form, would be redundant. Would still produce the same result, but not necessary.

 

Art, CEO, AnalystFinder.com

Chung-Li
Quartz | Level 8

@art297,

 

Thank you for your explanation!

Now, I know when should I use "retain", and when shouldn't I use it.

Most importantly, knowing "+" would automatically trigger retain statement is very helpful.

Thanks again!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1205 views
  • 5 likes
  • 4 in conversation