- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try out this code (modified according to @Tom's reply):
data want;
set have;
by year;
if first.year then cumsum=0;
cumsum+number;
run;
As a SAS user, I think computing cumulative sums in SAS is far less intuitive than doing so in Microsoft Excel. You will hardly be able to know what the line "cumsum+number;" means if nobody tells you that it is for computing cumulative sums.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data want;
set have;
by year;
if first.year then runsum=number;
else runsum+number;
run;
The FIRST. reference is incorrect, you need to use variable listed in the By statement.
@toomanystepsint wrote:
I have a dataset.It has 3 variables:-year-date-numberI want the cumulative value of number resetting every year.The typical sas advice does not work for me.For example, according to this answer my code should be:data want;set have;by year;if first.date then runsum=number;else runsum+number;run;But this code gives me all null values for runsum. Why? That makes no sense. There's nothing I'm doing that the other answers aren't doing, this just isn't working.I've tried many other things, including adding a retain, and doing by year and date. Retain does nothing, all my values are just null. Doing by year and date just sets runsum to the value of number, it never cumulates. I've tried turning my date variable into datetime, into character, nothing works. Why?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Unbelievable.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm having a difficult time reconciling your response to @Reeza
@toomanystepsint wrote:
I AM using a variable listed in the by statement. You can see I'm saying "by year" and if first.year.
Unbelievable.
with your original post:
data want;set have;by year;if first.date then runsum=number;else runsum+number;run;
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Re-read your code it says first.date, not first.year as posted.
I posted the corrected code.
@toomanystepsint wrote:
I AM using a variable listed in the by statement. You can see I'm saying "by year" and if first.year.
Unbelievable.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@toomanystepsint wrote:
I AM using a variable listed in the by statement. You can see I'm saying "by year" and if first.year.
Unbelievable.
Please share the lines from the SAS log for your data step.
The code you posted in your question was using FIRST.DATE, but only listed YEAR in the BY statement. That will cause SAS to invent a first.date variable, which will never be true.
If the goal is to re-start the accumulator when starting a new year then use FIRST.YEAR in the IF statement. You can keep DATE in the BY statement if you want have the data step make sure the observations are actually sorted by DATE within YEAR and stop if it is not.
Let's make some sample data:
data have; input year date :date. count; format date date9. ; cards; 1 01JAN2000 10 1 02JAN2000 20 1 03JAN2000 30 2 01JAN2001 10 2 01FEB2001 15 ;
And try making a running sum of the COUNT variable.
Remember to use a NEW variable for the running sum. If you use a variable that already exists on the input dataset then the value read by the SET statement will replace the value calculated on the previous observation.
data want;
set have ;
by year date;
if first.year then running_sum=0;
running_sum + count;
run;
Result
running_ Obs year date count sum 1 1 01JAN2000 10 10 2 1 02JAN2000 20 30 3 1 03JAN2000 30 60 4 2 01JAN2001 10 10 5 2 01FEB2001 15 25
If the goal is something else then you need to explain it more clearly. Sharing example input data and the desired result from that input will help clarify what you want.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It is not needed, but if you have a year AND date variable it ensures that the data is sorted correctly within the year to get the correct running sum. Otherwise, there could be a sum that's reset by year, but out of order by day.
This isn't stated in the question but best practice based on experience.
@Season wrote:
Your code is generally correct except for an issue that could be improved: the variable date is not needed in the BY statement.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
All right. I have got it. Thank you for your patient reply!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Season wrote:
Your code is generally correct except for an issue that could be improved: the variable date is not needed in the BY statement.
Please re-read my full response and you will see why it is there.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try out this code (modified according to @Tom's reply):
data want;
set have;
by year;
if first.year then cumsum=0;
cumsum+number;
run;
As a SAS user, I think computing cumulative sums in SAS is far less intuitive than doing so in Microsoft Excel. You will hardly be able to know what the line "cumsum+number;" means if nobody tells you that it is for computing cumulative sums.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you read the LOG the reason is there. (Highlight added). An Uninitialized variable is always missing and so will never be "true". Which may well lead to @Reeza's correction to use the proper FIRST.<variablename>.
284 data want; 285 set have; 286 by year; 287 if first.date then runsum=number; 288 else runsum+number; 289 run; NOTE: Variable 'first.date'n is uninitialized. NOTE: There were 2 observations read from the data set USER.HAVE. NOTE: The data set USER.WANT has 2 observations and 4 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You asked for advice to (emphasis mine)
I want the cumulative value of number resetting every year.
Yet you accepted a response (reproduced below) from @Season that does not do a yearly reset:
data want;
set have;
by year;
cumsum+number;
output;
run;
Please clarify.
Otherwise, naive users may believe the accepted solution correctly addresses the problem as originally stated. Help us make sas communities a reliable source of advice.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for following my reply. Referencing @Tom's code, I have modified mine. It is at @toomanystepsint's discretion to choose the reply of whom as the solution.
A reminder for @toomanystepsint is that you can retract the acceptance of one's reply as the solution and accepting another reply as solution afterwards, regardless of whether the second reply is posted by the original solution provider.