BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
toomanystepsint
Fluorite | Level 6
I have a dataset.
 
It has 3 variables:
-year
-date
-number
 
I want the cumulative value of number resetting every year.
 
The typical sas advice does not work for me.
 
For example, according to this answer my code should be:
 
data want;
set have;
by year;
if first.date then runsum=number;
else runsum+number;
run;
 
But this code gives me all null values for runsum. Why? That makes no sense. There's nothing I'm doing that the other answers aren't doing, this just isn't working.
 
I've tried many other things, including adding a retain, and doing by year and date. Retain does nothing, all my values are just null. Doing by year and date just sets runsum to the value of number, it never cumulates. I've tried turning my date variable into datetime, into character, nothing works. Why?
1 ACCEPTED SOLUTION

Accepted Solutions
Season
Lapis Lazuli | Level 10

Try out this code (modified according to @Tom's reply):

data want;
set have;
by year;
if first.year then cumsum=0;
cumsum+number;
run;

As a SAS user, I think computing cumulative sums in SAS is far less intuitive than doing so in Microsoft Excel. You will hardly be able to know what the line "cumsum+number;" means if nobody tells you that it is for computing cumulative sums.

View solution in original post

14 REPLIES 14
Reeza
Super User
data want;
set have;
by year;
if first.year then runsum=number;
else runsum+number;
run;

The FIRST. reference is incorrect, you need to use variable listed in the By statement.

 


@toomanystepsint wrote:
I have a dataset.
 
It has 3 variables:
-year
-date
-number
 
I want the cumulative value of number resetting every year.
 
The typical sas advice does not work for me.
 
For example, according to this answer my code should be:
 
data want;
set have;
by year;
if first.date then runsum=number;
else runsum+number;
run;
 
But this code gives me all null values for runsum. Why? That makes no sense. There's nothing I'm doing that the other answers aren't doing, this just isn't working.
 
I've tried many other things, including adding a retain, and doing by year and date. Retain does nothing, all my values are just null. Doing by year and date just sets runsum to the value of number, it never cumulates. I've tried turning my date variable into datetime, into character, nothing works. Why?

 

toomanystepsint
Fluorite | Level 6
I AM using a variable listed in the by statement. You can see I'm saying "by year" and if first.year.

Unbelievable.
mkeintz
PROC Star

I'm having a difficult time reconciling your response to @Reeza 

 


@toomanystepsint wrote:
I AM using a variable listed in the by statement. You can see I'm saying "by year" and if first.year.

Unbelievable.

with your original post:

 

data want;
set have;
by year;
if first.date then runsum=number;
else runsum+number;
run;
which has a "first." test referring to DATE, which is a variable NOT in the BY statement.  SAS would only know to check for first.date if DATE were a variable in the BY statement.
 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Reeza
Super User

Re-read your code it says first.date, not first.year as posted. 

 

I posted the corrected code.

 


@toomanystepsint wrote:
I AM using a variable listed in the by statement. You can see I'm saying "by year" and if first.year.

Unbelievable.

 

Tom
Super User Tom
Super User

@toomanystepsint wrote:
I AM using a variable listed in the by statement. You can see I'm saying "by year" and if first.year.

Unbelievable.

Please share the lines from the SAS log for your data step.

The code you posted in your question was using FIRST.DATE, but only listed YEAR in the BY statement.  That will cause SAS to invent a first.date variable, which will never be true.

 

If the goal is to re-start the accumulator when starting a new year then use FIRST.YEAR in the IF statement.  You can keep DATE in the BY statement if you want have the data step make sure the observations are actually sorted by DATE within YEAR and stop if it is not.

 

Let's make some sample data:

data have;
  input year date :date. count;
  format date date9. ;
cards;
1 01JAN2000 10
1 02JAN2000 20
1 03JAN2000 30
2 01JAN2001 10
2 01FEB2001 15
;

And try making a running sum of the COUNT variable.

Remember to use a NEW variable for the running sum.  If you use a variable that already exists on the input dataset then the value read by the SET statement will replace the value calculated on the previous observation.

data want;
  set have ;
  by year date;
  if first.year then running_sum=0;
  running_sum + count;
run;

Result

                                     running_
Obs    year         date    count       sum

 1       1     01JAN2000      10        10
 2       1     02JAN2000      20        30
 3       1     03JAN2000      30        60
 4       2     01JAN2001      10        10
 5       2     01FEB2001      15        25

If the goal is something else then you need to explain it more clearly.  Sharing example input data and the desired result from that input will help clarify what you want.

Season
Lapis Lazuli | Level 10
Your code is correct. But I have a question: why is the variable date in the BY statement?
Reeza
Super User

It is not needed, but if you have a year AND date variable it ensures that the data is sorted correctly within the year to get the correct running sum. Otherwise, there could be a sum that's reset by year, but out of order by day.

This isn't stated in the question but best practice based on experience.

 


@Season wrote:
Your code is generally correct except for an issue that could be improved: the variable date is not needed in the BY statement.

 

Season
Lapis Lazuli | Level 10

All right. I have got it. Thank you for your patient reply!

Tom
Super User Tom
Super User

@Season wrote:
Your code is generally correct except for an issue that could be improved: the variable date is not needed in the BY statement.

Please re-read my full response and you will see why it is there.

Season
Lapis Lazuli | Level 10
Thank you! I took a second look at your reply and have now understood why it was there.
Season
Lapis Lazuli | Level 10

Try out this code (modified according to @Tom's reply):

data want;
set have;
by year;
if first.year then cumsum=0;
cumsum+number;
run;

As a SAS user, I think computing cumulative sums in SAS is far less intuitive than doing so in Microsoft Excel. You will hardly be able to know what the line "cumsum+number;" means if nobody tells you that it is for computing cumulative sums.

ballardw
Super User

If you read the LOG the reason is there. (Highlight added). An Uninitialized variable is always missing and so will never be "true". Which may well lead to @Reeza's correction to use the proper FIRST.<variablename>.

 

284  data want;
285  set have;
286  by year;
287  if first.date then runsum=number;
288  else runsum+number;
289  run;

NOTE: Variable 'first.date'n is uninitialized.
NOTE: There were 2 observations read from the data set USER.HAVE.
NOTE: The data set USER.WANT has 2 observations and 4 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds
mkeintz
PROC Star

@toomanystepsint 

 

You asked for advice to (emphasis mine)

I want the cumulative value of number resetting every year.

Yet you accepted a response (reproduced below) from @Season that does not do a yearly reset:

 

data want;
set have;
by year;
cumsum+number;
output;
run;

Please clarify.

 

Otherwise, naive users may believe the accepted solution correctly addresses the problem as originally stated.  Help us make sas communities a reliable source of advice.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Season
Lapis Lazuli | Level 10

Thank you for following my reply. Referencing @Tom's code, I have modified mine. It is at @toomanystepsint's discretion to choose the reply of whom as the solution.

A reminder for @toomanystepsint is that you can retract the acceptance of one's reply as the solution and accepting another reply as solution afterwards, regardless of whether the second reply is posted by the original solution provider.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 14 replies
  • 1817 views
  • 6 likes
  • 6 in conversation