DATA Step, Macro, Functions and more

Suming values between records

Accepted Solution Solved
Reply
Regular Contributor
Posts: 216
Accepted Solution

Suming values between records

Hi

I have a simple data step where I am trying to create a cumulative number column (CumulativeNumber) based on the value of another column (DistributiveNumber). This column resets to 0 each time a new section of County Year AgeRange ExitMonthCategory is reached. However as the attached example shows it is re-setting, but not summing correctly.

If you see anything wrong with the below code, please let me know.

Paul

proc sort data=s9CountyAges;

by County Year AgeRange ExitMonthCategory;

run;

data test1;

set s9CountyAges;

by County Year AgeRange ExitMonthCategory;

if first.AgeRange then CumulativeNumber=0;

CumulativeNumber + DistributiveNumber;

run;


Accepted Solutions
Solution
‎03-07-2013 03:05 PM
Super User
Super User
Posts: 7,042

Re: Suming values between records

Two questions.

1) Does the variable you trying to accumulate into already exist on the input dataset? If so this will cause trouble because each time it reads a record it will override the value you are trying to carry forward from the previous observation.

2) Does the variable AGERANGE really have the values listed in the output or does it in fact have multiple distinct values that are grouped for display with a user defined format?  This could cause the FIRST. to get trigger more often that it would appear from the displayed values.

View solution in original post


All Replies
Occasional Contributor
Posts: 17

Re: Suming values between records

Hi Paul,

Try with Sum function because :smileyplus: operator will not give correct summing of the data if empty values are present.

CumulativeNumber=sum(CumulativeNumber,DistributiveNumber);

Regards

Sylas.J

Super User
Posts: 5,503

Re: Suming values between records

Paul,

It's close to working.  This switch should do it:

if first.ExitMonthCategory then CumulativeNumber=0;

Good luck.

Regular Contributor
Posts: 195

Re: Suming values between records

Hi,

/*Identifying Duplicates by specified variables*/

proc sort nodupkey data = have dupout = dup;

      by country year agerange exitmonthCategory;

run;

/*Combine it with original data*/

data want(drop = distributivenumber1);

      merge have

                dup(rename = (distributivenumber = distributivenumber1));

      by country year agerange exitmonthCategory;

      if distributivenumber1 EQ . then do;

        cum_num = 0;

      end;

      else do;

            cum_num = sum(distributivenumber,distributivenumber1);

      end;

run;

Thanks,

Urvish

Regular Contributor
Posts: 216

Re: Suming values between records

I tried everyone's ideas above and it is still not summing at all. I have used this type of data step many times before and it has worked fine. I tried the below simplified variation of it and it is not working either.

Is there anything unusual that the the sum function needs to operate?

Paul

proc sort data=s9CountyAges;

by County Year AgeRange ExitMonthCategory;

run;

data test1;

set s9CountyAges;

by County Year AgeRange ExitMonthCategory;

CumulativeNumber=.;

if first.AgeRange then CumulativeNumber=0;

CumulativeNumber + DistributiveNumber;

run;

Super User
Posts: 5,503

Re: Suming values between records

This statement does not belong in the code.  For every observation, you are setting:

CumulativeNumber=.;

Also, I still don't see my earlier suggestion incorporated.

Good luck.


Regular Contributor
Posts: 216

Re: Suming values between records

Posted in reply to Astounding

I tried it with 'ExitMonthCategory' and it did not work--it does not sum anything actually. When I have used this in the past, I always set the 'first' variable to the second to last one in the 'by' statement. This usually works.

Paul

data short;

set short;

by County Year AgeRange ExitMonthCategory;

if first.ExitMonthCategory then CumulativeNumber=0;

CumulativeNumber + DistributiveNumber;

run;

Super User
Posts: 5,503

Re: Suming values between records

You could be right about which variable is the "first" one.  It all depends on the level at which you would like to accumulate.  The original post said you wanted to set your CumulativeNumber to 0 each time you hit a new County / Year / AgeRange / ExitMonthCategory.  If you use first.AgeRange, you will set your CumulativeNumber to 0 each time  you hit a new County / Year / AgeRange.  It's not that one is right and the other is wrong, it just depends on what you would like your cumulative number to represent.

There's nothing wrong with the last set of code you posted.  Instead of posting a spreadsheet, run a PROC PRINT and examine the first 5 or 10 lines.  If it doesn't look right, post those lines along with the output from a PROC CONTENTS.  There aren't too many places for this sort of problem to hide.

Regular Contributor
Posts: 216

Re: Suming values between records

Posted in reply to Astounding

Hi Astounding

I re-ran it, it didn't sum anything. Below is the first 20 records which show that the CumulativeNumber column is just being set equal to the DistributiveNumber column and it is not zeroing out either. I don't know of a way to attach the proc contents though. I can't post it on a website right now due to temporary inaccess.

If anything jumps out, please let me know.

Paul

proc sort data=short;

by County Year AgeRange ExitMonthCategory;

run;

data short;

set short;

by County Year AgeRange ExitMonthCategory;

if first.AgeRange then CumulativeNumber=0;

CumulativeNumber + DistributiveNumber;

run;

Proc Print:

Obs County Year AgeRange ExitMonthCategory DistributiveNumber CumulativeNumber

1 Albany 2006 1 - 0 to 3 3 0 0

2 Albany 2006 1 - 0 to 3 4 0 0

3 Albany 2006 1 - 0 to 3 5 0 0

4 Albany 2006 1 - 0 to 3 6 0 0

5 Albany 2006 1 - 0 to 3 12 6 6

6 Albany 2006 1 - 0 to 3 18 7 7

7 Albany 2006 1 - 0 to 3 24 5 5

8 Albany 2006 1 - 0 to 3 30 0 0

9 Albany 2006 1 - 0 to 3 36 1 1

10 Albany 2006 1 - 0 to 3 42 0 0

11 Albany 2006 1 - 0 to 3 48 0 0

12 Albany 2006 1 - 0 to 3 54 0 0

13 Albany 2006 1 - 0 to 3 60 0 0

14 Albany 2006 1 - 0 to 3 66 0 0

15 Albany 2006 1 - 0 to 3 72 0 0

16 Albany 2006 2 - 4 to 6 3 0 0

17 Albany 2006 2 - 4 to 6 4 0 0

18 Albany 2006 2 - 4 to 6 5 0 0

19 Albany 2006 2 - 4 to 6 6 0 0

20 Albany 2006 2 - 4 to 6 12 2 2

Super Contributor
Posts: 578

Re: Suming values between records

Shouldn't this

data short;

set short;

be

data want;

set short;

Super User
Posts: 5,503

Re: Suming values between records

OK, there still looks like there is nothing wrong with the program.  So let's examine the data.  Try adding these statements, and let's see what these new variables tell us:

F_County = first.county;

F_Year = first.year;

F_AgeRange = first.agerange;

I suspect that some values are different, even if they look the same when they print.

Solution
‎03-07-2013 03:05 PM
Super User
Super User
Posts: 7,042

Re: Suming values between records

Two questions.

1) Does the variable you trying to accumulate into already exist on the input dataset? If so this will cause trouble because each time it reads a record it will override the value you are trying to carry forward from the previous observation.

2) Does the variable AGERANGE really have the values listed in the output or does it in fact have multiple distinct values that are grouped for display with a user defined format?  This could cause the FIRST. to get trigger more often that it would appear from the displayed values.

Regular Contributor
Posts: 216

Re: Suming values between records

Hi Tom

Your question above on if the CumulativeNumber variable exists on the input data set was the problem. It did and once I created it at accumulation, it resolved the problem.

Thanks a lot for ending a lot of frustration!

Paul

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 534 views
  • 3 likes
  • 6 in conversation