BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Paul_NYS
Obsidian | Level 7

Hi

I have a simple data step where I am trying to create a cumulative number column (CumulativeNumber) based on the value of another column (DistributiveNumber). This column resets to 0 each time a new section of County Year AgeRange ExitMonthCategory is reached. However as the attached example shows it is re-setting, but not summing correctly.

If you see anything wrong with the below code, please let me know.

Paul

proc sort data=s9CountyAges;

by County Year AgeRange ExitMonthCategory;

run;

data test1;

set s9CountyAges;

by County Year AgeRange ExitMonthCategory;

if first.AgeRange then CumulativeNumber=0;

CumulativeNumber + DistributiveNumber;

run;

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Two questions.

1) Does the variable you trying to accumulate into already exist on the input dataset? If so this will cause trouble because each time it reads a record it will override the value you are trying to carry forward from the previous observation.

2) Does the variable AGERANGE really have the values listed in the output or does it in fact have multiple distinct values that are grouped for display with a user defined format?  This could cause the FIRST. to get trigger more often that it would appear from the displayed values.

View solution in original post

12 REPLIES 12
Sylas
Fluorite | Level 6

Hi Paul,

Try with Sum function because :smileyplus: operator will not give correct summing of the data if empty values are present.

CumulativeNumber=sum(CumulativeNumber,DistributiveNumber);

Regards

Sylas.J

Astounding
PROC Star

Paul,

It's close to working.  This switch should do it:

if first.ExitMonthCategory then CumulativeNumber=0;

Good luck.

UrvishShah
Fluorite | Level 6

Hi,

/*Identifying Duplicates by specified variables*/

proc sort nodupkey data = have dupout = dup;

      by country year agerange exitmonthCategory;

run;

/*Combine it with original data*/

data want(drop = distributivenumber1);

      merge have

                dup(rename = (distributivenumber = distributivenumber1));

      by country year agerange exitmonthCategory;

      if distributivenumber1 EQ . then do;

        cum_num = 0;

      end;

      else do;

            cum_num = sum(distributivenumber,distributivenumber1);

      end;

run;

Thanks,

Urvish

Paul_NYS
Obsidian | Level 7

I tried everyone's ideas above and it is still not summing at all. I have used this type of data step many times before and it has worked fine. I tried the below simplified variation of it and it is not working either.

Is there anything unusual that the the sum function needs to operate?

Paul

proc sort data=s9CountyAges;

by County Year AgeRange ExitMonthCategory;

run;

data test1;

set s9CountyAges;

by County Year AgeRange ExitMonthCategory;

CumulativeNumber=.;

if first.AgeRange then CumulativeNumber=0;

CumulativeNumber + DistributiveNumber;

run;

Astounding
PROC Star

This statement does not belong in the code.  For every observation, you are setting:

CumulativeNumber=.;

Also, I still don't see my earlier suggestion incorporated.

Good luck.


Paul_NYS
Obsidian | Level 7

I tried it with 'ExitMonthCategory' and it did not work--it does not sum anything actually. When I have used this in the past, I always set the 'first' variable to the second to last one in the 'by' statement. This usually works.

Paul

data short;

set short;

by County Year AgeRange ExitMonthCategory;

if first.ExitMonthCategory then CumulativeNumber=0;

CumulativeNumber + DistributiveNumber;

run;

Astounding
PROC Star

You could be right about which variable is the "first" one.  It all depends on the level at which you would like to accumulate.  The original post said you wanted to set your CumulativeNumber to 0 each time you hit a new County / Year / AgeRange / ExitMonthCategory.  If you use first.AgeRange, you will set your CumulativeNumber to 0 each time  you hit a new County / Year / AgeRange.  It's not that one is right and the other is wrong, it just depends on what you would like your cumulative number to represent.

There's nothing wrong with the last set of code you posted.  Instead of posting a spreadsheet, run a PROC PRINT and examine the first 5 or 10 lines.  If it doesn't look right, post those lines along with the output from a PROC CONTENTS.  There aren't too many places for this sort of problem to hide.

Paul_NYS
Obsidian | Level 7

Hi Astounding

I re-ran it, it didn't sum anything. Below is the first 20 records which show that the CumulativeNumber column is just being set equal to the DistributiveNumber column and it is not zeroing out either. I don't know of a way to attach the proc contents though. I can't post it on a website right now due to temporary inaccess.

If anything jumps out, please let me know.

Paul

proc sort data=short;

by County Year AgeRange ExitMonthCategory;

run;

data short;

set short;

by County Year AgeRange ExitMonthCategory;

if first.AgeRange then CumulativeNumber=0;

CumulativeNumber + DistributiveNumber;

run;

Proc Print:

Obs County Year AgeRange ExitMonthCategory DistributiveNumber CumulativeNumber

1 Albany 2006 1 - 0 to 3 3 0 0

2 Albany 2006 1 - 0 to 3 4 0 0

3 Albany 2006 1 - 0 to 3 5 0 0

4 Albany 2006 1 - 0 to 3 6 0 0

5 Albany 2006 1 - 0 to 3 12 6 6

6 Albany 2006 1 - 0 to 3 18 7 7

7 Albany 2006 1 - 0 to 3 24 5 5

8 Albany 2006 1 - 0 to 3 30 0 0

9 Albany 2006 1 - 0 to 3 36 1 1

10 Albany 2006 1 - 0 to 3 42 0 0

11 Albany 2006 1 - 0 to 3 48 0 0

12 Albany 2006 1 - 0 to 3 54 0 0

13 Albany 2006 1 - 0 to 3 60 0 0

14 Albany 2006 1 - 0 to 3 66 0 0

15 Albany 2006 1 - 0 to 3 72 0 0

16 Albany 2006 2 - 4 to 6 3 0 0

17 Albany 2006 2 - 4 to 6 4 0 0

18 Albany 2006 2 - 4 to 6 5 0 0

19 Albany 2006 2 - 4 to 6 6 0 0

20 Albany 2006 2 - 4 to 6 12 2 2

DBailey
Lapis Lazuli | Level 10

Shouldn't this

data short;

set short;

be

data want;

set short;

Astounding
PROC Star

OK, there still looks like there is nothing wrong with the program.  So let's examine the data.  Try adding these statements, and let's see what these new variables tell us:

F_County = first.county;

F_Year = first.year;

F_AgeRange = first.agerange;

I suspect that some values are different, even if they look the same when they print.

Tom
Super User Tom
Super User

Two questions.

1) Does the variable you trying to accumulate into already exist on the input dataset? If so this will cause trouble because each time it reads a record it will override the value you are trying to carry forward from the previous observation.

2) Does the variable AGERANGE really have the values listed in the output or does it in fact have multiple distinct values that are grouped for display with a user defined format?  This could cause the FIRST. to get trigger more often that it would appear from the displayed values.

Paul_NYS
Obsidian | Level 7

Hi Tom

Your question above on if the CumulativeNumber variable exists on the input data set was the problem. It did and once I created it at accumulation, it resolved the problem.

Thanks a lot for ending a lot of frustration!

Paul

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 2329 views
  • 3 likes
  • 6 in conversation