Hi
I have a simple data step where I am trying to create a cumulative number column (CumulativeNumber) based on the value of another column (DistributiveNumber). This column resets to 0 each time a new section of County Year AgeRange ExitMonthCategory is reached. However as the attached example shows it is re-setting, but not summing correctly.
If you see anything wrong with the below code, please let me know.
Paul
proc sort data=s9CountyAges;
by County Year AgeRange ExitMonthCategory;
run;
data test1;
set s9CountyAges;
by County Year AgeRange ExitMonthCategory;
if first.AgeRange then CumulativeNumber=0;
CumulativeNumber + DistributiveNumber;
run;
Two questions.
1) Does the variable you trying to accumulate into already exist on the input dataset? If so this will cause trouble because each time it reads a record it will override the value you are trying to carry forward from the previous observation.
2) Does the variable AGERANGE really have the values listed in the output or does it in fact have multiple distinct values that are grouped for display with a user defined format? This could cause the FIRST. to get trigger more often that it would appear from the displayed values.
Hi Paul,
Try with Sum function because :smileyplus: operator will not give correct summing of the data if empty values are present.
CumulativeNumber=sum(CumulativeNumber,DistributiveNumber);
Regards
Sylas.J
Paul,
It's close to working. This switch should do it:
if first.ExitMonthCategory then CumulativeNumber=0;
Good luck.
Hi,
/*Identifying Duplicates by specified variables*/
proc sort nodupkey data = have dupout = dup;
by country year agerange exitmonthCategory;
run;
/*Combine it with original data*/
data want(drop = distributivenumber1);
merge have
dup(rename = (distributivenumber = distributivenumber1));
by country year agerange exitmonthCategory;
if distributivenumber1 EQ . then do;
cum_num = 0;
end;
else do;
cum_num = sum(distributivenumber,distributivenumber1);
end;
run;
Thanks,
Urvish
I tried everyone's ideas above and it is still not summing at all. I have used this type of data step many times before and it has worked fine. I tried the below simplified variation of it and it is not working either.
Is there anything unusual that the the sum function needs to operate?
Paul
proc sort data=s9CountyAges;
by County Year AgeRange ExitMonthCategory;
run;
data test1;
set s9CountyAges;
by County Year AgeRange ExitMonthCategory;
CumulativeNumber=.;
if first.AgeRange then CumulativeNumber=0;
CumulativeNumber + DistributiveNumber;
run;
This statement does not belong in the code. For every observation, you are setting:
CumulativeNumber=.;
Also, I still don't see my earlier suggestion incorporated.
Good luck.
I tried it with 'ExitMonthCategory' and it did not work--it does not sum anything actually. When I have used this in the past, I always set the 'first' variable to the second to last one in the 'by' statement. This usually works.
Paul
data short;
set short;
by County Year AgeRange ExitMonthCategory;
if first.ExitMonthCategory then CumulativeNumber=0;
CumulativeNumber + DistributiveNumber;
run;
You could be right about which variable is the "first" one. It all depends on the level at which you would like to accumulate. The original post said you wanted to set your CumulativeNumber to 0 each time you hit a new County / Year / AgeRange / ExitMonthCategory. If you use first.AgeRange, you will set your CumulativeNumber to 0 each time you hit a new County / Year / AgeRange. It's not that one is right and the other is wrong, it just depends on what you would like your cumulative number to represent.
There's nothing wrong with the last set of code you posted. Instead of posting a spreadsheet, run a PROC PRINT and examine the first 5 or 10 lines. If it doesn't look right, post those lines along with the output from a PROC CONTENTS. There aren't too many places for this sort of problem to hide.
Hi Astounding
I re-ran it, it didn't sum anything. Below is the first 20 records which show that the CumulativeNumber column is just being set equal to the DistributiveNumber column and it is not zeroing out either. I don't know of a way to attach the proc contents though. I can't post it on a website right now due to temporary inaccess.
If anything jumps out, please let me know.
Paul
proc sort data=short;
by County Year AgeRange ExitMonthCategory;
run;
data short;
set short;
by County Year AgeRange ExitMonthCategory;
if first.AgeRange then CumulativeNumber=0;
CumulativeNumber + DistributiveNumber;
run;
Proc Print:
Obs County Year AgeRange ExitMonthCategory DistributiveNumber CumulativeNumber
1 Albany 2006 1 - 0 to 3 3 0 0
2 Albany 2006 1 - 0 to 3 4 0 0
3 Albany 2006 1 - 0 to 3 5 0 0
4 Albany 2006 1 - 0 to 3 6 0 0
5 Albany 2006 1 - 0 to 3 12 6 6
6 Albany 2006 1 - 0 to 3 18 7 7
7 Albany 2006 1 - 0 to 3 24 5 5
8 Albany 2006 1 - 0 to 3 30 0 0
9 Albany 2006 1 - 0 to 3 36 1 1
10 Albany 2006 1 - 0 to 3 42 0 0
11 Albany 2006 1 - 0 to 3 48 0 0
12 Albany 2006 1 - 0 to 3 54 0 0
13 Albany 2006 1 - 0 to 3 60 0 0
14 Albany 2006 1 - 0 to 3 66 0 0
15 Albany 2006 1 - 0 to 3 72 0 0
16 Albany 2006 2 - 4 to 6 3 0 0
17 Albany 2006 2 - 4 to 6 4 0 0
18 Albany 2006 2 - 4 to 6 5 0 0
19 Albany 2006 2 - 4 to 6 6 0 0
20 Albany 2006 2 - 4 to 6 12 2 2
Shouldn't this
data short;
set short;
be
data want;
set short;
OK, there still looks like there is nothing wrong with the program. So let's examine the data. Try adding these statements, and let's see what these new variables tell us:
F_County = first.county;
F_Year = first.year;
F_AgeRange = first.agerange;
I suspect that some values are different, even if they look the same when they print.
Two questions.
1) Does the variable you trying to accumulate into already exist on the input dataset? If so this will cause trouble because each time it reads a record it will override the value you are trying to carry forward from the previous observation.
2) Does the variable AGERANGE really have the values listed in the output or does it in fact have multiple distinct values that are grouped for display with a user defined format? This could cause the FIRST. to get trigger more often that it would appear from the displayed values.
Hi Tom
Your question above on if the CumulativeNumber variable exists on the input data set was the problem. It did and once I created it at accumulation, it resolved the problem.
Thanks a lot for ending a lot of frustration!
Paul
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.