Hi all,
I have a dataset that currently looks like this
acc_id | curr | ever |
1 | 0 | 1 |
2 | 1 | 1 |
2 | 1 | 1 |
3 | 0 | 1 |
3 | 1 | 1 |
and I need to add a couple of additional columns to calculate a running total for each id - sum_curr and sum_ever
acc_id | curr | ever | sum_curr | sum_ever |
1 | 1 | 1 | 1 | 1 |
2 | 1 | 1 | 1 | 1 |
2 | 1 | 1 | 2 | 2 |
3 | 0 | 1 | 0 | 1 |
3 | 1 | 1 | 1 | 2 |
I'm struggling to work out how this would be done... any help would be greatly appreciated
thanks 🙂
Assuming that your "want" of id=1 and the value of Curr shown is a typo (changed from 0 to 1) this should work:
data have; input acc_id curr ever; datalines; 1 0 1 2 1 1 2 1 1 3 0 1 3 1 1 ; data want; set have; by acc_id; retain sum_curr sum_ever; if first.acc_id then do; sum_curr=curr; sum_ever=ever; end; else do; sum_curr=sum(sum_curr, curr); sum_ever=Sum(sum_ever, ever); end; run;
Please note the DATA step code to provide example data.
The Want step works with two major bits. The BY statement creates automatic variables Last. and First. that indicate whether the current record is the first or last of a by group. These automatic variables are numeric 1/0 for True/False and can be used to conditionally execute code. In this case to conditionally reset the accumulator variables.
The RETAIN statement creates variables that will hold the value across iterations of the data step.
I am using the SUM function in case your Curr or Ever variables ever have missing values but you want to accumulate across them. The + operator will return missing if any of the values are missing and would result in missing values for the total.
If your data is not actually sorted by ID but is grouped by the Id variable add NOTSORTED to the BY statement so the expected sort order of a BY statement is not violated causing a data error.
Assuming that your "want" of id=1 and the value of Curr shown is a typo (changed from 0 to 1) this should work:
data have; input acc_id curr ever; datalines; 1 0 1 2 1 1 2 1 1 3 0 1 3 1 1 ; data want; set have; by acc_id; retain sum_curr sum_ever; if first.acc_id then do; sum_curr=curr; sum_ever=ever; end; else do; sum_curr=sum(sum_curr, curr); sum_ever=Sum(sum_ever, ever); end; run;
Please note the DATA step code to provide example data.
The Want step works with two major bits. The BY statement creates automatic variables Last. and First. that indicate whether the current record is the first or last of a by group. These automatic variables are numeric 1/0 for True/False and can be used to conditionally execute code. In this case to conditionally reset the accumulator variables.
The RETAIN statement creates variables that will hold the value across iterations of the data step.
I am using the SUM function in case your Curr or Ever variables ever have missing values but you want to accumulate across them. The + operator will return missing if any of the values are missing and would result in missing values for the total.
If your data is not actually sorted by ID but is grouped by the Id variable add NOTSORTED to the BY statement so the expected sort order of a BY statement is not violated causing a data error.
Thank you!!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.