- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I have a dataset that currently looks like this
acc_id | curr | ever |
1 | 0 | 1 |
2 | 1 | 1 |
2 | 1 | 1 |
3 | 0 | 1 |
3 | 1 | 1 |
and I need to add a couple of additional columns to calculate a running total for each id - sum_curr and sum_ever
acc_id | curr | ever | sum_curr | sum_ever |
1 | 1 | 1 | 1 | 1 |
2 | 1 | 1 | 1 | 1 |
2 | 1 | 1 | 2 | 2 |
3 | 0 | 1 | 0 | 1 |
3 | 1 | 1 | 1 | 2 |
I'm struggling to work out how this would be done... any help would be greatly appreciated
thanks 🙂
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Assuming that your "want" of id=1 and the value of Curr shown is a typo (changed from 0 to 1) this should work:
data have; input acc_id curr ever; datalines; 1 0 1 2 1 1 2 1 1 3 0 1 3 1 1 ; data want; set have; by acc_id; retain sum_curr sum_ever; if first.acc_id then do; sum_curr=curr; sum_ever=ever; end; else do; sum_curr=sum(sum_curr, curr); sum_ever=Sum(sum_ever, ever); end; run;
Please note the DATA step code to provide example data.
The Want step works with two major bits. The BY statement creates automatic variables Last. and First. that indicate whether the current record is the first or last of a by group. These automatic variables are numeric 1/0 for True/False and can be used to conditionally execute code. In this case to conditionally reset the accumulator variables.
The RETAIN statement creates variables that will hold the value across iterations of the data step.
I am using the SUM function in case your Curr or Ever variables ever have missing values but you want to accumulate across them. The + operator will return missing if any of the values are missing and would result in missing values for the total.
If your data is not actually sorted by ID but is grouped by the Id variable add NOTSORTED to the BY statement so the expected sort order of a BY statement is not violated causing a data error.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Assuming that your "want" of id=1 and the value of Curr shown is a typo (changed from 0 to 1) this should work:
data have; input acc_id curr ever; datalines; 1 0 1 2 1 1 2 1 1 3 0 1 3 1 1 ; data want; set have; by acc_id; retain sum_curr sum_ever; if first.acc_id then do; sum_curr=curr; sum_ever=ever; end; else do; sum_curr=sum(sum_curr, curr); sum_ever=Sum(sum_ever, ever); end; run;
Please note the DATA step code to provide example data.
The Want step works with two major bits. The BY statement creates automatic variables Last. and First. that indicate whether the current record is the first or last of a by group. These automatic variables are numeric 1/0 for True/False and can be used to conditionally execute code. In this case to conditionally reset the accumulator variables.
The RETAIN statement creates variables that will hold the value across iterations of the data step.
I am using the SUM function in case your Curr or Ever variables ever have missing values but you want to accumulate across them. The + operator will return missing if any of the values are missing and would result in missing values for the total.
If your data is not actually sorted by ID but is grouped by the Id variable add NOTSORTED to the BY statement so the expected sort order of a BY statement is not violated causing a data error.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you!!