Hello
I want to accumulate by group .
The group is more than 1 grouping variable.
I want to ask 2 questions please:
1-Why my code is giving wrong results?
2-How can create the calculation by using multiple FIRST. statements
Data raw_tbl;
input ID group $ group2 $ Y;
cards;
1 a One 5
2 a One 10
3 a Two 15
4 a Two 20
5 b One 25
6 b Two 30
7 b Two 35
8 c Two 40
9 c Two 45
10 c One 50
;
run;
data raw_tbl2;
set raw_tbl;
group_CAT=CATX(',',group,group2);
Run;
proc sort data=raw_tbl2;by group_CAT;Run;
data required;
set raw_tbl2;
by group_CAT;
retain Accum_Y;
if first.group_CAT then Accum_Y=Y;
Accum_Y+y;
run;
Hi @Ronein
If you use the variable group_cat, you just need to reinitialize the counter to 0 each time there is a new Group_cat value:
data required;
set raw_tbl2;
by group_CAT;
if first.group_CAT then Accum_Y=0;
Accum_Y+y;
run;
You can simplify your code as follow:
Data raw_tbl;
input ID group $ group2 $ Y;
cards;
1 a One 5
2 a One 10
3 a Two 15
4 a Two 20
5 b One 25
6 b Two 30
7 b Two 35
8 c Two 40
9 c Two 45
10 c One 50
;
run;
proc sort data=raw_tbl;
by group group2;
Run;
data required;
set raw_tbl;
by group group2;
if first.group2 then Accum_Y=0;
Accum_Y+y;
run;
The rational is that when you specify "BY" variables, SAS creates the following internal corresponding variables : Last. and First.
First.<var> = 1 for the first record and 0 otherwise.
In the above example, the first and last values would be as follow. So you just need to reinitialize the counter when First.group2 = 0.
ID group group2 Y First.group First.group2
1 a One 5 1 1
2 a One 10 0 0
3 a Two 15 0 1
4 a Two 20 0 0
5 b One 25 1 1
6 b Two 30 0 1
7 b Two 35 0 0
8 c Two 40 1 1
9 c Two 45 0 0
10 c One 50 0 1
What wrong results is it giving?
Yes you can have two different FIRST. in the same data step
proc sort data = raw_tbl;
by group group2;
run;
data want;
set raw_tbl;
by group group2;
if first.group2 and first.group;
/* Although I don't think the above is what you want, I think you want */
/* if first.group2; */
/* but that's up to you */
...
run;
Hi @Ronein
If you use the variable group_cat, you just need to reinitialize the counter to 0 each time there is a new Group_cat value:
data required;
set raw_tbl2;
by group_CAT;
if first.group_CAT then Accum_Y=0;
Accum_Y+y;
run;
You can simplify your code as follow:
Data raw_tbl;
input ID group $ group2 $ Y;
cards;
1 a One 5
2 a One 10
3 a Two 15
4 a Two 20
5 b One 25
6 b Two 30
7 b Two 35
8 c Two 40
9 c Two 45
10 c One 50
;
run;
proc sort data=raw_tbl;
by group group2;
Run;
data required;
set raw_tbl;
by group group2;
if first.group2 then Accum_Y=0;
Accum_Y+y;
run;
The rational is that when you specify "BY" variables, SAS creates the following internal corresponding variables : Last. and First.
First.<var> = 1 for the first record and 0 otherwise.
In the above example, the first and last values would be as follow. So you just need to reinitialize the counter when First.group2 = 0.
ID group group2 Y First.group First.group2
1 a One 5 1 1
2 a One 10 0 0
3 a Two 15 0 1
4 a Two 20 0 0
5 b One 25 1 1
6 b Two 30 0 1
7 b Two 35 0 0
8 c Two 40 1 1
9 c Two 45 0 0
10 c One 50 0 1
As others have already shown, you can have more than one variable in a BY statement.
And I think most of the suggestions you got solved the calculation problem you had. But to answer you question:
1-Why my code is giving wrong results?
I think you are missing an ELSE. Your summation code is
if first.group_CAT then Accum_Y=Y;
Accum_Y+y;
Which means that in the first OBS in the group, the accumulated value will be Y+Y, and that error will persist throughout the result data.
Just change it to
if first.group_CAT then Accum_Y=Y;
else Accum_Y+y;
and I think you will get the right result.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.