Hello
I want to accumulate by group .
The group is more than 1 grouping variable.
I want to ask 2 questions please:
1-Why my code is giving wrong results?
2-How can create the calculation by using multiple FIRST. statements
Data raw_tbl;
input ID group $ group2 $ Y;
cards;
1 a One 5
2 a One 10
3 a Two 15
4 a Two 20
5 b One 25
6 b Two 30
7 b Two 35
8 c Two 40
9 c Two 45
10 c One 50
;
run;
data raw_tbl2;
set raw_tbl;
group_CAT=CATX(',',group,group2);
Run;
proc sort data=raw_tbl2;by group_CAT;Run;
data required;
set raw_tbl2;
by group_CAT;
retain Accum_Y;
if first.group_CAT then Accum_Y=Y;
Accum_Y+y;
run;
Hi @Ronein
If you use the variable group_cat, you just need to reinitialize the counter to 0 each time there is a new Group_cat value:
data required;
set raw_tbl2;
by group_CAT;
if first.group_CAT then Accum_Y=0;
Accum_Y+y;
run;
You can simplify your code as follow:
Data raw_tbl;
input ID group $ group2 $ Y;
cards;
1 a One 5
2 a One 10
3 a Two 15
4 a Two 20
5 b One 25
6 b Two 30
7 b Two 35
8 c Two 40
9 c Two 45
10 c One 50
;
run;
proc sort data=raw_tbl;
by group group2;
Run;
data required;
set raw_tbl;
by group group2;
if first.group2 then Accum_Y=0;
Accum_Y+y;
run;
The rational is that when you specify "BY" variables, SAS creates the following internal corresponding variables : Last. and First.
First.<var> = 1 for the first record and 0 otherwise.
In the above example, the first and last values would be as follow. So you just need to reinitialize the counter when First.group2 = 0.
ID group group2 Y First.group First.group2
1 a One 5 1 1
2 a One 10 0 0
3 a Two 15 0 1
4 a Two 20 0 0
5 b One 25 1 1
6 b Two 30 0 1
7 b Two 35 0 0
8 c Two 40 1 1
9 c Two 45 0 0
10 c One 50 0 1
What wrong results is it giving?
Yes you can have two different FIRST. in the same data step
proc sort data = raw_tbl;
by group group2;
run;
data want;
set raw_tbl;
by group group2;
if first.group2 and first.group;
/* Although I don't think the above is what you want, I think you want */
/* if first.group2; */
/* but that's up to you */
...
run;
Hi @Ronein
If you use the variable group_cat, you just need to reinitialize the counter to 0 each time there is a new Group_cat value:
data required;
set raw_tbl2;
by group_CAT;
if first.group_CAT then Accum_Y=0;
Accum_Y+y;
run;
You can simplify your code as follow:
Data raw_tbl;
input ID group $ group2 $ Y;
cards;
1 a One 5
2 a One 10
3 a Two 15
4 a Two 20
5 b One 25
6 b Two 30
7 b Two 35
8 c Two 40
9 c Two 45
10 c One 50
;
run;
proc sort data=raw_tbl;
by group group2;
Run;
data required;
set raw_tbl;
by group group2;
if first.group2 then Accum_Y=0;
Accum_Y+y;
run;
The rational is that when you specify "BY" variables, SAS creates the following internal corresponding variables : Last. and First.
First.<var> = 1 for the first record and 0 otherwise.
In the above example, the first and last values would be as follow. So you just need to reinitialize the counter when First.group2 = 0.
ID group group2 Y First.group First.group2
1 a One 5 1 1
2 a One 10 0 0
3 a Two 15 0 1
4 a Two 20 0 0
5 b One 25 1 1
6 b Two 30 0 1
7 b Two 35 0 0
8 c Two 40 1 1
9 c Two 45 0 0
10 c One 50 0 1
As others have already shown, you can have more than one variable in a BY statement.
And I think most of the suggestions you got solved the calculation problem you had. But to answer you question:
1-Why my code is giving wrong results?
I think you are missing an ELSE. Your summation code is
if first.group_CAT then Accum_Y=Y;
Accum_Y+y;
Which means that in the first OBS in the group, the accumulated value will be Y+Y, and that error will persist throughout the result data.
Just change it to
if first.group_CAT then Accum_Y=Y;
else Accum_Y+y;
and I think you will get the right result.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.