Hi community,
I need help with retaining values inside two o more groups. All samples I can find are with only one group.
My groups look like:
Group_A : A1, A2
Group_B : B1, B2, B3
My dataset is:
data have; input mydate:ddmmyy10. Group_A $ Group_B $ Val; format mydate ddmmyy10.; datalines; 01/04/2022 A1 B1 10 02/04/2022 A1 B1 10 03/04/2022 A1 B1 . 04/04/2022 A1 B1 . 05/04/2022 A1 B1 . 01/04/2022 A1 B2 . 02/04/2022 A1 B2 . 03/04/2022 A1 B2 20 04/04/2022 A1 B2 . 05/04/2022 A1 B2 . 01/04/2022 A1 B3 . 02/04/2022 A1 B3 40 03/04/2022 A1 B3 . 04/04/2022 A1 B3 42 05/04/2022 A1 B3 . 01/04/2022 A2 B1 . 02/04/2022 A2 B1 . 03/04/2022 A2 B1 15 04/04/2022 A2 B1 . 05/04/2022 A2 B1 . ; run;
I want the following results:
data want; input mydate:ddmmyy10. Group_A $ Group_B $ Val; format mydate ddmmyy10.; datalines; 01/04/2022 A1 B1 10 02/04/2022 A1 B1 10 03/04/2022 A1 B1 10 04/04/2022 A1 B1 10 05/04/2022 A1 B1 10 01/04/2022 A1 B2 . 02/04/2022 A1 B2 . 03/04/2022 A1 B2 20 04/04/2022 A1 B2 20 05/04/2022 A1 B2 20 01/04/2022 A1 B3 . 02/04/2022 A1 B3 40 03/04/2022 A1 B3 40 04/04/2022 A1 B3 42 05/04/2022 A1 B3 42 01/04/2022 A2 B1 . 02/04/2022 A2 B1 . 03/04/2022 A2 B1 15 04/04/2022 A2 B1 15 05/04/2022 A2 B1 15 ; run;
Thanks for your help.
Regards
So does adding it fix the issue?
data want;
  update have(obs=0) have;
  by period group_a group_b;
  output;
run;Let's make a dataset that shows the changed variable.
data change;
  set have;
  set want(keep=val rename=(val=new_val));
run;Result
Obs period mydate Group_a Group_B Val new_val 1 202204 2022-04-01 A1 B1 10 10 2 202204 2022-04-02 A1 B1 10 10 3 202204 2022-04-03 A1 B1 . 10 4 202204 2022-04-04 A1 B1 . 10 5 202204 2022-04-05 A1 B1 . 10 6 202204 2022-04-01 A1 B2 . . 7 202204 2022-04-02 A1 B2 . . 8 202204 2022-04-03 A1 B2 20 20 9 202204 2022-04-04 A1 B2 . 20 10 202204 2022-04-05 A1 B2 . 20 11 202204 2022-04-01 A1 B3 . . 12 202204 2022-04-02 A1 B3 40 40 13 202204 2022-04-03 A1 B3 . 40 14 202204 2022-04-04 A1 B3 42 42 15 202204 2022-04-05 A1 B3 . 42 16 202204 2022-04-01 A2 B1 . . 17 202204 2022-04-02 A2 B1 . . 18 202204 2022-04-03 A2 B1 15 15 19 202204 2022-04-04 A2 B1 . 15 20 202204 2022-04-05 A2 B1 . 15 21 202205 2022-05-01 A1 B1 . . 22 202205 2022-05-02 A1 B1 . . 23 202205 2022-05-03 A1 B1 12 12 24 202205 2022-05-04 A1 B1 . 12 25 202205 2022-05-05 A1 B1 . 12 26 202205 2022-05-01 A1 B2 . . 27 202205 2022-05-02 A1 B2 . . 28 202205 2022-05-03 A1 B2 22 22 29 202205 2022-05-04 A1 B2 . 22 30 202205 2022-05-05 A1 B2 . 22
If you found examples for one group, then it should be relatively simple to change it to 2 or more groups
I recommend you review BY-Group Processing in the DATA Step 
Here's an example:
data have;
input mydate:ddmmyy10. Group_A $ Group_B $ Val;
format mydate ddmmyy10.;
datalines;
01/04/2022 A1 B1 10
02/04/2022 A1 B1 10
03/04/2022 A1 B1 .
04/04/2022 A1 B1 .
05/04/2022 A1 B1 .
01/04/2022 A1 B2 .
02/04/2022 A1 B2 .
03/04/2022 A1 B2 20
04/04/2022 A1 B2 .
05/04/2022 A1 B2 .
01/04/2022 A1 B3 .
02/04/2022 A1 B3 40
03/04/2022 A1 B3 .
04/04/2022 A1 B3 42
05/04/2022 A1 B3 .
01/04/2022 A2 B1 .
02/04/2022 A2 B1 .
03/04/2022 A2 B1 15
04/04/2022 A2 B1 .
05/04/2022 A2 B1 .
;
run;
proc sort 
		data=work.have 
		out=work.srtdHave ;
	by Group_A Group_B mydate ;
run ;
data work.want (drop=value) ;	
	retain value . ;
	set srtdHave ;
	by Group_A Group_B mydate ;
	if first.Group_B then 
		value=. ;
	if val ne . then 
		value=val ;
	else
		val=value ;
	output work.want ;
run ;
Hi experts,
Thanks for your reply. I was trying to apply the solutions to my problem, but I have more variables. Please find a modified 'have' dataset.
data have; input period mydate:ddmmyy10. Group_a $ Group_B $ Val; format mydate ddmmyy10.; datalines; 202204 01/04/2022 A1 B1 10 202204 02/04/2022 A1 B1 10 202204 03/04/2022 A1 B1 . 202204 04/04/2022 A1 B1 . 202204 05/04/2022 A1 B1 . 202204 01/04/2022 A1 B2 . 202204 02/04/2022 A1 B2 . 202204 03/04/2022 A1 B2 20 202204 04/04/2022 A1 B2 . 202204 05/04/2022 A1 B2 . 202204 01/04/2022 A1 B3 . 202204 02/04/2022 A1 B3 40 202204 03/04/2022 A1 B3 . 202204 04/04/2022 A1 B3 42 202204 05/04/2022 A1 B3 . 202204 01/04/2022 A2 B1 . 202204 02/04/2022 A2 B1 . 202204 03/04/2022 A2 B1 15 202204 04/04/2022 A2 B1 . 202204 05/04/2022 A2 B1 . 202205 01/05/2022 A1 B1 . 202205 02/05/2022 A1 B1 . 202205 03/05/2022 A1 B1 12 202205 04/05/2022 A1 B1 . 202205 05/05/2022 A1 B1 . 202205 01/05/2022 A1 B2 . 202205 02/05/2022 A1 B2 . 202205 03/05/2022 A1 B2 22 202205 04/05/2022 A1 B2 . 202205 05/05/2022 A1 B2 . ; run;
The generated dataset has some incorrect val, for example at line:
period: 202205
mydate: 01/05/2022
group_A: A1
group_B: B1
Val: 10 (It must be missing, according to 'have' dataset).
Thanks for the suggestions.
So PERIOD is one of the BY variables also?
So does adding it fix the issue?
data want;
  update have(obs=0) have;
  by period group_a group_b;
  output;
run;Let's make a dataset that shows the changed variable.
data change;
  set have;
  set want(keep=val rename=(val=new_val));
run;Result
Obs period mydate Group_a Group_B Val new_val 1 202204 2022-04-01 A1 B1 10 10 2 202204 2022-04-02 A1 B1 10 10 3 202204 2022-04-03 A1 B1 . 10 4 202204 2022-04-04 A1 B1 . 10 5 202204 2022-04-05 A1 B1 . 10 6 202204 2022-04-01 A1 B2 . . 7 202204 2022-04-02 A1 B2 . . 8 202204 2022-04-03 A1 B2 20 20 9 202204 2022-04-04 A1 B2 . 20 10 202204 2022-04-05 A1 B2 . 20 11 202204 2022-04-01 A1 B3 . . 12 202204 2022-04-02 A1 B3 40 40 13 202204 2022-04-03 A1 B3 . 40 14 202204 2022-04-04 A1 B3 42 42 15 202204 2022-04-05 A1 B3 . 42 16 202204 2022-04-01 A2 B1 . . 17 202204 2022-04-02 A2 B1 . . 18 202204 2022-04-03 A2 B1 15 15 19 202204 2022-04-04 A2 B1 . 15 20 202204 2022-04-05 A2 B1 . 15 21 202205 2022-05-01 A1 B1 . . 22 202205 2022-05-02 A1 B1 . . 23 202205 2022-05-03 A1 B1 12 12 24 202205 2022-05-04 A1 B1 . 12 25 202205 2022-05-05 A1 B1 . 12 26 202205 2022-05-01 A1 B2 . . 27 202205 2022-05-02 A1 B2 . . 28 202205 2022-05-03 A1 B2 22 22 29 202205 2022-05-04 A1 B2 . 22 30 202205 2022-05-05 A1 B2 . 22
Looks like a simple LOCF (last observation carried forward).
You can implement that with the UPDATE statement. Use the source dataset as both the BASE and the TRANSACTION data. Add an OUTPUT statement so that an observation is written after every transaction is applied instead of the normal processing which would only write one observation per BY group that had all of the transactions for it applied.
data want;
  update have(obs=0) have;
   by group_a group_b;
  output;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
