Hi community,
I need help with retaining values inside two o more groups. All samples I can find are with only one group.
My groups look like:
Group_A : A1, A2
Group_B : B1, B2, B3
My dataset is:
data have; input mydate:ddmmyy10. Group_A $ Group_B $ Val; format mydate ddmmyy10.; datalines; 01/04/2022 A1 B1 10 02/04/2022 A1 B1 10 03/04/2022 A1 B1 . 04/04/2022 A1 B1 . 05/04/2022 A1 B1 . 01/04/2022 A1 B2 . 02/04/2022 A1 B2 . 03/04/2022 A1 B2 20 04/04/2022 A1 B2 . 05/04/2022 A1 B2 . 01/04/2022 A1 B3 . 02/04/2022 A1 B3 40 03/04/2022 A1 B3 . 04/04/2022 A1 B3 42 05/04/2022 A1 B3 . 01/04/2022 A2 B1 . 02/04/2022 A2 B1 . 03/04/2022 A2 B1 15 04/04/2022 A2 B1 . 05/04/2022 A2 B1 . ; run;
I want the following results:
data want; input mydate:ddmmyy10. Group_A $ Group_B $ Val; format mydate ddmmyy10.; datalines; 01/04/2022 A1 B1 10 02/04/2022 A1 B1 10 03/04/2022 A1 B1 10 04/04/2022 A1 B1 10 05/04/2022 A1 B1 10 01/04/2022 A1 B2 . 02/04/2022 A1 B2 . 03/04/2022 A1 B2 20 04/04/2022 A1 B2 20 05/04/2022 A1 B2 20 01/04/2022 A1 B3 . 02/04/2022 A1 B3 40 03/04/2022 A1 B3 40 04/04/2022 A1 B3 42 05/04/2022 A1 B3 42 01/04/2022 A2 B1 . 02/04/2022 A2 B1 . 03/04/2022 A2 B1 15 04/04/2022 A2 B1 15 05/04/2022 A2 B1 15 ; run;
Thanks for your help.
Regards
So does adding it fix the issue?
data want;
update have(obs=0) have;
by period group_a group_b;
output;
run;
Let's make a dataset that shows the changed variable.
data change;
set have;
set want(keep=val rename=(val=new_val));
run;
Result
Obs period mydate Group_a Group_B Val new_val 1 202204 2022-04-01 A1 B1 10 10 2 202204 2022-04-02 A1 B1 10 10 3 202204 2022-04-03 A1 B1 . 10 4 202204 2022-04-04 A1 B1 . 10 5 202204 2022-04-05 A1 B1 . 10 6 202204 2022-04-01 A1 B2 . . 7 202204 2022-04-02 A1 B2 . . 8 202204 2022-04-03 A1 B2 20 20 9 202204 2022-04-04 A1 B2 . 20 10 202204 2022-04-05 A1 B2 . 20 11 202204 2022-04-01 A1 B3 . . 12 202204 2022-04-02 A1 B3 40 40 13 202204 2022-04-03 A1 B3 . 40 14 202204 2022-04-04 A1 B3 42 42 15 202204 2022-04-05 A1 B3 . 42 16 202204 2022-04-01 A2 B1 . . 17 202204 2022-04-02 A2 B1 . . 18 202204 2022-04-03 A2 B1 15 15 19 202204 2022-04-04 A2 B1 . 15 20 202204 2022-04-05 A2 B1 . 15 21 202205 2022-05-01 A1 B1 . . 22 202205 2022-05-02 A1 B1 . . 23 202205 2022-05-03 A1 B1 12 12 24 202205 2022-05-04 A1 B1 . 12 25 202205 2022-05-05 A1 B1 . 12 26 202205 2022-05-01 A1 B2 . . 27 202205 2022-05-02 A1 B2 . . 28 202205 2022-05-03 A1 B2 22 22 29 202205 2022-05-04 A1 B2 . 22 30 202205 2022-05-05 A1 B2 . 22
If you found examples for one group, then it should be relatively simple to change it to 2 or more groups
I recommend you review BY-Group Processing in the DATA Step
Here's an example:
data have;
input mydate:ddmmyy10. Group_A $ Group_B $ Val;
format mydate ddmmyy10.;
datalines;
01/04/2022 A1 B1 10
02/04/2022 A1 B1 10
03/04/2022 A1 B1 .
04/04/2022 A1 B1 .
05/04/2022 A1 B1 .
01/04/2022 A1 B2 .
02/04/2022 A1 B2 .
03/04/2022 A1 B2 20
04/04/2022 A1 B2 .
05/04/2022 A1 B2 .
01/04/2022 A1 B3 .
02/04/2022 A1 B3 40
03/04/2022 A1 B3 .
04/04/2022 A1 B3 42
05/04/2022 A1 B3 .
01/04/2022 A2 B1 .
02/04/2022 A2 B1 .
03/04/2022 A2 B1 15
04/04/2022 A2 B1 .
05/04/2022 A2 B1 .
;
run;
proc sort
data=work.have
out=work.srtdHave ;
by Group_A Group_B mydate ;
run ;
data work.want (drop=value) ;
retain value . ;
set srtdHave ;
by Group_A Group_B mydate ;
if first.Group_B then
value=. ;
if val ne . then
value=val ;
else
val=value ;
output work.want ;
run ;
Hi experts,
Thanks for your reply. I was trying to apply the solutions to my problem, but I have more variables. Please find a modified 'have' dataset.
data have; input period mydate:ddmmyy10. Group_a $ Group_B $ Val; format mydate ddmmyy10.; datalines; 202204 01/04/2022 A1 B1 10 202204 02/04/2022 A1 B1 10 202204 03/04/2022 A1 B1 . 202204 04/04/2022 A1 B1 . 202204 05/04/2022 A1 B1 . 202204 01/04/2022 A1 B2 . 202204 02/04/2022 A1 B2 . 202204 03/04/2022 A1 B2 20 202204 04/04/2022 A1 B2 . 202204 05/04/2022 A1 B2 . 202204 01/04/2022 A1 B3 . 202204 02/04/2022 A1 B3 40 202204 03/04/2022 A1 B3 . 202204 04/04/2022 A1 B3 42 202204 05/04/2022 A1 B3 . 202204 01/04/2022 A2 B1 . 202204 02/04/2022 A2 B1 . 202204 03/04/2022 A2 B1 15 202204 04/04/2022 A2 B1 . 202204 05/04/2022 A2 B1 . 202205 01/05/2022 A1 B1 . 202205 02/05/2022 A1 B1 . 202205 03/05/2022 A1 B1 12 202205 04/05/2022 A1 B1 . 202205 05/05/2022 A1 B1 . 202205 01/05/2022 A1 B2 . 202205 02/05/2022 A1 B2 . 202205 03/05/2022 A1 B2 22 202205 04/05/2022 A1 B2 . 202205 05/05/2022 A1 B2 . ; run;
The generated dataset has some incorrect val, for example at line:
period: 202205
mydate: 01/05/2022
group_A: A1
group_B: B1
Val: 10 (It must be missing, according to 'have' dataset).
Thanks for the suggestions.
So PERIOD is one of the BY variables also?
So does adding it fix the issue?
data want;
update have(obs=0) have;
by period group_a group_b;
output;
run;
Let's make a dataset that shows the changed variable.
data change;
set have;
set want(keep=val rename=(val=new_val));
run;
Result
Obs period mydate Group_a Group_B Val new_val 1 202204 2022-04-01 A1 B1 10 10 2 202204 2022-04-02 A1 B1 10 10 3 202204 2022-04-03 A1 B1 . 10 4 202204 2022-04-04 A1 B1 . 10 5 202204 2022-04-05 A1 B1 . 10 6 202204 2022-04-01 A1 B2 . . 7 202204 2022-04-02 A1 B2 . . 8 202204 2022-04-03 A1 B2 20 20 9 202204 2022-04-04 A1 B2 . 20 10 202204 2022-04-05 A1 B2 . 20 11 202204 2022-04-01 A1 B3 . . 12 202204 2022-04-02 A1 B3 40 40 13 202204 2022-04-03 A1 B3 . 40 14 202204 2022-04-04 A1 B3 42 42 15 202204 2022-04-05 A1 B3 . 42 16 202204 2022-04-01 A2 B1 . . 17 202204 2022-04-02 A2 B1 . . 18 202204 2022-04-03 A2 B1 15 15 19 202204 2022-04-04 A2 B1 . 15 20 202204 2022-04-05 A2 B1 . 15 21 202205 2022-05-01 A1 B1 . . 22 202205 2022-05-02 A1 B1 . . 23 202205 2022-05-03 A1 B1 12 12 24 202205 2022-05-04 A1 B1 . 12 25 202205 2022-05-05 A1 B1 . 12 26 202205 2022-05-01 A1 B2 . . 27 202205 2022-05-02 A1 B2 . . 28 202205 2022-05-03 A1 B2 22 22 29 202205 2022-05-04 A1 B2 . 22 30 202205 2022-05-05 A1 B2 . 22
Looks like a simple LOCF (last observation carried forward).
You can implement that with the UPDATE statement. Use the source dataset as both the BASE and the TRANSACTION data. Add an OUTPUT statement so that an observation is written after every transaction is applied instead of the normal processing which would only write one observation per BY group that had all of the transactions for it applied.
data want;
update have(obs=0) have;
by group_a group_b;
output;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.