Solved: Re: SAS Retain values by two or more groups

MM88 · Posted 05-05-2023 07:56 AM

Hi community,

I need help with retaining values inside two o more groups. All samples I can find are with only one group.

My groups look like:

Group_A : A1, A2
Group_B : B1, B2, B3

My dataset is:

data have;
input mydate:ddmmyy10. Group_A $ Group_B $ Val;
format mydate ddmmyy10.;

datalines;
01/04/2022 A1 B1 10
02/04/2022 A1 B1 10
03/04/2022 A1 B1 .
04/04/2022 A1 B1 .
05/04/2022 A1 B1 .
01/04/2022 A1 B2 .
02/04/2022 A1 B2 .
03/04/2022 A1 B2 20
04/04/2022 A1 B2 .
05/04/2022 A1 B2 .
01/04/2022 A1 B3 .
02/04/2022 A1 B3 40
03/04/2022 A1 B3 .
04/04/2022 A1 B3 42
05/04/2022 A1 B3 .
01/04/2022 A2 B1 .
02/04/2022 A2 B1 .
03/04/2022 A2 B1 15
04/04/2022 A2 B1 .
05/04/2022 A2 B1 .
;
run;

I want the following results:

data want;
input mydate:ddmmyy10. Group_A $ Group_B $ Val;
format mydate ddmmyy10.;

datalines;
01/04/2022 A1 B1 10
02/04/2022 A1 B1 10
03/04/2022 A1 B1 10
04/04/2022 A1 B1 10
05/04/2022 A1 B1 10
01/04/2022 A1 B2 .
02/04/2022 A1 B2 .
03/04/2022 A1 B2 20
04/04/2022 A1 B2 20
05/04/2022 A1 B2 20
01/04/2022 A1 B3 .
02/04/2022 A1 B3 40
03/04/2022 A1 B3 40
04/04/2022 A1 B3 42
05/04/2022 A1 B3 42
01/04/2022 A2 B1 .
02/04/2022 A2 B1 .
03/04/2022 A2 B1 15
04/04/2022 A2 B1 15
05/04/2022 A2 B1 15
;
run;

Thanks for your help.
Regards

Tom · Posted 05-05-2023 10:18 AM

So does adding it fix the issue?

data want;
  update have(obs=0) have;
  by period group_a group_b;
  output;
run;

Let's make a dataset that shows the changed variable.

data change;
  set have;
  set want(keep=val rename=(val=new_val));
run;

Result

Obs    period        mydate    Group_a    Group_B    Val    new_val

  1    202204    2022-04-01      A1         B1        10       10
  2    202204    2022-04-02      A1         B1        10       10
  3    202204    2022-04-03      A1         B1         .       10
  4    202204    2022-04-04      A1         B1         .       10
  5    202204    2022-04-05      A1         B1         .       10
  6    202204    2022-04-01      A1         B2         .        .
  7    202204    2022-04-02      A1         B2         .        .
  8    202204    2022-04-03      A1         B2        20       20
  9    202204    2022-04-04      A1         B2         .       20
 10    202204    2022-04-05      A1         B2         .       20
 11    202204    2022-04-01      A1         B3         .        .
 12    202204    2022-04-02      A1         B3        40       40
 13    202204    2022-04-03      A1         B3         .       40
 14    202204    2022-04-04      A1         B3        42       42
 15    202204    2022-04-05      A1         B3         .       42
 16    202204    2022-04-01      A2         B1         .        .
 17    202204    2022-04-02      A2         B1         .        .
 18    202204    2022-04-03      A2         B1        15       15
 19    202204    2022-04-04      A2         B1         .       15
 20    202204    2022-04-05      A2         B1         .       15
 21    202205    2022-05-01      A1         B1         .        .
 22    202205    2022-05-02      A1         B1         .        .
 23    202205    2022-05-03      A1         B1        12       12
 24    202205    2022-05-04      A1         B1         .       12
 25    202205    2022-05-05      A1         B1         .       12
 26    202205    2022-05-01      A1         B2         .        .
 27    202205    2022-05-02      A1         B2         .        .
 28    202205    2022-05-03      A1         B2        22       22
 29    202205    2022-05-04      A1         B2         .       22
 30    202205    2022-05-05      A1         B2         .       22

View solution in original post

AMSAS · Posted 05-05-2023 08:06 AM

If you found examples for one group, then it should be relatively simple to change it to 2 or more groups

I recommend you review BY-Group Processing in the DATA Step
Here's an example:

data have;
input mydate:ddmmyy10. Group_A $ Group_B $ Val;
format mydate ddmmyy10.;

datalines;
01/04/2022 A1 B1 10
02/04/2022 A1 B1 10
03/04/2022 A1 B1 .
04/04/2022 A1 B1 .
05/04/2022 A1 B1 .
01/04/2022 A1 B2 .
02/04/2022 A1 B2 .
03/04/2022 A1 B2 20
04/04/2022 A1 B2 .
05/04/2022 A1 B2 .
01/04/2022 A1 B3 .
02/04/2022 A1 B3 40
03/04/2022 A1 B3 .
04/04/2022 A1 B3 42
05/04/2022 A1 B3 .
01/04/2022 A2 B1 .
02/04/2022 A2 B1 .
03/04/2022 A2 B1 15
04/04/2022 A2 B1 .
05/04/2022 A2 B1 .
;
run;

proc sort 
		data=work.have 
		out=work.srtdHave ;
	by Group_A Group_B mydate ;
run ;

data work.want (drop=value) ;	
	retain value . ;
	set srtdHave ;
	by Group_A Group_B mydate ;
	if first.Group_B then 
		value=. ;
	if val ne . then 
		value=val ;
	else
		val=value ;
	output work.want ;
run ;

MM88 · Posted 05-05-2023 10:05 AM

Hi experts,

Thanks for your reply. I was trying to apply the solutions to my problem, but I have more variables. Please find a modified 'have' dataset.

data have;
input period mydate:ddmmyy10. Group_a $ Group_B $ Val;
format mydate ddmmyy10.;

datalines;
202204 01/04/2022 A1 B1 10
202204 02/04/2022 A1 B1 10
202204 03/04/2022 A1 B1 .
202204 04/04/2022 A1 B1 .
202204 05/04/2022 A1 B1 .
202204 01/04/2022 A1 B2 .
202204 02/04/2022 A1 B2 .
202204 03/04/2022 A1 B2 20
202204 04/04/2022 A1 B2 .
202204 05/04/2022 A1 B2 .
202204 01/04/2022 A1 B3 .
202204 02/04/2022 A1 B3 40
202204 03/04/2022 A1 B3 .
202204 04/04/2022 A1 B3 42
202204 05/04/2022 A1 B3 .
202204 01/04/2022 A2 B1 .
202204 02/04/2022 A2 B1 .
202204 03/04/2022 A2 B1 15
202204 04/04/2022 A2 B1 .
202204 05/04/2022 A2 B1 .
202205 01/05/2022 A1 B1 .
202205 02/05/2022 A1 B1 .
202205 03/05/2022 A1 B1 12
202205 04/05/2022 A1 B1 .
202205 05/05/2022 A1 B1 .
202205 01/05/2022 A1 B2 .
202205 02/05/2022 A1 B2 .
202205 03/05/2022 A1 B2 22
202205 04/05/2022 A1 B2 .
202205 05/05/2022 A1 B2 .

;
run;

The generated dataset has some incorrect val, for example at line:

period: 202205

mydate: 01/05/2022

group_A: A1

group_B: B1

Val: 10 (It must be missing, according to 'have' dataset).

Thanks for the suggestions.

Tom · Posted 05-05-2023 10:12 AM

So PERIOD is one of the BY variables also?

MM88 · Posted 05-05-2023 10:15 AM

Yes, PERIOD is also a BY variable.

Tom · Posted 05-05-2023 10:18 AM

So does adding it fix the issue?

data want;
  update have(obs=0) have;
  by period group_a group_b;
  output;
run;

Let's make a dataset that shows the changed variable.

data change;
  set have;
  set want(keep=val rename=(val=new_val));
run;

Result

Obs    period        mydate    Group_a    Group_B    Val    new_val

  1    202204    2022-04-01      A1         B1        10       10
  2    202204    2022-04-02      A1         B1        10       10
  3    202204    2022-04-03      A1         B1         .       10
  4    202204    2022-04-04      A1         B1         .       10
  5    202204    2022-04-05      A1         B1         .       10
  6    202204    2022-04-01      A1         B2         .        .
  7    202204    2022-04-02      A1         B2         .        .
  8    202204    2022-04-03      A1         B2        20       20
  9    202204    2022-04-04      A1         B2         .       20
 10    202204    2022-04-05      A1         B2         .       20
 11    202204    2022-04-01      A1         B3         .        .
 12    202204    2022-04-02      A1         B3        40       40
 13    202204    2022-04-03      A1         B3         .       40
 14    202204    2022-04-04      A1         B3        42       42
 15    202204    2022-04-05      A1         B3         .       42
 16    202204    2022-04-01      A2         B1         .        .
 17    202204    2022-04-02      A2         B1         .        .
 18    202204    2022-04-03      A2         B1        15       15
 19    202204    2022-04-04      A2         B1         .       15
 20    202204    2022-04-05      A2         B1         .       15
 21    202205    2022-05-01      A1         B1         .        .
 22    202205    2022-05-02      A1         B1         .        .
 23    202205    2022-05-03      A1         B1        12       12
 24    202205    2022-05-04      A1         B1         .       12
 25    202205    2022-05-05      A1         B1         .       12
 26    202205    2022-05-01      A1         B2         .        .
 27    202205    2022-05-02      A1         B2         .        .
 28    202205    2022-05-03      A1         B2        22       22
 29    202205    2022-05-04      A1         B2         .       22
 30    202205    2022-05-05      A1         B2         .       22

Tom · Posted 05-05-2023 09:52 AM

Looks like a simple LOCF (last observation carried forward).

You can implement that with the UPDATE statement. Use the source dataset as both the BASE and the TRANSACTION data. Add an OUTPUT statement so that an observation is written after every transaction is applied instead of the normal processing which would only write one observation per BY group that had all of the transactions for it applied.

data want;
  update have(obs=0) have;
   by group_a group_b;
  output;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!