BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
MM88
Calcite | Level 5

Hi community,

 

I need help with retaining values inside two o more groups. All samples I can find are with only one group.

My groups look like:

Group_A : A1, A2
Group_B : B1, B2, B3

 

My dataset is:

 

data have;
input mydate:ddmmyy10. Group_A $ Group_B $ Val;
format mydate ddmmyy10.;

datalines;
01/04/2022 A1 B1 10
02/04/2022 A1 B1 10
03/04/2022 A1 B1 .
04/04/2022 A1 B1 .
05/04/2022 A1 B1 .
01/04/2022 A1 B2 .
02/04/2022 A1 B2 .
03/04/2022 A1 B2 20
04/04/2022 A1 B2 .
05/04/2022 A1 B2 .
01/04/2022 A1 B3 .
02/04/2022 A1 B3 40
03/04/2022 A1 B3 .
04/04/2022 A1 B3 42
05/04/2022 A1 B3 .
01/04/2022 A2 B1 .
02/04/2022 A2 B1 .
03/04/2022 A2 B1 15
04/04/2022 A2 B1 .
05/04/2022 A2 B1 .
;
run;


I want the following results:

 

data want;
input mydate:ddmmyy10. Group_A $ Group_B $ Val;
format mydate ddmmyy10.;

datalines;
01/04/2022 A1 B1 10
02/04/2022 A1 B1 10
03/04/2022 A1 B1 10
04/04/2022 A1 B1 10
05/04/2022 A1 B1 10
01/04/2022 A1 B2 .
02/04/2022 A1 B2 .
03/04/2022 A1 B2 20
04/04/2022 A1 B2 20
05/04/2022 A1 B2 20
01/04/2022 A1 B3 .
02/04/2022 A1 B3 40
03/04/2022 A1 B3 40
04/04/2022 A1 B3 42
05/04/2022 A1 B3 42
01/04/2022 A2 B1 .
02/04/2022 A2 B1 .
03/04/2022 A2 B1 15
04/04/2022 A2 B1 15
05/04/2022 A2 B1 15
;
run;

Thanks for your help.
Regards

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

So does adding it fix the issue?

data want;
  update have(obs=0) have;
  by period group_a group_b;
  output;
run;

Let's make a dataset that shows the changed variable.

data change;
  set have;
  set want(keep=val rename=(val=new_val));
run;

Result

Obs    period        mydate    Group_a    Group_B    Val    new_val

  1    202204    2022-04-01      A1         B1        10       10
  2    202204    2022-04-02      A1         B1        10       10
  3    202204    2022-04-03      A1         B1         .       10
  4    202204    2022-04-04      A1         B1         .       10
  5    202204    2022-04-05      A1         B1         .       10
  6    202204    2022-04-01      A1         B2         .        .
  7    202204    2022-04-02      A1         B2         .        .
  8    202204    2022-04-03      A1         B2        20       20
  9    202204    2022-04-04      A1         B2         .       20
 10    202204    2022-04-05      A1         B2         .       20
 11    202204    2022-04-01      A1         B3         .        .
 12    202204    2022-04-02      A1         B3        40       40
 13    202204    2022-04-03      A1         B3         .       40
 14    202204    2022-04-04      A1         B3        42       42
 15    202204    2022-04-05      A1         B3         .       42
 16    202204    2022-04-01      A2         B1         .        .
 17    202204    2022-04-02      A2         B1         .        .
 18    202204    2022-04-03      A2         B1        15       15
 19    202204    2022-04-04      A2         B1         .       15
 20    202204    2022-04-05      A2         B1         .       15
 21    202205    2022-05-01      A1         B1         .        .
 22    202205    2022-05-02      A1         B1         .        .
 23    202205    2022-05-03      A1         B1        12       12
 24    202205    2022-05-04      A1         B1         .       12
 25    202205    2022-05-05      A1         B1         .       12
 26    202205    2022-05-01      A1         B2         .        .
 27    202205    2022-05-02      A1         B2         .        .
 28    202205    2022-05-03      A1         B2        22       22
 29    202205    2022-05-04      A1         B2         .       22
 30    202205    2022-05-05      A1         B2         .       22

View solution in original post

6 REPLIES 6
AMSAS
SAS Super FREQ

If you found examples for one group, then it should be relatively simple to change it to 2 or more groups

I recommend you review BY-Group Processing in the DATA Step 
Here's an example:

 

data have;
input mydate:ddmmyy10. Group_A $ Group_B $ Val;
format mydate ddmmyy10.;

datalines;
01/04/2022 A1 B1 10
02/04/2022 A1 B1 10
03/04/2022 A1 B1 .
04/04/2022 A1 B1 .
05/04/2022 A1 B1 .
01/04/2022 A1 B2 .
02/04/2022 A1 B2 .
03/04/2022 A1 B2 20
04/04/2022 A1 B2 .
05/04/2022 A1 B2 .
01/04/2022 A1 B3 .
02/04/2022 A1 B3 40
03/04/2022 A1 B3 .
04/04/2022 A1 B3 42
05/04/2022 A1 B3 .
01/04/2022 A2 B1 .
02/04/2022 A2 B1 .
03/04/2022 A2 B1 15
04/04/2022 A2 B1 .
05/04/2022 A2 B1 .
;
run;

proc sort 
		data=work.have 
		out=work.srtdHave ;
	by Group_A Group_B mydate ;
run ;

data work.want (drop=value) ;	
	retain value . ;
	set srtdHave ;
	by Group_A Group_B mydate ;
	if first.Group_B then 
		value=. ;
	if val ne . then 
		value=val ;
	else
		val=value ;
	output work.want ;
run ;

 

MM88
Calcite | Level 5

Hi experts,

 

Thanks for your reply. I was trying to apply the solutions to my problem, but I have more variables. Please find a modified 'have' dataset.

 

data have;
input period mydate:ddmmyy10. Group_a $ Group_B $ Val;
format mydate ddmmyy10.;

datalines;
202204 01/04/2022 A1 B1 10
202204 02/04/2022 A1 B1 10
202204 03/04/2022 A1 B1 .
202204 04/04/2022 A1 B1 .
202204 05/04/2022 A1 B1 .
202204 01/04/2022 A1 B2 .
202204 02/04/2022 A1 B2 .
202204 03/04/2022 A1 B2 20
202204 04/04/2022 A1 B2 .
202204 05/04/2022 A1 B2 .
202204 01/04/2022 A1 B3 .
202204 02/04/2022 A1 B3 40
202204 03/04/2022 A1 B3 .
202204 04/04/2022 A1 B3 42
202204 05/04/2022 A1 B3 .
202204 01/04/2022 A2 B1 .
202204 02/04/2022 A2 B1 .
202204 03/04/2022 A2 B1 15
202204 04/04/2022 A2 B1 .
202204 05/04/2022 A2 B1 .
202205 01/05/2022 A1 B1 .
202205 02/05/2022 A1 B1 .
202205 03/05/2022 A1 B1 12
202205 04/05/2022 A1 B1 .
202205 05/05/2022 A1 B1 .
202205 01/05/2022 A1 B2 .
202205 02/05/2022 A1 B2 .
202205 03/05/2022 A1 B2 22
202205 04/05/2022 A1 B2 .
202205 05/05/2022 A1 B2 .

;
run;

The generated dataset has some incorrect val, for example at line:

period: 202205

mydate: 01/05/2022

group_A: A1

group_B: B1

Val: 10 (It must be missing, according to 'have' dataset).

 

Thanks for the suggestions.

 

Tom
Super User Tom
Super User

So PERIOD is one of the BY variables also?

 

MM88
Calcite | Level 5
Yes, PERIOD is also a BY variable.
Tom
Super User Tom
Super User

So does adding it fix the issue?

data want;
  update have(obs=0) have;
  by period group_a group_b;
  output;
run;

Let's make a dataset that shows the changed variable.

data change;
  set have;
  set want(keep=val rename=(val=new_val));
run;

Result

Obs    period        mydate    Group_a    Group_B    Val    new_val

  1    202204    2022-04-01      A1         B1        10       10
  2    202204    2022-04-02      A1         B1        10       10
  3    202204    2022-04-03      A1         B1         .       10
  4    202204    2022-04-04      A1         B1         .       10
  5    202204    2022-04-05      A1         B1         .       10
  6    202204    2022-04-01      A1         B2         .        .
  7    202204    2022-04-02      A1         B2         .        .
  8    202204    2022-04-03      A1         B2        20       20
  9    202204    2022-04-04      A1         B2         .       20
 10    202204    2022-04-05      A1         B2         .       20
 11    202204    2022-04-01      A1         B3         .        .
 12    202204    2022-04-02      A1         B3        40       40
 13    202204    2022-04-03      A1         B3         .       40
 14    202204    2022-04-04      A1         B3        42       42
 15    202204    2022-04-05      A1         B3         .       42
 16    202204    2022-04-01      A2         B1         .        .
 17    202204    2022-04-02      A2         B1         .        .
 18    202204    2022-04-03      A2         B1        15       15
 19    202204    2022-04-04      A2         B1         .       15
 20    202204    2022-04-05      A2         B1         .       15
 21    202205    2022-05-01      A1         B1         .        .
 22    202205    2022-05-02      A1         B1         .        .
 23    202205    2022-05-03      A1         B1        12       12
 24    202205    2022-05-04      A1         B1         .       12
 25    202205    2022-05-05      A1         B1         .       12
 26    202205    2022-05-01      A1         B2         .        .
 27    202205    2022-05-02      A1         B2         .        .
 28    202205    2022-05-03      A1         B2        22       22
 29    202205    2022-05-04      A1         B2         .       22
 30    202205    2022-05-05      A1         B2         .       22
Tom
Super User Tom
Super User

Looks like a simple LOCF (last observation carried forward).

 

You can implement that with the UPDATE statement.  Use the source dataset as both the BASE and the TRANSACTION data. Add an OUTPUT statement so that an observation is written after every transaction is applied instead of the normal processing which would only write one observation per BY group that had all of the transactions for it applied.

data want;
  update have(obs=0) have;
   by group_a group_b;
  output;
run;

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Creating Custom Steps in SAS Studio

Check out this tutorial series to learn how to build your own steps in SAS Studio.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1608 views
  • 0 likes
  • 3 in conversation