Hi,
I am trying to merge two datasets together (have1 and have2) to create the table look_like which is the same as Have1 but has the variable type added to it.
Have1 table is well structed to have month end snapshots.
Have2 table will only populate when there is a change noting the date of change and the change type (only 3 examples of change type here but could be many more).
I need to merge type in the Have2 table to Have1 and the date_change within Have2 is important to when the type should appear in the Have1 table.
I need the merge to understand that the type should be merged to the ID that has the same date or the closest change date before it.
For instance, ID of 2 in have2 table has "new" for type in 01Jun22. This means in Have1 table, ID of 2 should show new for 01Jun22 and 01Jul22. However, for 01Aug22, there has been an update for ID of 2 and type should now say "change".
I am trying to avoid post merge manipulation as the actual data has millions of records and feel this could be hard to fix but welcome suggestions if my gut feeling is wrong on this.
data have1;
infile datalines dlm=' ' truncover;
input ID $ month_snapshot :date9. balance;
format start_month ddmmyy10.;
datalines;
1 01jun22 1000
1 01jul22 900
1 01aug22 800
2 01jun22 1000
2 01jul22 900
2 01aug22 800
3 01jun22 1000
3 01jul22 900
3 01aug22 800
;
run;
data have2;
infile datalines dlm='' truncover;
input ID $ date_change :date9. type $;
format date_change ddmmyy10.;
datalines;
1 01jun22 new
1 25jul22 change
2 01jun22 new
2 01aug22 change
3 01jun22 new
3 22jun22 change
3 28jul22 fixed
;
run;
data look_like;
infile datalines dlm='' truncover;
input ID $ start_month :date9. balance type $;
format start_month ddmmyy10.;
datalines;
1 01jun22 1000 new
1 01jul22 900 new
1 01aug22 800 change
2 01jun22 1000 new
2 01jul22 900 change
2 01aug22 800 change
3 01jun22 1000 new
3 01jul22 900 change
3 01aug22 800 fixed
;
run;
data look_like_attempt;
merge
have1 (in=a)
have2 (in=b drop=date_change)
;
by ID;
run;
Taking a chance that there is an error in your desired LOOKLIKE dataset, here is a program that carries forward the value of TYPE from unmatched HAVE2 observations to subsequent HAVE1 observations:
data want ;
merge have1 (keep= rename=(month_snapshot=start_month) in=in1)
have2 (keep=id date_change rename=(date_change=start_month) in=in2) ;
by id start_month;
if in2 then set have2 (drop=id date_change);
if in1 then output;
if last.id then call missing(of _all_);
run;
This assumes both data sets are sorted by ID and relevent date.
All the variables in the conditional SET statement are "retained" until the same SET statement is conditionally executed. This is what carries forward the value of TYPE (and all other HAVE2 variables except those that are dropped). It is also the reason for the "if last.id then call missing ..." statement - you don't want to carry forward any values from one ID to the next.
You might also want to look at my paper History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies
Why do you carry forward type="change" for ID=1 from July 25 to Aug 1, but for ID=2 you apparently carry backward type=new from Aug 1 back to July 1?
Editted additional note:
In other words, from your description I expected the resulting sequence for ID=2 to be
but you have NEW, CHANGE, CHANGE
What are the rules you are trying to apply?
Taking a chance that there is an error in your desired LOOKLIKE dataset, here is a program that carries forward the value of TYPE from unmatched HAVE2 observations to subsequent HAVE1 observations:
data want ;
merge have1 (keep= rename=(month_snapshot=start_month) in=in1)
have2 (keep=id date_change rename=(date_change=start_month) in=in2) ;
by id start_month;
if in2 then set have2 (drop=id date_change);
if in1 then output;
if last.id then call missing(of _all_);
run;
This assumes both data sets are sorted by ID and relevent date.
All the variables in the conditional SET statement are "retained" until the same SET statement is conditionally executed. This is what carries forward the value of TYPE (and all other HAVE2 variables except those that are dropped). It is also the reason for the "if last.id then call missing ..." statement - you don't want to carry forward any values from one ID to the next.
You might also want to look at my paper History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies
Your gut feeling was correct and I did have a couple of typo/errors in there - apologies for that.
Thank you very much for the solution. A smart way to do it and I'll certainly have a read of your paper.
Taking a chance that there is an error in your desired LOOKLIKE dataset, here is a program that carries forward the value of TYPE from unmatched HAVE2 observations to subsequent HAVE1 observations:
data want ;
merge have1 (keep= rename=(month_snapshot=start_month) in=in1)
have2 (keep=id date_change rename=(date_change=start_month) in=in2) ;
by id start_month;
if in2 then set have2 (drop=id date_change);
if in1 then output;
if last.id then call missing(of _all_);
run;
This assumes both data sets are sorted by ID and relevant date.
All the variables in the conditional SET statement are "retained" until the same SET statement is conditionally executed. This is what carries forward the value of TYPE (and all other HAVE2 variables except those that are dropped). It is also the reason for the "if last.id then call missing ..." statement - you don't want to carry forward any values from one ID to the next.
You might also want to look at my paper History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.