Solved: Re: Merging data with dates

Leehammy · Posted 11-02-2022 01:21 PM

Hi,

I am trying to merge two datasets together (have1 and have2) to create the table look_like which is the same as Have1 but has the variable type added to it.

Have1 table is well structed to have month end snapshots.

Have2 table will only populate when there is a change noting the date of change and the change type (only 3 examples of change type here but could be many more).

I need to merge type in the Have2 table to Have1 and the date_change within Have2 is important to when the type should appear in the Have1 table.

I need the merge to understand that the type should be merged to the ID that has the same date or the closest change date before it.

For instance, ID of 2 in have2 table has "new" for type in 01Jun22. This means in Have1 table, ID of 2 should show new for 01Jun22 and 01Jul22. However, for 01Aug22, there has been an update for ID of 2 and type should now say "change".

I am trying to avoid post merge manipulation as the actual data has millions of records and feel this could be hard to fix but welcome suggestions if my gut feeling is wrong on this.

data have1;
infile datalines dlm=' ' truncover;
input ID $ month_snapshot :date9. balance;
format start_month ddmmyy10.;
datalines;
1 01jun22 1000
1 01jul22 900
1 01aug22 800
2 01jun22 1000
2 01jul22 900
2 01aug22 800
3 01jun22 1000
3 01jul22 900
3 01aug22 800
;
run;
data have2;
infile datalines dlm='' truncover;
input ID $ date_change :date9. type $;
format date_change ddmmyy10.;
datalines;
1 01jun22 new
1 25jul22 change
2 01jun22 new
2 01aug22 change
3 01jun22 new
3 22jun22 change
3 28jul22 fixed
;
run;
data look_like;
infile datalines dlm='' truncover;
input ID $ start_month :date9. balance type $;
format start_month ddmmyy10.;
datalines;
1 01jun22 1000 new
1 01jul22 900 new
1 01aug22 800 change
2 01jun22 1000 new
2 01jul22 900 change
2 01aug22 800 change
3 01jun22 1000 new
3 01jul22 900 change
3 01aug22 800 fixed
;
run;
data look_like_attempt;
merge
have1 (in=a)
have2 (in=b drop=date_change)
;
by ID;
run;

mkeintz · Posted 11-02-2022 02:11 PM

Taking a chance that there is an error in your desired LOOKLIKE dataset, here is a program that carries forward the value of TYPE from unmatched HAVE2 observations to subsequent HAVE1 observations:

data want ;
  merge have1 (keep=                rename=(month_snapshot=start_month) in=in1)
        have2 (keep=id date_change  rename=(date_change=start_month)    in=in2) ;
  by id start_month;
  if in2 then set have2 (drop=id date_change);
  if in1 then output;
  if last.id then call missing(of _all_);
run;

This assumes both data sets are sorted by ID and relevent date.

All the variables in the conditional SET statement are "retained" until the same SET statement is conditionally executed. This is what carries forward the value of TYPE (and all other HAVE2 variables except those that are dropped). It is also the reason for the "if last.id then call missing ..." statement - you don't want to carry forward any values from one ID to the next.

You might also want to look at my paper History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

mkeintz · Posted 11-02-2022 01:51 PM

Why do you carry forward type="change" for ID=1 from July 25 to Aug 1, but for ID=2 you apparently carry backward type=new from Aug 1 back to July 1?

Editted additional note:

In other words, from your description I expected the resulting sequence for ID=2 to be

NEW (for Jun 1, from HAVE2)
NEW (for Jul 1 .. carried forward from Jun 1)
CHANGE (for Aug 1, from HAVE2)

but you have NEW, CHANGE, CHANGE

What are the rules you are trying to apply?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

mkeintz · Posted 11-02-2022 02:11 PM

Taking a chance that there is an error in your desired LOOKLIKE dataset, here is a program that carries forward the value of TYPE from unmatched HAVE2 observations to subsequent HAVE1 observations:

data want ;
  merge have1 (keep=                rename=(month_snapshot=start_month) in=in1)
        have2 (keep=id date_change  rename=(date_change=start_month)    in=in2) ;
  by id start_month;
  if in2 then set have2 (drop=id date_change);
  if in1 then output;
  if last.id then call missing(of _all_);
run;

This assumes both data sets are sorted by ID and relevent date.

All the variables in the conditional SET statement are "retained" until the same SET statement is conditionally executed. This is what carries forward the value of TYPE (and all other HAVE2 variables except those that are dropped). It is also the reason for the "if last.id then call missing ..." statement - you don't want to carry forward any values from one ID to the next.

You might also want to look at my paper History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Leehammy · Posted 11-03-2022 05:01 AM

Your gut feeling was correct and I did have a couple of typo/errors in there - apologies for that.

Thank you very much for the solution. A smart way to do it and I'll certainly have a read of your paper.

mkeintz · Posted 11-02-2022 02:12 PM

Taking a chance that there is an error in your desired LOOKLIKE dataset, here is a program that carries forward the value of TYPE from unmatched HAVE2 observations to subsequent HAVE1 observations:

data want ;
  merge have1 (keep=                rename=(month_snapshot=start_month) in=in1)
        have2 (keep=id date_change  rename=(date_change=start_month)    in=in2) ;
  by id start_month;
  if in2 then set have2 (drop=id date_change);
  if in1 then output;
  if last.id then call missing(of _all_);
run;

This assumes both data sets are sorted by ID and relevant date.

All the variables in the conditional SET statement are "retained" until the same SET statement is conditionally executed. This is what carries forward the value of TYPE (and all other HAVE2 variables except those that are dropped). It is also the reason for the "if last.id then call missing ..." statement - you don't want to carry forward any values from one ID to the next.

You might also want to look at my paper History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Merging data with dates

Re: Merging data with dates

Re: Merging data with dates

Re: Merging data with dates

Re: Merging data with dates

Re: Merging data with dates

Ready to join fellow brilliant minds for the SAS Hackathon?

Click image to register for webinar

Classroom Training Available!