DATA Step, Macro, Functions and more

Merging data with dates between two records

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 7
Accepted Solution

Merging data with dates between two records

Hello, 

 

I have a created a data set which includes a a date stamp app_date for every application processed. 

 

I would like to join data from a different table which also includes a date error_date, but I unly want to join the data if the error date is before the app_date but after any prior app_date's.

 

Ive been racking my brain for a couple of hours and I cannot figure out the logic. Here is an example of what Im looking for: 

 

Table1: 

App_num      App_Date

1                     01jan16

                    08jan16

1                     13jan16

 

Table2:

App_num     Error_date     Error_Code

1                     09jan16           xxx

 

Output:

App_num      App_Date     Error_date     Error_Code

1                     01jan16

                    08jan16

1                     13jan16         09jan16           xxx

 

Any ideas to help? 

 

Thanks, 

 

Nukie

 

 


Accepted Solutions
Solution
‎02-22-2017 11:20 PM
Super Contributor
Posts: 252

Re: Merging data with dates between two records

Sorry - ignore that solution. It's rubbish. As I was walking @schnauzerrosie I rewrote it in my head. By turning table1 into a kind-of  slowing-changing dimension, you can use SQL to create the output you want, with far fewer data contingencies. The only problem here would be if there were more than one error record within a date range.

 

This also, if your tables are really large, allows you to put integrity constraints on the tables, to make sure you're coping with duplicates. Note that I've created a low date for app_date_previous for the first instance per app_num - integrity constraints hate null values.

data table1;
infile cards;
attrib app_num length=8;
attrib app_date length=4 informat=date7. format=date9.;
input app_num
      app_date;
cards;
1 01jan16
1 08jan16
1 13jan16
;
run;

data table2;
infile cards;
attrib app_num length=8;
attrib error_date length=4 informat=date7. format=date9.;
attrib error_code length=$ 3;
input app_num
      error_date
      error_code;
cards;
1 09jan16 xxx
;
run;

data table1_scd;
set table1;
retain app_date_previous;
attrib app_date_previous length=4 format=date9.;
by app_num;
if first.app_num then
   app_date_previous = '1jan1900'd;
output;
app_date_previous = app_date;
run;

proc sql;
create table want as
   select table1.app_num,
          table1.app_date,
          table2.error_date,
          table2.error_code
     from table1_scd as table1
     left join table2
       on table1.app_num = table2.app_num
      and table2.error_date between table1.app_date_previous and table1.app_date
    order by table1.app_num,
             table1.app_date;
quit;

View solution in original post


All Replies
Super Contributor
Posts: 252

Re: Merging data with dates between two records

This will work, but you may need to tweak it a bit, depending on how what you mean between to signify - is it inclusive at both ends or not? Also if you are going to have multiple entries per app_num in your error table, it becomes more complicated to solve.

 

data table1;
infile cards;
attrib app_num length=8;
attrib app_date informat=date7. format=date9.;
input app_num
      app_date;
cards;
1 01jan16
1 08jan16
1 13jan16
;
run;

data table2;
infile cards;
attrib app_num length=8;
attrib error_date informat=date7. format=date9.;
attrib error_code length=$ 3;
input app_num
      error_date
      error_code;
cards;
1 09jan16 xxx
;
run;

data want;
merge table1(in=in_mast)
      table2(in=in_error);
by app_num;
if in_mast;
attrib save_app_date format=date9.;
attrib save_error_code length=$ 3;
retain save_app_date save_error_date save_error_code;
if first.app_num 
   then do;
        save_error_date = error_date;
        save_error_code = error_code;
        call missing(error_date, error_code); /* This is why the error values are being saved */
        end;
   else if save_app_date < save_error_date < app_date then do;
           error_date = save_error_date;
           error_code = save_error_code;
           end;
save_app_date = app_date;
drop save:;
run;
Solution
‎02-22-2017 11:20 PM
Super Contributor
Posts: 252

Re: Merging data with dates between two records

Sorry - ignore that solution. It's rubbish. As I was walking @schnauzerrosie I rewrote it in my head. By turning table1 into a kind-of  slowing-changing dimension, you can use SQL to create the output you want, with far fewer data contingencies. The only problem here would be if there were more than one error record within a date range.

 

This also, if your tables are really large, allows you to put integrity constraints on the tables, to make sure you're coping with duplicates. Note that I've created a low date for app_date_previous for the first instance per app_num - integrity constraints hate null values.

data table1;
infile cards;
attrib app_num length=8;
attrib app_date length=4 informat=date7. format=date9.;
input app_num
      app_date;
cards;
1 01jan16
1 08jan16
1 13jan16
;
run;

data table2;
infile cards;
attrib app_num length=8;
attrib error_date length=4 informat=date7. format=date9.;
attrib error_code length=$ 3;
input app_num
      error_date
      error_code;
cards;
1 09jan16 xxx
;
run;

data table1_scd;
set table1;
retain app_date_previous;
attrib app_date_previous length=4 format=date9.;
by app_num;
if first.app_num then
   app_date_previous = '1jan1900'd;
output;
app_date_previous = app_date;
run;

proc sql;
create table want as
   select table1.app_num,
          table1.app_date,
          table2.error_date,
          table2.error_code
     from table1_scd as table1
     left join table2
       on table1.app_num = table2.app_num
      and table2.error_date between table1.app_date_previous and table1.app_date
    order by table1.app_num,
             table1.app_date;
quit;
Occasional Contributor
Posts: 7

Re: Merging data with dates between two records

@LaurieF Perfect! This was the code was looking for.

Super User
Posts: 10,023

Re: Merging data with dates between two records

data table1;
infile cards;
attrib app_num length=8;
attrib app_date informat=date7. format=date9.;
input app_num
      app_date;
cards;
1 01jan16
1 08jan16
1 13jan16
;
run;

data table2;
infile cards;
attrib app_num length=8;
attrib error_date informat=date7. format=date9.;
attrib error_code length=$ 3;
input app_num
      error_date
      error_code;
cards;
1 09jan16 xxx
;
run;
data want;
 set table1(in=ina) table2(in=inb rename=(error_date=app_date error_code=code));
 by app_num app_date;
 lag_date=lag(app_date);
 lag_code=lag(code);
 if lag(inb)=1 then do;error_date=lag_date;error_code=lag_code;end;
 if ina;
 format error_date date9.;
 drop lag_: code;
run;
Trusted Advisor
Posts: 1,019

Re: Merging data with dates between two records

[ Edited ]

The PDV (program data vector) is your friend.  In particular, you can take  advantage of the fact that variables from SET statements are automatically retained until the next SET that retrieves the same vars is encountered.  You want to "retain" vars from table 2 for the  subsequent table1 observation:

 

data want (drop=date);
  set table1 (in=in1 keep=app_date  rename=(app_date=date))
      table2 (in=in2 keep=error_date rename=(error_date=date));
  by date;

  if in1=1 then set table1;
  if in2=1 then set table2;
  if in1;
run;

 

The first SET statement, keeping only the BY variable, is just a means of controlling the sequence of data retrieved by the 2nd and 3rd SET statements. 

 

Editted addition: Just realized I didn't demonstrate how to incorporate this for by groups (e.g. APP_NUM): 

 

data want (drop=date);
  set table1 (in=in1 keep=app_num app_date   rename=(app_date=date))
      table2 (in=in2 keep=app_num error_date rename=(error_date=date));
  by app_num date;

  if in1=1 then set table1;
  if in2=1 then set table2;
  if in1 then output;
  if last.app_num then call missing(of _all_);
run;

 

Notes:

  1. This allows the table2 data to persist for ALL subseqent table 1 observations, until either a new table2 record or end-of-by-group.  If you don't want table2 to persist beyond a single following table1 record, then change 
        if last.app_num then call missing(of _all_);
    to
        if in1 then call missing(of _all_);
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 188 views
  • 1 like
  • 4 in conversation