BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
WestChan
Fluorite | Level 6

Hello, 

 

I have a created a data set which includes a a date stamp app_date for every application processed. 

 

I would like to join data from a different table which also includes a date error_date, but I unly want to join the data if the error date is before the app_date but after any prior app_date's.

 

Ive been racking my brain for a couple of hours and I cannot figure out the logic. Here is an example of what Im looking for: 

 

Table1: 

App_num      App_Date

1                     01jan16

                    08jan16

1                     13jan16

 

Table2:

App_num     Error_date     Error_Code

1                     09jan16           xxx

 

Output:

App_num      App_Date     Error_date     Error_Code

1                     01jan16

                    08jan16

1                     13jan16         09jan16           xxx

 

Any ideas to help? 

 

Thanks, 

 

Nukie

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
LaurieF
Barite | Level 11

Sorry - ignore that solution. It's rubbish. As I was walking @schnauzerrosie I rewrote it in my head. By turning table1 into a kind-of  slowing-changing dimension, you can use SQL to create the output you want, with far fewer data contingencies. The only problem here would be if there were more than one error record within a date range.

 

This also, if your tables are really large, allows you to put integrity constraints on the tables, to make sure you're coping with duplicates. Note that I've created a low date for app_date_previous for the first instance per app_num - integrity constraints hate null values.

data table1;
infile cards;
attrib app_num length=8;
attrib app_date length=4 informat=date7. format=date9.;
input app_num
      app_date;
cards;
1 01jan16
1 08jan16
1 13jan16
;
run;

data table2;
infile cards;
attrib app_num length=8;
attrib error_date length=4 informat=date7. format=date9.;
attrib error_code length=$ 3;
input app_num
      error_date
      error_code;
cards;
1 09jan16 xxx
;
run;

data table1_scd;
set table1;
retain app_date_previous;
attrib app_date_previous length=4 format=date9.;
by app_num;
if first.app_num then
   app_date_previous = '1jan1900'd;
output;
app_date_previous = app_date;
run;

proc sql;
create table want as
   select table1.app_num,
          table1.app_date,
          table2.error_date,
          table2.error_code
     from table1_scd as table1
     left join table2
       on table1.app_num = table2.app_num
      and table2.error_date between table1.app_date_previous and table1.app_date
    order by table1.app_num,
             table1.app_date;
quit;

View solution in original post

5 REPLIES 5
LaurieF
Barite | Level 11

This will work, but you may need to tweak it a bit, depending on how what you mean between to signify - is it inclusive at both ends or not? Also if you are going to have multiple entries per app_num in your error table, it becomes more complicated to solve.

 

data table1;
infile cards;
attrib app_num length=8;
attrib app_date informat=date7. format=date9.;
input app_num
      app_date;
cards;
1 01jan16
1 08jan16
1 13jan16
;
run;

data table2;
infile cards;
attrib app_num length=8;
attrib error_date informat=date7. format=date9.;
attrib error_code length=$ 3;
input app_num
      error_date
      error_code;
cards;
1 09jan16 xxx
;
run;

data want;
merge table1(in=in_mast)
      table2(in=in_error);
by app_num;
if in_mast;
attrib save_app_date format=date9.;
attrib save_error_code length=$ 3;
retain save_app_date save_error_date save_error_code;
if first.app_num 
   then do;
        save_error_date = error_date;
        save_error_code = error_code;
        call missing(error_date, error_code); /* This is why the error values are being saved */
        end;
   else if save_app_date < save_error_date < app_date then do;
           error_date = save_error_date;
           error_code = save_error_code;
           end;
save_app_date = app_date;
drop save:;
run;
LaurieF
Barite | Level 11

Sorry - ignore that solution. It's rubbish. As I was walking @schnauzerrosie I rewrote it in my head. By turning table1 into a kind-of  slowing-changing dimension, you can use SQL to create the output you want, with far fewer data contingencies. The only problem here would be if there were more than one error record within a date range.

 

This also, if your tables are really large, allows you to put integrity constraints on the tables, to make sure you're coping with duplicates. Note that I've created a low date for app_date_previous for the first instance per app_num - integrity constraints hate null values.

data table1;
infile cards;
attrib app_num length=8;
attrib app_date length=4 informat=date7. format=date9.;
input app_num
      app_date;
cards;
1 01jan16
1 08jan16
1 13jan16
;
run;

data table2;
infile cards;
attrib app_num length=8;
attrib error_date length=4 informat=date7. format=date9.;
attrib error_code length=$ 3;
input app_num
      error_date
      error_code;
cards;
1 09jan16 xxx
;
run;

data table1_scd;
set table1;
retain app_date_previous;
attrib app_date_previous length=4 format=date9.;
by app_num;
if first.app_num then
   app_date_previous = '1jan1900'd;
output;
app_date_previous = app_date;
run;

proc sql;
create table want as
   select table1.app_num,
          table1.app_date,
          table2.error_date,
          table2.error_code
     from table1_scd as table1
     left join table2
       on table1.app_num = table2.app_num
      and table2.error_date between table1.app_date_previous and table1.app_date
    order by table1.app_num,
             table1.app_date;
quit;
WestChan
Fluorite | Level 6

@LaurieF Perfect! This was the code was looking for.

Ksharp
Super User
data table1;
infile cards;
attrib app_num length=8;
attrib app_date informat=date7. format=date9.;
input app_num
      app_date;
cards;
1 01jan16
1 08jan16
1 13jan16
;
run;

data table2;
infile cards;
attrib app_num length=8;
attrib error_date informat=date7. format=date9.;
attrib error_code length=$ 3;
input app_num
      error_date
      error_code;
cards;
1 09jan16 xxx
;
run;
data want;
 set table1(in=ina) table2(in=inb rename=(error_date=app_date error_code=code));
 by app_num app_date;
 lag_date=lag(app_date);
 lag_code=lag(code);
 if lag(inb)=1 then do;error_date=lag_date;error_code=lag_code;end;
 if ina;
 format error_date date9.;
 drop lag_: code;
run;
mkeintz
PROC Star

The PDV (program data vector) is your friend.  In particular, you can take  advantage of the fact that variables from SET statements are automatically retained until the next SET that retrieves the same vars is encountered.  You want to "retain" vars from table 2 for the  subsequent table1 observation:

 

data want (drop=date);
  set table1 (in=in1 keep=app_date  rename=(app_date=date))
      table2 (in=in2 keep=error_date rename=(error_date=date));
  by date;

  if in1=1 then set table1;
  if in2=1 then set table2;
  if in1;
run;

 

The first SET statement, keeping only the BY variable, is just a means of controlling the sequence of data retrieved by the 2nd and 3rd SET statements. 

 

Editted addition: Just realized I didn't demonstrate how to incorporate this for by groups (e.g. APP_NUM): 

 

data want (drop=date);
  set table1 (in=in1 keep=app_num app_date   rename=(app_date=date))
      table2 (in=in2 keep=app_num error_date rename=(error_date=date));
  by app_num date;

  if in1=1 then set table1;
  if in2=1 then set table2;
  if in1 then output;
  if last.app_num then call missing(of _all_);
run;

 

Notes:

  1. This allows the table2 data to persist for ALL subseqent table 1 observations, until either a new table2 record or end-of-by-group.  If you don't want table2 to persist beyond a single following table1 record, then change 
        if last.app_num then call missing(of _all_);
    to
        if in1 then call missing(of _all_);
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1072 views
  • 1 like
  • 4 in conversation