Hello,
I have a created a data set which includes a a date stamp app_date for every application processed.
I would like to join data from a different table which also includes a date error_date, but I unly want to join the data if the error date is before the app_date but after any prior app_date's.
Ive been racking my brain for a couple of hours and I cannot figure out the logic. Here is an example of what Im looking for:
Table1:
App_num App_Date
1 01jan16
1 08jan16
1 13jan16
Table2:
App_num Error_date Error_Code
1 09jan16 xxx
Output:
App_num App_Date Error_date Error_Code
1 01jan16
1 08jan16
1 13jan16 09jan16 xxx
Any ideas to help?
Thanks,
Nukie
Sorry - ignore that solution. It's rubbish. As I was walking @schnauzerrosie I rewrote it in my head. By turning table1 into a kind-of slowing-changing dimension, you can use SQL to create the output you want, with far fewer data contingencies. The only problem here would be if there were more than one error record within a date range.
This also, if your tables are really large, allows you to put integrity constraints on the tables, to make sure you're coping with duplicates. Note that I've created a low date for app_date_previous for the first instance per app_num - integrity constraints hate null values.
data table1;
infile cards;
attrib app_num length=8;
attrib app_date length=4 informat=date7. format=date9.;
input app_num
app_date;
cards;
1 01jan16
1 08jan16
1 13jan16
;
run;
data table2;
infile cards;
attrib app_num length=8;
attrib error_date length=4 informat=date7. format=date9.;
attrib error_code length=$ 3;
input app_num
error_date
error_code;
cards;
1 09jan16 xxx
;
run;
data table1_scd;
set table1;
retain app_date_previous;
attrib app_date_previous length=4 format=date9.;
by app_num;
if first.app_num then
app_date_previous = '1jan1900'd;
output;
app_date_previous = app_date;
run;
proc sql;
create table want as
select table1.app_num,
table1.app_date,
table2.error_date,
table2.error_code
from table1_scd as table1
left join table2
on table1.app_num = table2.app_num
and table2.error_date between table1.app_date_previous and table1.app_date
order by table1.app_num,
table1.app_date;
quit;
This will work, but you may need to tweak it a bit, depending on how what you mean between to signify - is it inclusive at both ends or not? Also if you are going to have multiple entries per app_num in your error table, it becomes more complicated to solve.
data table1;
infile cards;
attrib app_num length=8;
attrib app_date informat=date7. format=date9.;
input app_num
app_date;
cards;
1 01jan16
1 08jan16
1 13jan16
;
run;
data table2;
infile cards;
attrib app_num length=8;
attrib error_date informat=date7. format=date9.;
attrib error_code length=$ 3;
input app_num
error_date
error_code;
cards;
1 09jan16 xxx
;
run;
data want;
merge table1(in=in_mast)
table2(in=in_error);
by app_num;
if in_mast;
attrib save_app_date format=date9.;
attrib save_error_code length=$ 3;
retain save_app_date save_error_date save_error_code;
if first.app_num
then do;
save_error_date = error_date;
save_error_code = error_code;
call missing(error_date, error_code); /* This is why the error values are being saved */
end;
else if save_app_date < save_error_date < app_date then do;
error_date = save_error_date;
error_code = save_error_code;
end;
save_app_date = app_date;
drop save:;
run;
Sorry - ignore that solution. It's rubbish. As I was walking @schnauzerrosie I rewrote it in my head. By turning table1 into a kind-of slowing-changing dimension, you can use SQL to create the output you want, with far fewer data contingencies. The only problem here would be if there were more than one error record within a date range.
This also, if your tables are really large, allows you to put integrity constraints on the tables, to make sure you're coping with duplicates. Note that I've created a low date for app_date_previous for the first instance per app_num - integrity constraints hate null values.
data table1;
infile cards;
attrib app_num length=8;
attrib app_date length=4 informat=date7. format=date9.;
input app_num
app_date;
cards;
1 01jan16
1 08jan16
1 13jan16
;
run;
data table2;
infile cards;
attrib app_num length=8;
attrib error_date length=4 informat=date7. format=date9.;
attrib error_code length=$ 3;
input app_num
error_date
error_code;
cards;
1 09jan16 xxx
;
run;
data table1_scd;
set table1;
retain app_date_previous;
attrib app_date_previous length=4 format=date9.;
by app_num;
if first.app_num then
app_date_previous = '1jan1900'd;
output;
app_date_previous = app_date;
run;
proc sql;
create table want as
select table1.app_num,
table1.app_date,
table2.error_date,
table2.error_code
from table1_scd as table1
left join table2
on table1.app_num = table2.app_num
and table2.error_date between table1.app_date_previous and table1.app_date
order by table1.app_num,
table1.app_date;
quit;
@LaurieF Perfect! This was the code was looking for.
data table1;
infile cards;
attrib app_num length=8;
attrib app_date informat=date7. format=date9.;
input app_num
app_date;
cards;
1 01jan16
1 08jan16
1 13jan16
;
run;
data table2;
infile cards;
attrib app_num length=8;
attrib error_date informat=date7. format=date9.;
attrib error_code length=$ 3;
input app_num
error_date
error_code;
cards;
1 09jan16 xxx
;
run;
data want;
set table1(in=ina) table2(in=inb rename=(error_date=app_date error_code=code));
by app_num app_date;
lag_date=lag(app_date);
lag_code=lag(code);
if lag(inb)=1 then do;error_date=lag_date;error_code=lag_code;end;
if ina;
format error_date date9.;
drop lag_: code;
run;
The PDV (program data vector) is your friend. In particular, you can take advantage of the fact that variables from SET statements are automatically retained until the next SET that retrieves the same vars is encountered. You want to "retain" vars from table 2 for the subsequent table1 observation:
data want (drop=date);
set table1 (in=in1 keep=app_date rename=(app_date=date))
table2 (in=in2 keep=error_date rename=(error_date=date));
by date;
if in1=1 then set table1;
if in2=1 then set table2;
if in1;
run;
The first SET statement, keeping only the BY variable, is just a means of controlling the sequence of data retrieved by the 2nd and 3rd SET statements.
Editted addition: Just realized I didn't demonstrate how to incorporate this for by groups (e.g. APP_NUM):
data want (drop=date);
set table1 (in=in1 keep=app_num app_date rename=(app_date=date))
table2 (in=in2 keep=app_num error_date rename=(error_date=date));
by app_num date;
if in1=1 then set table1;
if in2=1 then set table2;
if in1 then output;
if last.app_num then call missing(of _all_);
run;
Notes:
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.