/*Hi SAS Community,
I have two tables (Have_1 and Have_2).
*/
data Have_1;
informat Current_date date9.;
Input Current_date Account_number $12-14 Arrears_Band $ 17-23 balance bank_number;
Format Current_date date9.;
cards;
28FEB2010 111 NPNA 10 100
31MAR2010 111 Current 20 100
31MAY2010 111 30 - 60 30 100
28FEB2010 444 Current 40 900
31MAR2010 444 30 - 60 50 900
30APR2010 444 30 - 60 60 900
31MAY2010 444 Current 70 900
30APR2010 555 Current 80 700
31MAY2010 555 NPNA 90 700
31AUG2011 555 90 + 12 700
30SEP2011 555 NPNA 0 700
31MAY2010 666 Current 15 400
31AUG2011 666 90 + 20 400
30SEP2011 666 Current 30 400
;
run;
data Have_2;
informat Post_date date9.;
Input Post_date Account_number $12-14 Net_Write_off_total_USD bank_number;
Format Post_date date9.;
cards;
31AUG2010 111 50 100
31AUG2010 111 50 200
30SEP2011 555 77 700
30SEP2011 666 77 300
;
run;
Question:
I need to join these 2 tables so that I could get the answer below. It is mandatory that I need to keep all the records in have_1 table regardless of whether the have_2 table is having corresponding matching records or not. But reverse is not mandatory, i.e. have_2 table may have excess (non matching) records, and they should not show up in final table that I want.
Answer:
/* Final data set should be like this
Notice the 2 records from have_2 table are embedded within have_1 table in date sequence. And a value called “writoff” was assigned.
Current_date Account_number Arrears_Band balance bank_number (these are headings of below table)
28FEB2010 111 NPNA 10 100
31MAR2010 111 Current 20 100
31MAY2010 111 30 - 60 30 100
31AUG2010 111 writoff 50 100
28FEB2010 444 Current 40 900
31MAR2010 444 30 - 60 50 900
30APR2010 444 30 - 60 60 900
31MAY2010 444 Current 70 900
30APR2010 555 Current 80 700
31MAY2010 555 NPNA 90 700
31AUG2011 555 90 + 12 700
30SEP2011 555 NPNA 0 700
30SEP2011 555 writoff 77 700
31MAY2010 666 Current 15 400
31AUG2011 666 90 + 20 400
30SEP2011 666 Current 30 400
The code below (Ksharp is acknowledged) generate the wanted table to some extent but I have newly included bank_number variable to reflect business reality. Now the results are misleading us.
data want;
set have_1 have_2(in=inb rename=(Post_date=Current_date Net_Write_off_total_USD=balance)) ;
by Account_number;
if inb then Arrears_Band='writoff ';
run;
proc sort data=want;by Account_number Current_date;run;
This is the output generated by above code.
I have striken through 2 records, and they should not be in the output. Reason is their bank_number s are not matching although account_number s are matching.
Current_date Account_number Arrears_Band balance bank_number (these are headings of below table)
28FEB2010 | 111 | NPNA | 10 | 100 |
31MAR2010 | 111 | Current | 20 | 100 |
31MAY2010 | 111 | 30 - 60 | 30 | 100 |
31AUG2010 | 111 | writoff | 50 | 100 |
31AUG2010 | 111 | writoff | 50 | 200 |
28FEB2010 | 444 | Current | 40 | 900 |
31MAR2010 | 444 | 30 - 60 | 50 | 900 |
30APR2010 | 444 | 30 - 60 | 60 | 900 |
31MAY2010 | 444 | Current | 70 | 900 |
30APR2010 | 555 | Current | 80 | 700 |
31MAY2010 | 555 | NPNA | 90 | 700 |
31AUG2011 | 555 | 90 + | 12 | 700 |
30SEP2011 | 555 | NPNA | 0 | 700 |
30SEP2011 | 555 | writoff | 77 | 700 |
31MAY2010 | 666 | Current | 15 | 400 |
31AUG2011 | 666 | 90 + | 20 | 400 |
30SEP2011 | 666 | Current | 30 | 400 |
30SEP2011 | 666 | writoff | 77 | 300 |
I would appreciate any help to generate the correct table.
Thank you
Mirisage
one way:
data Have_1;
informat Current_date date9.;
Input Current_date Account_number $12-14 Arrears_Band $ 17-23 balance bank_number;
Format Current_date date9.;
cards;
28FEB2010 111 NPNA 10 100
31MAR2010 111 Current 20 100
31MAY2010 111 30 - 60 30 100
28FEB2010 444 Current 40 900
31MAR2010 444 30 - 60 50 900
30APR2010 444 30 - 60 60 900
31MAY2010 444 Current 70 900
30APR2010 555 Current 80 700
31MAY2010 555 NPNA 90 700
31AUG2011 555 90 + 12 700
30SEP2011 555 NPNA 0 700
31MAY2010 666 Current 15 400
31AUG2011 666 90 + 20 400
30SEP2011 666 Current 30 400
;
run;
proc sort;
by bank_number Account_number;
run;
data Have_2;
informat Post_date date9.;
Input Post_date Account_number $12-14 Net_Write_off_total_USD bank_number;
Format Post_date date9.;
cards;
31AUG2010 111 50 100
31AUG2010 111 50 200
30SEP2011 555 77 700
30SEP2011 666 77 300
;
proc sort;
by bank_number Account_number;
run;
proc sql;
create table temp as
select * from have_2
where bank_number in (select bank_number from have_1);
quit;
data want;
set have_1 temp(in=inb rename=(Post_date=Current_date Net_Write_off_total_USD=balance)) ;
by bank_number Account_number;
if inb then Arrears_Band='writoff ';
proc sort data=want;by Account_number bank_number Current_date;run;
proc print;run;
Message was edited by: Linlin
one way:
data Have_1;
informat Current_date date9.;
Input Current_date Account_number $12-14 Arrears_Band $ 17-23 balance bank_number;
Format Current_date date9.;
cards;
28FEB2010 111 NPNA 10 100
31MAR2010 111 Current 20 100
31MAY2010 111 30 - 60 30 100
28FEB2010 444 Current 40 900
31MAR2010 444 30 - 60 50 900
30APR2010 444 30 - 60 60 900
31MAY2010 444 Current 70 900
30APR2010 555 Current 80 700
31MAY2010 555 NPNA 90 700
31AUG2011 555 90 + 12 700
30SEP2011 555 NPNA 0 700
31MAY2010 666 Current 15 400
31AUG2011 666 90 + 20 400
30SEP2011 666 Current 30 400
;
run;
proc sort;
by bank_number Account_number;
run;
data Have_2;
informat Post_date date9.;
Input Post_date Account_number $12-14 Net_Write_off_total_USD bank_number;
Format Post_date date9.;
cards;
31AUG2010 111 50 100
31AUG2010 111 50 200
30SEP2011 555 77 700
30SEP2011 666 77 300
;
proc sort;
by bank_number Account_number;
run;
proc sql;
create table temp as
select * from have_2
where bank_number in (select bank_number from have_1);
quit;
data want;
set have_1 temp(in=inb rename=(Post_date=Current_date Net_Write_off_total_USD=balance)) ;
by bank_number Account_number;
if inb then Arrears_Band='writoff ';
proc sort data=want;by Account_number bank_number Current_date;run;
proc print;run;
Message was edited by: Linlin
A slight expansion of your original datastep solution:
data want;
set have_1 (in=ina)
have_2(in=inb rename=(Post_date=Current_date Net_Write_off_total_USD=balance)) ;
by Account_number bank_number;
retain _Account_number _bank_number;
if ina then do;
_Account_Number=Account_Number;
_Bank_Number=Bank_Number;
output;
end;
else do;
if Account_Number=_Account_Number and
Bank_Number=_Bank_Number then do;
Arrears_Band='writoff ';
output;
end;
end;
run;
Hi,
First of all, I have to say I love the way you presented your questions. It was addressed so thoroughly that no more time-wasting back-and-forth clarifying is needed and we can cut to the chase right away.
On top of the answers you already got, if you don't mind some hash(), here is one (works for your current data, if your account-bank combination is not unique in have_2, then a minor tweak will fix that):
data want;
if _n_=1 then do;
dcl hash h(dataset: 'have_2(rename=(Net_Write_off_total_USD=balance Post_date=Current_date))', multidata: 'y');
h.definekey('account_number','bank_number');
h.definedata('balance','Current_date');
h.definedone();
end;
do until (last.bank_number);
set have_1;
by account_number bank_number notsorted;
output;
end;
if h.find()=0 then do;
Arrears_Band='writeoff';
output;
end;
run;
Haikuo
Hi Linlin, Art and Haikuo,
Thank very much to everyone of you.
All the 3 codes are producing the inteneded results given the sample data sets.
Hi Linlin,
Then I applied Linlin's code to my large dataset and it produced an unintended output. The reason is I did not cleverly reflect all "record possibilities" in my data set I posted.
Now I included one such additional business reality I observed in my original data set (please see "red color highlighed" record in have_1 dataset below). One more thing that I have still not reflected in
this small "posting data set" is this. All the bank_number s found in Have_1 data set are found in Have_2 dataset too.
data Have_1;
informat Current_date date9.;
Input Current_date Account_number $12-14 Arrears_Band $ 17-23 balance bank_number;
Format Current_date date9.;
cards;
28FEB2010 111 NPNA 10 100
31MAR2010 111 Current 20 100
31MAY2010 111 30 - 60 30 100
28FEB2010 444 Current 40 900
31MAR2010 444 30 - 60 50 900
30APR2010 444 30 - 60 60 900
31MAY2010 444 Current 70 900
30APR2010 555 Current 80 700
31MAY2010 555 NPNA 90 700
31AUG2011 555 90 + 12 700
30SEP2011 555 NPNA 0 700
31MAY2010 666 Current 15 400
31AUG2011 666 90 + 20 400
30SEP2011 666 Current 30 400
30SEP2011 999 Current 30 300 /*I added this record*/
;
run;
data Have_2;
informat Post_date date9.;
Input Post_date Account_number $12-14 Net_Write_off_total_USD bank_number;
Format Post_date date9.;
cards;
31AUG2010 111 50 100
31AUG2010 111 50 200
30SEP2011 555 77 700
30SEP2011 666 77 300
;
run;
When Linlin's code is applied,it selects 17th record also highlighted in below output. This shouldn't have been
slected.Why? In have_1 dataset there is no account called 666 in bank_number 300 although there is a bank_number called 300.
I am sure Linlin will tweak her code in a few seconds and I will take hours/days to understand it.
1 | 28FEB2010 | 111 | NPNA | 10 | 100 |
2 | 31MAR2010 | 111 | Current | 20 | 100 |
3 | 31MAY2010 | 111 | 30 - 60 | 30 | 100 |
4 | 31AUG2010 | 111 | writoff | 50 | 100 |
5 | 28FEB2010 | 444 | Current | 40 | 900 |
6 | 31MAR2010 | 444 | 30 - 60 | 50 | 900 |
7 | 30APR2010 | 444 | 30 - 60 | 60 | 900 |
8 | 31MAY2010 | 444 | Current | 70 | 900 |
9 | 30APR2010 | 555 | Current | 80 | 700 |
10 | 31MAY2010 | 555 | NPNA | 90 | 700 |
11 | 31AUG2011 | 555 | 90 + | 12 | 700 |
12 | 30SEP2011 | 555 | NPNA | 0 | 700 |
13 | 30SEP2011 | 555 | writoff | 77 | 700 |
14 | 31MAY2010 | 666 | Current | 15 | 400 |
15 | 31AUG2011 | 666 | 90 + | 20 | 400 |
16 | 30SEP2011 | 666 | Current | 30 | 400 |
17 | 30SEP2011 | 666 | writoff | 77 | 300 |
18 | 30SEP2011 | 999 | Current | 30 | 300 |
Hi Art and Haikuo,
When I apply both of your codes even for the newly-included business reality, both your code produce intended results (meaning the output table doesn't have 17th record highlightedin above table).
Thanks again to all of you.
Mirisage
Updated:
data Have_1;
informat Current_date date9.;
Input Current_date Account_number $12-14 Arrears_Band $ 17-23 balance bank_number;
Format Current_date date9.;
cards;
28FEB2010 111 NPNA 10 100
31MAR2010 111 Current 20 100
31MAY2010 111 30 - 60 30 100
28FEB2010 444 Current 40 900
31MAR2010 444 30 - 60 50 900
30APR2010 444 30 - 60 60 900
31MAY2010 444 Current 70 900
30APR2010 555 Current 80 700
31MAY2010 555 NPNA 90 700
31AUG2011 555 90 + 12 700
30SEP2011 555 NPNA 0 700
31MAY2010 666 Current 15 400
31AUG2011 666 90 + 20 400
30SEP2011 666 Current 30 400
30SEP2011 999 Current 30 300
;
run;
data Have_2;
informat Post_date date9.;
Input Post_date Account_number $12-14 Net_Write_off_total_USD bank_number;
Format Post_date date9.;
cards;
31AUG2010 111 50 100
31AUG2010 111 50 200
30SEP2011 555 77 700
30SEP2011 666 77 300
;
proc sql;
create table temp as
select distinct a.* from have_2 as a,have_1 as b
where a.bank_number=b.bank_number and a.Account_number=b.Account_number
order by Account_number,bank_number;
quit;
data want;
set have_1 temp(in=inb rename=(Post_date=Current_date Net_Write_off_total_USD=balance)) ;
by Account_number bank_number;
if inb then Arrears_Band='writoff ';
proc sort data=want;by Account_number bank_number Current_date;run;
proc print;run;
Current_ Account_ Arrears_ bank_
Obs date number Band balance number
1 28FEB2010 111 NPNA 10 100
2 31MAR2010 111 Current 20 100
3 31MAY2010 111 30 - 60 30 100
4 31AUG2010 111 writoff 50 100
5 28FEB2010 444 Current 40 900
6 31MAR2010 444 30 - 60 50 900
7 30APR2010 444 30 - 60 60 900
8 31MAY2010 444 Current 70 900
9 30APR2010 555 Current 80 700
10 31MAY2010 555 NPNA 90 700
11 31AUG2011 555 90 + 12 700
12 30SEP2011 555 NPNA 0 700
13 30SEP2011 555 writoff 77 700
14 31MAY2010 666 Current 15 400
15 31AUG2011 666 90 + 20 400
16 30SEP2011 666 Current 30 400
17 30SEP2011 999 Current 30 300
Hi Linlin,
Thank you very much, your revised code is working for my large data set too.
Thanks again!
Best regards
Mirisage
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.