Solved: Re: Why cannot I merge these two tables-II?

Mirisage · Posted 09-16-2012 09:58 PM

Hi SAS Forum,

I need to stack the attached two SAS tables so that I will get a table like below. In below table, the first 5 records are coming from attahced Book2 table.

The "bottom-most bolded record" of below table is coming from the attahced Book1 table.

It is the corresponding counterpart account in Book1 data set where arrerars_band = writoff (I find difficult to explain this).

Bank_number	Current_date	Account_number	Short_name	Currency_code	Product_code	Currency_short_name	Country	Transit	Balance	Product	Arrears_Band
10	28-Feb-10	1000000000	OOOOOOOOO RRRRRRRR	0	1	VVV	DDDDDDDD	Retail	1700	Personal Loan	NPNA
10	31-Mar-10	1000000000	OOOOOOOOO RRRRRRRR	0	1	VVV	DDDDDDDD	Retail	1700	Personal Loan	NPNA
10	30-Apr-10	1000000000	OOOOOOOOO RRRRRRRR	0	1	VVV	DDDDDDDD	Retail	1700	Personal Loan	NPNA
10	31-May-10	1000000000	OOOOOOOOO RRRRRRRR	0	1	VVV	DDDDDDDD	Retail	1700	Personal Loan	NPNA
10	30-Jun-10	1000000000	OOOOOOOOO RRRRRRRR	0	1	VVV	DDDDDDDD	Retail	1700	Personal Loan	NPNA
10	6-Jul-10	1000000000	missing	missing	missing	missing	missing	missing	2216	Personal Loan	writoff

The account_name called ‘8000000000’in attahced Book2 data set does not have a corresponding counterpart account in Book1 data set where arrerars_band = writoff.

So, I do not want the account_number called 8000000000 to show up in my final stacked output file.

AUTiger and Haikuo helped me to do the below code to tackle a "kind of similar situation" but the current situation in my hand is bit different.

So, this code is not capable of doing it.

proc sort data=A.BOOK1 out=file_A;

by bank_number Account_number ;

run;

proc sort data=A.BOOK2 Out=file_B;

by bank_number account_number ;

run;

data merged_file;

set file_B

file_A

;

by bank_number account_number ;

run;

Would there be way to handle this new situation?

Thanks

Mirisage

AUTigers · Posted 09-17-2012 11:18 PM

it works for your sample data.

proc sort data=ds.BOOK1 out=file_A;

by bank_number Account_number ;

run;

proc sort data=ds.BOOK2 Out=file_B;

by bank_number account_number ;

run;

data merged_file;

retain lastacct;

set file_A(in=ina)

file_B (in=inb)

;

by bank_number account_number ;

sortord=Ina*2+inb*1;

if ina and first.account_number then lastacct=account_number;

if ina or ( inb and lastacct=account_number) then output;

run;

proc sort data=merged_file;

by bank_number account_number sortord ;

run;

View solution in original post

ballardw · Posted 09-17-2012 11:09 AM

One way if what you want is records that come from both data sets is :

data merged_file;

set file_B (in= inb)

file_A (in= ina)

;

by bank_number account_number ;

if inb and ina;

run;

Tom · Posted 09-17-2012 12:45 PM

Sounds like you only want to append records from BOOK1 when the account is also in BOOK2. But that you want to keep all records from BOOK2 whether there is a record in BOOK1 or not.

In this case the accounts you want will always have their first record coming from BOOK2. Note that this code is dependent on the order the datasets are listed in the SET statement.

data want ;

retain keep;

drop keep;

set book2(in=in2) book1(in=in1);

by bank_number account_number;

if first.account_number then do;

if in2 then keep=1;

else keep=0;

end;

if keep;

run;

robertrao · Posted 09-17-2012 01:37 PM

Tom,

Could you explain the code.

if its the first of the account number(nothing but from book2) we are giving it a value of 1 for keep variable.

if its not coming from book2(if its not there in book2 ie its from book1) then we give a value of 0 for keep variable;

if keep gives us only the ones which have a value of 1(from book2).....if the first of the bank number is from book1(0) its eliminated!!!!

Lastly ,What is the use of retain keep and drop keep statements at the top????

why are we dropping it at the top? and again creating at the bottom????

Please explain??

Tom · Posted 09-17-2012 02:12 PM

First RETAIN and DROP are examples of non executable statements. That is they are used to compile the data step but do not do anything on each iteration. So it doesn't matter (other than for easy of reading) where in the data step they are coded.

When it starts a new account (IF FIRST.ACCOUNT_NUMBER) it assigns a value to the variable KEEP. The RETAIN statement insures that the value stays the same until it is changed. The first record must be from either BOOK2 (in2=1) or BOOK1(in1=1) but cannot be from both as we are using a SET statement instead of MERGE statement. So every record for the same account will have the same value of the KEEP variable.

IF KEEP; is the same as IF NOT KEEP THEN DELETE;. Thus only the accounts we have flagged to keep will be output to the new dataset.

AUTigers · Posted 09-17-2012 11:18 PM

it works for your sample data.

proc sort data=ds.BOOK1 out=file_A;

by bank_number Account_number ;

run;

proc sort data=ds.BOOK2 Out=file_B;

by bank_number account_number ;

run;

data merged_file;

retain lastacct;

set file_A(in=ina)

file_B (in=inb)

;

by bank_number account_number ;

sortord=Ina*2+inb*1;

if ina and first.account_number then lastacct=account_number;

if ina or ( inb and lastacct=account_number) then output;

run;

proc sort data=merged_file;

by bank_number account_number sortord ;

run;

SAS Innovate 2025: Call for Content

Classroom Training Available!