BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Mirisage
Obsidian | Level 7

/*Hi SAS Community,

I have two tables (Have_1 and Have_2).

*/

data Have_1;

informat Current_date date9.;

Input Current_date   Account_number $12-14 Arrears_Band $ 17-23  balance bank_number;

Format Current_date date9.;

cards;

28FEB2010  111  NPNA    10 100

31MAR2010  111  Current 20 100

31MAY2010  111  30 - 60 30 100

28FEB2010  444  Current 40 900

31MAR2010  444  30 - 60 50 900

30APR2010  444  30 - 60 60 900

31MAY2010  444  Current 70 900

30APR2010  555  Current 80 700

31MAY2010  555  NPNA    90 700

31AUG2011  555  90 +    12 700

30SEP2011  555  NPNA    0  700

31MAY2010  666  Current 15 400

31AUG2011  666  90 +    20 400

30SEP2011  666  Current 30 400

;

run;

data Have_2;

informat Post_date date9.;

Input Post_date Account_number $12-14 Net_Write_off_total_USD bank_number;

Format Post_date date9.;

cards;

31AUG2010  111  50 100

31AUG2010  111  50 200

30SEP2011  555  77 700

30SEP2011  666  77 300

;

run;

Question:

I need to join these 2 tables so that I could get the answer below. It is mandatory that I need to keep all the records in have_1 table regardless of whether the have_2 table is having corresponding matching records or not. But reverse is not mandatory, i.e. have_2 table may have excess (non matching) records, and they should not show up in final table that I want.    

Answer:

/*    Final data set should be like this

Notice the 2 records from have_2 table are embedded within have_1 table in date sequence. And a value called “writoff” was assigned.

Current_date Account_number   Arrears_Band      balance     bank_number (these are headings of below table)

28FEB2010  111  NPNA    10 100

31MAR2010  111  Current 20 100

31MAY2010  111  30 - 60 30 100

31AUG2010  111  writoff 50 100

28FEB2010  444  Current 40 900     

31MAR2010  444  30 - 60 50 900

30APR2010  444  30 - 60 60 900

31MAY2010  444  Current 70 900

30APR2010  555  Current 80 700

31MAY2010  555  NPNA    90 700

31AUG2011  555  90 +    12 700

30SEP2011  555  NPNA    0  700

30SEP2011  555  writoff 77 700

31MAY2010  666  Current 15 400

31AUG2011  666  90 +    20 400

30SEP2011  666  Current 30 400  

The code below (Ksharp is acknowledged) generate the wanted table to some extent but I have newly included bank_number variable to reflect business reality. Now the results are misleading us.

data want;

set have_1 have_2(in=inb  rename=(Post_date=Current_date Net_Write_off_total_USD=balance)) ;

by Account_number;

if inb then Arrears_Band='writoff ';

run;

proc sort data=want;by Account_number Current_date;run;

This is the output generated by above code.

I have striken through 2 records, and they should not be in the output. Reason is their bank_number s are not matching although account_number s are matching.

Current_date  Account_number       Arrears_Band  balance         bank_number (these are headings of below table)

28FEB2010

111

NPNA

10

100

31MAR2010

111

Current

20

100

31MAY2010

111

30 - 60

30

100

31AUG2010

111

writoff

50

100

31AUG2010

111

writoff

50

200

28FEB2010

444

Current

40

900

31MAR2010

444

30 - 60

50

900

30APR2010

444

30 - 60

60

900

31MAY2010

444

Current

70

900

30APR2010

555

Current

80

700

31MAY2010

555

NPNA

90

700

31AUG2011

555

90 +

12

700

30SEP2011

555

NPNA

0

700

30SEP2011

555

writoff

77

700

31MAY2010

666

Current

15

400

31AUG2011

666

90 +

20

400

30SEP2011

666

Current

30

400

30SEP2011

666

writoff

77

300

I would appreciate any help to generate the correct table.

Thank you

Mirisage

1 ACCEPTED SOLUTION

Accepted Solutions
Linlin
Lapis Lazuli | Level 10

one way:

data Have_1;

informat Current_date date9.;

Input Current_date   Account_number $12-14 Arrears_Band $ 17-23  balance bank_number;

Format Current_date date9.;

cards;

28FEB2010  111  NPNA    10 100

31MAR2010  111  Current 20 100

31MAY2010  111  30 - 60 30 100

28FEB2010  444  Current 40 900

31MAR2010  444  30 - 60 50 900

30APR2010  444  30 - 60 60 900

31MAY2010  444  Current 70 900

30APR2010  555  Current 80 700

31MAY2010  555  NPNA    90 700

31AUG2011  555  90 +    12 700

30SEP2011  555  NPNA    0  700

31MAY2010  666  Current 15 400

31AUG2011  666  90 +    20 400

30SEP2011  666  Current 30 400

;

run;

proc sort;

by bank_number Account_number;

run;

data Have_2;

informat Post_date date9.;

Input Post_date Account_number $12-14 Net_Write_off_total_USD bank_number;

Format Post_date date9.;

cards;

31AUG2010  111  50 100

31AUG2010  111  50 200

30SEP2011  555  77 700

30SEP2011  666  77 300

;

proc sort;

by bank_number Account_number;

run;

proc sql;

  create table temp as

     select * from have_2

    where bank_number in (select bank_number from have_1);

quit;

data want;

set have_1 temp(in=inb  rename=(Post_date=Current_date Net_Write_off_total_USD=balance)) ;

by bank_number Account_number;

if inb then Arrears_Band='writoff ';

proc sort data=want;by Account_number bank_number Current_date;run;

proc print;run;

Message was edited by: Linlin

View solution in original post

6 REPLIES 6
Linlin
Lapis Lazuli | Level 10

one way:

data Have_1;

informat Current_date date9.;

Input Current_date   Account_number $12-14 Arrears_Band $ 17-23  balance bank_number;

Format Current_date date9.;

cards;

28FEB2010  111  NPNA    10 100

31MAR2010  111  Current 20 100

31MAY2010  111  30 - 60 30 100

28FEB2010  444  Current 40 900

31MAR2010  444  30 - 60 50 900

30APR2010  444  30 - 60 60 900

31MAY2010  444  Current 70 900

30APR2010  555  Current 80 700

31MAY2010  555  NPNA    90 700

31AUG2011  555  90 +    12 700

30SEP2011  555  NPNA    0  700

31MAY2010  666  Current 15 400

31AUG2011  666  90 +    20 400

30SEP2011  666  Current 30 400

;

run;

proc sort;

by bank_number Account_number;

run;

data Have_2;

informat Post_date date9.;

Input Post_date Account_number $12-14 Net_Write_off_total_USD bank_number;

Format Post_date date9.;

cards;

31AUG2010  111  50 100

31AUG2010  111  50 200

30SEP2011  555  77 700

30SEP2011  666  77 300

;

proc sort;

by bank_number Account_number;

run;

proc sql;

  create table temp as

     select * from have_2

    where bank_number in (select bank_number from have_1);

quit;

data want;

set have_1 temp(in=inb  rename=(Post_date=Current_date Net_Write_off_total_USD=balance)) ;

by bank_number Account_number;

if inb then Arrears_Band='writoff ';

proc sort data=want;by Account_number bank_number Current_date;run;

proc print;run;

Message was edited by: Linlin

art297
Opal | Level 21

A slight expansion of your original datastep solution:

data want;

  set have_1 (in=ina)

      have_2(in=inb  rename=(Post_date=Current_date Net_Write_off_total_USD=balance)) ;

  by Account_number bank_number;

  retain _Account_number _bank_number;

  if ina then do;

    _Account_Number=Account_Number;

    _Bank_Number=Bank_Number;

    output;

  end;

  else do;

    if Account_Number=_Account_Number and

     Bank_Number=_Bank_Number then do;

      Arrears_Band='writoff ';

      output;

    end;

  end;

run;

Haikuo
Onyx | Level 15

Hi,

First of all, I have to say I love the way you presented your questions. It was addressed so thoroughly that no more time-wasting back-and-forth clarifying is needed and we can cut to the chase right away.

On top of the answers you already got, if you don't mind some hash(), here is one (works for your current data, if your account-bank combination is not unique in have_2, then a minor tweak will fix that):

data want;

   if _n_=1 then do;

         dcl hash h(dataset: 'have_2(rename=(Net_Write_off_total_USD=balance Post_date=Current_date))', multidata: 'y');

         h.definekey('account_number','bank_number');

         h.definedata('balance','Current_date');

         h.definedone();

    end;

   do until (last.bank_number);

    set have_1;

    by account_number bank_number notsorted;

    output;

   end;

    if h.find()=0 then do;

      Arrears_Band='writeoff';

      output;

    end;

    run;

Haikuo

Mirisage
Obsidian | Level 7

Hi Linlin, Art and Haikuo,

Thank very much to everyone of you.

All the 3 codes are producing the inteneded results given the sample data sets.

Hi Linlin,

Then I applied Linlin's code to my large dataset and it produced an unintended output. The reason is I did not cleverly reflect all "record possibilities" in my data set I posted.

Now I included one such additional business reality I observed in my original data set (please see "red color highlighed" record in have_1 dataset below). One more thing that I have still not reflected in

this small "posting data set" is this. All the bank_number s found in Have_1 data set are found in Have_2 dataset too.

data Have_1;

informat Current_date date9.;

Input Current_date   Account_number $12-14 Arrears_Band $ 17-23  balance bank_number;

Format Current_date date9.;

cards;

28FEB2010  111  NPNA    10 100

31MAR2010  111  Current 20 100

31MAY2010  111  30 - 60 30 100

28FEB2010  444  Current 40 900

31MAR2010  444  30 - 60 50 900

30APR2010  444  30 - 60 60 900

31MAY2010  444  Current 70 900

30APR2010  555  Current 80 700

31MAY2010  555  NPNA    90 700

31AUG2011  555  90 +    12 700

30SEP2011  555  NPNA    0  700

31MAY2010  666  Current 15 400

31AUG2011  666  90 +    20 400

30SEP2011  666  Current 30 400

30SEP2011  999  Current 30 300 /*I added this record*/

;

run;

data Have_2;

informat Post_date date9.;

Input Post_date Account_number $12-14 Net_Write_off_total_USD bank_number;

Format Post_date date9.;

cards;

31AUG2010  111  50 100

31AUG2010  111  50 200

30SEP2011  555  77 700

30SEP2011  666  77 300

;

run;

When Linlin's code is applied,it selects 17th record also highlighted in below output. This shouldn't have been

slected.Why? In have_1 dataset there is no account called 666 in bank_number 300 although there is a bank_number called 300.

I am sure Linlin will tweak her code in a few seconds and I will take hours/days to understand it.

1

28FEB2010

111

NPNA

10

100

2

31MAR2010

111

Current

20

100

3

31MAY2010

111

30 - 60

30

100

4

31AUG2010

111

writoff

50

100

5

28FEB2010

444

Current

40

900

6

31MAR2010

444

30 - 60

50

900

7

30APR2010

444

30 - 60

60

900

8

31MAY2010

444

Current

70

900

9

30APR2010

555

Current

80

700

10

31MAY2010

555

NPNA

90

700

11

31AUG2011

555

90 +

12

700

12

30SEP2011

555

NPNA

0

700

13

30SEP2011

555

writoff

77

700

14

31MAY2010

666

Current

15

400

15

31AUG2011

666

90 +

20

400

16

30SEP2011

666

Current

30

400

17

30SEP2011

666

writoff

77

300

18

30SEP2011

999

Current

30

300

Hi Art and Haikuo,

When I apply both of your codes even for the newly-included business reality, both your code produce intended results (meaning the output table doesn't have 17th record highlightedin  above table).

Thanks again to all of you.

Mirisage

Linlin
Lapis Lazuli | Level 10

Updated:

data Have_1;

informat Current_date date9.;

Input Current_date   Account_number $12-14 Arrears_Band $ 17-23  balance bank_number;

Format Current_date date9.;

cards;

28FEB2010  111  NPNA    10 100

31MAR2010  111  Current 20 100

31MAY2010  111  30 - 60 30 100

28FEB2010  444  Current 40 900

31MAR2010  444  30 - 60 50 900

30APR2010  444  30 - 60 60 900

31MAY2010  444  Current 70 900

30APR2010  555  Current 80 700

31MAY2010  555  NPNA    90 700

31AUG2011  555  90 +    12 700

30SEP2011  555  NPNA    0  700

31MAY2010  666  Current 15 400

31AUG2011  666  90 +    20 400

30SEP2011  666  Current 30 400

30SEP2011  999  Current 30 300

;

run;

data Have_2;

informat Post_date date9.;

Input Post_date Account_number $12-14 Net_Write_off_total_USD bank_number;

Format Post_date date9.;

cards;

31AUG2010  111  50 100

31AUG2010  111  50 200

30SEP2011  555  77 700

30SEP2011  666  77 300

;

 

proc sql;

  create table temp as

     select distinct a.* from have_2 as a,have_1 as b

       where a.bank_number=b.bank_number and a.Account_number=b.Account_number

    order by Account_number,bank_number;

quit;

  data want;

set have_1 temp(in=inb  rename=(Post_date=Current_date Net_Write_off_total_USD=balance)) ;

by  Account_number bank_number;

if inb then Arrears_Band='writoff ';

proc sort data=want;by Account_number bank_number Current_date;run;

proc print;run;

         Current_    Account_    Arrears_                bank_

           Obs         date     number       Band      balance    number

             1    28FEB2010      111       NPNA           10        100

             2    31MAR2010      111       Current        20        100

             3    31MAY2010      111       30 - 60        30        100

             4    31AUG2010      111       writoff        50        100

             5    28FEB2010      444       Current        40        900

             6    31MAR2010      444       30 - 60        50        900

             7    30APR2010      444       30 - 60        60        900

             8    31MAY2010      444       Current        70        900

             9    30APR2010      555       Current        80        700

            10    31MAY2010      555       NPNA           90        700

            11    31AUG2011      555       90 +           12        700

            12    30SEP2011      555       NPNA            0        700

            13    30SEP2011      555       writoff        77        700

            14    31MAY2010      666       Current        15        400

            15    31AUG2011      666       90 +           20        400

            16    30SEP2011      666       Current        30        400

            17    30SEP2011      999       Current        30        300

Mirisage
Obsidian | Level 7

Hi Linlin,

Thank you very much, your revised code is working for my large data set too.

Thanks again!

Best regards

Mirisage


sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 773 views
  • 4 likes
  • 4 in conversation