left join

Reply
Contributor
Posts: 44

left join

Hi team , 

I have the below tables..

First table

 

Member Send_date
5588644 1/04/2016
5588644 5/07/2016
5585645 1/04/2016
5585645 12/07/2016
5585646 1/04/2016
5585647 1/04/2016
5585647 10/10/2016
5585648 1/04/2016
5585649 2/04/2016
5585650 8/04/2016
5585651 1/04/2016
5585651 21/04/2016

 

Second Table 

Member Payment_date payment_amount
5588644 11/04/2016  $                  50.00
5588644 21/04/2016  $                  60.00
5585645 11/04/2016  $                150.00
5585645 22/07/2016  $                200.00
5585646 11/04/2016  $                260.00
5585647 11/04/2016  $                270.00
5585647 23/04/2016  $                270.00
5585647 20/10/2016  $                290.00
5585648 11/04/2016  $                620.00
5585649 12/04/2016  $                750.00
5585650 18/04/2016  $                820.00
5585651 11/04/2016  $                900.00

 

The output table should be in the below format ..   any ideas would be great .. 

 

Member Send_date Payment _date Payment _Amount
5588644 1/04/2016 11/04/2016 $50.00
5588644 1/04/2016 21/04/2016 $60.00
5588644 5/07/2016
5585645 1/04/2016 11/04/2016 $150.00
5585645 12/07/2016 22/07/2016 $200.00
5585646 1/04/2016 11/04/2016 $260.00
5585647 1/04/2016 11/04/2016 $270.00
5585647 1/04/2016 23/04/2016 $270.00
5585647 10/10/2016 20/10/2016 $290.00
5585648 1/04/2016 11/04/2016 $620.00
5585649 2/04/2016 12/04/2016 $750.00
5585650 8/04/2016 18/04/2016 $820.00
5585651 1/04/2016 11/04/2016 $900.00
5585651 21/04/2016

Super User
Posts: 789

Re: left join

[ Edited ]

 

This is a sort of LOCF (last observation carried forward) for table1 only.  The LOCF has these particular attributes:

  1. True for matching members only
  2. Output all records with table2 data.  Do not output records with table1 data only, unless it is the last record for a member.

By the way, is there an error in the first member number (588644)?  Except for that member, the data appear sorted by member number.

 

 

The logic of the program is this.

  1. The SEQ1_ and SEQ2_ arrays should have a dimension large enough to capture all the records for even the most common member.
  2. The first "do seq=1 by 1 until (last.member)": runs through all the dates for a given member, tracking the sequential position of each record in each table.  If a particular sequence has input from table one, set the corresponding SEQ1_ dummy.  Same for table2 and SEQ2_ dummies.
  3. If there are two table2 entries on the same date as a table 1 entry, then only set the table1 SEQ1_ dummy for the first of the two ("if first.date then seq1_{seq}=in1;").
  4. The second do group ("DO S=1 to SEQ;"):  rereads the same records, but sequence input from table1 and table2 as guided by the SEQ1_ and SEQ2_ dummy arrays.
  5. Since only the table1 data is wanted for LOCF, there is a call missing of the table2 variables after every output statement.

It shouldn't be too hard to extend this program to more-than-two datasets.

 

Regards,

Mark

 

data table1;
  input member send_date : ddmmyy10.;
  format send_date ddmmyy10.;
datalines;
5588644 1/04/2016 
5588644 5/07/2016 
5585645 1/04/2016 
5585645 12/07/2016 
5585646 1/04/2016 
5585647 1/04/2016 
5585647 10/10/2016 
5585648 1/04/2016 
5585649 2/04/2016 
5585650 8/04/2016 
5585651 1/04/2016 
5585651 21/04/2016 
run;
proc sort data=table1;
  by member send_date;
run;

data table2;
input Member Payment_date :ddmmyy10. payment_amount ;
format payment_date ddmmyy10.;
datalines;
5588644 11/04/2016 50.00 
5588644 21/04/2016 60.00 
5585645 11/04/2016 150.00 
5585645 22/07/2016 200.00 
5585646 11/04/2016 260.00 
5585647 11/04/2016 270.00 
5585647 23/04/2016 270.00 
5585647 20/10/2016 290.00 
5585648 11/04/2016 620.00 
5585649 12/04/2016 750.00 
5585650 18/04/2016 820.00 
5585651 11/04/2016 900.00 
run;
proc sort data=table2; 
  by member payment_date;
run;


data want (drop=date s n_s seq1_: seq2_:);
  array seq1_ {100};
  array seq2_ {100};

  do s=1 by 1 until (last.member);
    merge table1 (in=in1 keep=member send_date
                 rename=(send_date=date))
          table2 (in=in2 keep=member payment_date
                 rename=(payment_date=date));
    by member date ;
    if first.date then seq1_{s}=in1;  
    seq2_{s}=in2;
  end;

n_s=s; do s=1 to n_s; if seq1_{s} then set table1; if seq2_{s} then set table2; if seq2_{s} or s=n_s then do;
output; call missing(payment_date,payment_amount);
end; end; run;

 

 

 

Contributor
Posts: 44

Re: left join

Thanks Mark , the member are sometime not in sequence ..
Can this be used for large datasets like 500K..
do i have to increase the array..?
Super User
Posts: 789

Re: left join

[ Edited ]

No, the array size has to do with the number of records WITHIN a member, not the number of members.

 

500K shouldn't be much of a sort.

 

Mark

 

editted addition: 

this program assume no more tha 1 table 1 record per member/date.

Respected Advisor
Posts: 3,831

Re: left join

 Here a SQL version.

proc sql;
  create table want as
  select 
    t1.member,
    t1.send_date,
    t2.payment_date,
    t2.payment_amount
  from 
    table1 t1
    left join
    table2 t2
    on 
      t1.member=t2.member 
      and t1.send_date<=t2.Payment_date or missing(t2.Payment_date)
    group by t1.member, t2.Payment_date
    having min(t2.Payment_date-t1.send_date)=t2.Payment_date-t1.send_date
    order by t1.member, t1.send_date, t2.payment_date 
  ;
quit;
Contributor
Posts: 44

Re: left join

Thanks Patrick , I have tried to use the above , however the having statement is slowing the performance .. my datasets are massive ... around 500k
Respected Advisor
Posts: 3,831

Re: left join

I'm not astonished that performance isn't great for the SQL.

 

500K rows is not that massive. What's more important for you: maximum performance or a compromise resulting in code which is easier to understand and maintain?

 

There would be a HoH approach which should perform quite well - but it will take a bit of "brain massage" to understand how this works. Is HoH something you would consider?

http://support.sas.com/resources/papers/proceedings13/021-2013.pdf

 

Grand Advisor
Posts: 9,576

Re: left join

What is your logic?

5588644 1/04/2016 11/04/2016 $50.00 
5588644 1/04/2016 21/04/2016 $60.00 
5588644 5/07/2016 
5585645 1/04/2016 11/04/2016 $150.00 
5585645 12/07/2016 22/07/2016 $200.00 

Why the first have three obs,while the second have the two obs ?


Super User
Posts: 789

Re: left join

Have you tried to sort the 500k? Is that really a burden?
Ask a Question
Discussion stats
  • 8 replies
  • 261 views
  • 0 likes
  • 4 in conversation