Hi team ,
I have the below tables..
First table
Member | Send_date |
5588644 | 1/04/2016 |
5588644 | 5/07/2016 |
5585645 | 1/04/2016 |
5585645 | 12/07/2016 |
5585646 | 1/04/2016 |
5585647 | 1/04/2016 |
5585647 | 10/10/2016 |
5585648 | 1/04/2016 |
5585649 | 2/04/2016 |
5585650 | 8/04/2016 |
5585651 | 1/04/2016 |
5585651 | 21/04/2016 |
Second Table
Member | Payment_date | payment_amount |
5588644 | 11/04/2016 | $ 50.00 |
5588644 | 21/04/2016 | $ 60.00 |
5585645 | 11/04/2016 | $ 150.00 |
5585645 | 22/07/2016 | $ 200.00 |
5585646 | 11/04/2016 | $ 260.00 |
5585647 | 11/04/2016 | $ 270.00 |
5585647 | 23/04/2016 | $ 270.00 |
5585647 | 20/10/2016 | $ 290.00 |
5585648 | 11/04/2016 | $ 620.00 |
5585649 | 12/04/2016 | $ 750.00 |
5585650 | 18/04/2016 | $ 820.00 |
5585651 | 11/04/2016 | $ 900.00 |
The output table should be in the below format .. any ideas would be great ..
Member Send_date Payment _date Payment _Amount
5588644 1/04/2016 11/04/2016 $50.00
5588644 1/04/2016 21/04/2016 $60.00
5588644 5/07/2016
5585645 1/04/2016 11/04/2016 $150.00
5585645 12/07/2016 22/07/2016 $200.00
5585646 1/04/2016 11/04/2016 $260.00
5585647 1/04/2016 11/04/2016 $270.00
5585647 1/04/2016 23/04/2016 $270.00
5585647 10/10/2016 20/10/2016 $290.00
5585648 1/04/2016 11/04/2016 $620.00
5585649 2/04/2016 12/04/2016 $750.00
5585650 8/04/2016 18/04/2016 $820.00
5585651 1/04/2016 11/04/2016 $900.00
5585651 21/04/2016
This is a sort of LOCF (last observation carried forward) for table1 only. The LOCF has these particular attributes:
By the way, is there an error in the first member number (588644)? Except for that member, the data appear sorted by member number.
The logic of the program is this.
It shouldn't be too hard to extend this program to more-than-two datasets.
Regards,
Mark
data table1;
input member send_date : ddmmyy10.;
format send_date ddmmyy10.;
datalines;
5588644 1/04/2016
5588644 5/07/2016
5585645 1/04/2016
5585645 12/07/2016
5585646 1/04/2016
5585647 1/04/2016
5585647 10/10/2016
5585648 1/04/2016
5585649 2/04/2016
5585650 8/04/2016
5585651 1/04/2016
5585651 21/04/2016
run;
proc sort data=table1;
by member send_date;
run;
data table2;
input Member Payment_date :ddmmyy10. payment_amount ;
format payment_date ddmmyy10.;
datalines;
5588644 11/04/2016 50.00
5588644 21/04/2016 60.00
5585645 11/04/2016 150.00
5585645 22/07/2016 200.00
5585646 11/04/2016 260.00
5585647 11/04/2016 270.00
5585647 23/04/2016 270.00
5585647 20/10/2016 290.00
5585648 11/04/2016 620.00
5585649 12/04/2016 750.00
5585650 18/04/2016 820.00
5585651 11/04/2016 900.00
run;
proc sort data=table2;
by member payment_date;
run;
data want (drop=date s n_s seq1_: seq2_:);
array seq1_ {100};
array seq2_ {100};
do s=1 by 1 until (last.member);
merge table1 (in=in1 keep=member send_date
rename=(send_date=date))
table2 (in=in2 keep=member payment_date
rename=(payment_date=date));
by member date ;
if first.date then seq1_{s}=in1;
seq2_{s}=in2;
end;
n_s=s;
do s=1 to n_s;
if seq1_{s} then set table1;
if seq2_{s} then set table2;
if seq2_{s} or s=n_s then do;
output;
call missing(payment_date,payment_amount);
end;
end;
run;
No, the array size has to do with the number of records WITHIN a member, not the number of members.
500K shouldn't be much of a sort.
Mark
editted addition:
this program assume no more tha 1 table 1 record per member/date.
Here a SQL version.
proc sql;
create table want as
select
t1.member,
t1.send_date,
t2.payment_date,
t2.payment_amount
from
table1 t1
left join
table2 t2
on
t1.member=t2.member
and t1.send_date<=t2.Payment_date or missing(t2.Payment_date)
group by t1.member, t2.Payment_date
having min(t2.Payment_date-t1.send_date)=t2.Payment_date-t1.send_date
order by t1.member, t1.send_date, t2.payment_date
;
quit;
I'm not astonished that performance isn't great for the SQL.
500K rows is not that massive. What's more important for you: maximum performance or a compromise resulting in code which is easier to understand and maintain?
There would be a HoH approach which should perform quite well - but it will take a bit of "brain massage" to understand how this works. Is HoH something you would consider?
http://support.sas.com/resources/papers/proceedings13/021-2013.pdf
What is your logic? 5588644 1/04/2016 11/04/2016 $50.00 5588644 1/04/2016 21/04/2016 $60.00 5588644 5/07/2016 5585645 1/04/2016 11/04/2016 $150.00 5585645 12/07/2016 22/07/2016 $200.00 Why the first have three obs,while the second have the two obs ?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.