HI All,
I have a data set where I've matched on customer number (CRN), date (idate) and amount (fullpay) from two tables which has created a many to many data set where there are same payments on the same day.
For example if there were 3 payments in my full payment table and 3 payments in my deduction table I get nine lines (see below). Is there a way I can get a unique MATCH_ID for each unique PAID_ID?
This is what I have:
data have;
length crn paid_id match_id fullpay deduct issued idate 8;
informat idate anydtdte21.;
format idate ddmmyy10.;
infile datalines delimiter=',';
input crn paid_id match_id fullpay deduct issued idate;
datalines;
123456,4471,53360,66.03,66.03,0,5/09/2019
123456,4471,53361,66.03,41.07,24.96,5/09/2019
123456,4471,53362,66.03,0,66.03,5/09/2019
123456,4472,53360,66.03,66.03,0,5/09/2019
123456,4472,53361,66.03,41.07,24.96,5/09/2019
123456,4472,53362,66.03,0,66.03,5/09/2019
123456,4473,53360,66.03,66.03,0,5/09/2019
123456,4473,53361,66.03,41.07,24.96,5/09/2019
123456,4473,53362,66.03,0,66.03,5/09/2019
654321,11827,184385,57.4,16.8,40.6,15/03/2019
654321,11828,184385,57.4,16.8,40.6,15/03/2019
654321,11826,184385,57.4,16.8,40.6,15/03/2019
654321,11827,184384,57.4,57.4,0,15/03/2019
654321,11828,184384,57.4,57.4,0,15/03/2019
654321,11826,184384,57.4,57.4,0,15/03/2019
654321,11827,184383,57.4,57.4,0,15/03/2019
654321,11828,184383,57.4,57.4,0,15/03/2019
654321,11826,184383,57.4,57.4,0,15/03/2019
;
and this is what I need:
data want;
length crn paid_id match_id fullpay deduct issued idate 8;
informat idate anydtdte21.;
format idate ddmmyy10.;
infile datalines delimiter=',';
input crn paid_id match_id fullpay deduct issued idate;
datalines;
123456,4471,53360,66.03,66.03,0,5/09/2019
123456,4472,53361,66.03,41.07,24.96,5/09/2019
123456,4473,53362,66.03,0,66.03,5/09/2019
654321,11826,184383,57.4,57.4,0,15/03/2019
654321,11827,184384,57.4,57.4,0,15/03/2019
654321,11828,184385,57.4,16.8,40.6,15/03/2019
;
Any help gratefully received and many thanks in advance
cheers
Steve
please try below code
proc sort data=have;
by idate crn match_id ;
run;
data want;
set have;
by idate crn match_id ;
if first.match_id;
run;
I guess this should be fixed while you do the initial join. Could you show us the corresponding entries in the two tables from which you created "have"?
So do you want to resolve this problem in your initial join or from have to want?
I really appreciate that you've posted the data as working SAS data steps.
You need to fix the join so that it's no more many:many
For your sample data show us what you've got in the two source tables.
please try below code
proc sort data=have;
by idate crn match_id ;
run;
data want;
set have;
by idate crn match_id ;
if first.match_id;
run;
Hi Jag,
This appears to have solved it at first glance (large amount of data). Thanks so much, I think I was over complicating things a bit my end but running your code seems to do the trick!
cheers
Steve
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.