Solved: Complicated matching problem - Page 2

novinosrin · Posted 11-16-2020 12:23 PM

Sir @PaigeMiller FYI- I made a couple of edits to the above. Also at any point you want to test the code, please feel free to let me know if you need a details of the code in piecemeal. A nudge online/offline will alert me just in case.

PaigeMiller · Posted 11-16-2020 12:26 PM

Thanks, we'll see how it goes!

--
Paige Miller

novinosrin · Posted 11-16-2020 08:29 PM

One more logic clarification in the expected output please -

why 21jan isnt matched with 30jan as it occurs before 28th?

mkeintz · Posted 11-16-2020 11:58 PM

@novinosrin wrote:

One more logic clarification in the expected output please -

why 21jan isnt matched with 30jan as it occurs before 28th?

I had the same question.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ksharp · Posted 11-17-2020 09:28 AM

Just simple MERGE ?

data have ;
  infile datalines dsd truncover;
  length id $4;
  input id:$3. dateTran:DATE9. Trans:$5. SourceType:$2. TranValueADj:32.;
  id=cats('A',id);
  format dateTran DATE9.;
datalines4;
001,03JAN2020,Type1,M1,1776
001,03JAN2020,Type1,A2,1824.12
001,03JAN2020,Type1,M1,1000
001,17JAN2020,Type1,A2,1830.88
001,27JAN2020,Type1,M1,375.27
001,27JAN2020,Type1,M1,470.04
001,27JAN2020,Type1,M1,4712.3
001,31JAN2020,Type1,A2,1869.88
002,02JAN2020,Type1,B3,1848
002,03JAN2020,Type1,A2,3351
002,03JAN2020,Type1,A2,871
002,03JAN2020,Type1,A2,5135
002,03JAN2020,Type2,,1000
002,03JAN2020,Type1,A2,1070
002,03JAN2020,Type1,A2,8357
002,06JAN2020,Type1,B3,49425
002,06JAN2020,Type1,B3,61744.1
002,06JAN2020,Type1,B3,11273
002,07JAN2020,Type2,,2198
002,07JAN2020,Type1,B3,36006.95
002,08JAN2020,Type1,B3,29329.5
002,10JAN2020,Type1,B3,695
002,10JAN2020,Type2,,703
002,10JAN2020,Type1,B3,29912.3
002,13JAN2020,Type2,,250
002,13JAN2020,Type1,B3,24951
002,14JAN2020,Type2,,250
002,16JAN2020,Type1,B3,9759.5
002,17JAN2020,Type1,A2,652
002,17JAN2020,Type1,B3,4538
002,21JAN2020,Type1,B3,845
002,21JAN2020,Type1,B3,83436.27
002,21JAN2020,Type1,B3,75
002,21JAN2020,Type1,B3,8814
002,22JAN2020,Type2,,250
002,24JAN2020,Type1,B3,2270
002,24JAN2020,Type1,B3,2258
002,28JAN2020,Type1,B3,9135
003,03JAN2020,Type1,B3,100000
003,03JAN2020,Type1,A2,2061
003,07JAN2020,Type2,,100000
003,13JAN2020,Type1,A2,775.1
003,14JAN2020,Type1,B3,100000
003,16JAN2020,Type2,,100000
003,23JAN2020,Type1,B3,25000
003,24JAN2020,Type1,A2,121.16
003,27JAN2020,Type1,A2,146.67
003,31JAN2020,Type1,A2,1246.7
004,06JAN2020,Type1,A1,300
004,09JAN2020,Type2,,300
004,15JAN2020,Type1,A2,1204.5
004,31JAN2020,Type1,B1,0
004,31JAN2020,Type1,A2,3079.09
005,13JAN2020,Type1,B3,1000
005,21JAN2020,Type1,B3,1000
005,23JAN2020,Type2,,1000
005,28JAN2020,Type1,B3,1000
005,30JAN2020,Type2,,1000
006,02JAN2020,Type1,M1,250
006,03JAN2020,Type1,A2,23.72
006,03JAN2020,Type2,,200
006,06JAN2020,Type1,M1,400
006,09JAN2020,Type2,,400
006,16JAN2020,Type1,M1,150
006,21JAN2020,Type1,M1,190
006,22JAN2020,Type1,M1,200
006,23JAN2020,Type1,M1,200
006,24JAN2020,Type1,M1,184.75
006,27JAN2020,Type1,M1,120
006,28JAN2020,Type2,,200
006,29JAN2020,Type1,M1,550
007,02JAN2020,Type1,B1,0
007,13JAN2020,Type1,MF,180
007,13JAN2020,Type1,M1,175
007,15JAN2020,Type1,M1,240
007,16JAN2020,Type1,M1,440
007,16JAN2020,Type2,,175
007,17JAN2020,Type1,M1,345
007,21JAN2020,Type2,,240
007,22JAN2020,Type2,,440
007,23JAN2020,Type2,,345
;;;;
data type1 type2;
 set have;
 if Trans='Type1' then output type1;
  else output type2;
run;
proc sort data=type1;by id TranValueADj;run;
proc sort data=type2;by id TranValueADj;run;
data type2;
 set type2;
 _TranValueADj=TranValueADj;
run;

data want;
 ina=0;inb=0;
 merge type1(in=ina) type2(rename=(dateTran=_dateTran Trans=_Trans SourceType=_SourceType) in=inb);
 by id TranValueADj;
 if not ina then do;dateTran=_dateTran; Trans=_Trans;SourceType=_SourceType;call missing(of _:); end;
 if not inb then  call missing(of _:);
 drop _Trans _SourceType;
run;
proc sort data=want;by id dateTran;run;

PaigeMiller · Posted 11-17-2020 09:35 AM

Thanks, @Ksharp and @mkeintz , we will get around to trying all of the solutions offered as time permits, it does take a long time when the true data set has nearly 60million records and we have other things to do as well.

--
Paige Miller

PaigeMiller · Posted 11-18-2020 01:59 PM

Again, thanks to everyone who contributed. Here are some results

So it seems that the solution 3(@mkeintz) is giving me the correct answer. So each solution runs on the sample data and then I added 6 more accounts to sample data and it seemed to work with additional id’s except the merge one below. However when I run the full October dataset I have the following issues:

@PeterClemmensen solution 1: It’s not dropping enough records in the final solution but I can’t figure out where it’s not dropping them. I think it’s matching correctly and identifies 3162 matched of the 5107 but the final “want” dataset is only reduced by 1604? Not the 3162? Also on this solution, I am not following what he is doing with the U hash object.

Original data rows	want dataset	dataset reduced by
59,650,509	59,648,905	1,604

@novinosrin Solution 2: I am not able to run on the full data set and receive an error “Error: Duplicate Key”

@mkeintz – matches 3177 of the 5107. However it removes and non-type 1 or 2 transactions from final dataset but that is ok cause at least I know what it’s removing vs. solution 1.

@Ksharp Solution 4: the merge data set I did not try on the full data set since the problem below exists

--
Paige Miller

Re: Complicated matching problem

Re: Complicated matching problem

Re: Complicated matching problem

Re: Complicated matching problem

Re: Complicated matching problem

Re: Complicated matching problem

Re: Complicated matching problem

Re: Complicated matching problem

Re: Complicated matching problem

Re: Complicated matching problem

Re: Complicated matching problem

Re: Complicated matching problem

Re: Complicated matching problem

Re: Complicated matching problem

Registration is open

SAS Training: Just a Click Away