Sir @PaigeMiller FYI- I made a couple of edits to the above. Also at any point you want to test the code, please feel free to let me know if you need a details of the code in piecemeal. A nudge online/offline will alert me just in case.
Thanks, we'll see how it goes!
One more logic clarification in the expected output please -
why 21jan isnt matched with 30jan as it occurs before 28th?
@novinosrin wrote:
One more logic clarification in the expected output please -
why 21jan isnt matched with 30jan as it occurs before 28th?
I had the same question.
Just simple MERGE ?
data have ; infile datalines dsd truncover; length id $4; input id:$3. dateTran:DATE9. Trans:$5. SourceType:$2. TranValueADj:32.; id=cats('A',id); format dateTran DATE9.; datalines4; 001,03JAN2020,Type1,M1,1776 001,03JAN2020,Type1,A2,1824.12 001,03JAN2020,Type1,M1,1000 001,17JAN2020,Type1,A2,1830.88 001,27JAN2020,Type1,M1,375.27 001,27JAN2020,Type1,M1,470.04 001,27JAN2020,Type1,M1,4712.3 001,31JAN2020,Type1,A2,1869.88 002,02JAN2020,Type1,B3,1848 002,03JAN2020,Type1,A2,3351 002,03JAN2020,Type1,A2,871 002,03JAN2020,Type1,A2,5135 002,03JAN2020,Type2,,1000 002,03JAN2020,Type1,A2,1070 002,03JAN2020,Type1,A2,8357 002,06JAN2020,Type1,B3,49425 002,06JAN2020,Type1,B3,61744.1 002,06JAN2020,Type1,B3,11273 002,07JAN2020,Type2,,2198 002,07JAN2020,Type1,B3,36006.95 002,08JAN2020,Type1,B3,29329.5 002,10JAN2020,Type1,B3,695 002,10JAN2020,Type2,,703 002,10JAN2020,Type1,B3,29912.3 002,13JAN2020,Type2,,250 002,13JAN2020,Type1,B3,24951 002,14JAN2020,Type2,,250 002,16JAN2020,Type1,B3,9759.5 002,17JAN2020,Type1,A2,652 002,17JAN2020,Type1,B3,4538 002,21JAN2020,Type1,B3,845 002,21JAN2020,Type1,B3,83436.27 002,21JAN2020,Type1,B3,75 002,21JAN2020,Type1,B3,8814 002,22JAN2020,Type2,,250 002,24JAN2020,Type1,B3,2270 002,24JAN2020,Type1,B3,2258 002,28JAN2020,Type1,B3,9135 003,03JAN2020,Type1,B3,100000 003,03JAN2020,Type1,A2,2061 003,07JAN2020,Type2,,100000 003,13JAN2020,Type1,A2,775.1 003,14JAN2020,Type1,B3,100000 003,16JAN2020,Type2,,100000 003,23JAN2020,Type1,B3,25000 003,24JAN2020,Type1,A2,121.16 003,27JAN2020,Type1,A2,146.67 003,31JAN2020,Type1,A2,1246.7 004,06JAN2020,Type1,A1,300 004,09JAN2020,Type2,,300 004,15JAN2020,Type1,A2,1204.5 004,31JAN2020,Type1,B1,0 004,31JAN2020,Type1,A2,3079.09 005,13JAN2020,Type1,B3,1000 005,21JAN2020,Type1,B3,1000 005,23JAN2020,Type2,,1000 005,28JAN2020,Type1,B3,1000 005,30JAN2020,Type2,,1000 006,02JAN2020,Type1,M1,250 006,03JAN2020,Type1,A2,23.72 006,03JAN2020,Type2,,200 006,06JAN2020,Type1,M1,400 006,09JAN2020,Type2,,400 006,16JAN2020,Type1,M1,150 006,21JAN2020,Type1,M1,190 006,22JAN2020,Type1,M1,200 006,23JAN2020,Type1,M1,200 006,24JAN2020,Type1,M1,184.75 006,27JAN2020,Type1,M1,120 006,28JAN2020,Type2,,200 006,29JAN2020,Type1,M1,550 007,02JAN2020,Type1,B1,0 007,13JAN2020,Type1,MF,180 007,13JAN2020,Type1,M1,175 007,15JAN2020,Type1,M1,240 007,16JAN2020,Type1,M1,440 007,16JAN2020,Type2,,175 007,17JAN2020,Type1,M1,345 007,21JAN2020,Type2,,240 007,22JAN2020,Type2,,440 007,23JAN2020,Type2,,345 ;;;; data type1 type2; set have; if Trans='Type1' then output type1; else output type2; run; proc sort data=type1;by id TranValueADj;run; proc sort data=type2;by id TranValueADj;run; data type2; set type2; _TranValueADj=TranValueADj; run; data want; ina=0;inb=0; merge type1(in=ina) type2(rename=(dateTran=_dateTran Trans=_Trans SourceType=_SourceType) in=inb); by id TranValueADj; if not ina then do;dateTran=_dateTran; Trans=_Trans;SourceType=_SourceType;call missing(of _:); end; if not inb then call missing(of _:); drop _Trans _SourceType; run; proc sort data=want;by id dateTran;run;
Again, thanks to everyone who contributed. Here are some results
So it seems that the solution 3(@mkeintz) is giving me the correct answer. So each solution runs on the sample data and then I added 6 more accounts to sample data and it seemed to work with additional id’s except the merge one below. However when I run the full October dataset I have the following issues:
@PeterClemmensen solution 1: It’s not dropping enough records in the final solution but I can’t figure out where it’s not dropping them. I think it’s matching correctly and identifies 3162 matched of the 5107 but the final “want” dataset is only reduced by 1604? Not the 3162? Also on this solution, I am not following what he is doing with the U hash object.
Original data rows |
want dataset |
dataset reduced by |
59,650,509 |
59,648,905 |
1,604 |
@novinosrin Solution 2: I am not able to run on the full data set and receive an error “Error: Duplicate Key”
@mkeintz – matches 3177 of the 5107. However it removes and non-type 1 or 2 transactions from final dataset but that is ok cause at least I know what it’s removing vs. solution 1.
@Ksharp Solution 4: the merge data set I did not try on the full data set since the problem below exists
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.