Sir @PaigeMiller FYI- I made a couple of edits to the above. Also at any point you want to test the code, please feel free to let me know if you need a details of the code in piecemeal. A nudge online/offline will alert me just in case.
Thanks, we'll see how it goes!
One more logic clarification in the expected output please -
why 21jan isnt matched with 30jan as it occurs before 28th?
@novinosrin wrote:
One more logic clarification in the expected output please -
why 21jan isnt matched with 30jan as it occurs before 28th?
I had the same question.
Just simple MERGE ?
data have ;
infile datalines dsd truncover;
length id $4;
input id:$3. dateTran:DATE9. Trans:$5. SourceType:$2. TranValueADj:32.;
id=cats('A',id);
format dateTran DATE9.;
datalines4;
001,03JAN2020,Type1,M1,1776
001,03JAN2020,Type1,A2,1824.12
001,03JAN2020,Type1,M1,1000
001,17JAN2020,Type1,A2,1830.88
001,27JAN2020,Type1,M1,375.27
001,27JAN2020,Type1,M1,470.04
001,27JAN2020,Type1,M1,4712.3
001,31JAN2020,Type1,A2,1869.88
002,02JAN2020,Type1,B3,1848
002,03JAN2020,Type1,A2,3351
002,03JAN2020,Type1,A2,871
002,03JAN2020,Type1,A2,5135
002,03JAN2020,Type2,,1000
002,03JAN2020,Type1,A2,1070
002,03JAN2020,Type1,A2,8357
002,06JAN2020,Type1,B3,49425
002,06JAN2020,Type1,B3,61744.1
002,06JAN2020,Type1,B3,11273
002,07JAN2020,Type2,,2198
002,07JAN2020,Type1,B3,36006.95
002,08JAN2020,Type1,B3,29329.5
002,10JAN2020,Type1,B3,695
002,10JAN2020,Type2,,703
002,10JAN2020,Type1,B3,29912.3
002,13JAN2020,Type2,,250
002,13JAN2020,Type1,B3,24951
002,14JAN2020,Type2,,250
002,16JAN2020,Type1,B3,9759.5
002,17JAN2020,Type1,A2,652
002,17JAN2020,Type1,B3,4538
002,21JAN2020,Type1,B3,845
002,21JAN2020,Type1,B3,83436.27
002,21JAN2020,Type1,B3,75
002,21JAN2020,Type1,B3,8814
002,22JAN2020,Type2,,250
002,24JAN2020,Type1,B3,2270
002,24JAN2020,Type1,B3,2258
002,28JAN2020,Type1,B3,9135
003,03JAN2020,Type1,B3,100000
003,03JAN2020,Type1,A2,2061
003,07JAN2020,Type2,,100000
003,13JAN2020,Type1,A2,775.1
003,14JAN2020,Type1,B3,100000
003,16JAN2020,Type2,,100000
003,23JAN2020,Type1,B3,25000
003,24JAN2020,Type1,A2,121.16
003,27JAN2020,Type1,A2,146.67
003,31JAN2020,Type1,A2,1246.7
004,06JAN2020,Type1,A1,300
004,09JAN2020,Type2,,300
004,15JAN2020,Type1,A2,1204.5
004,31JAN2020,Type1,B1,0
004,31JAN2020,Type1,A2,3079.09
005,13JAN2020,Type1,B3,1000
005,21JAN2020,Type1,B3,1000
005,23JAN2020,Type2,,1000
005,28JAN2020,Type1,B3,1000
005,30JAN2020,Type2,,1000
006,02JAN2020,Type1,M1,250
006,03JAN2020,Type1,A2,23.72
006,03JAN2020,Type2,,200
006,06JAN2020,Type1,M1,400
006,09JAN2020,Type2,,400
006,16JAN2020,Type1,M1,150
006,21JAN2020,Type1,M1,190
006,22JAN2020,Type1,M1,200
006,23JAN2020,Type1,M1,200
006,24JAN2020,Type1,M1,184.75
006,27JAN2020,Type1,M1,120
006,28JAN2020,Type2,,200
006,29JAN2020,Type1,M1,550
007,02JAN2020,Type1,B1,0
007,13JAN2020,Type1,MF,180
007,13JAN2020,Type1,M1,175
007,15JAN2020,Type1,M1,240
007,16JAN2020,Type1,M1,440
007,16JAN2020,Type2,,175
007,17JAN2020,Type1,M1,345
007,21JAN2020,Type2,,240
007,22JAN2020,Type2,,440
007,23JAN2020,Type2,,345
;;;;
data type1 type2;
set have;
if Trans='Type1' then output type1;
else output type2;
run;
proc sort data=type1;by id TranValueADj;run;
proc sort data=type2;by id TranValueADj;run;
data type2;
set type2;
_TranValueADj=TranValueADj;
run;
data want;
ina=0;inb=0;
merge type1(in=ina) type2(rename=(dateTran=_dateTran Trans=_Trans SourceType=_SourceType) in=inb);
by id TranValueADj;
if not ina then do;dateTran=_dateTran; Trans=_Trans;SourceType=_SourceType;call missing(of _:); end;
if not inb then call missing(of _:);
drop _Trans _SourceType;
run;
proc sort data=want;by id dateTran;run;
Again, thanks to everyone who contributed. Here are some results
So it seems that the solution 3(@mkeintz) is giving me the correct answer. So each solution runs on the sample data and then I added 6 more accounts to sample data and it seemed to work with additional id’s except the merge one below. However when I run the full October dataset I have the following issues:
@PeterClemmensen solution 1: It’s not dropping enough records in the final solution but I can’t figure out where it’s not dropping them. I think it’s matching correctly and identifies 3162 matched of the 5107 but the final “want” dataset is only reduced by 1604? Not the 3162? Also on this solution, I am not following what he is doing with the U hash object.
|
Original data rows |
want dataset |
dataset reduced by |
|
59,650,509 |
59,648,905 |
1,604 |
@novinosrin Solution 2: I am not able to run on the full data set and receive an error “Error: Duplicate Key”
@mkeintz – matches 3177 of the 5107. However it removes and non-type 1 or 2 transactions from final dataset but that is ok cause at least I know what it’s removing vs. solution 1.
@Ksharp Solution 4: the merge data set I did not try on the full data set since the problem below exists
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.