BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
novinosrin
Tourmaline | Level 20

Sir @PaigeMiller  FYI- I made a couple of edits to the above. Also at any point you want to test the code, please feel free to let me know if you need a details of the code in piecemeal. A nudge online/offline will alert me just in case. 

PaigeMiller
Diamond | Level 26

Thanks, we'll see how it goes!

--
Paige Miller
novinosrin
Tourmaline | Level 20

One more logic clarification in the expected output please -

results.PNG

why 21jan isnt matched with 30jan as it occurs before 28th? 

mkeintz
PROC Star

@novinosrin wrote:

One more logic clarification in the expected output please -

results.PNG

why 21jan isnt matched with 30jan as it occurs before 28th? 


I had the same question.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User

Just simple MERGE ?

 

data have ;
  infile datalines dsd truncover;
  length id $4;
  input id:$3. dateTran:DATE9. Trans:$5. SourceType:$2. TranValueADj:32.;
  id=cats('A',id);
  format dateTran DATE9.;
datalines4;
001,03JAN2020,Type1,M1,1776
001,03JAN2020,Type1,A2,1824.12
001,03JAN2020,Type1,M1,1000
001,17JAN2020,Type1,A2,1830.88
001,27JAN2020,Type1,M1,375.27
001,27JAN2020,Type1,M1,470.04
001,27JAN2020,Type1,M1,4712.3
001,31JAN2020,Type1,A2,1869.88
002,02JAN2020,Type1,B3,1848
002,03JAN2020,Type1,A2,3351
002,03JAN2020,Type1,A2,871
002,03JAN2020,Type1,A2,5135
002,03JAN2020,Type2,,1000
002,03JAN2020,Type1,A2,1070
002,03JAN2020,Type1,A2,8357
002,06JAN2020,Type1,B3,49425
002,06JAN2020,Type1,B3,61744.1
002,06JAN2020,Type1,B3,11273
002,07JAN2020,Type2,,2198
002,07JAN2020,Type1,B3,36006.95
002,08JAN2020,Type1,B3,29329.5
002,10JAN2020,Type1,B3,695
002,10JAN2020,Type2,,703
002,10JAN2020,Type1,B3,29912.3
002,13JAN2020,Type2,,250
002,13JAN2020,Type1,B3,24951
002,14JAN2020,Type2,,250
002,16JAN2020,Type1,B3,9759.5
002,17JAN2020,Type1,A2,652
002,17JAN2020,Type1,B3,4538
002,21JAN2020,Type1,B3,845
002,21JAN2020,Type1,B3,83436.27
002,21JAN2020,Type1,B3,75
002,21JAN2020,Type1,B3,8814
002,22JAN2020,Type2,,250
002,24JAN2020,Type1,B3,2270
002,24JAN2020,Type1,B3,2258
002,28JAN2020,Type1,B3,9135
003,03JAN2020,Type1,B3,100000
003,03JAN2020,Type1,A2,2061
003,07JAN2020,Type2,,100000
003,13JAN2020,Type1,A2,775.1
003,14JAN2020,Type1,B3,100000
003,16JAN2020,Type2,,100000
003,23JAN2020,Type1,B3,25000
003,24JAN2020,Type1,A2,121.16
003,27JAN2020,Type1,A2,146.67
003,31JAN2020,Type1,A2,1246.7
004,06JAN2020,Type1,A1,300
004,09JAN2020,Type2,,300
004,15JAN2020,Type1,A2,1204.5
004,31JAN2020,Type1,B1,0
004,31JAN2020,Type1,A2,3079.09
005,13JAN2020,Type1,B3,1000
005,21JAN2020,Type1,B3,1000
005,23JAN2020,Type2,,1000
005,28JAN2020,Type1,B3,1000
005,30JAN2020,Type2,,1000
006,02JAN2020,Type1,M1,250
006,03JAN2020,Type1,A2,23.72
006,03JAN2020,Type2,,200
006,06JAN2020,Type1,M1,400
006,09JAN2020,Type2,,400
006,16JAN2020,Type1,M1,150
006,21JAN2020,Type1,M1,190
006,22JAN2020,Type1,M1,200
006,23JAN2020,Type1,M1,200
006,24JAN2020,Type1,M1,184.75
006,27JAN2020,Type1,M1,120
006,28JAN2020,Type2,,200
006,29JAN2020,Type1,M1,550
007,02JAN2020,Type1,B1,0
007,13JAN2020,Type1,MF,180
007,13JAN2020,Type1,M1,175
007,15JAN2020,Type1,M1,240
007,16JAN2020,Type1,M1,440
007,16JAN2020,Type2,,175
007,17JAN2020,Type1,M1,345
007,21JAN2020,Type2,,240
007,22JAN2020,Type2,,440
007,23JAN2020,Type2,,345
;;;;
data type1 type2;
 set have;
 if Trans='Type1' then output type1;
  else output type2;
run;
proc sort data=type1;by id TranValueADj;run;
proc sort data=type2;by id TranValueADj;run;
data type2;
 set type2;
 _TranValueADj=TranValueADj;
run;

data want;
 ina=0;inb=0;
 merge type1(in=ina) type2(rename=(dateTran=_dateTran Trans=_Trans SourceType=_SourceType) in=inb);
 by id TranValueADj;
 if not ina then do;dateTran=_dateTran; Trans=_Trans;SourceType=_SourceType;call missing(of _:); end;
 if not inb then  call missing(of _:);
 drop _Trans _SourceType;
run;
proc sort data=want;by id dateTran;run;
PaigeMiller
Diamond | Level 26

Thanks, @Ksharp and @mkeintz , we will get around to trying all of the solutions offered as time permits, it does take a long time when the true data set has nearly 60million records and we have other things to do as well.

--
Paige Miller
PaigeMiller
Diamond | Level 26

Again, thanks to everyone who contributed. Here are some results

 

So it seems that the solution 3(@mkeintz) is giving me the correct answer.  So each solution runs on the sample data and then I added 6 more accounts to sample data and it seemed to work with additional id’s except the merge one below. However when I run the full  October dataset I have the following issues:

 

@PeterClemmensen solution 1: It’s not dropping enough records in the final solution but I can’t figure out where it’s not dropping them. I think it’s matching correctly and identifies 3162 matched of the 5107  but the final “want” dataset is only reduced by 1604? Not the 3162?   Also on this solution, I am not following what he is doing with the U hash object.

Original data rows

want dataset

dataset reduced by

                              59,650,509

                  59,648,905

                                    1,604

 

 

@novinosrin Solution 2: I am not able to run on the full data set and receive an error “Error: Duplicate Key”

 

@mkeintz – matches 3177 of the 5107.  However it removes and non-type 1 or 2 transactions from final dataset but that is ok cause at least I know what it’s removing vs. solution 1.

 

 

@Ksharp Solution 4: the merge data set I did not try on the full data set since the problem below exists

--
Paige Miller

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 21 replies
  • 1886 views
  • 4 likes
  • 6 in conversation