BookmarkSubscribeRSS Feed
brulard
Pyrite | Level 9

If someone could provide guidance on this issue. After doing an SQL inner join with two tables, there are duplicate records created.

Is there a join option, or string that can be added to a query to essentially dictate "move in a sequential order to match records from table 1 ONCE to matching record of table B, then move on to next row" (in other words, do not match a given value from table A more than once with value from table B). Apologies if this sounds crude.

 

The query I ran:

 

proc sql; create table DID as
select distinct

a.*,
b.TXN_DT AS TRANSACTION_DATE,b.MSTR_CRD_TXN_ID,b.POSTNG_DT,b.SRC_SYS_POSTNG_TM_CNT
FROM Z4 A inner join (select distinct * from TRANS) B ON
A.ACCT_ID=B.ACCT_ID AND B.TXN_AMT=A.AMOUNT and (A.EVENT_DT=B.POSTNG_DT or A.EVENT_DT=B.POSTNG_DT-1 or A.EVENT_DT=B.POSTNG_DT-2 or A.EVENT_DT=B.POSTNG_DT-3) ;QUIT;

 

/*the reason for the join criterion to various POSTNG_DT (dates) is that there sometime is a time lapse between when records from table a get loaded in table b*/

 

 

Sample of current output:

 

 

WANT?UNIQUE_ID_BUNIQUE_ID_A
YES536184298975059187
NO536184298975059188
NO536184307975059187
YES536184307975059188

 

Notice that when selecting which of the two 975059187 from UNIQUE_ID_A, the row desired has a lower value in UNIQUE_ID_B while the opposite occurs selecting which to pick for 975059188 (the value in B is higher).

 

And here is my attempt using a BASES sas merge, (but a possible issue here is I am missing some records in the output):

 

proc sort data=trans; by ACCT_ID POSTNG_DT SRC_SYS_POSTNG_TM_CNT TXN_AMT MSTR_CRD_TXN_ID;run;

 

proc sort data=z4; by ACCT_ID EVENT_DT amount event_id;run;

 

data merged;
merge trans (in=a KEEP=MSTR_CRD_TXN_ID SRC_SYS_ACCT_ID POSTNG_DT SRC_SYS_POSTNG_TM_CNT txn_amt)
z4 (IN=b) ;
by SRC_SYS_ACCT_ID ;
IF (POSTNG_DT-3=EVENT_DT OR POSTNG_DT-2=EVENT_DT or POSTNG_DT-1=EVENT_DT or POSTNG_DT=EVENT_DT) AND TXN_AMT=AMOUNT;RUN;

 

Thanks in advance

2 REPLIES 2
ballardw
Super User

It isn't clear if your "sample of output" is your actual result or your desired result.

If it is your actual result then I do not see any "duplicate". If you have some additional rule about relationships between Unique_Id_B and Unique_Id_A I think you need to be a bit more detailed.

 

It might help to provide short examples of the two sets and desired result.

brulard
Pyrite | Level 9

thanks for your suggestions. 

 

- "sample of output" is actual result (I added a first column to indicate the rows I ideally want to retain)

- Each unique id in Unique_ID_A, I only want to see once (e.g., 975059187 appears twice)

            -Additionally, there is also a unique identifier from table B (i.e., I would like to retain 536184298 linked with 975059187)

 

-Some details on this data: table B captures anytime an agent processes a credit on an account. Could be more than 1 credit for same account same day, and could get deleted prior to getting processed.

table A captures credits that did get processed. The objective is to identify which credits did get processed for a given agent.

 

-Additional rules: the linking variable between both tables is the acct_id. Then the criteria is, match by date, and amount.

 

   - Also, each table contains a unique identifier whose sequence is linear (a greater number means the credit got processed later)

 

Perhaps a data step merge is best approach to do the join? 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 484 views
  • 0 likes
  • 2 in conversation