Refining output of a join to exclude dup value created

Reply
Frequent Contributor
Posts: 103

Refining output of a join to exclude dup value created

[ Edited ]

If someone could provide guidance on this issue. After doing an SQL inner join with two tables, there are duplicate records created.

Is there a join option, or string that can be added to a query to essentially dictate "move in a sequential order to match records from table 1 ONCE to matching record of table B, then move on to next row" (in other words, do not match a given value from table A more than once with value from table B). Apologies if this sounds crude.

 

The query I ran:

 

proc sql; create table DID as
select distinct

a.*,
b.TXN_DT AS TRANSACTION_DATE,b.MSTR_CRD_TXN_ID,b.POSTNG_DT,b.SRC_SYS_POSTNG_TM_CNT
FROM Z4 A inner join (select distinct * from TRANS) B ON
A.ACCT_ID=B.ACCT_ID AND B.TXN_AMT=A.AMOUNT and (A.EVENT_DT=B.POSTNG_DT or A.EVENT_DT=B.POSTNG_DT-1 or A.EVENT_DT=B.POSTNG_DT-2 or A.EVENT_DT=B.POSTNG_DT-3) ;QUIT;

 

/*the reason for the join criterion to various POSTNG_DT (dates) is that there sometime is a time lapse between when records from table a get loaded in table b*/

 

 

Sample of current output:

 

 

WANT?UNIQUE_ID_BUNIQUE_ID_A
YES536184298975059187
NO536184298975059188
NO536184307975059187
YES536184307975059188

 

Notice that when selecting which of the two 975059187 from UNIQUE_ID_A, the row desired has a lower value in UNIQUE_ID_B while the opposite occurs selecting which to pick for 975059188 (the value in B is higher).

 

And here is my attempt using a BASES sas merge, (but a possible issue here is I am missing some records in the output):

 

proc sort data=trans; by ACCT_ID POSTNG_DT SRC_SYS_POSTNG_TM_CNT TXN_AMT MSTR_CRD_TXN_ID;run;

 

proc sort data=z4; by ACCT_ID EVENT_DT amount event_id;run;

 

data merged;
merge trans (in=a KEEP=MSTR_CRD_TXN_ID SRC_SYS_ACCT_ID POSTNG_DT SRC_SYS_POSTNG_TM_CNT txn_amt)
z4 (IN=b) ;
by SRC_SYS_ACCT_ID ;
IF (POSTNG_DT-3=EVENT_DT OR POSTNG_DT-2=EVENT_DT or POSTNG_DT-1=EVENT_DT or POSTNG_DT=EVENT_DT) AND TXN_AMT=AMOUNT;RUN;

 

Thanks in advance

Super User
Posts: 12,148

Re: Refining output of a join to exclude dups created

It isn't clear if your "sample of output" is your actual result or your desired result.

If it is your actual result then I do not see any "duplicate". If you have some additional rule about relationships between Unique_Id_B and Unique_Id_A I think you need to be a bit more detailed.

 

It might help to provide short examples of the two sets and desired result.

Frequent Contributor
Posts: 103

Re: Refining output of a join to exclude dup value created

thanks for your suggestions. 

 

- "sample of output" is actual result (I added a first column to indicate the rows I ideally want to retain)

- Each unique id in Unique_ID_A, I only want to see once (e.g., 975059187 appears twice)

            -Additionally, there is also a unique identifier from table B (i.e., I would like to retain 536184298 linked with 975059187)

 

-Some details on this data: table B captures anytime an agent processes a credit on an account. Could be more than 1 credit for same account same day, and could get deleted prior to getting processed.

table A captures credits that did get processed. The objective is to identify which credits did get processed for a given agent.

 

-Additional rules: the linking variable between both tables is the acct_id. Then the criteria is, match by date, and amount.

 

   - Also, each table contains a unique identifier whose sequence is linear (a greater number means the credit got processed later)

 

Perhaps a data step merge is best approach to do the join? 

Ask a Question
Discussion stats
  • 2 replies
  • 129 views
  • 0 likes
  • 2 in conversation