If someone could provide guidance on this issue. After doing an SQL inner join with two tables, there are duplicate records created.
Is there a join option, or string that can be added to a query to essentially dictate "move in a sequential order to match records from table 1 ONCE to matching record of table B, then move on to next row" (in other words, do not match a given value from table A more than once with value from table B). Apologies if this sounds crude.
The query I ran:
proc sql; create table DID as
select distinct
a.*,
b.TXN_DT AS TRANSACTION_DATE,b.MSTR_CRD_TXN_ID,b.POSTNG_DT,b.SRC_SYS_POSTNG_TM_CNT
FROM Z4 A inner join (select distinct * from TRANS) B ON
A.ACCT_ID=B.ACCT_ID AND B.TXN_AMT=A.AMOUNT and (A.EVENT_DT=B.POSTNG_DT or A.EVENT_DT=B.POSTNG_DT-1 or A.EVENT_DT=B.POSTNG_DT-2 or A.EVENT_DT=B.POSTNG_DT-3) ;QUIT;
/*the reason for the join criterion to various POSTNG_DT (dates) is that there sometime is a time lapse between when records from table a get loaded in table b*/
Sample of current output:
WANT? | UNIQUE_ID_B | UNIQUE_ID_A |
YES | 536184298 | 975059187 |
NO | 536184298 | 975059188 |
NO | 536184307 | 975059187 |
YES | 536184307 | 975059188 |
Notice that when selecting which of the two 975059187 from UNIQUE_ID_A, the row desired has a lower value in UNIQUE_ID_B while the opposite occurs selecting which to pick for 975059188 (the value in B is higher).
And here is my attempt using a BASES sas merge, (but a possible issue here is I am missing some records in the output):
proc sort data=trans; by ACCT_ID POSTNG_DT SRC_SYS_POSTNG_TM_CNT TXN_AMT MSTR_CRD_TXN_ID;run;
proc sort data=z4; by ACCT_ID EVENT_DT amount event_id;run;
data merged;
merge trans (in=a KEEP=MSTR_CRD_TXN_ID SRC_SYS_ACCT_ID POSTNG_DT SRC_SYS_POSTNG_TM_CNT txn_amt)
z4 (IN=b) ;
by SRC_SYS_ACCT_ID ;
IF (POSTNG_DT-3=EVENT_DT OR POSTNG_DT-2=EVENT_DT or POSTNG_DT-1=EVENT_DT or POSTNG_DT=EVENT_DT) AND TXN_AMT=AMOUNT;RUN;
Thanks in advance
It isn't clear if your "sample of output" is your actual result or your desired result.
If it is your actual result then I do not see any "duplicate". If you have some additional rule about relationships between Unique_Id_B and Unique_Id_A I think you need to be a bit more detailed.
It might help to provide short examples of the two sets and desired result.
thanks for your suggestions.
- "sample of output" is actual result (I added a first column to indicate the rows I ideally want to retain)
- Each unique id in Unique_ID_A, I only want to see once (e.g., 975059187 appears twice)
-Additionally, there is also a unique identifier from table B (i.e., I would like to retain 536184298 linked with 975059187)
-Some details on this data: table B captures anytime an agent processes a credit on an account. Could be more than 1 credit for same account same day, and could get deleted prior to getting processed.
table A captures credits that did get processed. The objective is to identify which credits did get processed for a given agent.
-Additional rules: the linking variable between both tables is the acct_id. Then the criteria is, match by date, and amount.
- Also, each table contains a unique identifier whose sequence is linear (a greater number means the credit got processed later)
Perhaps a data step merge is best approach to do the join?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.