topic Re: Refining output of a join to exclude dup value created in SAS Programming

Refining output of a join to exclude dup value created

brulard — Thu, 09 Nov 2017 11:41:28 GMT

If someone could provide guidance on this issue. After doing an SQL inner join with two tables, there are duplicate records created.

Is there a join option, or string that can be added to a query to essentially dictate "move in a sequential order to match records from table 1 ONCE to matching record of table B, then move on to next row" (in other words, do not match a given value from table A more than once with value from table B). Apologies if this sounds crude.

The query I ran:

proc sql; create table DID as
select distinct

a.*,
b.TXN_DT AS TRANSACTION_DATE,b.MSTR_CRD_TXN_ID,b.POSTNG_DT,b.SRC_SYS_POSTNG_TM_CNT
FROM Z4 A inner join (select distinct * from TRANS) B ON
A.ACCT_ID=B.ACCT_ID AND B.TXN_AMT=A.AMOUNT and (A.EVENT_DT=B.POSTNG_DT or A.EVENT_DT=B.POSTNG_DT-1 or A.EVENT_DT=B.POSTNG_DT-2 or A.EVENT_DT=B.POSTNG_DT-3) ;QUIT;

/*the reason for the join criterion to various POSTNG_DT (dates) is that there sometime is a time lapse between when records from table a get loaded in table b*/

Sample of current output:

WANT?	UNIQUE_ID_B	UNIQUE_ID_A
YES	536184298	975059187
NO	536184298	975059188
NO	536184307	975059187
YES	536184307	975059188

Notice that when selecting which of the two 975059187 from UNIQUE_ID_A, the row desired has a lower value in UNIQUE_ID_B while the opposite occurs selecting which to pick for 975059188 (the value in B is higher).

And here is my attempt using a BASES sas merge, (but a possible issue here is I am missing some records in the output):

proc sort data=trans; by ACCT_ID POSTNG_DT SRC_SYS_POSTNG_TM_CNT TXN_AMT MSTR_CRD_TXN_ID;run;

proc sort data=z4; by ACCT_ID EVENT_DT amount event_id;run;

data merged;
merge trans (in=a KEEP=MSTR_CRD_TXN_ID SRC_SYS_ACCT_ID POSTNG_DT SRC_SYS_POSTNG_TM_CNT txn_amt)
z4 (IN=b) ;
by SRC_SYS_ACCT_ID ;
IF (POSTNG_DT-3=EVENT_DT OR POSTNG_DT-2=EVENT_DT or POSTNG_DT-1=EVENT_DT or POSTNG_DT=EVENT_DT) AND TXN_AMT=AMOUNT;RUN;

Thanks in advance

Re: Refining output of a join to exclude dups created

ballardw — Wed, 08 Nov 2017 22:59:33 GMT

It isn't clear if your "sample of output" is your actual result or your desired result.

If it is your actual result then I do not see any "duplicate". If you have some additional rule about relationships between Unique_Id_B and Unique_Id_A I think you need to be a bit more detailed.

It might help to provide short examples of the two sets and desired result.

Re: Refining output of a join to exclude dup value created

brulard — Thu, 09 Nov 2017 12:33:37 GMT

thanks for your suggestions.

- "sample of output" is actual result (I added a first column to indicate the rows I ideally want to retain)

- Each unique id in Unique_ID_A, I only want to see once (e.g., 975059187 appears twice)

-Additionally, there is also a unique identifier from table B (i.e., I would like to retain 536184298 linked with 975059187)

-Some details on this data: table B captures anytime an agent processes a credit on an account. Could be more than 1 credit for same account same day, and could get deleted prior to getting processed.

table A captures credits that did get processed. The objective is to identify which credits did get processed for a given agent.

-Additional rules: the linking variable between both tables is the acct_id. Then the criteria is, match by date, and amount.

- Also, each table contains a unique identifier whose sequence is linear (a greater number means the credit got processed later)

Perhaps a data step merge is best approach to do the join?