I have a dataset that that appears as below. There may be multiple COD_MOT_RSPS_TRAN codes for the same SCORE_CUSTOMER_ACCOUNT_XID, TRANSACTION_DTTM and TRANSACTION_AMT:
I sorted with the following code:
proc sort data=work.final_decision;
by score_customer_account_xid transaction_dttm transaction_amt cod_mot_rsps_tran;
run;
What I want to do next is to create another dataset that gives me only the 1st observation by COD_MOT_RSPS_TRN if there are multiple and all other column values are equal (account, datetime, and amount fields).
As an example, here are two observations with the same account and timestamp, but with 2 different COD_MOS_RSPS_TRN values. In the resulting dataset, I would want to keep the observation associated with FL5 since that would would be the 1st record after sorting:
I attempted to accomplish this with the below code, but it removed far too many records (e.g. the example account above was completely removed from the dataset):
data work.final_decision_nodups;
set work.final_decision;
by score_customer_account_xid transaction_dttm transaction_amt cod_mot_rsps_tran;
if first.cod_mot_rsps_tran then;
else output;
run;
Can anyone tell me what I'm doing incorrectly? I'm on SAS EG 7.1. Thanks!
data work.final_decision_nodups;
set work.final_decision;
by score_customer_account_xid transaction_dttm transaction_amt
cod_mot_rsps_tran;
if first.cod_mot_rsps_tran then output;
run;
In this case, your IF THEN ELSE does not perform the task you wanted it to perform. You said "I want to ... create another dataset that gives me only the 1st observation by COD_MOT_RSPS_TRN". So your IF statement ought to mirror those words ... which ought to translate almost directly in SAS syntax. And so the adjustment to your code is above.
Thank you for your reply, but seems like something is still missing. When I run the code you suggest, then query against it for my example account, I receive the exact same output as from work.final_decision:
@lujo1017 wrote:
Thank you for your reply, but seems like something is still missing. When I run the code you suggest, then query against it for my example account, I receive the exact same output as from work.final_decision:
Why is this not the correct answer? It looks right to me. You get the first record for each value of COD_MOT_RSPS_TRAN, which is exactly what you asked for. You said:
What I want to do next is to create another dataset that gives me only the 1st observation by COD_MOT_RSPS_TRN if there are multiple and all other column values are equal (account, datetime, and amount fields).
My apologies, as I don't think I explained correctly. Using the example, I only want to keep the FL5 record. If the values in all the other columns are duplicates, I don't want multiple values for COD_MOT_RSPS_TRAN. I only want the 1st one.
Ok, got it. So instead of the code I provided earlier, you want to use:
data work.final_decision_nodups;
set work.final_decision;
by score_customer_account_xid transaction_dttm transaction_amt cod_mot_rsps_tran;
if first._________________ then output;
run;
where (homework assignment) you fill in the blank.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.