Hi all,
I am trying to get multiple transactions happened with customer accounts with travel agent to a single travel account. But, I don't want to see the same customer having the same travel account transactions (single or multiple transactions). The data looks like this
data have;
input date mmddyy10. cust_account_id$ travel_agent$ trasfer_amount travel_account_id$;
format date mmddyy10.;
cards;
1/15/2018 wxyz dave 1000 abcd
2/14/2017 wxyz dave 2000 abcd
8/17/1998 mnop jason 8521 defg
05/24/1999 klmn greg 8754 opqr
11/30/2000 ryan greg 12548 opqr
06/25/2011 mall greg 1487 opqr
12/25/2013 john molly 2594 matt
10/28/2019 jim molly 5000 matt
12/23/2019 jim molly 10000 matt
;
run;
I want the output dataset to look like
05/24/1999 klmn greg 8754 opqr 11/30/2000 ryan greg 12548 opqr 06/25/2011 mall greg 1487 opqr 12/25/2013 john molly 2594 matt 10/28/2019 jim molly 5000 matt 12/23/2019 jim molly 10000 matt
I used following steps:
proc sort data=have ;
by travel_account_id travel_agent;
run;
data want1;
set have;
by travel_account_id ;
if first.travel_account_id and last.travel_account_id then delete;
else output want1;
run;
proc sort data=want1;
by cust_account_id;
run;
data want2;
set want1;
by cust_account_id;
if first.cust_account_id and last.cust_account_id then output;
else delete;
run;
I am unable to get the output i want. Could you please share your thoughts on this please??
Sounds like you want this query:
select * from have
group by travel_agent , travel_account_id
having count(distinct cust_account_id) > 1
;
If you wanted to do it with data step then first sort, for each group count how many customers and the based on number of customers output the data or not.
proc sort data=have ;
by travel_agent travel_account_id cust_account_id ;
run;
data want ;
do until (last.travel_account_id);
set have;
by travel_agent travel_account_id cust_account_id ;
num_cust = sum(num_cust,first.cust_account_id);
end;
do until (last.travel_account_id);
set have;
by travel_agent travel_account_id cust_account_id ;
if num_cust > 1 then output;
end;
run;
Can you explain better what you are trying to do?
Also explain what you think the code you posted is going to do and how you think that helps you get to the answer you want.
Tom,
I am trying to get the records that have the same travel agent making multiple customer account transactions to a single travel account id. For example these below records are perfect for the above description (where multiple customer accounts (klmn, ryan and mall) transactions were done by same travel agent (greg) for the same travel account id (opqr).
05/24/1999 klmn greg 8754 opqr
11/30/2000 ryan greg 12548 opqr
06/25/2011 mall greg 1487 opqr
At the same time, molly (travel agent) had multiple customer account ids (john and jim) has single travel account id (matt). But in other records (unique record meaning single customer id and travel account id) need to be deleted and at the same time dave (travel agent) has same customer id as well as travel id (which I don't want).
So, in my first step (proc sort) I sorted and deleted unique records, but I am unable to accomplish the second criteria to delete these records in my data (below are the records)
1/15/2018 wxyz dave 1000 abcd
2/14/2017 wxyz dave 2000 abcd
thanks
Sounds like you want this query:
select * from have
group by travel_agent , travel_account_id
having count(distinct cust_account_id) > 1
;
If you wanted to do it with data step then first sort, for each group count how many customers and the based on number of customers output the data or not.
proc sort data=have ;
by travel_agent travel_account_id cust_account_id ;
run;
data want ;
do until (last.travel_account_id);
set have;
by travel_agent travel_account_id cust_account_id ;
num_cust = sum(num_cust,first.cust_account_id);
end;
do until (last.travel_account_id);
set have;
by travel_agent travel_account_id cust_account_id ;
if num_cust > 1 then output;
end;
run;
Here is one way to do this (if I understood your question):
proc sort data=have; by travel_agent cust_account_id; run;
data want;
do until(last.travel_agent);
set have; by travel_agent cust_account_id;
if first.cust_account_id and not first.travel_agent then keep=1;
end;
do until(last.travel_agent);
set have; by travel_agent cust_account_id;
if keep then output;
end;
drop keep;
run;
Let's see if I understand this correctly
If I an correct with #1 and #2, and #3 is irrelevant (i.e. no need to consider travel_agent), then there is only one rule: Drop all records in which a given travel_account_id has only one cust_account_id (i.e. rule 2 includes rule 1). If this is the case, then:
proc sort data=have out=need;
by travel_account_id ;
run;
data want2 (drop=more_than_one_cust);
set need (in=firstpass) need (in=secondpass);
by travel_account_id;
retain more_than_one_cust;
if firstpass then do;
if cust_account_id^=lag(cust_account_id) then more_than_one_cust='Y';
if first.travel_account_id then more_than_one_cust='N';
end;
if secondpass and more_than_one_cust='Y';
run;
Thank you all for your replies.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.