BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
buddha_d
Pyrite | Level 9

Hi all,

          I am trying to get multiple transactions happened with customer accounts with travel agent to a single travel account. But, I don't want to see the same customer having the same travel account transactions (single or multiple transactions). The data looks like this

 

data have;
input date mmddyy10. cust_account_id$ travel_agent$ trasfer_amount travel_account_id$;
format date mmddyy10.;
cards;
1/15/2018 wxyz dave 1000 abcd
2/14/2017 wxyz dave 2000 abcd
8/17/1998 mnop jason 8521 defg
05/24/1999 klmn greg 8754 opqr
11/30/2000 ryan greg 12548 opqr
06/25/2011 mall greg 1487 opqr
12/25/2013 john molly 2594 matt
10/28/2019 jim molly 5000 matt
12/23/2019 jim molly 10000 matt
;
run;

 

I want the output dataset to look like 

05/24/1999 klmn greg 8754 opqr
11/30/2000 ryan greg 12548 opqr
06/25/2011 mall greg 1487 opqr
12/25/2013 john molly 2594 matt
10/28/2019 jim molly 5000 matt
12/23/2019 jim molly 10000 matt

 

I used following steps:

proc sort data=have ;
by travel_account_id travel_agent;
run;

data want1;
set have;
by travel_account_id ;
if first.travel_account_id and last.travel_account_id then delete;
else output want1;
run;

proc sort data=want1;
by cust_account_id;
run;

data want2;
set want1;
by cust_account_id;
if first.cust_account_id and last.cust_account_id then output;
else delete;
run;

I am unable to get the output i want. Could you please share your thoughts on this please??

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Sounds like you want this query:

select * from have
  group by travel_agent , travel_account_id
  having  count(distinct cust_account_id) > 1
;

If you wanted to do it with data step then first sort,  for each group count how many customers and the based on number of customers output the data or not.

proc sort data=have ;
  by travel_agent travel_account_id cust_account_id ;
run;

data want ;
do until (last.travel_account_id);
  set have;
  by travel_agent travel_account_id cust_account_id ;
  num_cust = sum(num_cust,first.cust_account_id);
end;
do until (last.travel_account_id);
  set have;
  by travel_agent travel_account_id cust_account_id ;
  if num_cust > 1 then output;
end;
run;

View solution in original post

6 REPLIES 6
Tom
Super User Tom
Super User

Can you explain better what you are trying to do?

Also explain what you think the code you posted is going to do and how you think that helps you get to the answer you want. 

buddha_d
Pyrite | Level 9

Tom,

       I am trying to get the records that have the same travel agent making multiple customer account transactions to a single travel account id. For example these below records are perfect for the above description (where multiple customer accounts (klmn, ryan and mall) transactions were done by same travel agent (greg) for the same travel account id (opqr). 

 

05/24/1999 klmn greg 8754 opqr
11/30/2000 ryan greg 12548 opqr
06/25/2011 mall greg 1487 opqr

 

At the same time, molly (travel agent) had multiple customer account ids (john and jim)  has single travel account id (matt). But in other records (unique record meaning single customer id and travel account id) need to be deleted and at the same time  dave (travel agent) has same customer id as well as travel id (which I don't want). 

 

So, in my first step (proc sort)  I sorted and deleted unique records, but I am unable to accomplish the second criteria to delete these records in my data (below are the records)

 

1/15/2018 wxyz dave 1000 abcd
2/14/2017 wxyz dave 2000 abcd

 

thanks 

Tom
Super User Tom
Super User

Sounds like you want this query:

select * from have
  group by travel_agent , travel_account_id
  having  count(distinct cust_account_id) > 1
;

If you wanted to do it with data step then first sort,  for each group count how many customers and the based on number of customers output the data or not.

proc sort data=have ;
  by travel_agent travel_account_id cust_account_id ;
run;

data want ;
do until (last.travel_account_id);
  set have;
  by travel_agent travel_account_id cust_account_id ;
  num_cust = sum(num_cust,first.cust_account_id);
end;
do until (last.travel_account_id);
  set have;
  by travel_agent travel_account_id cust_account_id ;
  if num_cust > 1 then output;
end;
run;
PGStats
Opal | Level 21

Here is one way to do this (if I understood your question):

 

proc sort data=have; by travel_agent cust_account_id; run;

data want;
do until(last.travel_agent);
	set have; by travel_agent cust_account_id;
	if first.cust_account_id and not first.travel_agent then keep=1;
	end;
do until(last.travel_agent);
	set have; by travel_agent cust_account_id;
	if keep then output;
	end;
drop keep;
run;
PG
mkeintz
PROC Star

 

Let's see if I understand this correctly

 

  1. You want to eliminate any record that represents the only record for a given travel_account_id.  So drop the
      8/17/1998 mnop jason 8521 defg
    record.
  2. But what is the rule you are using to drop these two records?
    1/15/2018 wxyz dave 1000 abcd 
    2/14/2017 wxyz dave 2000 abcd
    Is it because travel_account_id "abcd" has only one customer (2 records for cust_account_id "wxyz")?

  3. Or does this have to somehow involve the travel_agent ("dave") as well?

If I an correct with #1 and #2,  and #3 is irrelevant (i.e. no need to consider travel_agent), then there is only one rule: Drop all records in which a given travel_account_id has only one cust_account_id  (i.e. rule 2 includes rule 1).  If this is the case, then:

 

 

proc sort data=have out=need;
  by travel_account_id ;
run;
data want2 (drop=more_than_one_cust);
  set need (in=firstpass)   need (in=secondpass);
  by travel_account_id;
  retain more_than_one_cust;
  if firstpass then do;
    if cust_account_id^=lag(cust_account_id) then more_than_one_cust='Y';
    if first.travel_account_id then more_than_one_cust='N';
  end;
  if secondpass and more_than_one_cust='Y';
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
buddha_d
Pyrite | Level 9

Thank you all for your replies. 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1143 views
  • 0 likes
  • 4 in conversation