Solved: using multiple first. and last.

buddha_d · Posted 01-16-2020 10:28 PM

Hi all,

I am trying to get multiple transactions happened with customer accounts with travel agent to a single travel account. But, I don't want to see the same customer having the same travel account transactions (single or multiple transactions). The data looks like this

data have;
input date mmddyy10. cust_account_id$ travel_agent$ trasfer_amount travel_account_id$;
format date mmddyy10.;
cards;
1/15/2018 wxyz dave 1000 abcd
2/14/2017 wxyz dave 2000 abcd
8/17/1998 mnop jason 8521 defg
05/24/1999 klmn greg 8754 opqr
11/30/2000 ryan greg 12548 opqr
06/25/2011 mall greg 1487 opqr
12/25/2013 john molly 2594 matt
10/28/2019 jim molly 5000 matt
12/23/2019 jim molly 10000 matt
;
run;

I want the output dataset to look like

05/24/1999 klmn greg 8754 opqr
11/30/2000 ryan greg 12548 opqr
06/25/2011 mall greg 1487 opqr
12/25/2013 john molly 2594 matt
10/28/2019 jim molly 5000 matt
12/23/2019 jim molly 10000 matt

I used following steps:

proc sort data=have ;
by travel_account_id travel_agent;
run;

data want1;
set have;
by travel_account_id ;
if first.travel_account_id and last.travel_account_id then delete;
else output want1;
run;

proc sort data=want1;
by cust_account_id;
run;

data want2;
set want1;
by cust_account_id;
if first.cust_account_id and last.cust_account_id then output;
else delete;
run;

I am unable to get the output i want. Could you please share your thoughts on this please??

Tom · Posted 01-16-2020 11:47 PM

Sounds like you want this query:

select * from have
  group by travel_agent , travel_account_id
  having  count(distinct cust_account_id) > 1
;

If you wanted to do it with data step then first sort, for each group count how many customers and the based on number of customers output the data or not.

proc sort data=have ;
  by travel_agent travel_account_id cust_account_id ;
run;

data want ;
do until (last.travel_account_id);
  set have;
  by travel_agent travel_account_id cust_account_id ;
  num_cust = sum(num_cust,first.cust_account_id);
end;
do until (last.travel_account_id);
  set have;
  by travel_agent travel_account_id cust_account_id ;
  if num_cust > 1 then output;
end;
run;

View solution in original post

Tom · Posted 01-16-2020 11:27 PM

Can you explain better what you are trying to do?

Also explain what you think the code you posted is going to do and how you think that helps you get to the answer you want.

buddha_d · Posted 01-16-2020 11:38 PM

Tom,

I am trying to get the records that have the same travel agent making multiple customer account transactions to a single travel account id. For example these below records are perfect for the above description (where multiple customer accounts (klmn, ryan and mall) transactions were done by same travel agent (greg) for the same travel account id (opqr).

05/24/1999 klmn greg 8754 opqr
11/30/2000 ryan greg 12548 opqr
06/25/2011 mall greg 1487 opqr

At the same time, molly (travel agent) had multiple customer account ids (john and jim) has single travel account id (matt). But in other records (unique record meaning single customer id and travel account id) need to be deleted and at the same time dave (travel agent) has same customer id as well as travel id (which I don't want).

So, in my first step (proc sort) I sorted and deleted unique records, but I am unable to accomplish the second criteria to delete these records in my data (below are the records)

1/15/2018 wxyz dave 1000 abcd
2/14/2017 wxyz dave 2000 abcd

thanks

Tom · Posted 01-16-2020 11:47 PM

Sounds like you want this query:

select * from have
  group by travel_agent , travel_account_id
  having  count(distinct cust_account_id) > 1
;

If you wanted to do it with data step then first sort, for each group count how many customers and the based on number of customers output the data or not.

proc sort data=have ;
  by travel_agent travel_account_id cust_account_id ;
run;

data want ;
do until (last.travel_account_id);
  set have;
  by travel_agent travel_account_id cust_account_id ;
  num_cust = sum(num_cust,first.cust_account_id);
end;
do until (last.travel_account_id);
  set have;
  by travel_agent travel_account_id cust_account_id ;
  if num_cust > 1 then output;
end;
run;

PGStats · Posted 01-16-2020 11:32 PM

Here is one way to do this (if I understood your question):

proc sort data=have; by travel_agent cust_account_id; run;

data want;
do until(last.travel_agent);
	set have; by travel_agent cust_account_id;
	if first.cust_account_id and not first.travel_agent then keep=1;
	end;
do until(last.travel_agent);
	set have; by travel_agent cust_account_id;
	if keep then output;
	end;
drop keep;
run;

PG

mkeintz · Posted 01-16-2020 11:48 PM

Let's see if I understand this correctly

You want to eliminate any record that represents the only record for a given travel_account_id. So drop the
8/17/1998 mnop jason 8521 defg
record.
But what is the rule you are using to drop these two records?
1/15/2018 wxyz dave 1000 abcd
2/14/2017 wxyz dave 2000 abcd
Is it because travel_account_id "abcd" has only one customer (2 records for cust_account_id "wxyz")?
Or does this have to somehow involve the travel_agent ("dave") as well?

If I an correct with #1 and #2, and #3 is irrelevant (i.e. no need to consider travel_agent), then there is only one rule: Drop all records in which a given travel_account_id has only one cust_account_id (i.e. rule 2 includes rule 1). If this is the case, then:

proc sort data=have out=need;
  by travel_account_id ;
run;
data want2 (drop=more_than_one_cust);
  set need (in=firstpass)   need (in=secondpass);
  by travel_account_id;
  retain more_than_one_cust;
  if firstpass then do;
    if cust_account_id^=lag(cust_account_id) then more_than_one_cust='Y';
    if first.travel_account_id then more_than_one_cust='N';
  end;
  if secondpass and more_than_one_cust='Y';
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

buddha_d · Posted 01-18-2020 05:28 AM

Thank you all for your replies.

using multiple first. and last.

Re: using multiple first. and last.

Re: using multiple first. and last.

Re: using multiple first. and last.

Re: using multiple first. and last.

Re: using multiple first. and last.

Re: using multiple first. and last.

Re: using multiple first. and last.

Classroom Training Available!