Fluorite | Level 6

## Proc Sort and First.variable - uncertain how to eliminate unnecessary rows

Hello Everyone:

Having issues with developing logic around rewarding customers who open new deposit accounts on the same day vs different days. Only one reward is provided per newly opened account.

Scenario........

Have:

If customer A and customer B open a jointly held deposit account on Day 1 and
customer B and customer C open a joint account on Day 1 and
customer A and customer C open a joint account on Day 1 then reward customers A, B, C    (3 rewards paid out)

If customer D and customer E open a joint account: reward either customer D or E    (1 reward paid out)

If customer I and customer J and customer K open a joint account: reward either customer I or J or K    (1 reward paid out)

However;
If customer F and customer G open a jointly held deposit account on Day 1 and
customer G and customer H open a joint account on Day 2 and
customer F and customer H open a joint account on Day 3 then reward customers F or G.    (1 reward paid out)

(spaces added to datalines below to help clarify above groups)

data have;
infile datalines truncover;
input customer_no \$   acctno \$   open_date;

datalines;
A 999 20221201
B 999 20221201
B 888 20221201
C 888 20221201
A 777 20221201
C 777 20221201

D 666 20221201
E 666 20221201

F 555 20221205
G 555 20221205
G 444 20221206
H 444 20221206
F 333 20221207
H 333 20221207

I 222 20221209
J 222 20221209
K 222 20221209
;

proc print data=have;
title 'have';
run;

proc sort data=have     out=temp2;
by customer_no     open_date;
run;
proc print data=temp2(obs=40);
title 'temp2';

run;

data temp3;
set temp2;
by customer_no open_date;

if first.customer_no;

run;
proc print data=temp3(obs=40);
title 'temp3';
run;

Want:

A 999 20221201
B 888 20221201
C 777 20221201

D 666 20221201

F 555 20221205

I 222 20221209

So in the proc print table for dataset 'temp3'; below customers A,B,C,D,F, I are fine. However, I do not want customers E,G,H,J,K. I'm stuck at this point and at a loss how to eliminate those records.

```data have;
infile datalines truncover;
input customer_no \$  acctno \$ open_date;

datalines;
A 999 20221201
B 999 20221201
B 888 20221201
C 888 20221201
A 777 20221201
C 777 20221201
D 666 20221201
E 666 20221201
F 555 20221205
G 555 20221205
G 444 20221206
H 444 20221206
F 333 20221207
H 333 20221207
I 222 20221209
J 222 20221209
K 222 20221209
;

proc print data=have;
title 'have';
run;

proc sort data=have  out=temp2;
by customer_no    open_date;
run;
proc print data=temp2(obs=40);
title 'temp2';

run;

data temp3;
set temp2;
by customer_no   open_date;

if first.customer_no;

run;
proc print data=temp3(obs=40);
title 'temp3';
run;

/* WANT
A 999 20221201
B 888 20221201
C 777 20221201

D 666 20221201

F 555 20221205

I 222 20221209
*/```

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: Proc Sort and First.variable - uncertain how to eliminate unnecessary rows

I think I would approach this with a hash table. I don't have time to try coding it, but I think would approach like:

1. Sort the data by Open_Date Account Customer
2. Read in the data in a DATA step (probably DOW loop) set by Open_Date Account Customer
3. Create a hash table that will hold the Customer IDs of all customers that are co-owners of an account that has triggered a reward.  The hashtable will have Customer and OpenDate.  The hash table starts empty.
4. Read in the customers for the first account by-group.
5. Check each customer for the account, to see if it exists in the hash table, with an open date in the hash table that is earlier that the current account's open date.
6. After reading through all the customers for the account, if none of them were already in the hash table with an earlier open date:
1. This account triggers a reward
2. Add all the customers for this account to the hash table (with the open date for this account)

I think that might do it.

Edit: I don't think the above handles a case where A opens two accounts on the same day.  It would give them double rewards.  Perhaps that could be de-duped in a separate step.  What happens if A opens an account on day 1 and  also on day 1 A and B open an account.  Does B get a reward?  Above algorithm would say this account generates an award.  But would need another step to determine that the award goes to B.  Perhaps the above step would be useful for generating all the accounts that trigger rewards.  Then a separate step could be used to dedup and determine who gets the reward from each account.  This is tricky. : )

BASUG is hosting free webinars ! Check out our recordings of past webinars: https://www.basug.org/videos. Be sure to subscribe to our email list for notification of future BASUG events.
12 REPLIES 12
Super User

## Re: Proc Sort and First.variable - uncertain how to eliminate unnecessary rows

Hi,

I don't understand the last group:

If customer F and customer G open a jointly held deposit account on Day 1 and

customer G and customer H open a joint account on Day 2 and
customer F and customer H open a joint account on Day 3 then reward customers F or G.    (1 reward paid out)

So if these three accounts were all opened on the same day, you would pay out 3 rewards (which is the first group).  But when they are opened on different days, you only pay 1 reward?  What happens if there is a gap in dates, e.g. F and G open a join account on Jan 1, then G and H open a join account on Jun 1?

I can understand a rule "each account only triggers one reward, and no customer gets two rewards."  But the date part is hurting my head.

BASUG is hosting free webinars ! Check out our recordings of past webinars: https://www.basug.org/videos. Be sure to subscribe to our email list for notification of future BASUG events.
Fluorite | Level 6

## Re: Proc Sort and First.variable - uncertain how to eliminate unnecessary rows

Yes this logic hurts my head too. In any respect; the campaign involves (new) FIRST deposit accounts for the customers. If the account is opened on Jan 1st by customers F & G (their first new account) then any subsequent account openings on later days or weeks or months are disqualified. Only if multiple accounts are opened on the SAME day do they (accounts) qualify. In your example above; the account opened by G and H on Jun 1 is disqualified because it is not their first newly opened account (their first new account was opened previously on Jan 1).
Opal | Level 21

## Re: Proc Sort and First.variable - uncertain how to eliminate unnecessary rows

@sasasauraus wrote:
Yes this logic hurts my head too. In any respect; the campaign involves (new) FIRST deposit accounts for the customers. If the account is opened on Jan 1st by customers F & G (their first new account) then any subsequent account openings on later days or weeks or months are disqualified. Only if multiple accounts are opened on the SAME day do they (accounts) qualify. In your example above; the account opened by G and H on Jun 1 is disqualified because it is not their first newly opened account (their first new account was opened previously on Jan 1).

What if for a joint account customer A already got an earlier account opening but customer B is new? Does that disqualify the account?

Your WANT data includes the customer. For a joint account that qualifies: Any logic which customer gets the reward?

If a customer opens two accounts on the same day (FIRST deposit) you state that both accounts qualify: Does this mean the customer gets two rewards?

Fluorite | Level 6

## Re: Proc Sort and First.variable - uncertain how to eliminate unnecessary rows

What if for a joint account customer A already got an earlier account opening but customer B is new? Does that disqualify the account?  Correct.  The latter account no longer qualifies as it is now the 2nd opened account.

Your WANT data includes the customer. For a joint account that qualifies: Any logic which customer gets the reward?

Only one payout per opened account will qualify regardless of how many joint holders are on that particular account.  Does not matter who gets the reward per account.  Customers should not be rewarded multiple times.

If a customer opens two accounts on the same day (FIRST deposit) you state that both accounts qualify: Does this mean the customer gets two rewards?  No; this is considered a single held account not joint.  If A  opens acct #111 and A  opens acct# 222 on the same day then A is rewarded once.

Fluorite | Level 6

## Re: Proc Sort and First.variable - uncertain how to eliminate unnecessary rows

Adding notes to help clarify matters further:

/* WANT */

A 999 20221201
B 888 20221201
C 777 20221201

D or E          666 20221201

F or G          555 20221205

I or J or K    222 20221209

Notes as per above WANT:
Acct#   555   (F or G qualifies as this is their 1st account - but only 1 customer will be rewarded)

Acct#   666  (D or E qualifies as this is their 1st account - but only 1 customer will be rewarded)

Acct#   222  (I or J or K qualifies as this is their 1st account - but only 1 customer will be rewarded)

*****************************************************

F 555 20221205
G 555 20221205
G 444 20221206   (disqualified as 1st account opened 2022-12-05)
H 444 20221206   (disqualified as joint customer G's 1st account opened 2022-12-05)
F 333 20221207   (disqualified as 1st account opened 2022-12-05)
H 333 20221207  (disqualified as joint customer F's 1st account opened 2022-12-05)

SAS Super FREQ

## Re: Proc Sort and First.variable - uncertain how to eliminate unnecessary rows

Hi:
These "rules" seem to penalize person H for even having a joint account. It seems like if H had set up a single account on 12/6 and ditched G, they would have gotten the award. These rules seem erratic.
Cynthia
Super User

## Re: Proc Sort and First.variable - uncertain how to eliminate unnecessary rows

I'm still struggling with defining the rule.  Is it:

A new account triggers a reward (for one of the account owners) unless one of the account owners for the new account is also an owner of an account the previously triggered a reward.  (Previously means an account opened on an earlier date).

?

BASUG is hosting free webinars ! Check out our recordings of past webinars: https://www.basug.org/videos. Be sure to subscribe to our email list for notification of future BASUG events.
Fluorite | Level 6

## Re: Proc Sort and First.variable - uncertain how to eliminate unnecessary rows

Yes, that is correct.  Essentially; if customer(s) open any subsequent account(s) any time afterwards, then these records can be tossed out.  Only the very first open accounts are in consideration for reward.  My earlier example:

Customer  Account  Open_Date

------------    ----------   --------------

F               555         20221205 (F or G qualifies as this is their 1st account - reward F or G but not both)
G              555         20221205 (F or G qualifies as this is their 1st account - reward F or G but not both)

G              444         20221206 (does not qualify as G opened a previous acct on 2022105)
H              444         20221206 (does not qualify as joint owner G opened a previous acct on 2022105)
F              333         20221207 (does not qualify as F opened a previous acct on 2022105)
H             333         20221207 (does not qualify as H opened a previous acct on 2022106 with G who opened a previous account on 20221205)

Super User

## Re: Proc Sort and First.variable - uncertain how to eliminate unnecessary rows

I think I would approach this with a hash table. I don't have time to try coding it, but I think would approach like:

1. Sort the data by Open_Date Account Customer
2. Read in the data in a DATA step (probably DOW loop) set by Open_Date Account Customer
3. Create a hash table that will hold the Customer IDs of all customers that are co-owners of an account that has triggered a reward.  The hashtable will have Customer and OpenDate.  The hash table starts empty.
4. Read in the customers for the first account by-group.
5. Check each customer for the account, to see if it exists in the hash table, with an open date in the hash table that is earlier that the current account's open date.
6. After reading through all the customers for the account, if none of them were already in the hash table with an earlier open date:
1. This account triggers a reward
2. Add all the customers for this account to the hash table (with the open date for this account)

I think that might do it.

Edit: I don't think the above handles a case where A opens two accounts on the same day.  It would give them double rewards.  Perhaps that could be de-duped in a separate step.  What happens if A opens an account on day 1 and  also on day 1 A and B open an account.  Does B get a reward?  Above algorithm would say this account generates an award.  But would need another step to determine that the award goes to B.  Perhaps the above step would be useful for generating all the accounts that trigger rewards.  Then a separate step could be used to dedup and determine who gets the reward from each account.  This is tricky. : )

BASUG is hosting free webinars ! Check out our recordings of past webinars: https://www.basug.org/videos. Be sure to subscribe to our email list for notification of future BASUG events.
Fluorite | Level 6

## Re: Proc Sort and First.variable - uncertain how to eliminate unnecessary rows

Thanks Quentin.  I do not have familiarity with hash tables as I am a novice SAS user.  Nevertheless; I will give your advice a shot.  Also; to answer your question....yes, B would also get a reward as we have two accounts opened on the same day.  One reward per account opened on the same day but do not reward the same customer twice.

Fluorite | Level 6

## Re: Proc Sort and First.variable - uncertain how to eliminate unnecessary rows

Thank you everyone who laid eyes on this dilemma of mine. Special thanks to Quentin (the Proc Star) who provided a segway into a solution. Also; I need to praise a co-worker who took Quentin's option and ran with it. With intestinal fortitude and immense tenacity she managed to pull together Hash table code worth its weight in gold; thus saving my bacon. Just need to figure how to upload the solution....please stay tuned.
Fluorite | Level 6

## Re: Proc Sort and First.variable - uncertain how to eliminate unnecessary rows

%macro printme(in1);
proc print data=&in1(obs=50);
title "%upcase(&in1)";
run;
proc contents data=&in1;
title "%upcase(&in1)";
run;

%mend printme;

data have;
infile datalines truncover;
input customer_no \$ acctno \$ open_date;

datalines;
A 999 20221201
B 999 20221201
B 888 20221201
C 888 20221201
A 777 20221201
C 777 20221201
D 666 20221201
E 666 20221201
F 555 20221205
G 555 20221205
G 444 20221206
H 444 20221206
F 333 20221207
H 333 20221207
I 222 20221209
J 222 20221209
K 222 20221209
;

proc sql;
create table tble1_1 as
select distinct
h1.open_date
, h1.acctno
, h1.customer_no
, h2.open_date as h2_open_date

from have h1
left join
have h2
on h1.customer_no = h2.customer_no

order by h1.open_date, h2.open_date desc, h1.acctno, h1.customer_no
;
quit;
%printme(tble1_1);

proc sql;
create table tble1_base as
select
open_date
, acctno
, customer_no
, min(h2_open_date) As cust_acct_min_open_dt

from tble1_1

group by
open_date
, acctno
, customer_no

order by open_date, acctno, customer_no desc
;

quit;
%printme(tble1_base);

proc sql;
create table tble1_acctno as
select
acctno
, min(h2_open_date) As acct_min_open_dt

from tble1_1

group by acctno
;

quit;
%printme(tble1_acctno);

proc sql;
create table tble1_customer as
select
customer_no
, min(open_date) As cust_min_open_dt

from tble1_1
group by customer_no
;

quit;
%printme(tble1_customer);

proc sql;
create table tble1_2 As
select
base.open_date
, base.acctno
, base.customer_no
/* , base.cust_acct_min_open_dt*/
, acct.acct_min_open_dt
, cust.cust_min_open_dt

from tble1_base base

inner join tble1_acctno acct
on base.acctno = acct.acctno

inner join tble1_customer cust
on base.customer_no = cust.customer_no
;

quit;
%printme(tble1_2);

data _null_;
if _n_ = 1 then
do;
if 0 then set tble1_2; /* Add variables from the view */

/* to the PDV for WORK.NEW */
declare hash ht(ordered: 'a');
ht.definekey ('open_date','acctno');
ht.definedata('open_date','acctno','customer_no','cust_min_open_dt','acct_min_open_dt','ind_candidate');
ht.definedone();

declare hash custno();
custno.definekey('customer_no');
custno.definedone();

end;

do while(not last);
set tble1_2 end=last;

open_date = open_date;
acctno = acctno;
/* customer_no = customer_no;*/

if (open_date = cust_min_open_dt) and (open_date = acct_min_open_dt) then
do;
if ht.find() NE 0
and custno.find() NE 0 then
do;
ind_candidate=1;

end;
end;

if last then
do;
ht.output(dataset:'want'); /* write data using HashSort */
custno.output(dataset:'clients');
end;
end;
run;
Discussion stats
• 12 replies
• 1046 views
• 3 likes
• 4 in conversation