Solved: Re: Loop between two columns

Amr_Alaaeldin · Posted 09-19-2018 03:37 AM

Hello

I need your appreciated assistance in the below case

I have a data set includes three columns Customer ID , Existing Account and New Account , the existing account could be replaced with the new account and the new account will open a new record to be a existing account , and the account may be replaced many times . Like below

Data have ; 
input Cust_ID $3. Exist_Account $3. New_Account $4.; 
datalines;
003 CA CB
002 BB BC
002 BC 
001 AA AB
003 DC DD
001 AC AD
001 AD AE
003 CB CC
003 DD 
003 DB DC
003 DA DB
002 BA BB
001 AE 
001 AB AC
003 CC 
;
Run ;

for example : Existing Account "AA" replaced by new account "AB" and "AB" opened a new record to be existing account then "AB" Replaced by "AC" and "AC" opened a new record to be existing account then "AC" Replaced by "AD" and "AD" opened a new record to be existing account then "AD" Replaced by "AE" and "AE" opened a new record to be existing account .

Cust_ID	Exist_Account	New_Account
003	CA	CB
002	BB	BC
002	BC
001	AA	AB
003	DC	DD
001	AC	AD
001	AD	AE
003	CB	CC
003	DD
003	DB	DC
003	DA	DB
002	BA	BB
001	AE
001	AB	AC
003	CC

the results should be like the below:

Cust_ID	First_Account	Last_Account	Replacement_Count
001	AA	AE	4
002	BA	BC	2
003	CA	CC	2
003	DA	DD	3

Thank you ...

Ksharp · Posted 09-20-2018 08:43 AM

500,000 records is small dataset as my opinion .

You must run into a dead loop like :

003 CA CB
002 CB CA

Try this one : (Edited)


proc sql;
create table ancentor as
select * from have
 where exist_account not in 
 (select new_account from have)
 order by 1;
quit;
data want;
 if _n_=1 then do;
  declare hash h(dataset:'have(where=(exist_account is not missing and 
  new_account is not missing))');
  h.definekey('exist_account');
  h.definedata('new_account');
  h.definedone();
 
  length kk $ 80;
  declare hash k();
  k.definekey('kk');
  k.definedone();
 end;
set ancentor;
fist=exist_account;
count=0;
 do until(rc ne 0 or rx = 0);
  if not missing(exist_account) then do;kk=exist_account; k.replace();end;
  call missing(new_account);
  rc=h.find();
  rx=k.check(key:new_account);
  if rc=0 then do;exist_account=new_account;if rx ne 0 then do;count+1;last=new_account;end;end;
 end;
k.clear();
drop exist_account new_account rc kk rx;
run;

View solution in original post

RW9 · Posted 09-19-2018 04:04 AM

I wasn't sure about the extra split in 003, but this is nearly there, you can add the other restriction, maybe by adding another grouping variable to the data.

data have; 
  input cust_id $3. exist_account $3. new_account $4.; 
datalines;
003 CA CB
002 BB BC
002 BC 
001 AA AB
003 DC DD
001 AC AD
001 AD AE
003 CB CC
003 DD 
003 DB DC
003 DA DB
002 BA BB
001 AE 
001 AB AC
003 CC 
;
run;

proc sort data=have out=want;
  by cust_id exist_account;
run;

data want;
  set want;
  retain replacement_count first_account;
  by cust_id;
  if first.cust_id then do;
    first_account=exist_account;
    if new_account ne "" then replacement_count=sum(replacement_count,1);
  end;
  else if last.cust_id then do;
    last_account=exist_account;
    output;
  end;
  else if new_account ne "" then replacement_count=sum(replacement_count,1);
run;

Amr_Alaaeldin · Posted 09-19-2018 04:31 AM

Thank you .. But still have an issues with Cust_id "003" because this customer has 2 different accounts and every one replaced many times ..
so i am expecting to don't use the customer id .. and use only the existing account and new account .
Thanks a lot

RW9 · Posted 09-19-2018 04:34 AM

Please re-read what I posted:

"

I wasn't sure about the extra split in 003, but this is nearly there, you can add the other restriction, maybe by adding another grouping variable to the data.

"

I do not know about the split in cust 3 - as I suggested you should add some information into the dataset to indicate a new account, maybe something like:
cust_id account exist_account ...

003 1 ...

003 2 ...

Then where I use just by cust_id, you would add in by cust_id account, and do first/last from that.

Ksharp · Posted 09-19-2018 09:18 AM

Are you sure it is one to one match ?

data have; 
  input cust_id $3. exist_account $3. new_account $4.; 
datalines;
003 CA CB
002 BB BC
002 BC 
001 AA AB
003 DC DD
001 AC AD
001 AD AE
003 CB CC
003 DD 
003 DB DC
003 DA DB
002 BA BB
001 AE 
001 AB AC
003 CC 
;
run;
proc sql;
create table ancentor as
select * from have
 where exist_account not in 
 (select new_account from have)
 order by 1;
quit;
data want;
 if _n_=1 then do;
  declare hash h(dataset:'have(where=(exist_account is not missing and 
  new_account is not missing))');
  h.definekey('exist_account');
  h.definedata('new_account');
  h.definedone();
 end;
set ancentor;
fist=exist_account;
count=0;
 do until(rc ne 0);
  call missing(new_account);
  rc=h.find();
  if rc=0 then do;exist_account=new_account;count+1;last=new_account;end;
 end;
drop exist_account new_account rc;
run;
proc print;run;

Amr_Alaaeldin · Posted 09-20-2018 06:08 AM

Thanks .. It works .. but the the performance is very slow because I performed it on big dataset include 500,000 records .. and I started to run it yesterday and it still running till now .

Ksharp · Posted 09-20-2018 08:43 AM

500,000 records is small dataset as my opinion .

You must run into a dead loop like :

003 CA CB
002 CB CA

Try this one : (Edited)


proc sql;
create table ancentor as
select * from have
 where exist_account not in 
 (select new_account from have)
 order by 1;
quit;
data want;
 if _n_=1 then do;
  declare hash h(dataset:'have(where=(exist_account is not missing and 
  new_account is not missing))');
  h.definekey('exist_account');
  h.definedata('new_account');
  h.definedone();
 
  length kk $ 80;
  declare hash k();
  k.definekey('kk');
  k.definedone();
 end;
set ancentor;
fist=exist_account;
count=0;
 do until(rc ne 0 or rx = 0);
  if not missing(exist_account) then do;kk=exist_account; k.replace();end;
  call missing(new_account);
  rc=h.find();
  rx=k.check(key:new_account);
  if rc=0 then do;exist_account=new_account;if rx ne 0 then do;count+1;last=new_account;end;end;
 end;
k.clear();
drop exist_account new_account rc kk rx;
run;

Amr_Alaaeldin · Posted 09-20-2018 01:03 PM

Thanks you Ksharp very much appreciated.. Perfect

novinosrin · Posted 09-20-2018 04:20 PM

Hi @Amr_Alaaeldin if the proc sort is right as did by RW, I think this is pretty straight forward assuming my understanding is right

data have; 
infile cards truncover;
  input cust_id $3. exist_account $3. new_account $4.; 
datalines;
003 CA CB
002 BB BC
002 BC 
001 AA AB
003 DC DD
001 AC AD
001 AD AE
003 CB CC
003 DD 
003 DB DC
003 DA DB
002 BA BB
001 AE 
001 AB AC
003 CC 
004 AA
;
run;

proc sort data=have out=_have;
  by cust_id exist_account;
run;

data want;
set _have;
by cust_id exist_account;
array t(10) $ _temporary_;/*arbitrary subscript, should assign something large like 9999*/
retain First_Account;
if first.cust_id then do; First_Account=exist_account;Replacement_Count=0;end;
if first.cust_id and missing(new_account) then 
	do;
		last_account=exist_account;
		output;
	end;
else if not missing(new_account) and (new_account not in t) then
	do; 
		if Replacement_Count=0 then First_Account=exist_account;
		Replacement_Count+1;
		t(Replacement_Count)=new_account;
	end;
else if   missing(new_account) then 
	do;
		last_account=exist_account;
		output;
		Replacement_Count=0;
		call missing(of t(*));
	end;
keep cust_id First_Account last_account Replacement_Count;
run;

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away