Retaining Dates Against Transactions

Baljit · Posted 05-17-2020 06:49 AM

I have a dataset that contains a large number of customer IDs, a previous customer ID and a variable that represents the first acquisition date. Occasionally the customer ID is changed and the value of previous customer ID is updated accordingly. This can happen multiple times and unfortunately, each time this happens the first acquisition date is updated to the date of the ID change and is not carried over.

Current Dataset Example:

Customer_ID Previous_Customer_ID First_Acquisition_Date

A103420830 01JAN2014

B100453820 A103420830 01JAN2015

C293099201 B100453820 01JAN2016

I need to retain the true first date of acquisition against these ID changes but I am having some difficulties dealing with more than one ID change. Any help would be appreciated.

ed_sas_member · Posted 05-17-2020 07:23 AM

Hi @Baljit

Could you please specify what should be the expected output?

Would something like this be convenient?

data want;
	set have;
	retain First_Acquisition_Date2;
	format First_Acquisition_Date2 date9.;
	if missing(Previous_Customer_ID) then First_Acquisition_Date2=First_Acquisition_Date;
run;

Best,

Baljit · Posted 05-17-2020 07:30 AM

Hi, The expected output is as follows:

Customer_ID Previous_Customer_ID First_Acquisition_Date TRUE_ACQUISITION_DATE

A103420830 01JAN2014 01JAN2014

B100453820 A103420830 01JAN2015 01JAN2014

C293099201 B100453820 01JAN2016 01JAN2014

Baljit · Posted 05-17-2020 07:34 AM

Hi @ed_sas_member,

The expected output would have an additional column called True_Acquisition_Date and for the example above, all values would be 01JAN2014.

Cheers

Baljit · Posted 05-17-2020 07:44 AM

That would be great and it is something I have tried, but in reality the data doesn't come in order of customer and previous customer id. Do you know how I could order it like this?

RichardDeVen · Posted 05-17-2020 08:12 AM

The data represent links in a transactional path. You can store links data in a HASH object and traverse the HASH through the links to the head of the path, which would have the sole first date.

Example:

data have;
length Customer_ID  Previous_Customer_ID $10 First_Acquisition_Date 8;
input 
Customer_ID Previous_Customer_ID First_Acquisition_Date: date9.;
format First_Acquisition_Date date9.;
datalines;
A103420830  .            01JAN2014
B100453820  A103420830   01JAN2015
C293099201  B100453820   01JAN2016
;

data want;
  if _n_ = 0 then set have;

  if _n_ = 1 then do;
    declare hash paths(dataset:'have');
    paths.defineKey('Customer_ID');
    paths.defineData('Previous_Customer_ID', 'First_Acquisition_Date');
    paths.defineDone();
  end;

  set have;

  _H1 = First_Acquisition_Date;
  _H2 = Previous_Customer_ID;

  do _index = 1 to 1000 until (paths.find(key:Previous_Customer_ID) ne 0);
  end;

  if _index = > 1000 then do;
    put 'ERROR: Excessive path length or data loop' Customer_ID=;
    True_Acquisition_Date = .;
  end;
  else do;
    True_Acquisition_Date = First_Acquisition_Date;
  end;

  steps = _index - 1;
  First_Acquisition_Date = _H1;
  Previous_Customer_ID = _H2;

  attrib True_Acquisition_Date format=date9.;

  drop _:;
run;

Output:

ed_sas_member · Posted 05-17-2020 08:23 AM

Hi @Baljit

In this case, you need to use a HASH object.

Please have a look at the following thread, which is exactly the same issue:

https://communities.sas.com/t5/SAS-Programming/Specifying-the-observation-as-in-same-group/m-p/64774...

Best,

ed_sas_member · Posted 05-17-2020 09:54 AM

Here is the application of the method described in the thread (link above) to your use case:

/******************************************************/
/** MACRO (source: cf. link above)					 **/
/******************************************************/

%macro SubGraphs(arcs,from=from,to=to,out=Clusters,exp=8);
	data _null_;
		if 0 then set &arcs(keep=&from rename=(&from=node)); /* get node data type */
		length clust 8;
		declare hash nodes(hashexp:&exp);
		nodes.defineKey('node');
		nodes.defineData('node', 'clust');
		nodes.defineDone();
		declare hiter nodeList('nodes');

		do newClust = 1 by 1 while(not endLoop);
			set &arcs end=endLoop;
			call missing(clust); node = &from;
			if 0^=nodes.find() then nodes.add(); 
			fromClust = clust; 
			call missing(clust); node = &to;
			if 0^=nodes.find() then nodes.add(); 
			toClust = clust;
			if n(fromClust, toClust) = 0 then do;
				nodes.replace(key:&from, data:&from, data:newClust);
				nodes.replace(key:&to, data:&to, data:newClust);
			end;
			else if missing(toClust) then 
				nodes.replace(key:&to, data:&to, data:fromClust);
			else if missing(fromClust) then 
				nodes.replace(key:&from, data:&from, data:toClust);
			else if fromClust ne toClust then do;
				rc = nodeList.first();
				do while (rc = 0);
					if clust = fromClust then 
						nodes.replace(key:node, data:node, data:toClust);
					rc = nodeList.next();
				end;
			end;
		end;
		nodes.output(dataset:"&out");
		stop;
	run;
%mend SubGraphs;

/******************************************************/

data have;
	length Customer_ID  Previous_Customer_ID $10 First_Acquisition_Date 8;
	input Customer_ID Previous_Customer_ID First_Acquisition_Date: date9.;
	format First_Acquisition_Date date9.;
	datalines;
A103420830  .            01JAN2014
F293099201  E100453820   01JAN2019
B100453820  A103420830   01JAN2015
C293099201  B100453820   01JAN2016
D103420830  .            01JAN2017
E100453820  D103420830   01JAN2018
;

data have2;
	set have;
	if missing(Previous_Customer_ID) then Previous_Customer_ID=Customer_ID;
	_numobs+1;
run;

%SubGraphs(have2,from=Customer_ID,to=Previous_Customer_ID,out=ref,exp=8);

proc sql;
	create table have3 as
	select a.*, b.clust
	from have2 as a left join ref as b
	on a.Customer_ID = b.node
	order by clust, First_Acquisition_Date;
run;

data have4;
	set have3;
	by clust;
	retain First_Acquisition_Date2;
	format First_Acquisition_Date2 date9.;
	if first.clust then do;
		First_Acquisition_Date2 = First_Acquisition_Date;
		call missing (Previous_Customer_ID);
	end;
run;

proc sort data=have4 out=want (drop=_: clust);
	by _numobs;
run;

Ksharp · Posted 05-18-2020 08:00 AM

Can you post more data and the OUTPUT you are looking for ?

Retaining Dates Against Transactions

Re: Retaining Dates Against Transactions

Re: Retaining Dates Against Transactions

Re: Retaining Dates Against Transactions

Re: Retaining Dates Against Transactions

Re: Retaining Dates Against Transactions

Re: Retaining Dates Against Transactions

Re: Retaining Dates Against Transactions

Re: Retaining Dates Against Transactions

Registration is open

SAS Training: Just a Click Away