I have a dataset that contains a large number of customer IDs, a previous customer ID and a variable that represents the first acquisition date. Occasionally the customer ID is changed and the value of previous customer ID is updated accordingly. This can happen multiple times and unfortunately, each time this happens the first acquisition date is updated to the date of the ID change and is not carried over.
Current Dataset Example:
Customer_ID Previous_Customer_ID First_Acquisition_Date
A103420830 01JAN2014
B100453820 A103420830 01JAN2015
C293099201 B100453820 01JAN2016
I need to retain the true first date of acquisition against these ID changes but I am having some difficulties dealing with more than one ID change. Any help would be appreciated.
Hi @Baljit
Could you please specify what should be the expected output?
Would something like this be convenient?
data want;
set have;
retain First_Acquisition_Date2;
format First_Acquisition_Date2 date9.;
if missing(Previous_Customer_ID) then First_Acquisition_Date2=First_Acquisition_Date;
run;
Best,
Hi, The expected output is as follows:
Customer_ID Previous_Customer_ID First_Acquisition_Date TRUE_ACQUISITION_DATE
A103420830 01JAN2014 01JAN2014
B100453820 A103420830 01JAN2015 01JAN2014
C293099201 B100453820 01JAN2016 01JAN2014
Hi @ed_sas_member,
The expected output would have an additional column called True_Acquisition_Date and for the example above, all values would be 01JAN2014.
Cheers
The data represent links in a transactional path. You can store links data in a HASH object and traverse the HASH through the links to the head of the path, which would have the sole first date.
Example:
data have; length Customer_ID Previous_Customer_ID $10 First_Acquisition_Date 8; input Customer_ID Previous_Customer_ID First_Acquisition_Date: date9.; format First_Acquisition_Date date9.; datalines; A103420830 . 01JAN2014 B100453820 A103420830 01JAN2015 C293099201 B100453820 01JAN2016 ; data want; if _n_ = 0 then set have; if _n_ = 1 then do; declare hash paths(dataset:'have'); paths.defineKey('Customer_ID'); paths.defineData('Previous_Customer_ID', 'First_Acquisition_Date'); paths.defineDone(); end; set have; _H1 = First_Acquisition_Date; _H2 = Previous_Customer_ID; do _index = 1 to 1000 until (paths.find(key:Previous_Customer_ID) ne 0); end; if _index = > 1000 then do; put 'ERROR: Excessive path length or data loop' Customer_ID=; True_Acquisition_Date = .; end; else do; True_Acquisition_Date = First_Acquisition_Date; end; steps = _index - 1; First_Acquisition_Date = _H1; Previous_Customer_ID = _H2; attrib True_Acquisition_Date format=date9.; drop _:; run;
Output:
Hi @Baljit
In this case, you need to use a HASH object.
Please have a look at the following thread, which is exactly the same issue:
Best,
Here is the application of the method described in the thread (link above) to your use case:
/******************************************************/
/** MACRO (source: cf. link above) **/
/******************************************************/
%macro SubGraphs(arcs,from=from,to=to,out=Clusters,exp=8);
data _null_;
if 0 then set &arcs(keep=&from rename=(&from=node)); /* get node data type */
length clust 8;
declare hash nodes(hashexp:&exp);
nodes.defineKey('node');
nodes.defineData('node', 'clust');
nodes.defineDone();
declare hiter nodeList('nodes');
do newClust = 1 by 1 while(not endLoop);
set &arcs end=endLoop;
call missing(clust); node = &from;
if 0^=nodes.find() then nodes.add();
fromClust = clust;
call missing(clust); node = &to;
if 0^=nodes.find() then nodes.add();
toClust = clust;
if n(fromClust, toClust) = 0 then do;
nodes.replace(key:&from, data:&from, data:newClust);
nodes.replace(key:&to, data:&to, data:newClust);
end;
else if missing(toClust) then
nodes.replace(key:&to, data:&to, data:fromClust);
else if missing(fromClust) then
nodes.replace(key:&from, data:&from, data:toClust);
else if fromClust ne toClust then do;
rc = nodeList.first();
do while (rc = 0);
if clust = fromClust then
nodes.replace(key:node, data:node, data:toClust);
rc = nodeList.next();
end;
end;
end;
nodes.output(dataset:"&out");
stop;
run;
%mend SubGraphs;
/******************************************************/
data have;
length Customer_ID Previous_Customer_ID $10 First_Acquisition_Date 8;
input Customer_ID Previous_Customer_ID First_Acquisition_Date: date9.;
format First_Acquisition_Date date9.;
datalines;
A103420830 . 01JAN2014
F293099201 E100453820 01JAN2019
B100453820 A103420830 01JAN2015
C293099201 B100453820 01JAN2016
D103420830 . 01JAN2017
E100453820 D103420830 01JAN2018
;
data have2;
set have;
if missing(Previous_Customer_ID) then Previous_Customer_ID=Customer_ID;
_numobs+1;
run;
%SubGraphs(have2,from=Customer_ID,to=Previous_Customer_ID,out=ref,exp=8);
proc sql;
create table have3 as
select a.*, b.clust
from have2 as a left join ref as b
on a.Customer_ID = b.node
order by clust, First_Acquisition_Date;
run;
data have4;
set have3;
by clust;
retain First_Acquisition_Date2;
format First_Acquisition_Date2 date9.;
if first.clust then do;
First_Acquisition_Date2 = First_Acquisition_Date;
call missing (Previous_Customer_ID);
end;
run;
proc sort data=have4 out=want (drop=_: clust);
by _numobs;
run;
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.