BookmarkSubscribeRSS Feed
Baljit
Calcite | Level 5

I have a dataset that contains a large number of customer IDs, a previous customer ID and a variable that represents the first acquisition date. Occasionally the customer ID is changed  and the value of previous customer ID is updated accordingly. This can happen multiple times and unfortunately, each time this happens the first acquisition date is updated to the date of the ID change and is not carried over. 

 

Current Dataset Example:

           Customer_ID           Previous_Customer_ID           First_Acquisition_Date

           A103420830                                                             01JAN2014

           B100453820            A103420830                            01JAN2015

           C293099201            B100453820                            01JAN2016

 

I need to retain the true first date of acquisition against these ID changes but I am having some difficulties dealing with more than one ID change. Any help would be appreciated.

8 REPLIES 8
ed_sas_member
Meteorite | Level 14

Hi @Baljit 

Could you please specify what should be the expected output?

Would something like this be convenient?

data want;
	set have;
	retain First_Acquisition_Date2;
	format First_Acquisition_Date2 date9.;
	if missing(Previous_Customer_ID) then First_Acquisition_Date2=First_Acquisition_Date;
run;

Best,

Baljit
Calcite | Level 5

Hi, The expected output is as follows:

 

           Customer_ID           Previous_Customer_ID           First_Acquisition_Date          TRUE_ACQUISITION_DATE

           A103420830                                                             01JAN2014                           01JAN2014

           B100453820            A103420830                            01JAN2015                            01JAN2014

           C293099201            B100453820                            01JAN2016                           01JAN2014

Baljit
Calcite | Level 5

Hi @ed_sas_member,

 

The expected output would have an additional column called True_Acquisition_Date and for the example above, all values would be 01JAN2014.

 

Cheers

Baljit
Calcite | Level 5
That would be great and it is something I have tried, but in reality the data doesn't come in order of customer and previous customer id. Do you know how I could order it like this?
RichardDeVen
Barite | Level 11

The data represent links in a transactional path.  You can store links data in a HASH object and traverse the HASH through the links to the head of the path, which would have the sole first date.  

 

Example:

data have;
length Customer_ID  Previous_Customer_ID $10 First_Acquisition_Date 8;
input 
Customer_ID Previous_Customer_ID First_Acquisition_Date: date9.;
format First_Acquisition_Date date9.;
datalines;
A103420830  .            01JAN2014
B100453820  A103420830   01JAN2015
C293099201  B100453820   01JAN2016
;

data want;
  if _n_ = 0 then set have;

  if _n_ = 1 then do;
    declare hash paths(dataset:'have');
    paths.defineKey('Customer_ID');
    paths.defineData('Previous_Customer_ID', 'First_Acquisition_Date');
    paths.defineDone();
  end;

  set have;

  _H1 = First_Acquisition_Date;
  _H2 = Previous_Customer_ID;

  do _index = 1 to 1000 until (paths.find(key:Previous_Customer_ID) ne 0);
  end;

  if _index = > 1000 then do;
    put 'ERROR: Excessive path length or data loop' Customer_ID=;
    True_Acquisition_Date = .;
  end;
  else do;
    True_Acquisition_Date = First_Acquisition_Date;
  end;

  steps = _index - 1;
  First_Acquisition_Date = _H1;
  Previous_Customer_ID = _H2;

  attrib True_Acquisition_Date format=date9.;

  drop _:;
run;

Output:

paths.png

ed_sas_member
Meteorite | Level 14

Hi @Baljit 

In this case, you need to use a HASH object.

Please have a look at the following thread, which is exactly the same issue:

https://communities.sas.com/t5/SAS-Programming/Specifying-the-observation-as-in-same-group/m-p/64774...

 

Best,

 

ed_sas_member
Meteorite | Level 14

Here is the application of the method described in the thread (link above) to your use case:

/******************************************************/
/** MACRO (source: cf. link above)					 **/
/******************************************************/

%macro SubGraphs(arcs,from=from,to=to,out=Clusters,exp=8);
	data _null_;
		if 0 then set &arcs(keep=&from rename=(&from=node)); /* get node data type */
		length clust 8;
		declare hash nodes(hashexp:&exp);
		nodes.defineKey('node');
		nodes.defineData('node', 'clust');
		nodes.defineDone();
		declare hiter nodeList('nodes');

		do newClust = 1 by 1 while(not endLoop);
			set &arcs end=endLoop;
			call missing(clust); node = &from;
			if 0^=nodes.find() then nodes.add(); 
			fromClust = clust; 
			call missing(clust); node = &to;
			if 0^=nodes.find() then nodes.add(); 
			toClust = clust;
			if n(fromClust, toClust) = 0 then do;
				nodes.replace(key:&from, data:&from, data:newClust);
				nodes.replace(key:&to, data:&to, data:newClust);
			end;
			else if missing(toClust) then 
				nodes.replace(key:&to, data:&to, data:fromClust);
			else if missing(fromClust) then 
				nodes.replace(key:&from, data:&from, data:toClust);
			else if fromClust ne toClust then do;
				rc = nodeList.first();
				do while (rc = 0);
					if clust = fromClust then 
						nodes.replace(key:node, data:node, data:toClust);
					rc = nodeList.next();
				end;
			end;
		end;
		nodes.output(dataset:"&out");
		stop;
	run;
%mend SubGraphs;

/******************************************************/

data have;
	length Customer_ID  Previous_Customer_ID $10 First_Acquisition_Date 8;
	input Customer_ID Previous_Customer_ID First_Acquisition_Date: date9.;
	format First_Acquisition_Date date9.;
	datalines;
A103420830  .            01JAN2014
F293099201  E100453820   01JAN2019
B100453820  A103420830   01JAN2015
C293099201  B100453820   01JAN2016
D103420830  .            01JAN2017
E100453820  D103420830   01JAN2018
;

data have2;
	set have;
	if missing(Previous_Customer_ID) then Previous_Customer_ID=Customer_ID;
	_numobs+1;
run;

%SubGraphs(have2,from=Customer_ID,to=Previous_Customer_ID,out=ref,exp=8);

proc sql;
	create table have3 as
	select a.*, b.clust
	from have2 as a left join ref as b
	on a.Customer_ID = b.node
	order by clust, First_Acquisition_Date;
run;

data have4;
	set have3;
	by clust;
	retain First_Acquisition_Date2;
	format First_Acquisition_Date2 date9.;
	if first.clust then do;
		First_Acquisition_Date2 = First_Acquisition_Date;
		call missing (Previous_Customer_ID);
	end;
run;

proc sort data=have4 out=want (drop=_: clust);
	by _numobs;
run;
Ksharp
Super User
Can you post more data and the OUTPUT you are looking for ?

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 796 views
  • 3 likes
  • 4 in conversation