Help using Base SAS procedures

How to retrieve parent-child in a "chain" of relationships without using reflexing joins?

Accepted Solution Solved
Reply
SAS Employee
Posts: 13
Accepted Solution

How to retrieve parent-child in a "chain" of relationships without using reflexing joins?

Hi all,

I have a dataset with three columns with this kind of information:

 

OBS DATE CLIENT_ID OLD_CLIENT_ID
1 27-Sep-16 10 6
2 27-Sep-16 7 5
3 17-Jun-16 6 4
4 10-Sep-16 9 8
5 5-Mar-16 4 2

 

Where "date" column is the date where some client has made a change in his account id, and the other columns are the respective values at that moment.

As you can see in the example dataset posted, many (and unknown) number of relationships could exist in the data history. For example, the client "10" was before client "6" (obs 1) but before was client "4" (obs 3) and before of that was client "2".

Therefore, what I want is a final dataset with the client_id-old_client_id pairs taking account of what was said before, which will be like this:

 

OBS DATE CLIENT_ID OLD_CLIENT_ID
1 27-Sep-16 10 6
2 17-Jun-16 10 4
3 5-Mar-16 10 2
4 27-Sep-16 7 5
5 10-Sep-16 9 8

 

The dataset has ~ 4.000.000 obs.

There is no other variable to identify the client uniquely.

 

Is there a way to accomplish this without using reflexive joins? As I said, number of relationships are unknown so making reflexive joins could be worst-performing.

 

Thanks in advance!


Accepted Solutions
Solution
‎01-20-2017 08:18 AM
Super User
Posts: 17,750

Re: How to retrieve parent-child in a "chain" of relationships without using reflexing joi

There are a couple of solved questions on here with different HASH solutions to this problem.

 

This is based on a macro written by PGStats. You didn't show what you want as your final results, but this will get you closer, if its not what you want as an end result. Note the last parameter controls how 'deep' the search goes so you do need some idea or set it to an arbitrarily high number.

 

%macro SubGraphs(arcs,from=from,to=to,out=Clusters,exp=8);
data _null_;
if 0 then set &arcs(keep=&from rename=(&from=node)); /* get node data type */
length clust 8;
declare hash nodes(hashexp:&exp);
nodes.defineKey('node');
nodes.defineData('node', 'clust');
nodes.defineDone();
declare hiter nodeList('nodes');

do newClust = 1 by 1 while(not endLoop);
	set &arcs end=endLoop;
	call missing(clust); node = &from;
	if 0^=nodes.find() then nodes.add(); 
	fromClust = clust; 
	call missing(clust); node = &to;
	if 0^=nodes.find() then nodes.add(); 
	toClust = clust;
	if n(fromClust, toClust) = 0 then do;
		nodes.replace(key:&from, data:&from, data:newClust);
		nodes.replace(key:&to, data:&to, data:newClust);
	end;
	else if missing(toClust) then 
		nodes.replace(key:&to, data:&to, data:fromClust);
	else if missing(fromClust) then 
		nodes.replace(key:&from, data:&from, data:toClust);
	else if fromClust ne toClust then do;
		rc = nodeList.first();
		do while (rc = 0);
			if clust = fromClust then 
				nodes.replace(key:node, data:node, data:toClust);
			rc = nodeList.next();
		end;
	end;
end;
nodes.output(dataset:"&out");
stop;
run;
%mend SubGraphs;



data have;
informat obs 8. date date9.;
format date date9.;
input  OBS	DATE	CLIENT_ID	OLD_CLIENT_ID;
cards;
1	27Sep16	10	6
2	17Jun16	10	4
3	5Mar16	10	2
4	27Sep16	7	5
5	10Sep16	9	8
;
run;

%SubGraphs(have, from=old_client_id, to=client_id, out=want,exp=4);

proc sort data=want;
by clust node;
run;

View solution in original post


All Replies
Respected Advisor
Posts: 4,641

Re: How to retrieve parent-child in a "chain" of relationships without using reflexing joi

How is this question different from your previous (solved) question

 

https://communities.sas.com/t5/SAS-Procedures/How-to-retrieve-the-last-value-in-a-quot-chain-quot-of...

PG
SAS Employee
Posts: 13

Re: How to retrieve parent-child in a "chain" of relationships without using reflexing joi

Hi PGStats,

the expected result is not the same, as the output of the solved answer were 3 observations, meanwhile, for this question, the expected output has to got 5 observations.

 

Super User
Posts: 17,750

Re: How to retrieve parent-child in a "chain" of relationships without using reflexing joi


lmignone wrote:

Hi PGStats,

the expected result is not the same, as the output of the solved answer were 3 observations, meanwhile, for this question, the expected output has to got 5 observations.

 


I'm confused. How does the 3/5 change the situation?

Those solutions don't seem dependent on the depth of the linkage. Can you explain with an example of how it doesn't work?

SAS Employee
Posts: 13

Re: How to retrieve parent-child in a "chain" of relationships without using reflexing joi

In the previous solution, the expected output was just the oldest client id for each client_id:

 

OBS

CLIENT_ID

OLD_CLIENT_ID
1 10 2
2 7 5
3 9 8

 

Meanwhile, in this question I'm asking for all the relationships that, for example, client_id 10 has in the input data. The expected output is this:

 

OBS DATE CLIENT_ID OLD_CLIENT_ID
1 27-Sep-16 10 6
2 17-Jun-16 10 4
3 5-Mar-16 10 2
4 27-Sep-16 7 5
5 10-Sep-16 9 8

 

Note that I don't need the relationship between 6-4 or between 4-2, as those ones has a newer client ID (10).

Super User
Posts: 17,750

Re: How to retrieve parent-child in a "chain" of relationships without using reflexing joi

So the solution I provided doesn't work for you?

SAS Employee
Posts: 13

Re: How to retrieve parent-child in a "chain" of relationships without using reflexing joi

It needs a further transformation in order to reach the desired result, but it works...

 

Thanks Reeza!

Solution
‎01-20-2017 08:18 AM
Super User
Posts: 17,750

Re: How to retrieve parent-child in a "chain" of relationships without using reflexing joi

There are a couple of solved questions on here with different HASH solutions to this problem.

 

This is based on a macro written by PGStats. You didn't show what you want as your final results, but this will get you closer, if its not what you want as an end result. Note the last parameter controls how 'deep' the search goes so you do need some idea or set it to an arbitrarily high number.

 

%macro SubGraphs(arcs,from=from,to=to,out=Clusters,exp=8);
data _null_;
if 0 then set &arcs(keep=&from rename=(&from=node)); /* get node data type */
length clust 8;
declare hash nodes(hashexp:&exp);
nodes.defineKey('node');
nodes.defineData('node', 'clust');
nodes.defineDone();
declare hiter nodeList('nodes');

do newClust = 1 by 1 while(not endLoop);
	set &arcs end=endLoop;
	call missing(clust); node = &from;
	if 0^=nodes.find() then nodes.add(); 
	fromClust = clust; 
	call missing(clust); node = &to;
	if 0^=nodes.find() then nodes.add(); 
	toClust = clust;
	if n(fromClust, toClust) = 0 then do;
		nodes.replace(key:&from, data:&from, data:newClust);
		nodes.replace(key:&to, data:&to, data:newClust);
	end;
	else if missing(toClust) then 
		nodes.replace(key:&to, data:&to, data:fromClust);
	else if missing(fromClust) then 
		nodes.replace(key:&from, data:&from, data:toClust);
	else if fromClust ne toClust then do;
		rc = nodeList.first();
		do while (rc = 0);
			if clust = fromClust then 
				nodes.replace(key:node, data:node, data:toClust);
			rc = nodeList.next();
		end;
	end;
end;
nodes.output(dataset:"&out");
stop;
run;
%mend SubGraphs;



data have;
informat obs 8. date date9.;
format date date9.;
input  OBS	DATE	CLIENT_ID	OLD_CLIENT_ID;
cards;
1	27Sep16	10	6
2	17Jun16	10	4
3	5Mar16	10	2
4	27Sep16	7	5
5	10Sep16	9	8
;
run;

%SubGraphs(have, from=old_client_id, to=client_id, out=want,exp=4);

proc sort data=want;
by clust node;
run;
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 242 views
  • 0 likes
  • 3 in conversation