Solved: Re: How to retrieve parent-child in a "chain" of relationships without...

lmignone · Posted 01-19-2017 02:49 PM

Hi all,

I have a dataset with three columns with this kind of information:

OBS	DATE	CLIENT_ID	OLD_CLIENT_ID
1	27-Sep-16	10	6
2	27-Sep-16	7	5
3	17-Jun-16	6	4
4	10-Sep-16	9	8
5	5-Mar-16	4	2

Where "date" column is the date where some client has made a change in his account id, and the other columns are the respective values at that moment.

As you can see in the example dataset posted, many (and unknown) number of relationships could exist in the data history. For example, the client "10" was before client "6" (obs 1) but before was client "4" (obs 3) and before of that was client "2".

Therefore, what I want is a final dataset with the client_id-old_client_id pairs taking account of what was said before, which will be like this:

OBS	DATE	CLIENT_ID	OLD_CLIENT_ID
1	27-Sep-16	10	6
2	17-Jun-16	10	4
3	5-Mar-16	10	2
4	27-Sep-16	7	5
5	10-Sep-16	9	8

The dataset has ~ 4.000.000 obs.

There is no other variable to identify the client uniquely.

Is there a way to accomplish this without using reflexive joins? As I said, number of relationships are unknown so making reflexive joins could be worst-performing.

Thanks in advance!

Reeza · Posted 01-19-2017 03:43 PM

There are a couple of solved questions on here with different HASH solutions to this problem.

This is based on a macro written by PGStats. You didn't show what you want as your final results, but this will get you closer, if its not what you want as an end result. Note the last parameter controls how 'deep' the search goes so you do need some idea or set it to an arbitrarily high number.

%macro SubGraphs(arcs,from=from,to=to,out=Clusters,exp=8);
data _null_;
if 0 then set &arcs(keep=&from rename=(&from=node)); /* get node data type */
length clust 8;
declare hash nodes(hashexp:&exp);
nodes.defineKey('node');
nodes.defineData('node', 'clust');
nodes.defineDone();
declare hiter nodeList('nodes');

do newClust = 1 by 1 while(not endLoop);
	set &arcs end=endLoop;
	call missing(clust); node = &from;
	if 0^=nodes.find() then nodes.add(); 
	fromClust = clust; 
	call missing(clust); node = &to;
	if 0^=nodes.find() then nodes.add(); 
	toClust = clust;
	if n(fromClust, toClust) = 0 then do;
		nodes.replace(key:&from, data:&from, data:newClust);
		nodes.replace(key:&to, data:&to, data:newClust);
	end;
	else if missing(toClust) then 
		nodes.replace(key:&to, data:&to, data:fromClust);
	else if missing(fromClust) then 
		nodes.replace(key:&from, data:&from, data:toClust);
	else if fromClust ne toClust then do;
		rc = nodeList.first();
		do while (rc = 0);
			if clust = fromClust then 
				nodes.replace(key:node, data:node, data:toClust);
			rc = nodeList.next();
		end;
	end;
end;
nodes.output(dataset:"&out");
stop;
run;
%mend SubGraphs;



data have;
informat obs 8. date date9.;
format date date9.;
input  OBS	DATE	CLIENT_ID	OLD_CLIENT_ID;
cards;
1	27Sep16	10	6
2	17Jun16	10	4
3	5Mar16	10	2
4	27Sep16	7	5
5	10Sep16	9	8
;
run;

%SubGraphs(have, from=old_client_id, to=client_id, out=want,exp=4);

proc sort data=want;
by clust node;
run;

View solution in original post

PGStats · Posted 01-19-2017 03:36 PM

How is this question different from your previous (solved) question

https://communities.sas.com/t5/SAS-Procedures/How-to-retrieve-the-last-value-in-a-quot-chain-quot-of...

PG

lmignone · Posted 01-19-2017 03:42 PM

Hi PGStats,

the expected result is not the same, as the output of the solved answer were 3 observations, meanwhile, for this question, the expected output has to got 5 observations.

Reeza · Posted 01-19-2017 03:52 PM

@lmignone wrote:

Hi PGStats,

the expected result is not the same, as the output of the solved answer were 3 observations, meanwhile, for this question, the expected output has to got 5 observations.

I'm confused. How does the 3/5 change the situation?

Those solutions don't seem dependent on the depth of the linkage. Can you explain with an example of how it doesn't work?

lmignone · Posted 01-19-2017 04:07 PM

In the previous solution, the expected output was just the oldest client id for each client_id:

OBS	CLIENT_ID	OLD_CLIENT_ID
1	10	2
2	7	5
3	9	8

Meanwhile, in this question I'm asking for all the relationships that, for example, client_id 10 has in the input data. The expected output is this:

OBS	DATE	CLIENT_ID	OLD_CLIENT_ID
1	27-Sep-16	10	6
2	17-Jun-16	10	4
3	5-Mar-16	10	2
4	27-Sep-16	7	5
5	10-Sep-16	9	8

Note that I don't need the relationship between 6-4 or between 4-2, as those ones has a newer client ID (10).

Reeza · Posted 01-19-2017 04:46 PM

So the solution I provided doesn't work for you?

lmignone · Posted 01-20-2017 08:17 AM

It needs a further transformation in order to reach the desired result, but it works...

Thanks Reeza!

Reeza · Posted 01-19-2017 03:43 PM

There are a couple of solved questions on here with different HASH solutions to this problem.

This is based on a macro written by PGStats. You didn't show what you want as your final results, but this will get you closer, if its not what you want as an end result. Note the last parameter controls how 'deep' the search goes so you do need some idea or set it to an arbitrarily high number.

%macro SubGraphs(arcs,from=from,to=to,out=Clusters,exp=8);
data _null_;
if 0 then set &arcs(keep=&from rename=(&from=node)); /* get node data type */
length clust 8;
declare hash nodes(hashexp:&exp);
nodes.defineKey('node');
nodes.defineData('node', 'clust');
nodes.defineDone();
declare hiter nodeList('nodes');

do newClust = 1 by 1 while(not endLoop);
	set &arcs end=endLoop;
	call missing(clust); node = &from;
	if 0^=nodes.find() then nodes.add(); 
	fromClust = clust; 
	call missing(clust); node = &to;
	if 0^=nodes.find() then nodes.add(); 
	toClust = clust;
	if n(fromClust, toClust) = 0 then do;
		nodes.replace(key:&from, data:&from, data:newClust);
		nodes.replace(key:&to, data:&to, data:newClust);
	end;
	else if missing(toClust) then 
		nodes.replace(key:&to, data:&to, data:fromClust);
	else if missing(fromClust) then 
		nodes.replace(key:&from, data:&from, data:toClust);
	else if fromClust ne toClust then do;
		rc = nodeList.first();
		do while (rc = 0);
			if clust = fromClust then 
				nodes.replace(key:node, data:node, data:toClust);
			rc = nodeList.next();
		end;
	end;
end;
nodes.output(dataset:"&out");
stop;
run;
%mend SubGraphs;



data have;
informat obs 8. date date9.;
format date date9.;
input  OBS	DATE	CLIENT_ID	OLD_CLIENT_ID;
cards;
1	27Sep16	10	6
2	17Jun16	10	4
3	5Mar16	10	2
4	27Sep16	7	5
5	10Sep16	9	8
;
run;

%SubGraphs(have, from=old_client_id, to=client_id, out=want,exp=4);

proc sort data=want;
by clust node;
run;

How to retrieve parent-child in a "chain" of relationships without using reflexing joins?

Re: How to retrieve parent-child in a "chain" of relationships without using reflexing joi

Re: How to retrieve parent-child in a "chain" of relationships without using reflexing joi

Re: How to retrieve parent-child in a "chain" of relationships without using reflexing joi

Re: How to retrieve parent-child in a "chain" of relationships without using reflexing joi

Re: How to retrieve parent-child in a "chain" of relationships without using reflexing joi

Re: How to retrieve parent-child in a "chain" of relationships without using reflexing joi

Re: How to retrieve parent-child in a "chain" of relationships without using reflexing joi

Re: How to retrieve parent-child in a "chain" of relationships without using reflexing joi

The 2025 SAS Hackathon has begun!

SAS Training: Just a Click Away