Help using Base SAS procedures

How to retrieve the last value in a "chain" of relationships without using reflexing joins?

Accepted Solution Solved
Reply
SAS Employee
Posts: 13
Accepted Solution

How to retrieve the last value in a "chain" of relationships without using reflexing joins?

Hi all,

I have a dataset with three columns with this kind of information:

 

OBS DATE CLIENT_ID OLD_CLIENT_ID
1 27-Sep-16 10 6
2 27-Sep-16 7 5
3 17-Jun-16 6 4
4 10-Sep-16 9 8
5 5-Mar-16 4 2

 

Where "date" column is the date where some client has made a change in his account id, and the other columns are the respective values at that moment.

As you can see in the example dataset posted, many (and unknown) number of relationships could exist in the data history. For example, the client "10" was before client "6" (obs 1) but before was client "4" and before of that was client "2".

Therefore, what I want is a final dataset with the client_id-old_client_id pair taking account of what was said before, which will be like this:

 

OBS

CLIENT_ID

OLD_CLIENT_ID
1 10 2
2 7 5
3 9 8

 

Is there a way to accomplish this without using reflexive joins? As I said, number of relationships are unknown so making reflexive joins could be worst-performing.

 

Thanks in advance!


Accepted Solutions
Solution
‎09-28-2016 09:30 AM
Grand Advisor
Posts: 9,444

Re: How to retrieve the last value in a "chain" of relationships without using reflexing j

As Patrick did.
Assuming One client_id corresponding to One old_client_id.


data chain;
   infile datalines truncover;
   input date date9. client_id old_client_id;
   format date date9.;
   datalines;
27-sep=16 10 6
27-sep-16 7 5
17-jun-16 6 4
10-sep-16 9 8
05-mar-16 4 2
; 
run;
data start;
 if _n_=1 then do;
  if 0 then set chain;
  declare hash h(dataset:'chain');
  h.definekey('old_client_id');
  h.definedone();
 end;
set chain;
if h.check(key:client_id) ne 0;
keep client_id date;
run;
data want;
 if _n_=1 then do;
  if 0 then set chain;
  declare hash h(dataset:'chain');
  h.definekey('client_id');
  h.definedata('old_client_id');
  h.definedone(); 
 end;
set start;
rc=h.find();
do while(rc=0);
 rc=h.find(key:old_client_id);
end;
drop rc;
run;


View solution in original post


All Replies
Super User
Posts: 1,159

Re: How to retrieve the last value in a "chain" of relationships without using reflexing j

I have the feeling that some in memory methods are the right mean to solve your problem.

Before tryning to code the program, it is good to know how big is the dataset?

or more specifically:

- how many observations are in the dataset?

- how many clients are expected in the result dataset ?

 

one more question - is there any other variable that can be used to identify client uniquely ?

having such variable will simplify the code.

Super User
Posts: 1,159

Re: How to retrieve the last value in a "chain" of relationships without using reflexing j

With a file up to some thousands of clients you can use the attached tested program

Attachment
SAS Employee
Posts: 13

Re: How to retrieve the last value in a "chain" of relationships without using reflexing j

Hi Shmuel,

 

Regarding your questions:

 

- The dataset has ~ 4.000.000 obs.

- The clients expected are ~95% of the obs of the dataset, as most of the relationships has one change only.

There is no other variable to identify the client uniquely.

Respected Advisor
Posts: 4,606

Re: How to retrieve the last value in a "chain" of relationships without using reflexing j

It can be done with indexed random access:

 

data have;
input OBS	DATE :anydtdte.	CLIENT_ID	OLD_CLIENT_ID;
format date yymmdd10.;
datalines;
1	27-Sep-16	10	6
2	27-Sep-16	7	5
3	17-Jun-16	6	4
4	10-Sep-16	9	8
5	5-Mar-16	4	2
;

proc sql;
create unique index client_id on have(client_id);
create table current_clients as
select client_id as last_id
from have 
where client_id not in (select old_client_id from have);
quit;

data want;
set current_clients;
client_id = last_id;
do i = 1 to 100 until(_error_); /* Prevent cycling */
    set have key=client_id / unique;
    if _error_ then do;
        first_id = client_id;
        output;
        end;
    else client_id = old_client_id;
    end;
_error_ = 0;
keep first_id last_id;
run;
PG
Respected Advisor
Posts: 3,740

Re: How to retrieve the last value in a "chain" of relationships without using reflexing j

Below should work as long as you have enough memory for the hashes AND each parent has only one child (so it's a single chain and not a tree with branches).

 

data chain;
   infile datalines truncover;
   input date date9. client_id old_client_id;
   format date date9.;
   datalines;
27-sep=16 10 6
27-sep-16 7 5
17-jun-16 6 4
10-sep-16 9 8
05-mar-16 4 2
05-mar-16 20 
; 
run;

data want(keep=date client_id old_client_id);
  set chain(keep=date client_id old_client_id);
  if _n_=1 then
    do;
      if 0 then set chain(keep=client_id old_client_id rename=(client_id=child old_client_id=parent));
      dcl hash root (dataset:'chain(keep=client_id old_client_id rename=(client_id=child old_client_id=parent))');
      _rc=root.defineKey('child');
      _rc=root.defineData('parent');
      _rc=root.defineDone();
      dcl hash leaf (dataset:'chain(keep=client_id old_client_id rename=(client_id=child old_client_id=parent) where=(parent ne .))');
      _rc=leaf.defineKey('parent');
      _rc=leaf.defineData('child');
      _rc=leaf.defineDone();

    end;
  
  parent=old_client_id;
  do while(root.find(key:parent)=0);
  end;

  child=client_id;
  do while(leaf.find(key:child)=0);
  end;
  
  if client_id=child then 
    do;
      old_client_id=parent;
      output;
    end;
run; 

 

In case your data volumes are high and performance becomes an issue then it would be possible to delete entries from the hashes once we found root and parent - but it would require additional coding and I've tried to keep things simple.

 

Solution
‎09-28-2016 09:30 AM
Grand Advisor
Posts: 9,444

Re: How to retrieve the last value in a "chain" of relationships without using reflexing j

As Patrick did.
Assuming One client_id corresponding to One old_client_id.


data chain;
   infile datalines truncover;
   input date date9. client_id old_client_id;
   format date date9.;
   datalines;
27-sep=16 10 6
27-sep-16 7 5
17-jun-16 6 4
10-sep-16 9 8
05-mar-16 4 2
; 
run;
data start;
 if _n_=1 then do;
  if 0 then set chain;
  declare hash h(dataset:'chain');
  h.definekey('old_client_id');
  h.definedone();
 end;
set chain;
if h.check(key:client_id) ne 0;
keep client_id date;
run;
data want;
 if _n_=1 then do;
  if 0 then set chain;
  declare hash h(dataset:'chain');
  h.definekey('client_id');
  h.definedata('old_client_id');
  h.definedone(); 
 end;
set start;
rc=h.find();
do while(rc=0);
 rc=h.find(key:old_client_id);
end;
drop rc;
run;


SAS Employee
Posts: 13

Re: How to retrieve the last value in a "chain" of relationships without using reflexing j

After testing the different solutions that were posted, I have to choose this one due to performance (45% faster processing than using reflexive sql joins).

 

Thanks to all Smiley Happy

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 379 views
  • 0 likes
  • 5 in conversation