SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

recursive joins with PROC SQL

Accepted Solution Solved
Reply
Contributor
Posts: 32
Accepted Solution

recursive joins with PROC SQL

[ Edited ]

I have a recursive data I would want to re-structure analysis:

data :

data have;
input id pid ;
cards;
1 0
2 1
3 1
4 2
5 2
6 3
7 0
8 7
9 8
10 8
11 7

run;


 I would like to structure it like this :

id1id2id
124
125
136
789
7810
711.

 

The challenge I have is I don't know how many levels I have in the data set.

I tried the SAS support example (http://support.sas.com/kb/25/437.html) but I keep getting errors (ERROR: Ambiguous reference, column level is in more than one table.)

 

Any help is much appreciated.


Accepted Solutions
Solution
‎11-09-2017 11:46 AM
Super User
Posts: 8,053

Re: recursive joins with PROC SQL

Since this is an interesting issue (at least for me), I invested some brain cycles and came up with this:

data have;
input id pid;
cards;
1 0
2 1
3 1
4 2
5 2
6 3
7 0
8 7
9 8
10 8
11 7
;
run;

/* first, let's create an initial dataset that fits the parameterized SQL's */
proc sql;
create table children1 as
select pid as id, id as id1
from have;
quit;

/* now, we build a macro that iterates recursively until an empty result is created */
%macro recursion;
/* initialize */
%let lev=2;
%let flag=1;

%do %while (&flag.); /* until result is reached */

proc sql noprint;
create table children&lev. as /* create next level */
select a.*, b.id%eval(&lev.-1) as id&lev.
from children%eval(&lev.-1) a full join children%eval(&lev.-1) b
on b.id = a.id%eval(&lev.-1)
where
  a.id not in (0,.) /* remove root lines */ and
  a.id not in (select distinct id%eval(&lev.-1) from children%eval(&lev.-1) where id ne 0)
  /* the sub-select removes all lines that already appear as children of another */
order by a.id
;
/* check if newly created column is empty */
select coalesce(sum(id&lev.),0) into :flag from children&lev.;
/* coalesce needed because SQL sum() creates a missing value if all values are missing */
/* is different from data step sum() */
quit;

  %let lev=%eval(&lev.+1); /* increment */
%end;

/* since lev now points to a not-yet-created dataset, and we want the next-to-last (non-empty) one, reduce by 2 */
/* sort, so it looks pretty */
proc sort
  data=children%eval(&lev.-2)
  out=result
;
by id
%do i = 1 %to %eval(&lev.-2);
  id&i
%end;
;
run;
%mend;
%recursion

proc print data=result noobs;
run;

The result:

id    id1    id2

 1      2      4
 1      2      5
 1      3      6
 7      8      9
 7      8     10
 7     11      .

You will want to test it against larger, more complex data.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers

View solution in original post


All Replies
Solution
‎11-09-2017 11:46 AM
Super User
Posts: 8,053

Re: recursive joins with PROC SQL

Since this is an interesting issue (at least for me), I invested some brain cycles and came up with this:

data have;
input id pid;
cards;
1 0
2 1
3 1
4 2
5 2
6 3
7 0
8 7
9 8
10 8
11 7
;
run;

/* first, let's create an initial dataset that fits the parameterized SQL's */
proc sql;
create table children1 as
select pid as id, id as id1
from have;
quit;

/* now, we build a macro that iterates recursively until an empty result is created */
%macro recursion;
/* initialize */
%let lev=2;
%let flag=1;

%do %while (&flag.); /* until result is reached */

proc sql noprint;
create table children&lev. as /* create next level */
select a.*, b.id%eval(&lev.-1) as id&lev.
from children%eval(&lev.-1) a full join children%eval(&lev.-1) b
on b.id = a.id%eval(&lev.-1)
where
  a.id not in (0,.) /* remove root lines */ and
  a.id not in (select distinct id%eval(&lev.-1) from children%eval(&lev.-1) where id ne 0)
  /* the sub-select removes all lines that already appear as children of another */
order by a.id
;
/* check if newly created column is empty */
select coalesce(sum(id&lev.),0) into :flag from children&lev.;
/* coalesce needed because SQL sum() creates a missing value if all values are missing */
/* is different from data step sum() */
quit;

  %let lev=%eval(&lev.+1); /* increment */
%end;

/* since lev now points to a not-yet-created dataset, and we want the next-to-last (non-empty) one, reduce by 2 */
/* sort, so it looks pretty */
proc sort
  data=children%eval(&lev.-2)
  out=result
;
by id
%do i = 1 %to %eval(&lev.-2);
  id&i
%end;
;
run;
%mend;
%recursion

proc print data=result noobs;
run;

The result:

id    id1    id2

 1      2      4
 1      2      5
 1      3      6
 7      8      9
 7      8     10
 7     11      .

You will want to test it against larger, more complex data.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 32

Re: recursive joins with PROC SQL

Posted in reply to KurtBremser

Thank you

PROC Star
Posts: 1,798

Re: recursive joins with PROC SQL

Posted in reply to KurtBremser

@KurtBremser Not really recursive imho as you don't have a function calling itself.

 

Rather you have a loop that builds something equivalent to this:

 

proc sql;
    create table T as select curr.*, new1.ID as ID1, new2.ID as ID2, new3.ID as ID3, new4.ID as ID4
    from HAVE curr 
           left join 
         HAVE new1 
           on curr.ID=new1.PID
           left join 
         HAVE new2 
            on new1.ID=new2.PID
           left join 
         HAVE new3 
           on new2.ID=new3.PID
           left join 
         HAVE new4 
           on new3.ID=new4.PID
  having PID=0
  order by 1,2,3,4,5;
quit;

Result ( I added a level with value 12 ): 

 

PID ID ID1 ID2 ID3 ID4
0 1 2 4 . .
0 1 2 5 . .
0 1 3 6 . .
0 7 8 9 . .
0 7 8 10 12 .
0 7 11 . . .

 

To me, a truly recursive macro should call itself, and could look like this:

%macro gorecurse(iter);
  %if &iter= %then %do;
    %let iter=1; 
    proc sql;
      create table LEV&iter. as select PID as ID0, ID as ID1 from HAVE;
    quit;
    %let iter=2; 
  %end;
  proc sql;
    create table LEV&iter. as select curr.* , new.ID1 as ID&iter. 
      from LEV%eval(&iter.-1) curr left join LEV%eval(&iter.-1) new 
      on  new.ID0=curr.ID%eval(&iter.-1)
      and curr.ID0=0 ;
    create table MATCHED as select unique ID%eval(&iter.-1) from LEV&iter. where ID&iter. ;
    delete from LEV&iter. where ID0 in (select * from MATCHED); 
  quit;
  %if &sqlobs. %then %gorecurse(%eval(&iter.+1)); 
%mend; 
%gorecurse
ID0 ID1 ID2 ID3 ID4 ID5
0 1 2 5 . .
0 7 8 9 . .
0 7 11 . . .
0 1 2 4 . .
0 1 3 6 . .
0 7 8 10 12 .

 

Just my 2 cents. Smiley Happy

 

 

 

 
Super User
Posts: 8,053

Re: recursive joins with PROC SQL

@ChrisNZ You are right of course. What I've done is the usual transformation of a recursive problem into a iterative algorithm, that one does where true recursive elements are either not present within a language, or where a recursive method might be a hindrance to further maintenance.

 

Nice work on the macro.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Super User
Posts: 8,174

Re: recursive joins with PROC SQL

There is some specific name for this type of thing, parent-child hierarchies or graphs.  You would find the following interesting reads:

http://support.sas.com/kb/25/968.html

 

https://communities.sas.com/t5/Base-SAS-Programming/Parent-Child-hierarchy/td-p/293028

 

Various solutions provided there.

PROC Star
Posts: 1,798

Re: recursive joins with PROC SQL

@RW9 Yes, but recursions tickle the mind more,,,  Smiley Happy

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 448 views
  • 5 likes
  • 4 in conversation