Friends, someone has already done recursion in sas?
example.
source table :
KEY | NAME | FATHER | DATE_START | DATE_END |
---|---|---|---|---|
1 | SÃO PAULO | 2 | 01/01/1900 | 12/31/2015 |
2 | BRASIL | 01/01/1900 | 12/31/2015 | |
3 | SANTOS | 1 | 01/01/1900 | 12/31/2015 |
4 | PRAIA GRANDE | 3 | 01/01/1900 | 12/31/2015 |
5 | ESPIRITO SANTO | 2 | 01/01/1900 | 12/31/2015 |
6 | VITORIA | 5 | 01/01/1900 | 12/31/2015 |
7 | MATA DA PRAIA | 6 | 01/01/1900 | 12/31/2015 |
8 | BAHIA | 2 | 01/01/1900 | 12/31/2015 |
9 | SALVADOR | 8 | 01/01/1900 | 12/31/2015 |
result table:
SK | LEVEL | NAME | KEY | FATHER | DATE_START | DATE_END |
---|---|---|---|---|---|---|
1 | 1 | BRASIL | 2 | 01/01/1900 | 12/31/2015 | |
2 | 2 | -SÃO PAULO | 1 | 2 | 01/01/1900 | 12/31/2015 |
3 | 3 | --SANTOS | 3 | 1 | 01/01/1900 | 12/31/2015 |
4 | 4 | ---PRAIA GRANDE | 4 | 3 | 01/01/1900 | 12/31/2015 |
5 | 2 | -ESPIRITO SANTO | 5 | 2 | 01/01/1900 | 12/31/2015 |
6 | 3 | --VITORIA | 6 | 5 | 01/01/1900 | 12/31/2015 |
7 | 4 | ---MATA DA PRAIA | 7 | 6 | 01/01/1900 | 12/31/2015 |
8 | 2 | -BAHIA | 8 | 2 | 01/01/1900 | 12/31/2015 |
9 | 3 | --SALVADOR | 9 | 8 | 01/01/1900 | 12/31/2015 |
if possible like that ... result table two:
SK | DATE_START | DATE_END | KEY_1 | NAME_1 | KEY_2 | NAME_2 | KEY_3 | NAME_3 | KEY_4 | NAME_4 | LEVEL |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 01/01/1900 | 12/31/2015 | 2 | BRASIL | 2 | BRASIL | 2 | BRASIL | 2 | BRASIL | 1 |
2 | 01/01/1900 | 12/31/2015 | 2 | BRASIL | 1 | SÃO PAULO | 1 | SÃO PAULO | 1 | SÃO PAULO | 2 |
3 | 01/01/1900 | 12/31/2015 | 2 | BRASIL | 1 | SÃO PAULO | 3 | SANTOS | 3 | SANTOS | 3 |
4 | 01/01/1900 | 12/31/2015 | 2 | BRASIL | 1 | SÃO PAULO | 3 | SANTOS | 4 | PRAIA GRANDE | 4 |
5 | 01/01/1900 | 12/31/2015 | 2 | BRASIL | 5 | ESPIRITO SANTO | 5 | ESPIRITO SANTO | 5 | ESPIRITO SANTO | 2 |
6 | 01/01/1900 | 12/31/2015 | 2 | BRASIL | 5 | ESPIRITO SANTO | 6 | VITORIA | 6 | VITORIA | 3 |
7 | 01/01/1900 | 12/31/2015 | 2 | BRASIL | 5 | ESPIRITO SANTO | 6 | VITORIA | 7 | MATA DA PRAIA | 4 |
8 | 01/01/1900 | 12/31/2015 | 2 | BRASIL | 8 | BAHIA | 8 | BAHIA | 8 | BAHIA | 2 |
9 | 01/01/1900 | 12/31/2015 | 2 | BRASIL | 8 | BAHIA | 9 | SALVADOR | 9 | SALVADOR | 3 |
Thank you!
In proc fcmp and proc ds2 you can define functions, that can call themselves.
link statement in datastep can be also used, but very carefully.
If you want to look for ancestors or descendants or traverse a graph, sometimes you can accomplish this using a loop.
Typically you load such data structures into a hash object but other alternatives techniques are also possible (array, set point=, set key=).
you have any example?
not yet worked out ..
I did not understand
Your question has been addressed a few times in this forum.
One does not need recursion to solve the problem.
2 observations:
1- The Father (or Parent) of a lineage does not show up in the Key column.
2- The "end-of-lineage" child does not show in the Father (or Parent) column.
If a child does not become Father, then the lineage stops. End-of-lineage.
So one finds fairly easily the "end-of-lineage" children and then one goes back to reach the Father.
From what you show as the intended result, you have repeated the "end-of-lineage" child several times to pad records.
The following is a solution (no padding of records). You need to replace
Key -> zChild
Father -> zParent
Level -> T_gen
Your suggested output assumes no more than 4 generations per lineage. My proposed solution assumes no more than 9999 generations per lineage.
In the proposed solution:
1- dataset t_a is the input dataset;
2- dataset t_want is the output dataset.
3- does not pad the records.
4- array GG(10000) covers column KEY of your input; if you want the output to have the names, assign another array, say, HH(10000) to cover the names.
Whatever is done to array GG(), do it to array HH().
/*************************/
/*** proposed solution ***/
/*************************/
data t_want(keep=t_gen P1 G1-G9999);
length c_end 8.;
t_max=0;
array GG(10000) P1 G1-G9999;
if _N_=1 then do;
if 0 then set t_a;
declare hash ha(dataset:'t_a',multidata:'N' );
ha.definekey('zChild');
ha.definedata('zChild','zParent');
ha.definedone();
declare hash hb(dataset:'t_a',multidata:'N' );
hb.definekey('zParent');
hb.definedata('zChild','zParent');
hb.definedone();
declare hash hd(multidata:'N');
hd.definekey('c_end');
declare hiter dIter('hd');
hd.definedone();
end;
/******************************************/
/*** make list of end-of-line children ***/
/******************************************/
do until (zDone);
set t_a(rename=(zchild=zc zparent=zp)) nobs=n_last end=zDone;
zParent=zC;
if hb.find() then do;
c_end=zParent;
if hd.check() then do; hd.add(); end;
end;
end;
/****************************************************/
/*** for each end-of-line child, find the lineage ***/
/****************************************************/
rc = dIter.first();
do while (rc = 0);
aDone=0;
t_gen=0;
do until (aDone);
if (t_gen=0) then do; t_gen+1; GG(t_gen)= c_end; zChild=c_end; end;
else if not(ha.find()) then do; t_gen+1; GG(t_gen)= zParent; zChild=zParent; end;
else do; aDone=1; end;
end;
/********************************/
/*** reverse lineage sequence ***/
/********************************/
do i=1 to int(t_gen/2);
x1= GG(i);
GG(i)= GG(t_gen+1-i);
GG(t_gen+1-i)=x1;
end;
output;
/********************************************************/
/*** with one lineage completed, clear out array GG() ***/
/********************************************************/
call missing(of GG(*));
t_max=max(t_max,t_gen);
rc = dIter.next();
end;
/***************************************************************/
/*** remove the fields with only null values for all records ***/
/***************************************************************/
call execute('data t_want; set t_want(keep=t_gen P1 G1-'||compress('G'||put(t_max-1,6.))||');run;');
run;
- ERROR: Not all variables in the list G1-G0 were found.
- WARNING: The data set WORK.W8B2NR may be incomplete. When this step was stopped there were 0 observations and 0 variables.
- WARNING: Data set WORK.W8B2NR was not replaced because this step was stopped.
/*************************/
/*** proposed solution ***/
/*************************/
data WORK.W8B2NR(keep=t_gen P1 G1-G9999);
length c_end 8.;
t_max=0;
array GG(10000) P1 G1-G9999;
if _N_=1 then do;
if 0 then set ORA_BISI.TEMP_HSETOR;
declare hash ha(dataset:'ORA_BISI.TEMP_HSETOR',multidata:'N' );
ha.definekey('SETOR');
ha.definedata('SETOR','PAISETOR');
ha.definedone();
declare hash hb(dataset:'ORA_BISI.TEMP_HSETOR',multidata:'N' );
hb.definekey('PAISETOR');
hb.definedata('SETOR','PAISETOR');
hb.definedone();
declare hash hd(multidata:'N');
hd.definekey('c_end');
declare hiter dIter('hd');
hd.definedone();
end;
/******************************************/
/*** make list of end-of-line children ***/
/******************************************/
do until (zDone);
set ORA_BISI.TEMP_HSETOR(rename=(SETOR=zc PAISETOR=zp)) nobs=n_last end=zDone;
PAISETOR=zC;
if hb.find() then do;
c_end=PAISETOR;
if hd.check() then do; hd.add(); end;
end;
end;
/****************************************************/
/*** for each end-of-line child, find the lineage ***/
/****************************************************/
rc = dIter.first();
do while (rc = 0);
aDone=0;
t_gen=0;
do until (aDone);
if (t_gen=0) then do; t_gen+1; GG(t_gen)= c_end; SETOR=c_end; end;
else if not(ha.find()) then do; t_gen+1; GG(t_gen)= PAISETOR; SETOR=PAISETOR; end;
else do; aDone=1; end;
end;
/********************************/
/*** reverse lineage sequence ***/
/********************************/
do i=1 to int(t_gen/2);
x1= GG(i);
GG(i)= GG(t_gen+1-i);
GG(t_gen+1-i)=x1;
end;
output;
/********************************************************/
/*** with one lineage completed, clear out array GG() ***/
/********************************************************/
call missing(of GG(*));
t_max=max(t_max,t_gen);
rc = dIter.next();
end;
/***************************************************************/
/*** remove the fields with only null values for all records ***/
/***************************************************************/
call execute('data WORK.W8B2NR; set WORK.W8B2NR(keep=t_gen P1 G1-'||compress('G'||put(t_max-1,6.))||');run;');
run;
Here is an example :
data have; infile datalines truncover; input Key $ Name & $40. Father $; datalines; 1 SO PAULO 2 2 BRASIL 3 SANTOS 1 4 PRAIA GRANDE 3 5 ESPIRITO SANTO 2 6 VITORIA 5 7 MATA DA PRAIA 6 8 BAHIA 2 9 SALVADOR 8 ; run; data ancestor; if _n_ eq 1 then do; if 0 then set have; declare hash h(dataset:'have',hashexp:20); h.definekey('Key'); h.definedone(); end; set have; if h.check(key:Father) ne 0; run; data want (keep = path); if _n_ eq 1 then do; length path _path $ 4000 ; if 0 then set have; declare hash ha(hashexp:20,dataset:'have(where=(Father is not missing and Key is not missing))',multidata:'Y'); ha.definekey('Father'); ha.definedata('Key','Name'); ha.definedone(); declare hash pa(ordered:'Y'); declare hiter hi_path('pa'); pa.definekey('count'); pa.definedata('path'); pa.definedone(); end; set ancestor; count=1; path=catx('|',Name,Key); pa.add(); do while(hi_path.next()=0); _path=path; Father=scan(path,-1,'|'); rc=ha.find(); if rc ne 0 then output; do while(rc=0); if not find(path,strip(Key)) then do; count+1; path=catx('|',path,Name,Key); pa.add(); path=_path; end; rc=ha.find_next(); end; end; pa.clear(); run; data final_want; set want; length sk level 8 Name Key Father $ 80 ; level=0; do i=1 to countw(path,'|') by 2; sk+1;level+1; Name=scan(path,i,'|'); Key=scan(path,i+1,'|'); output; Father=Key; end; drop i path ; run;
Xia Keshan
it is almost friend, need to get those who have no father, and also need to take up the last level information even if she does not have more children.
thank you very much, you are the best!
What else do you need more ? You can process my result to fit your demand later .
Methinks you were too quick in dismissing billfish's suggested code. Other than the dates, which you've never mentioned how you want to treat, the following additions to billfish's code results in the table you said you would like to attain:
libname ora_bsi "c:\temp";
data ora_bsi.temp_hsetor;
informat name $20.;
informat date_start date_end mmddyy10.;
format date_start date_end mmddyy10.;
infile cards dlm='09'x;
input KEY NAME FATHER DATE_START DATE_END;
cards;
1 SÃO PAULO 2 01/01/1900 12/31/2015
2 BRASIL . 01/01/1900 12/31/2015
3 SANTOS 1 01/01/1900 12/31/2015
4 PRAIA GRANDE 3 01/01/1900 12/31/2015
5 ESPIRITO SANTO 2 01/01/1900 12/31/2015
6 VITORIA 5 01/01/1900 12/31/2015
7 MATA DA PRAIA 6 01/01/1900 12/31/2015
8 BAHIA 2 01/01/1900 12/31/2015
9 SALVADOR 8 01/01/1900 12/31/2015
;
data t_a;
set ora_bsi.temp_hsetor (rename=(key=zChild father=zParent));
run;
/*************************/
/*** proposed solution ***/
/*************************/
data t_want(keep=t_gen P1 G1-G9999);
length c_end 8.;
t_max=0;
array GG(10000) P1 G1-G9999;
if _N_=1 then do;
if 0 then set t_a;
declare hash ha(dataset:'t_a',multidata:'N' );
ha.definekey('zChild');
ha.definedata('zChild','zParent');
ha.definedone();
declare hash hb(dataset:'t_a',multidata:'N' );
hb.definekey('zParent');
hb.definedata('zChild','zParent');
hb.definedone();
declare hash hd(multidata:'N');
hd.definekey('c_end');
declare hiter dIter('hd');
hd.definedone();
end;
/******************************************/
/*** make list of end-of-line children ***/
/******************************************/
do until (zDone);
set t_a(rename=(zchild=zc zparent=zp)) nobs=n_last end=zDone;
zParent=zC;
if hb.find() then do;
c_end=zParent;
if hd.check() then do; hd.add(); end;
end;
end;
/****************************************************/
/*** for each end-of-line child, find the lineage ***/
/****************************************************/
rc = dIter.first();
do while (rc = 0);
aDone=0;
t_gen=0;
do until (aDone);
if (t_gen=0) then do; t_gen+1; GG(t_gen)= c_end; zChild=c_end; end;
else if not(ha.find()) then do; t_gen+1; GG(t_gen)= zParent; zChild=zParent; end;
else do; aDone=1; end;
end;
/********************************/
/*** reverse lineage sequence ***/
/********************************/
do i=1 to int(t_gen/2);
x1= GG(i);
GG(i)= GG(t_gen+1-i);
GG(t_gen+1-i)=x1;
end;
output;
/********************************************************/
/*** with one lineage completed, clear out array GG() ***/
/********************************************************/
call missing(of GG(*));
t_max=max(t_max,t_gen);
rc = dIter.next();
end;
/***************************************************************/
/*** remove the fields with only null values for all records ***/
/***************************************************************/
call execute('data t_want; set t_want(keep=t_gen P1 G1-'||compress('G'||put(t_max-1,6.))||');run;');
run;
data ctrl;
set ora_bsi.temp_hsetor (rename=(key=start
name=label)) end=last;
retain fmtname 'names' type 'n';
output;
if last then do;
hlo='O';
label='***ERROR***';
output;
end;
run;
proc format library=work cntlin=ctrl;
run;
data want (keep=key: name: level sk);
set t_want;
array oldkey(*) g:;
array key_(4);
array name_(4) $20.;
retain mainkey;
if _n_ eq 1 then do;
do i=1 to t_gen-1;
key_(i)=oldkey(1);
name_(i)=put(oldkey(1),names.);
end;
mainkey=oldkey(1);
sk=1;
level=1;
output;
end;
else do;
key_(1)=mainkey;
name_(1)=put(key_(1),names.);
level=1;
end;
if 2 le t_gen-1 then do;
do i=2 to 4;
key_(i)=oldkey(2);
name_(i)=put(key_(i),names.);
end;
sk+1;
level+1;
output;
end;
if 3 le t_gen-1 then do;
do i=3 to 4;
key_(i)=oldkey(3);
name_(i)=put(key_(i),names.);
end;
sk+1;
level+1;
output;
end;
if 4 le t_gen-1 then do;
key_(4)=oldkey(4);
name_(4)=put(key_(4),names.);
sk+1;
level+1;
output;
end;
run;
data want;
retain sk key_1 name_1 key_2 name_2 key_3 name_3 key_4 name_4 level;
set want;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.