Re: Hierachy data (Parent Child)

anandbillava · Posted 08-27-2018 01:52 PM

Hi There,

I have a data like this

data a;
a=1; b=5.1; c=5.5; output;
a=1; b=5.2; c=5.5; output;
a=1; b=5.0; c=5.2; output;

a=1; b=4.7; c=5.0; output;
a=1; b=4.6; c=5.2; output;

a=1; b=.; c=4.6; output;

run;

I am trying to create a output in which for a given parent nodie (variable c) it should get all the childs/nodes (variable b) which should not have any child/nodes. for e.g in the above example for parent node with value 5.5 it should output 5.1, 4.7 and 4.6 which do not have any child.

Here is how the output is picked.

5.5 has 5.1 and 5.2 child, then 5.2 has child 5.0, 5.0 has child 4.7 and 5.2 has child 4.6. 4.6, 4.7 and 5.1 do not have any childs.

Please let me know if you need more expalnation.

Thanks in advance for the help.

Anand

.

PGStats · Posted 08-27-2018 02:33 PM

Are we to understand that a child node has a number < parent node number? Why not code the child node number as missing when it is greater than the parent node?

PG

PeterClemmensen · Posted 08-27-2018 02:53 PM

Is your desired result a data set? Can you show us what your desired result looks like?

The DATA to DATA Step Macro
Blog: SASnrd

anandbillava · Posted 08-27-2018 03:39 PM

I this case the output will be

new variable

5.1

4.7

4.6

All this do not have any child nodes or children for the ancestor or parent 5.5.

Basically when user asks variable c with value 5.5 then output should be look like above.

Steps are

1. The program reads c variable where value is 5.5 and then looks for the children which is variable b and the values are 5.1 and 5.2

2. For each of b value which is 5.1 and 5.2 the program looks for children and their children are for 5.1 no children so it will appear in the report and 5.2 has children 5.0... and it goes on for all the childrens...

Ksharp · Posted 08-28-2018 11:05 AM

data have(keep=_start _end);
a=1; b='5.1'; c='5.5'; output;
a=1; b='5.2'; c='5.5'; output;
a=1; b='5.0'; c='5.2'; output;

a=1; b='4.7'; c='5.0'; output;
a=1; b='4.6'; c='5.2'; output;

a=1; b=' '; c='4.6'; output;
rename c=_start b=_end;
run;
proc sql;
create table _have as
 select *
  from have where _start not in (select _end from have);
quit;


data want(keep=path want);
if _n_ eq 1 then do;
length path _path  $ 800 ;
if 0 then set have;
declare hash ha(hashexp:20,dataset:'have(where=(_start is not missing and _end is not missing))',multidata:'y');
ha.definekey('_start');
ha.definedata('_end');
ha.definedone();

declare hash pa(ordered:'y');
declare hiter hi_path('pa');
pa.definekey('n');
pa.definedata('n','path');
pa.definedone();

end;


set _have;
count=1;n=1;_n=1;
path=catx('|',_start,_end);
   
pa.add();
do while(hi_path.next()=0);
 if n ne 1 then pa.remove(key:_n);_n=n;
 _path=path;   
 _start=scan(path,-1,'|');
 rc=ha.find();
 if rc ne 0 then do; want=_start;output;end;
 do while(rc=0);
  if not findw(path,strip(_end),'|') then do;
   if length(path)+length(_end)+1 gt lengthc(path) then do;
    putlog 'ERROR: The length of path and _path are set too short';
    stop;
   end;
   
   count+1;n=count;
   path=catx('|',path,_end);
   pa.add(); 
   path=_path;
 end;
  rc=ha.find_next();
end;
end;
pa.clear();

run;

anandbillava · Posted 08-28-2018 11:30 AM

Thank you very much. Its kind of working. But I need to output this based on value a and c. In this case the variable value for a is 1 and c is 5.5.

data a;
a=1; b=5.1; c=5.5; output;
a=1; b=5.2; c=5.5; output;
a=1; b=5.0; c=5.2; output;
a=1; b=4.7; c=5.0; output;
a=1; b=4.6; c=5.2; output;
a=1; b=.; c=4.6; output;

a=2; b=5.4; c=6.1; output;
a=2; b=5.3; c=6.1; output;
a=2; b=5.0; c=5.3; output;
a=2; b=4.7; c=5.0; output;
a=2; b=4.5; c=4.7; output;
run;

For the second set of record where a=2 and lets say value c is 6.1 the children output must be as below

5.4

4.5

Only these two do not have children.

I am sorry for not mentioning this case earlier.

run;

Ksharp · Posted 08-29-2018 07:41 AM

You could split this table into two tables and running my code separately .

If you have many a, make a macro to go through all of a . That is not too difficulty .

Ksharp · Posted 08-29-2018 07:45 AM

You could split this table into two tables and running my code separately .

If you have many a, make a macro to go through all of a . That is not too difficulty .

Ksharp · Posted 08-30-2018 10:47 AM

OK. Since I have some time. Check the following code .


data have;
a=1; b=5.1; c=5.5; output;
a=1; b=5.2; c=5.5; output;
a=1; b=5.0; c=5.2; output;
a=1; b=4.7; c=5.0; output;
a=1; b=4.6; c=5.2; output;
a=1; b=.; c=4.6; output;

 

a=2; b=5.4; c=6.1; output;
a=2; b=5.3; c=6.1; output;
a=2; b=5.0; c=5.3; output;
a=2; b=4.7; c=5.0; output;
a=2; b=4.5; c=4.7; output;

rename c=_start b=_end;
run;



proc sql;
create table _have as
 select distinct a,_start
  from have where catx(' ',a,_start) not in (select catx(' ',a,_end) from have);
quit;


data want(keep=a _start  want);
if _n_ eq 1 then do;
if 0 then set have;
length key n path 8;
declare hash ha(hashexp:20,dataset:'have(where=(_start is not missing and _end is not missing))',multidata:'y');
ha.definekey('a','_start');
ha.definedata('_end');
ha.definedone();

declare hash pa(ordered:'y');
declare hiter hi_path('pa');
pa.definekey('n');
pa.definedata('n','path');
pa.definedone();

declare hash k();
k.definekey('key');
k.definedone();
end;


set _have;
count=1;n=1;_n=1;
path=_start;pa.add();
key=_start;k.add();
do while(hi_path.next()=0);
 if n ne 1 then pa.remove(key:_n);_n=n;
 rc=ha.find(key:a,key:path);
 if rc ne 0 then do; want=path;output;end;
 do while(rc=0);
  if k.check(key:_end) ne 0 then do;
   count+1;n=count;
   path=_end;
   pa.add();   
   key=_end;
   k.add();
 end;
  rc=ha.find_next();
end;
end;
pa.clear();	k.clear();

run;

anandbillava · Posted 08-30-2018 03:35 PM

Thank you.. I am able to solve this issue with another approach. Thanks for all your help

anandbillava · Posted 08-05-2019 04:15 PM

Can you help me in solving problem below. I am providing here the appropriate ancestor for each of the record. I need to find this ancestor for each of the record by user. Can you please help.

data a;

user=1; base=4.1; cur=4.2; output; /* 4.1 to 4.2 - Ancestor here is 4.4 */

user=1; base=4.2; cur=4.4; output; /* 4.2 to 4.4 - Ancestor here is 4.4 */

user=1; base=4.6; cur=5.5; output; /* 4.6 to 5.5 - Ancestor here is 6.1 */

user=1; base=5.5; cur=6.1; output; /* 5.5 to 6.1 - Ancestor here is 6.1 */

user=2; base=4.2; cur=4.6; output; /* 4.2 to 4.6 - Ancestor here is 6.1 */

user=2; base=4.6; cur=5.5; output; /* 4.6 to 5.5 - Ancestor here is 6.1 */

user=2; base=5.5; cur=6.1; output; /* 5.0 to 6.1 - Ancestor here is 6.1 */

user=3; base=4.0; cur=4.2; output; /* 4.0 to 4.2 - Ancestor here is 5.5 */

user=3; base=4.2; cur=4.6; output; /* 4.2 to 4.6 - Ancestor here is 5.5 */

user=3; base=4.6; cur=5.5; output; /* 4.6 to 6.1 - Ancestor here is 5.5 */

run;

I am able to achieve this if there is only one user in the dataset using code below. But its not working if there are more than one user.

data _tmp1(rename=(base=base1 cur=cur1));

set a;

run;

**Repeat the training module for each of the delta version between base and current module;

data _tmp2;

retain base cur;

set _tmp1; by u;

cur = cur1;

base = base1;

output;

drop base1 cur1 i;

run;

**Get all the base version for the current modue which has training then need to check the

corresponding base version full access;

data _lineage;

set _tmp2; by u;

array hist(6);

count=1;

hist(count)=base;

do i=1 to last while (count<dim(hist)-1);

set _tmp2(rename=(base=base1 cur=cur1)) nobs=last point=i;

if cur=base1 then do;

count+1;

hist(count)=base1;

base=base1;

cur=cur1;

i=1;

end;

count+1;

hist(count)=cur;

ancest = cur;

run;

Ksharp · Posted 08-06-2019 10:06 AM

You can split the table into three tables and run your code separately ? data user1; set a; if user=1; run; ...........

anandbillava · Posted 08-06-2019 11:23 AM

Hi ,

I have lot of records in the dataset not just 3. It will take lot of time to process if i write a macro to subset these. Is there a way i can do it in one step.

Thank you,

andreas_lds · Posted 08-06-2019 12:33 PM

@anandbillava wrote:

Hi ,

I have lot of records in the dataset not just 3. It will take lot of time to process if i write a macro to subset these. Is there a way i can do it in one step.

Thank you,

Creating subsets won't decrease the time required to create the final result. Do you get any errors while processing the complete dataset.

anandbillava · Posted 08-06-2019 01:12 PM

I will have to create more than 4000 datasets which is am already doing and its taking lot of time to process.

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!