Solved: Transitive Dependency Finder using code

Swapnil_21 · Posted 03-30-2022 05:06 AM

Hi,

Its plain transitive dependency finder task for me but somehow I am not able to generate the proper code

E.g. A depends on B and B depends on A so this combination is transitive . C depends on D , D depends on E and F , E depends on C so this combination is also transitive. So I need to flag out such records irrespective of how much is the depth

Below is the sample data

data have;
input col1: $1. col2: $1.;
datalines;
A B
P Q
B A
C D
D E
D F
E C

R

;
run;

Below is the output I want

col1	col2	flag
A	B	NotOk
P	Q	Ok
B	A	NotOk
C	D	NotOk
D	E	NotOk
D	F	NotOk
E	C	NotOk
R		Ok

Any help is really appreciated

RobPratt · Posted 03-30-2022 12:11 PM

Here's an approach that uses the network solver in PROC OPTMODEL:

data have;
   infile datalines truncover;
   input col1: $1. col2: $1.;
   datalines;
A B
P Q
B A
C D
D E
D F
E C
R 
;

proc optmodel printlevel=0;
   /* read input data */
   set <str,str> LINKS_ORIG;
   read data have into LINKS_ORIG=[col1 col2];
   set <str> NODES;
   NODES = (union {<i,j> in LINKS_ORIG} {i,j}) diff {''};
   set <str,str> LINKS;
   LINKS = {<i,j> in LINKS_ORIG: {i,j} within NODES};

   /* find weakly connected components */
   num component {NODES};
   solve with network / concomp direction=undirected nodes=(include=NODES) links=(include=LINKS) out=(concomp=component);
   set COMPONENTS;
   COMPONENTS = 1.._OROPTMODEL_NUM_['NUM_COMPONENTS'];
   set <str> NODES_c {COMPONENTS} init {};
   num compThis;
   for {i in NODES} do;
      compThis = component[i];
      NODES_c[compThis] = NODES_c[compThis] union {i};
   end;

   /* for each component, find a directed cycle if possible */
   num okNode {NODES} init 1;
   cofor {c in COMPONENTS: card(NODES_c[c]) > 1} do;
      put c=;
      solve with network / cycle direction=directed links=(include=LINKS) subgraph=(nodes=NODES_c[c]);
      if _OROPTMODEL_NUM_['NUM_CYCLES'] then for {i in NODES_c[c]} okNode[i] = 0;
   end;

   /* create output data set */
   create data want from [col1 col2]=LINKS_ORIG flag=(if okNode[col1] then 'Ok' else 'NotOk');
quit;

View solution in original post

PeterClemmensen · Posted 03-30-2022 06:00 AM

I can't quite figure out the logic here.

A points to B and B points to A --> 'Not Ok'.
P points to Q, but Q does not point back to P --> 'Ok'
C points to D, D points to both E and F. Only E points back to C --> 'Not Ok'.

How is the last case different from the second? There is still a loose 'end', as F does not point back to C?

The DATA to DATA Step Macro
Blog: SASnrd

Swapnil_21 · Posted 03-30-2022 06:19 AM

It means if any of the element depends back to parent then it's transitive dependency. So third case has one element which points back to parent so it's not OK. First case does not have any dependency or missing dependency so it's OK

PeterClemmensen · Posted 03-30-2022 06:15 AM

Also, do you have SAS/OR?

The DATA to DATA Step Macro
Blog: SASnrd

Swapnil_21 · Posted 03-30-2022 06:19 AM

No we don't have SAS/OR

PeterClemmensen · Posted 03-30-2022 06:35 AM

The best approach is to use one of the Opt Procedures in SAS/OR to solve the problem. @Rick_SAS provided links to examples.

If you do not have access to SAS/OR, the code written by @PGStats or @Ksharp in this link can probably be tweaked to solve the problem.

The DATA to DATA Step Macro
Blog: SASnrd

Rick_SAS · Posted 03-30-2022 06:31 AM

Seem like each letter represents a node in a directed graph, and the rows indicate directed edges (or links). You want to find out if there is a cycle from one node back to itself by traversing other nodes.

The OPTGRAPH procedure in SAS/OR can solve these types of problems. It looks like there is a problem with the SAS documentation pages this morning, but you should be able (eventually) to get to the example from this page:

SAS OPTGRAPH Procedure | SAS Support

Here is a link to a very old (SAS/OR 14.1) page that shows how to perform transitive closure operations: Example 1.13 Transitive Closure for Identification of Circular Dependencies in a Bug Tracking System...

Matthew_Galati · Posted 03-31-2022 08:53 AM

Just a clarification on Rick's comment. PROC OPTGRAPH is not part of SAS/OR, but PROC OPTNET is:

https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.3/ornoaug/ornoaug_optnet_details40.htm

You can also get transitive closure, cycle, and connected components in Viya in PROC NETWORK (in Visual Data Mining and Machine Learning) or PROC OPTNETWORK (in Optimization).

PeterClemmensen · Posted 03-31-2022 10:44 AM

@Matthew_Galati , learned something new. Thanks! 🙂

The DATA to DATA Step Macro
Blog: SASnrd

Ksharp · Posted 03-30-2022 08:23 AM

You want find a dead loop / circle component?

Here is an example .




data have;
infile datalines truncover;
input _start : $1. _end: $1.;
datalines;
A B
P Q
B A
C D
D E
D F
E C
R 
;
run;


data _null_;
if _n_ eq 1 then do;
length path _path  $ 800 ;

if 0 then set have;
declare hash ha(hashexp:20,dataset:'have(where=(_start is not missing and _end is not missing))',multidata:'y');
ha.definekey('_start');
ha.definedata('_end');
ha.definedone();

declare hash pa(ordered:'y');
declare hiter hi_path('pa');
pa.definekey('n');
pa.definedata('n','path');
pa.definedone();

end;

set have(where=(_start is not missing and _end is not missing));
count=1;n=1;_n=1;
path=catx(' ',_start,_end);
pa.add();
do while(hi_path.next()=0);

if n ne 1 then pa.remove(key:_n);_n=n;

_path=path;   
_start=scan(path,-1,' ');
rc=ha.find();
do while(rc=0);
  if not findw(path,strip(_end)) then do;

   if length(path)+length(_end)+1 gt lengthc(path) then do; put path= _end=;
    putlog 'ERROR: The length of path and _path are set too short';
    stop;
   end;
                                      count+1;n=count;
                                      path=catx(' ',path,_end);
                                      pa.add();
                                      path=_path;
                                     end;
     else do;put path= _end=;putlog 'FOUND:' _end 'is a circle component.';end;
  rc=ha.find_next();
end;
end;
pa.clear();
run;

NOTE: 从数据集 WORK.HAVE. 读取了 7 个观测
      WHERE (_start is not null) and (_end is not null);
path=A B _end=A
FOUND:A is a circle component.
path=B A _end=B
FOUND:B is a circle component.
path=C D E _end=C
FOUND:C is a circle component.
path=D E C _end=D
FOUND:D is a circle component.
path=E C D _end=E
FOUND:E is a circle component.
NOTE: 从数据集 WORK.HAVE. 读取了 7 个观测
      WHERE (_start is not null) and (_end is not null);
NOTE: “DATA 语句”所用时间（总处理时间）:
      实际时间          0.19 秒
      CPU 时间          0.12 秒

Swapnil_21 · Posted 03-30-2022 09:07 AM

@Ksharp

The code works correctly as per requirement for given sample but it doesnt work for the actual data. Attached is the actual dataset. I should have flag for 21-BSE-CUST_WBC-00001

However, it doesnt show anything . Can you please check where I am doing wrong

Ksharp · Posted 03-31-2022 07:41 AM

I check RobPrat's code result . And found they are all 'Ok'. That means your data don't have any circle component.

You don't see anything in LOG ,that means your data don't have any circle component.(a.k.a My code is right).

if you think 21-BSE-CUST_WBC-00001  is a circle component. Could you show the path in your data ?

RobPratt · Posted 03-30-2022 12:11 PM

Here's an approach that uses the network solver in PROC OPTMODEL:

data have;
   infile datalines truncover;
   input col1: $1. col2: $1.;
   datalines;
A B
P Q
B A
C D
D E
D F
E C
R 
;

proc optmodel printlevel=0;
   /* read input data */
   set <str,str> LINKS_ORIG;
   read data have into LINKS_ORIG=[col1 col2];
   set <str> NODES;
   NODES = (union {<i,j> in LINKS_ORIG} {i,j}) diff {''};
   set <str,str> LINKS;
   LINKS = {<i,j> in LINKS_ORIG: {i,j} within NODES};

   /* find weakly connected components */
   num component {NODES};
   solve with network / concomp direction=undirected nodes=(include=NODES) links=(include=LINKS) out=(concomp=component);
   set COMPONENTS;
   COMPONENTS = 1.._OROPTMODEL_NUM_['NUM_COMPONENTS'];
   set <str> NODES_c {COMPONENTS} init {};
   num compThis;
   for {i in NODES} do;
      compThis = component[i];
      NODES_c[compThis] = NODES_c[compThis] union {i};
   end;

   /* for each component, find a directed cycle if possible */
   num okNode {NODES} init 1;
   cofor {c in COMPONENTS: card(NODES_c[c]) > 1} do;
      put c=;
      solve with network / cycle direction=directed links=(include=LINKS) subgraph=(nodes=NODES_c[c]);
      if _OROPTMODEL_NUM_['NUM_CYCLES'] then for {i in NODES_c[c]} okNode[i] = 0;
   end;

   /* create output data set */
   create data want from [col1 col2]=LINKS_ORIG flag=(if okNode[col1] then 'Ok' else 'NotOk');
quit;

Swapnil_21 · Posted 03-30-2022 09:07 PM

@RobPratt.. u r GOD... super awesomely working.....

Matthew_Galati · Posted 03-31-2022 12:34 PM

Here is an approach that uses Viya's OPTNETWORK to solve it. This uses the same algorithms that @RobPratt shows through the OPTMODEL interface. It finds the connected components and then runs cycle detection by group (where group = connected component).

For large scale this approach would scale nicely as it does all the heavy work on the server (either shared or distributed memory).

/* Find weakly connected components (ignore singletons) */
proc optnetwork
   links    = sascas1.have(where=(col2 ne ''))
   outLinks = sascas1.ConCompLinks;
   connectedComponents;
   linksVar from=col1 to=col2;
run;

/* Determine if the component is directed acyclic. */ 
proc optnetwork
   direction = directed
   links     = sascas1.ConCompLinks;
   linksVar from=col1 to=col2;
   cycle;
   displayout
      SolutionSummary = SolutionSummary;
   by concomp;
run;

/* For each component that is not acyclic, its nodes are NotOk (and append singletons as Ok). */
proc sql;
   create table tmp as
   select a.*, b.numCycles as hasCycle
   from sascas1.ConCompLinks as a left join sascas1.SolutionSummary as b
   on a.concomp = b.concomp;
quit;
data want;
   set tmp sascas1.have(where=(col2 eq ''));
   if(hasCycle > 0) then flag="NotOk"; else flag="Ok";
run;
proc print data=want; run;

Transitive Dependency Finder using code

Re: Transitive Dependency Finder using code

Re: Transitive Dependency Finder using code

Re: Transitive Dependency Finder using code

Re: Transitive Dependency Finder using code

Re: Transitive Dependency Finder using code

Re: Transitive Dependency Finder using code

Re: Transitive Dependency Finder using code

Re: Transitive Dependency Finder using code

Re: Transitive Dependency Finder using code

Re: Transitive Dependency Finder using code

Re: Transitive Dependency Finder using code

Re: Transitive Dependency Finder using code

Re: Transitive Dependency Finder using code

Re: Transitive Dependency Finder using code

Re: Transitive Dependency Finder using code