Hi,
Its plain transitive dependency finder task for me but somehow I am not able to generate the proper code
E.g. A depends on B and B depends on A so this combination is transitive . C depends on D , D depends on E and F , E depends on C so this combination is also transitive. So I need to flag out such records irrespective of how much is the depth
Below is the sample data
data have;
input col1: $1. col2: $1.;
datalines;
A B
P Q
B A
C D
D E
D F
E C
R
;
run;
Below is the output I want
col1 | col2 | flag |
A | B | NotOk |
P | Q | Ok |
B | A | NotOk |
C | D | NotOk |
D | E | NotOk |
D | F | NotOk |
E | C | NotOk |
R | Ok |
Any help is really appreciated
Here's an approach that uses the network solver in PROC OPTMODEL:
data have;
infile datalines truncover;
input col1: $1. col2: $1.;
datalines;
A B
P Q
B A
C D
D E
D F
E C
R
;
proc optmodel printlevel=0;
/* read input data */
set <str,str> LINKS_ORIG;
read data have into LINKS_ORIG=[col1 col2];
set <str> NODES;
NODES = (union {<i,j> in LINKS_ORIG} {i,j}) diff {''};
set <str,str> LINKS;
LINKS = {<i,j> in LINKS_ORIG: {i,j} within NODES};
/* find weakly connected components */
num component {NODES};
solve with network / concomp direction=undirected nodes=(include=NODES) links=(include=LINKS) out=(concomp=component);
set COMPONENTS;
COMPONENTS = 1.._OROPTMODEL_NUM_['NUM_COMPONENTS'];
set <str> NODES_c {COMPONENTS} init {};
num compThis;
for {i in NODES} do;
compThis = component[i];
NODES_c[compThis] = NODES_c[compThis] union {i};
end;
/* for each component, find a directed cycle if possible */
num okNode {NODES} init 1;
cofor {c in COMPONENTS: card(NODES_c[c]) > 1} do;
put c=;
solve with network / cycle direction=directed links=(include=LINKS) subgraph=(nodes=NODES_c[c]);
if _OROPTMODEL_NUM_['NUM_CYCLES'] then for {i in NODES_c[c]} okNode[i] = 0;
end;
/* create output data set */
create data want from [col1 col2]=LINKS_ORIG flag=(if okNode[col1] then 'Ok' else 'NotOk');
quit;
I can't quite figure out the logic here.
How is the last case different from the second? There is still a loose 'end', as F does not point back to C?
Also, do you have SAS/OR?
Seem like each letter represents a node in a directed graph, and the rows indicate directed edges (or links). You want to find out if there is a cycle from one node back to itself by traversing other nodes.
The OPTGRAPH procedure in SAS/OR can solve these types of problems. It looks like there is a problem with the SAS documentation pages this morning, but you should be able (eventually) to get to the example from this page:
SAS OPTGRAPH Procedure | SAS Support
Here is a link to a very old (SAS/OR 14.1) page that shows how to perform transitive closure operations: Example 1.13 Transitive Closure for Identification of Circular Dependencies in a Bug Tracking System...
Just a clarification on Rick's comment. PROC OPTGRAPH is not part of SAS/OR, but PROC OPTNET is:
https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.3/ornoaug/ornoaug_optnet_details40.htm
You can also get transitive closure, cycle, and connected components in Viya in PROC NETWORK (in Visual Data Mining and Machine Learning) or PROC OPTNETWORK (in Optimization).
@Matthew_Galati , learned something new. Thanks! 🙂
You want find a dead loop / circle component?
Here is an example .
data have;
infile datalines truncover;
input _start : $1. _end: $1.;
datalines;
A B
P Q
B A
C D
D E
D F
E C
R
;
run;
data _null_;
if _n_ eq 1 then do;
length path _path $ 800 ;
if 0 then set have;
declare hash ha(hashexp:20,dataset:'have(where=(_start is not missing and _end is not missing))',multidata:'y');
ha.definekey('_start');
ha.definedata('_end');
ha.definedone();
declare hash pa(ordered:'y');
declare hiter hi_path('pa');
pa.definekey('n');
pa.definedata('n','path');
pa.definedone();
end;
set have(where=(_start is not missing and _end is not missing));
count=1;n=1;_n=1;
path=catx(' ',_start,_end);
pa.add();
do while(hi_path.next()=0);
if n ne 1 then pa.remove(key:_n);_n=n;
_path=path;
_start=scan(path,-1,' ');
rc=ha.find();
do while(rc=0);
if not findw(path,strip(_end)) then do;
if length(path)+length(_end)+1 gt lengthc(path) then do; put path= _end=;
putlog 'ERROR: The length of path and _path are set too short';
stop;
end;
count+1;n=count;
path=catx(' ',path,_end);
pa.add();
path=_path;
end;
else do;put path= _end=;putlog 'FOUND:' _end 'is a circle component.';end;
rc=ha.find_next();
end;
end;
pa.clear();
run;
NOTE: 从数据集 WORK.HAVE. 读取了 7 个观测 WHERE (_start is not null) and (_end is not null); path=A B _end=A FOUND:A is a circle component. path=B A _end=B FOUND:B is a circle component. path=C D E _end=C FOUND:C is a circle component. path=D E C _end=D FOUND:D is a circle component. path=E C D _end=E FOUND:E is a circle component. NOTE: 从数据集 WORK.HAVE. 读取了 7 个观测 WHERE (_start is not null) and (_end is not null); NOTE: “DATA 语句”所用时间(总处理时间): 实际时间 0.19 秒 CPU 时间 0.12 秒
The code works correctly as per requirement for given sample but it doesnt work for the actual data. Attached is the actual dataset. I should have flag for 21-BSE-CUST_WBC-00001
However, it doesnt show anything . Can you please check where I am doing wrong
I check RobPrat's code result . And found they are all 'Ok'. That means your data don't have any circle component. You don't see anything in LOG ,that means your data don't have any circle component.(a.k.a My code is right). if you think 21-BSE-CUST_WBC-00001 is a circle component. Could you show the path in your data ?
Here's an approach that uses the network solver in PROC OPTMODEL:
data have;
infile datalines truncover;
input col1: $1. col2: $1.;
datalines;
A B
P Q
B A
C D
D E
D F
E C
R
;
proc optmodel printlevel=0;
/* read input data */
set <str,str> LINKS_ORIG;
read data have into LINKS_ORIG=[col1 col2];
set <str> NODES;
NODES = (union {<i,j> in LINKS_ORIG} {i,j}) diff {''};
set <str,str> LINKS;
LINKS = {<i,j> in LINKS_ORIG: {i,j} within NODES};
/* find weakly connected components */
num component {NODES};
solve with network / concomp direction=undirected nodes=(include=NODES) links=(include=LINKS) out=(concomp=component);
set COMPONENTS;
COMPONENTS = 1.._OROPTMODEL_NUM_['NUM_COMPONENTS'];
set <str> NODES_c {COMPONENTS} init {};
num compThis;
for {i in NODES} do;
compThis = component[i];
NODES_c[compThis] = NODES_c[compThis] union {i};
end;
/* for each component, find a directed cycle if possible */
num okNode {NODES} init 1;
cofor {c in COMPONENTS: card(NODES_c[c]) > 1} do;
put c=;
solve with network / cycle direction=directed links=(include=LINKS) subgraph=(nodes=NODES_c[c]);
if _OROPTMODEL_NUM_['NUM_CYCLES'] then for {i in NODES_c[c]} okNode[i] = 0;
end;
/* create output data set */
create data want from [col1 col2]=LINKS_ORIG flag=(if okNode[col1] then 'Ok' else 'NotOk');
quit;
Here is an approach that uses Viya's OPTNETWORK to solve it. This uses the same algorithms that @RobPratt shows through the OPTMODEL interface. It finds the connected components and then runs cycle detection by group (where group = connected component).
For large scale this approach would scale nicely as it does all the heavy work on the server (either shared or distributed memory).
/* Find weakly connected components (ignore singletons) */
proc optnetwork
links = sascas1.have(where=(col2 ne ''))
outLinks = sascas1.ConCompLinks;
connectedComponents;
linksVar from=col1 to=col2;
run;
/* Determine if the component is directed acyclic. */
proc optnetwork
direction = directed
links = sascas1.ConCompLinks;
linksVar from=col1 to=col2;
cycle;
displayout
SolutionSummary = SolutionSummary;
by concomp;
run;
/* For each component that is not acyclic, its nodes are NotOk (and append singletons as Ok). */
proc sql;
create table tmp as
select a.*, b.numCycles as hasCycle
from sascas1.ConCompLinks as a left join sascas1.SolutionSummary as b
on a.concomp = b.concomp;
quit;
data want;
set tmp sascas1.have(where=(col2 eq ''));
if(hasCycle > 0) then flag="NotOk"; else flag="Ok";
run;
proc print data=want; run;
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.