BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Swapnil_21
Obsidian | Level 7

Hi,

 

Its plain transitive dependency finder task for me but somehow I am not able to generate the proper code

E.g. A depends on B and B depends on A so this combination is transitive . C depends on D , D depends on E and F , E depends on C so this combination is also transitive. So I need to flag out such records irrespective of how much is the depth

Below is the sample data 


data have;
input col1: $1. col2: $1.;
datalines;
A B
P Q
B A
C D
D E
D F
E C

;
run;

 

 

Below is the output I want

col1 col2 flag
A B NotOk
P Q Ok
B A NotOk
C D NotOk
D E NotOk
D F NotOk
E C NotOk
R   Ok

 

Any help is really appreciated

1 ACCEPTED SOLUTION

Accepted Solutions
RobPratt
SAS Super FREQ

Here's an approach that uses the network solver in PROC OPTMODEL:

data have;
   infile datalines truncover;
   input col1: $1. col2: $1.;
   datalines;
A B
P Q
B A
C D
D E
D F
E C
R 
;

proc optmodel printlevel=0;
   /* read input data */
   set <str,str> LINKS_ORIG;
   read data have into LINKS_ORIG=[col1 col2];
   set <str> NODES;
   NODES = (union {<i,j> in LINKS_ORIG} {i,j}) diff {''};
   set <str,str> LINKS;
   LINKS = {<i,j> in LINKS_ORIG: {i,j} within NODES};

   /* find weakly connected components */
   num component {NODES};
   solve with network / concomp direction=undirected nodes=(include=NODES) links=(include=LINKS) out=(concomp=component);
   set COMPONENTS;
   COMPONENTS = 1.._OROPTMODEL_NUM_['NUM_COMPONENTS'];
   set <str> NODES_c {COMPONENTS} init {};
   num compThis;
   for {i in NODES} do;
      compThis = component[i];
      NODES_c[compThis] = NODES_c[compThis] union {i};
   end;

   /* for each component, find a directed cycle if possible */
   num okNode {NODES} init 1;
   cofor {c in COMPONENTS: card(NODES_c[c]) > 1} do;
      put c=;
      solve with network / cycle direction=directed links=(include=LINKS) subgraph=(nodes=NODES_c[c]);
      if _OROPTMODEL_NUM_['NUM_CYCLES'] then for {i in NODES_c[c]} okNode[i] = 0;
   end;

   /* create output data set */
   create data want from [col1 col2]=LINKS_ORIG flag=(if okNode[col1] then 'Ok' else 'NotOk');
quit;

View solution in original post

14 REPLIES 14
PeterClemmensen
Tourmaline | Level 20

I can't quite figure out the logic here.

 

  • A points to B and B points to A --> 'Not Ok'. 
  • P points to Q, but Q does not point back to P --> 'Ok'
  • C points to D, D points to both E and F. Only E points back to C --> 'Not Ok'. 

How is the last case different from the second? There is still a loose 'end', as F does not point back to C?

Swapnil_21
Obsidian | Level 7
It means if any of the element depends back to parent then it's transitive dependency. So third case has one element which points back to parent so it's not OK. First case does not have any dependency or missing dependency so it's OK
PeterClemmensen
Tourmaline | Level 20

Also, do you have SAS/OR?

Swapnil_21
Obsidian | Level 7
No we don't have SAS/OR
PeterClemmensen
Tourmaline | Level 20

The best approach is to use one of the Opt Procedures in SAS/OR to solve the problem. @Rick_SAS provided links to examples. 

 

If you do not have access to SAS/OR, the code written by @PGStats or @Ksharp in this link can probably be tweaked to solve the problem. 

Rick_SAS
SAS Super FREQ

Seem like each letter represents a node in a directed graph, and the rows indicate directed edges (or links). You want to find out if there is a cycle from one node back to itself by traversing other nodes.

 

The OPTGRAPH procedure in SAS/OR can solve these types of problems. It looks like there is a problem with the SAS documentation pages this morning, but you should be able (eventually) to get to the example from this page:

SAS OPTGRAPH Procedure | SAS Support

Here is a link to a very old (SAS/OR 14.1) page that shows how to perform transitive closure operations: Example 1.13 Transitive Closure for Identification of Circular Dependencies in a Bug Tracking System...

 

Matthew_Galati
SAS Employee

Just a clarification on Rick's comment. PROC OPTGRAPH is not part of SAS/OR, but PROC OPTNET is:

https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.3/ornoaug/ornoaug_optnet_details40.htm

 

You can also get transitive closure, cycle, and connected components in Viya in PROC NETWORK (in Visual Data Mining and Machine Learning) or PROC OPTNETWORK (in Optimization).

PeterClemmensen
Tourmaline | Level 20

@Matthew_Galati , learned something new. Thanks! 🙂

Ksharp
Super User

You want find a dead loop / circle component?

Here is an example .

 




data have;
infile datalines truncover;
input _start : $1. _end: $1.;
datalines;
A B
P Q
B A
C D
D E
D F
E C
R 
;
run;


data _null_;
if _n_ eq 1 then do;
length path _path  $ 800 ;

if 0 then set have;
declare hash ha(hashexp:20,dataset:'have(where=(_start is not missing and _end is not missing))',multidata:'y');
ha.definekey('_start');
ha.definedata('_end');
ha.definedone();

declare hash pa(ordered:'y');
declare hiter hi_path('pa');
pa.definekey('n');
pa.definedata('n','path');
pa.definedone();

end;

set have(where=(_start is not missing and _end is not missing));
count=1;n=1;_n=1;
path=catx(' ',_start,_end);
pa.add();
do while(hi_path.next()=0);

if n ne 1 then pa.remove(key:_n);_n=n;

_path=path;   
_start=scan(path,-1,' ');
rc=ha.find();
do while(rc=0);
  if not findw(path,strip(_end)) then do;

   if length(path)+length(_end)+1 gt lengthc(path) then do; put path= _end=;
    putlog 'ERROR: The length of path and _path are set too short';
    stop;
   end;
                                      count+1;n=count;
                                      path=catx(' ',path,_end);
                                      pa.add();
                                      path=_path;
                                     end;
     else do;put path= _end=;putlog 'FOUND:' _end 'is a circle component.';end;
  rc=ha.find_next();
end;
end;
pa.clear();
run;
NOTE: 从数据集 WORK.HAVE. 读取了 7 个观测
      WHERE (_start is not null) and (_end is not null);
path=A B _end=A
FOUND:A is a circle component.
path=B A _end=B
FOUND:B is a circle component.
path=C D E _end=C
FOUND:C is a circle component.
path=D E C _end=D
FOUND:D is a circle component.
path=E C D _end=E
FOUND:E is a circle component.
NOTE: 从数据集 WORK.HAVE. 读取了 7 个观测
      WHERE (_start is not null) and (_end is not null);
NOTE: “DATA 语句”所用时间(总处理时间):
      实际时间          0.19 秒
      CPU 时间          0.12 秒

Swapnil_21
Obsidian | Level 7

@Ksharp  

The code works correctly as per requirement for given sample but it doesnt work for the actual data. Attached is the actual dataset. I should have flag for 21-BSE-CUST_WBC-00001 

However, it doesnt show anything . Can you please check where I am doing wrong 

Ksharp
Super User
I check RobPrat's code result . And found they are all 'Ok'. That means your data don't have any circle component.

You don't see anything in LOG ,that means your data don't have any circle component.(a.k.a My code is right).

if you think 21-BSE-CUST_WBC-00001  is a circle component. Could you show the path in your data ?
RobPratt
SAS Super FREQ

Here's an approach that uses the network solver in PROC OPTMODEL:

data have;
   infile datalines truncover;
   input col1: $1. col2: $1.;
   datalines;
A B
P Q
B A
C D
D E
D F
E C
R 
;

proc optmodel printlevel=0;
   /* read input data */
   set <str,str> LINKS_ORIG;
   read data have into LINKS_ORIG=[col1 col2];
   set <str> NODES;
   NODES = (union {<i,j> in LINKS_ORIG} {i,j}) diff {''};
   set <str,str> LINKS;
   LINKS = {<i,j> in LINKS_ORIG: {i,j} within NODES};

   /* find weakly connected components */
   num component {NODES};
   solve with network / concomp direction=undirected nodes=(include=NODES) links=(include=LINKS) out=(concomp=component);
   set COMPONENTS;
   COMPONENTS = 1.._OROPTMODEL_NUM_['NUM_COMPONENTS'];
   set <str> NODES_c {COMPONENTS} init {};
   num compThis;
   for {i in NODES} do;
      compThis = component[i];
      NODES_c[compThis] = NODES_c[compThis] union {i};
   end;

   /* for each component, find a directed cycle if possible */
   num okNode {NODES} init 1;
   cofor {c in COMPONENTS: card(NODES_c[c]) > 1} do;
      put c=;
      solve with network / cycle direction=directed links=(include=LINKS) subgraph=(nodes=NODES_c[c]);
      if _OROPTMODEL_NUM_['NUM_CYCLES'] then for {i in NODES_c[c]} okNode[i] = 0;
   end;

   /* create output data set */
   create data want from [col1 col2]=LINKS_ORIG flag=(if okNode[col1] then 'Ok' else 'NotOk');
quit;
Swapnil_21
Obsidian | Level 7
@RobPratt.. u r GOD... super awesomely working.....
Matthew_Galati
SAS Employee

Here is an approach that uses Viya's OPTNETWORK to solve it. This uses the same algorithms that @RobPratt shows through the OPTMODEL interface. It finds the connected components and then runs cycle detection by group (where group = connected component).

For large scale this approach would scale nicely as it does all the heavy work on the server (either shared or distributed memory).

 

/* Find weakly connected components (ignore singletons) */
proc optnetwork
   links    = sascas1.have(where=(col2 ne ''))
   outLinks = sascas1.ConCompLinks;
   connectedComponents;
   linksVar from=col1 to=col2;
run;

/* Determine if the component is directed acyclic. */ 
proc optnetwork
   direction = directed
   links     = sascas1.ConCompLinks;
   linksVar from=col1 to=col2;
   cycle;
   displayout
      SolutionSummary = SolutionSummary;
   by concomp;
run;

/* For each component that is not acyclic, its nodes are NotOk (and append singletons as Ok). */
proc sql;
   create table tmp as
   select a.*, b.numCycles as hasCycle
   from sascas1.ConCompLinks as a left join sascas1.SolutionSummary as b
   on a.concomp = b.concomp;
quit;
data want;
   set tmp sascas1.have(where=(col2 eq ''));
   if(hasCycle > 0) then flag="NotOk"; else flag="Ok";
run;
proc print data=want; run;

 

 

 

 

 

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 14 replies
  • 1258 views
  • 6 likes
  • 6 in conversation