SAS Programming

ErikLund_Jensen · Posted 09-10-2022 05:00 AM

Hi Community Members.

I have a network analysis problem.

The background is as follows: We have a lot of complicated DI Studio jobs creating multiple output tables. They are a mess of crossing arrows and very difficult to maintain, because the full canvas cannot fit into one screen without scrolling both vertical and horisontal. So we want to split them into smaller jobs if they contain independent structures. A structure is the set of transformations (steps) that contribute to one output table, and all structures that shares at least one transformation will go into one new job.

I have extracted metadata and created the "have" table as input to the analysis. (theoretical example, the real data set has more that 200.000 obs). So far there are no problems.

data have;
input JobID$ Structure TransID$;
datalines;
A 1 T11
A 1 T6
A 1 T1
A 2 T16
A 2 T12
A 2 T8
A 2 T7 
A 2 T2
A 2 T1
A 3 T13
A 3 T9
A 3 T4
A 3 T3
A 4 T17
A 4 T14
A 4 T9
A 4 T4
A 4 T3
A 5 T18
A 5 T15
A 5 T10
A 5 T5
;

The next step is to identify independent structures in this list. To simplify the problem, the transformations can be left out, because they can be merged back after analysis:

proc sql;
  create table w2 as
    select distinct a.JobID, a.Structure, b.Structure as ShareStructure
    from have as a
    left join have as b
    on a.JobID = b.JobID and a.TransID = b.TransID;
quit;

The structure 5 occurs only once, so it could be an independent job, but how to group structures 1-2 and 3-4 as two independent jobs? - there can be up to 50 structures in a job. I am stuck there, though it seems a trivial problem, and I hope somebody can help with a solution that doesn't require products not found in Foundation.

Tre result should be this:

data want;
input JobID$ Structure NewJobID;
datalines;
A 1 1
A 2 1
A 3 2
A 4 2
A 5 3
;

Thanks!

Ksharp · Posted 09-10-2022 07:43 AM

data temp;
input JobID$ Structure TransID$;
datalines;
A 1 T11
A 1 T6
A 1 T1
A 2 T16
A 2 T12
A 2 T8
A 2 T7 
A 2 T2
A 2 T1
A 3 T13
A 3 T9
A 3 T4
A 3 T3
A 4 T17
A 4 T14
A 4 T9
A 4 T4
A 4 T3
A 5 T18
A 5 T15
A 5 T10
A 5 T5
;
proc sql;
  create table w2 as
    select distinct a.JobID, a.Structure, b.Structure as ShareStructure
    from temp as a
    left join temp as b
    on a.JobID = b.JobID and a.TransID = b.TransID;
quit;
















data have;
 set w2(rename=(Structure=from ShareStructure=to));
 run;

data full;
  set have end=last;
  if _n_ eq 1 then do;
   declare hash h();
    h.definekey('node');
     h.definedata('node');
     h.definedone();
  end;
  output;
  node=from; h.replace();
  from=to; to=node;
  output;
  node=from; h.replace();
  if last then h.output(dataset:'node');
  drop node;
run;


data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
 from_to.definekey('from');
 from_to.definedata('to');
 from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
 no.definekey('node');
 no.definedata('node');
 no.definedone();
 

do while(hi_no.next()=0);
 household+1; output;
 count=1;
 key=node;_ha.add();
 last=node;ha.add();
 rc=hi.first();
 do while(rc=0);
   from=last;rx=from_to.find();
   do while(rx=0);
     key=to;ry=_ha.check();
      if ry ne 0 then do;
       node=to;output;rr=no.remove(key:node);
       key=to;_ha.add();
       count+1;
       last=to;ha.add();
      end;
      rx=from_to.find_next();
   end;
   rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;

View solution in original post

PeterClemmensen · Posted 09-10-2022 05:37 AM

Just to be sure, you don't have an OR license right?

The DATA to DATA Step Macro
Blog: SASnrd

ErikLund_Jensen · Posted 09-10-2022 06:35 AM

@PeterClemmensen

Right. Only Base, Graph and Stat + some acess modules.

I think it can be done with a two-dimensional array to hold Structures in one dimension and all shared structures in the other dimension within a job and a lot of array transversing, but that is complicated to build. So I hope somebody smarter than me can suggest a simple and elegant solution.

Ksharp · Posted 09-10-2022 07:43 AM

data temp;
input JobID$ Structure TransID$;
datalines;
A 1 T11
A 1 T6
A 1 T1
A 2 T16
A 2 T12
A 2 T8
A 2 T7 
A 2 T2
A 2 T1
A 3 T13
A 3 T9
A 3 T4
A 3 T3
A 4 T17
A 4 T14
A 4 T9
A 4 T4
A 4 T3
A 5 T18
A 5 T15
A 5 T10
A 5 T5
;
proc sql;
  create table w2 as
    select distinct a.JobID, a.Structure, b.Structure as ShareStructure
    from temp as a
    left join temp as b
    on a.JobID = b.JobID and a.TransID = b.TransID;
quit;
















data have;
 set w2(rename=(Structure=from ShareStructure=to));
 run;

data full;
  set have end=last;
  if _n_ eq 1 then do;
   declare hash h();
    h.definekey('node');
     h.definedata('node');
     h.definedone();
  end;
  output;
  node=from; h.replace();
  from=to; to=node;
  output;
  node=from; h.replace();
  if last then h.output(dataset:'node');
  drop node;
run;


data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
 from_to.definekey('from');
 from_to.definedata('to');
 from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
 no.definekey('node');
 no.definedata('node');
 no.definedone();
 

do while(hi_no.next()=0);
 household+1; output;
 count=1;
 key=node;_ha.add();
 last=node;ha.add();
 rc=hi.first();
 do while(rc=0);
   from=last;rx=from_to.find();
   do while(rx=0);
     key=to;ry=_ha.check();
      if ry ne 0 then do;
       node=to;output;rr=no.remove(key:node);
       key=to;_ha.add();
       count+1;
       last=to;ha.add();
      end;
      rx=from_to.find_next();
   end;
   rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;

ErikLund_Jensen · Posted 09-10-2022 09:42 AM

Hi @Ksharp

Thank you very much.

It works perfect for one Job. Due to my poor understanding of the hash object i couldn't figure out how to obtain a separate analysis per JobID and get the JobID through the code to output. Bot I found a workaround by using concatenated keys, so my problem is solved.

Ksharp · Posted 09-11-2022 06:07 AM

"Bot I found a workaround by using concatenated keys, so my problem is solved."

Yes. you are right , concate JobID and Structure , as your wish :

data have;
 set w2;
 length from to $ 80;
 from=catx('|',JobID,Structure);
 to=catx('|',JobID,ShareStructure);
keep from to;
 run;

ErikLund_Jensen · Posted 09-11-2022 10:37 AM

@Ksharp

Thanks again!

Up and running now with a weekly rapport. it seems the time has come to learn the use of hash objects , as I have a bad feeling about using code I don't understand. I also never got round to DS2 or Proc Fedsql, Maybe they too were worth looking into...

Ksharp · Posted 09-12-2022 06:55 AM

" as I have a bad feeling about using code I don't understand. "
Me too. I would test it far more enough before using it.

" I also never got round to DS2 or Proc Fedsql, Maybe they too were worth looking into..."
Me too. I never learned DS2 or Fedsql either, and don't know how to judge them .But if you are in SAS Viya environment, maybe you need them.

BTW, Hash Table is a must-have tool for sas programmer .

SAS Programming

Network analysis problem - identifying independent structures

Re: Network analysis problem - identifying independent structures

Re: Network analysis problem - identifying independent structures

Re: Network analysis problem - identifying independent structures

Re: Network analysis problem - identifying independent structures

Re: Network analysis problem - identifying independent structures

Re: Network analysis problem - identifying independent structures

Re: Network analysis problem - identifying independent structures

Re: Network analysis problem - identifying independent structures

Using Network Analysis and Machine Learning to Identify Virus Spread T...

Chow Test for Structural Breaks

Using Network Analysis and Machine Learning to Identify Virus Spread T...

Time to Mix Up Your Analysis with MIXED Models

Network Analysis in SAS Visual Analytics: Part 3 of Biostats in the T...

Follow Us

What is...

SAS Programming

Join us for our biggest event of the year!

SAS Training: Just a Click Away

Follow Us

What is...