Hi Community Members.
I have a network analysis problem.
The background is as follows: We have a lot of complicated DI Studio jobs creating multiple output tables. They are a mess of crossing arrows and very difficult to maintain, because the full canvas cannot fit into one screen without scrolling both vertical and horisontal. So we want to split them into smaller jobs if they contain independent structures. A structure is the set of transformations (steps) that contribute to one output table, and all structures that shares at least one transformation will go into one new job.
I have extracted metadata and created the "have" table as input to the analysis. (theoretical example, the real data set has more that 200.000 obs). So far there are no problems.
data have;
input JobID$ Structure TransID$;
datalines;
A 1 T11
A 1 T6
A 1 T1
A 2 T16
A 2 T12
A 2 T8
A 2 T7
A 2 T2
A 2 T1
A 3 T13
A 3 T9
A 3 T4
A 3 T3
A 4 T17
A 4 T14
A 4 T9
A 4 T4
A 4 T3
A 5 T18
A 5 T15
A 5 T10
A 5 T5
;
The next step is to identify independent structures in this list. To simplify the problem, the transformations can be left out, because they can be merged back after analysis:
proc sql;
create table w2 as
select distinct a.JobID, a.Structure, b.Structure as ShareStructure
from have as a
left join have as b
on a.JobID = b.JobID and a.TransID = b.TransID;
quit;
The structure 5 occurs only once, so it could be an independent job, but how to group structures 1-2 and 3-4 as two independent jobs? - there can be up to 50 structures in a job. I am stuck there, though it seems a trivial problem, and I hope somebody can help with a solution that doesn't require products not found in Foundation.
Tre result should be this:
data want;
input JobID$ Structure NewJobID;
datalines;
A 1 1
A 2 1
A 3 2
A 4 2
A 5 3
;
Thanks!
data temp;
input JobID$ Structure TransID$;
datalines;
A 1 T11
A 1 T6
A 1 T1
A 2 T16
A 2 T12
A 2 T8
A 2 T7
A 2 T2
A 2 T1
A 3 T13
A 3 T9
A 3 T4
A 3 T3
A 4 T17
A 4 T14
A 4 T9
A 4 T4
A 4 T3
A 5 T18
A 5 T15
A 5 T10
A 5 T5
;
proc sql;
create table w2 as
select distinct a.JobID, a.Structure, b.Structure as ShareStructure
from temp as a
left join temp as b
on a.JobID = b.JobID and a.TransID = b.TransID;
quit;
data have;
set w2(rename=(Structure=from ShareStructure=to));
run;
data full;
set have end=last;
if _n_ eq 1 then do;
declare hash h();
h.definekey('node');
h.definedata('node');
h.definedone();
end;
output;
node=from; h.replace();
from=to; to=node;
output;
node=from; h.replace();
if last then h.output(dataset:'node');
drop node;
run;
data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();
if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
from_to.definekey('from');
from_to.definedata('to');
from_to.definedone();
if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
no.definekey('node');
no.definedata('node');
no.definedone();
do while(hi_no.next()=0);
household+1; output;
count=1;
key=node;_ha.add();
last=node;ha.add();
rc=hi.first();
do while(rc=0);
from=last;rx=from_to.find();
do while(rx=0);
key=to;ry=_ha.check();
if ry ne 0 then do;
node=to;output;rr=no.remove(key:node);
key=to;_ha.add();
count+1;
last=to;ha.add();
end;
rx=from_to.find_next();
end;
rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;
Just to be sure, you don't have an OR license right?
Right. Only Base, Graph and Stat + some acess modules.
I think it can be done with a two-dimensional array to hold Structures in one dimension and all shared structures in the other dimension within a job and a lot of array transversing, but that is complicated to build. So I hope somebody smarter than me can suggest a simple and elegant solution.
data temp;
input JobID$ Structure TransID$;
datalines;
A 1 T11
A 1 T6
A 1 T1
A 2 T16
A 2 T12
A 2 T8
A 2 T7
A 2 T2
A 2 T1
A 3 T13
A 3 T9
A 3 T4
A 3 T3
A 4 T17
A 4 T14
A 4 T9
A 4 T4
A 4 T3
A 5 T18
A 5 T15
A 5 T10
A 5 T5
;
proc sql;
create table w2 as
select distinct a.JobID, a.Structure, b.Structure as ShareStructure
from temp as a
left join temp as b
on a.JobID = b.JobID and a.TransID = b.TransID;
quit;
data have;
set w2(rename=(Structure=from ShareStructure=to));
run;
data full;
set have end=last;
if _n_ eq 1 then do;
declare hash h();
h.definekey('node');
h.definedata('node');
h.definedone();
end;
output;
node=from; h.replace();
from=to; to=node;
output;
node=from; h.replace();
if last then h.output(dataset:'node');
drop node;
run;
data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();
if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
from_to.definekey('from');
from_to.definedata('to');
from_to.definedone();
if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
no.definekey('node');
no.definedata('node');
no.definedone();
do while(hi_no.next()=0);
household+1; output;
count=1;
key=node;_ha.add();
last=node;ha.add();
rc=hi.first();
do while(rc=0);
from=last;rx=from_to.find();
do while(rx=0);
key=to;ry=_ha.check();
if ry ne 0 then do;
node=to;output;rr=no.remove(key:node);
key=to;_ha.add();
count+1;
last=to;ha.add();
end;
rx=from_to.find_next();
end;
rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;
Hi @Ksharp
Thank you very much.
It works perfect for one Job. Due to my poor understanding of the hash object i couldn't figure out how to obtain a separate analysis per JobID and get the JobID through the code to output. Bot I found a workaround by using concatenated keys, so my problem is solved.
"Bot I found a workaround by using concatenated keys, so my problem is solved."
Yes. you are right , concate JobID and Structure , as your wish :
data have; set w2; length from to $ 80; from=catx('|',JobID,Structure); to=catx('|',JobID,ShareStructure); keep from to; run;
Thanks again!
Up and running now with a weekly rapport. it seems the time has come to learn the use of hash objects , as I have a bad feeling about using code I don't understand. I also never got round to DS2 or Proc Fedsql, Maybe they too were worth looking into...
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.