- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Community Members.
I have a network analysis problem.
The background is as follows: We have a lot of complicated DI Studio jobs creating multiple output tables. They are a mess of crossing arrows and very difficult to maintain, because the full canvas cannot fit into one screen without scrolling both vertical and horisontal. So we want to split them into smaller jobs if they contain independent structures. A structure is the set of transformations (steps) that contribute to one output table, and all structures that shares at least one transformation will go into one new job.
I have extracted metadata and created the "have" table as input to the analysis. (theoretical example, the real data set has more that 200.000 obs). So far there are no problems.
data have;
input JobID$ Structure TransID$;
datalines;
A 1 T11
A 1 T6
A 1 T1
A 2 T16
A 2 T12
A 2 T8
A 2 T7
A 2 T2
A 2 T1
A 3 T13
A 3 T9
A 3 T4
A 3 T3
A 4 T17
A 4 T14
A 4 T9
A 4 T4
A 4 T3
A 5 T18
A 5 T15
A 5 T10
A 5 T5
;
The next step is to identify independent structures in this list. To simplify the problem, the transformations can be left out, because they can be merged back after analysis:
proc sql;
create table w2 as
select distinct a.JobID, a.Structure, b.Structure as ShareStructure
from have as a
left join have as b
on a.JobID = b.JobID and a.TransID = b.TransID;
quit;
The structure 5 occurs only once, so it could be an independent job, but how to group structures 1-2 and 3-4 as two independent jobs? - there can be up to 50 structures in a job. I am stuck there, though it seems a trivial problem, and I hope somebody can help with a solution that doesn't require products not found in Foundation.
Tre result should be this:
data want;
input JobID$ Structure NewJobID;
datalines;
A 1 1
A 2 1
A 3 2
A 4 2
A 5 3
;
Thanks!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data temp;
input JobID$ Structure TransID$;
datalines;
A 1 T11
A 1 T6
A 1 T1
A 2 T16
A 2 T12
A 2 T8
A 2 T7
A 2 T2
A 2 T1
A 3 T13
A 3 T9
A 3 T4
A 3 T3
A 4 T17
A 4 T14
A 4 T9
A 4 T4
A 4 T3
A 5 T18
A 5 T15
A 5 T10
A 5 T5
;
proc sql;
create table w2 as
select distinct a.JobID, a.Structure, b.Structure as ShareStructure
from temp as a
left join temp as b
on a.JobID = b.JobID and a.TransID = b.TransID;
quit;
data have;
set w2(rename=(Structure=from ShareStructure=to));
run;
data full;
set have end=last;
if _n_ eq 1 then do;
declare hash h();
h.definekey('node');
h.definedata('node');
h.definedone();
end;
output;
node=from; h.replace();
from=to; to=node;
output;
node=from; h.replace();
if last then h.output(dataset:'node');
drop node;
run;
data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();
if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
from_to.definekey('from');
from_to.definedata('to');
from_to.definedone();
if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
no.definekey('node');
no.definedata('node');
no.definedone();
do while(hi_no.next()=0);
household+1; output;
count=1;
key=node;_ha.add();
last=node;ha.add();
rc=hi.first();
do while(rc=0);
from=last;rx=from_to.find();
do while(rx=0);
key=to;ry=_ha.check();
if ry ne 0 then do;
node=to;output;rr=no.remove(key:node);
key=to;_ha.add();
count+1;
last=to;ha.add();
end;
rx=from_to.find_next();
end;
rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Just to be sure, you don't have an OR license right?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Right. Only Base, Graph and Stat + some acess modules.
I think it can be done with a two-dimensional array to hold Structures in one dimension and all shared structures in the other dimension within a job and a lot of array transversing, but that is complicated to build. So I hope somebody smarter than me can suggest a simple and elegant solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data temp;
input JobID$ Structure TransID$;
datalines;
A 1 T11
A 1 T6
A 1 T1
A 2 T16
A 2 T12
A 2 T8
A 2 T7
A 2 T2
A 2 T1
A 3 T13
A 3 T9
A 3 T4
A 3 T3
A 4 T17
A 4 T14
A 4 T9
A 4 T4
A 4 T3
A 5 T18
A 5 T15
A 5 T10
A 5 T5
;
proc sql;
create table w2 as
select distinct a.JobID, a.Structure, b.Structure as ShareStructure
from temp as a
left join temp as b
on a.JobID = b.JobID and a.TransID = b.TransID;
quit;
data have;
set w2(rename=(Structure=from ShareStructure=to));
run;
data full;
set have end=last;
if _n_ eq 1 then do;
declare hash h();
h.definekey('node');
h.definedata('node');
h.definedone();
end;
output;
node=from; h.replace();
from=to; to=node;
output;
node=from; h.replace();
if last then h.output(dataset:'node');
drop node;
run;
data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();
if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
from_to.definekey('from');
from_to.definedata('to');
from_to.definedone();
if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
no.definekey('node');
no.definedata('node');
no.definedone();
do while(hi_no.next()=0);
household+1; output;
count=1;
key=node;_ha.add();
last=node;ha.add();
rc=hi.first();
do while(rc=0);
from=last;rx=from_to.find();
do while(rx=0);
key=to;ry=_ha.check();
if ry ne 0 then do;
node=to;output;rr=no.remove(key:node);
key=to;_ha.add();
count+1;
last=to;ha.add();
end;
rx=from_to.find_next();
end;
rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Ksharp
Thank you very much.
It works perfect for one Job. Due to my poor understanding of the hash object i couldn't figure out how to obtain a separate analysis per JobID and get the JobID through the code to output. Bot I found a workaround by using concatenated keys, so my problem is solved.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
"Bot I found a workaround by using concatenated keys, so my problem is solved."
Yes. you are right , concate JobID and Structure , as your wish :
data have; set w2; length from to $ 80; from=catx('|',JobID,Structure); to=catx('|',JobID,ShareStructure); keep from to; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks again!
Up and running now with a weekly rapport. it seems the time has come to learn the use of hash objects , as I have a bad feeling about using code I don't understand. I also never got round to DS2 or Proc Fedsql, Maybe they too were worth looking into...
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Me too. I would test it far more enough before using it.
" I also never got round to DS2 or Proc Fedsql, Maybe they too were worth looking into..."
Me too. I never learned DS2 or Fedsql either, and don't know how to judge them .But if you are in SAS Viya environment, maybe you need them.
BTW, Hash Table is a must-have tool for sas programmer .