Hi ,
I have table which has flow name , job name and dependency as columns, I need to find if same job is present in any of the dependency column.
Below is the Sample
Flow Name | Job Name | Dependency |
CE_B3_STACK | 90-IDM-STK_FMTS-00001 | |
CE_B3_STACK | 91-IDM-STK_FW-00001 | |
CE_B3_STACK | 92-IDM-STK_PRTY-00001 | 91-IDM-STK_FW-00001 |
CE_B3_STACK | 93-IDM-STK_PRXY-00001 | 91-IDM-STK_FW-00001|92-IDM-STK_PRTY-00001 |
CE_B3_STACK | 94-IDM-STK_B3_ASSET_CLASS-00001 | 90-IDM-STK_FMTS-00001|92-IDM-STK_PRTY-00001|93-IDM-STK_PRXY-00001 |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00001 | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00002 | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00003 | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00004 | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00005 | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00006 | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00007 | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00008 | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00009 | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00010 | |
CE_BSE_CUST_SGB | 21-BSE-CUST_SGB-00001 | 20-BSE-CUST_SGB-00001|20-BSE-CUST_SGB-00002|20-BSE-CUST_SGB-00003|20-BSE-CUST_SGB-00004|20-BSE-CUST_SGB-00005|20-BSE-CUST_SGB-00006|20-BSE-CUST_SGB-00007|20-BSE-CUST_SGB-00008 |
Flow CE_B3_STACK is perfectly alright. However, CE_BSE_CUST_SGB has 2 issues. It has job name
21-BSE-CUST_SGB-00001 entry twice and also 21-BSE-CUST_SGB-00001 job shows it is dependent on the same job 21-BSE-CUST_SGB-00001.
So I need to know and flag out such entries where job is dependent on same job within a flow . Something like below output is expected ..
Flow Name | Job Name | Dependency | Flag |
CE_B3_STACK | 90-IDM-STK_FMTS-00001 | ok | |
CE_B3_STACK | 91-IDM-STK_FW-00001 | ||
CE_B3_STACK | 92-IDM-STK_PRTY-00001 | 91-IDM-STK_FW-00001 | ok |
CE_B3_STACK | 93-IDM-STK_PRXY-00001 | 91-IDM-STK_FW-00001|92-IDM-STK_PRTY-00001 | ok |
CE_B3_STACK | 94-IDM-STK_B3_ASSET_CLASS-00001 | 90-IDM-STK_FMTS-00001|92-IDM-STK_PRTY-00001|93-IDM-STK_PRXY-00001 | ok |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00001 | Not ok | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00002 | Not ok | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00003 | Not ok | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00004 | Not ok | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00005 | Not ok | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00006 | Not ok | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00007 | Not ok | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00008 | Not ok | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00009 | Not ok | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00010 | Not ok | |
CE_BSE_CUST_SGB | 21-BSE-CUST_SGB-00001 | 20-BSE-CUST_SGB-00001|20-BSE-CUST_SGB-00002|20-BSE-CUST_SGB-00003|20-BSE-CUST_SGB-00004|20-BSE-CUST_SGB-00005|20-BSE-CUST_SGB-00006|20-BSE-CUST_SGB-00007|20-BSE-CUST_SGB-00008 | Not ok |
Any help is really appreciated
See this code:
data have;
infile datalines delimiter=",";
input Flow_Name: $256. Job_Name: $256. Dependency: $1024.;
cards;
CE_B3_STACK,90-IDM-STK_FMTS-00001,
CE_B3_STACK,91-IDM-STK_FW-00001,
CE_B3_STACK,92-IDM-STK_PRTY-00001,91-IDM-STK_FW-00001
CE_B3_STACK,93-IDM-STK_PRXY-00001,91-IDM-STK_FW-00001|92-IDM-STK_PRTY-00001
CE_B3_STACK,94-IDM-STK_B3_ASSET_CLASS-00001,90-IDM-STK_FMTS-00001|92-IDM-STK_PRTY-00001|93-IDM-STK_PRXY-00001
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00001,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00002,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00003,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00004,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00005,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00006,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00007,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00008,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00009,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00010,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00001,20-BSE-CUST_SGB-00001|20-BSE-CUST_SGB-00002|20-BSE-CUST_SGB-00003|20-BSE-CUST_SGB-00004|20-BSE-CUST_SGB-00005|20-BSE-CUST_SGB-00006|20-BSE-CUST_SGB-00007|20-BSE-CUST_SGB-00008
CE_BSE_CUST_SGB,21-BSE-CUST_SGB-00001,20-BSE-CUST_SGB-00001|20-BSE-CUST_SGB-00002|20-BSE-CUST_SGB-00003|20-BSE-CUST_SGB-00004|20-BSE-CUST_SGB-00005|20-BSE-CUST_SGB-00006|20-BSE-CUST_SGB-00007|20-BSE-CUST_SGB-00008|XXX
;
/* find duplicate job_names */
proc sort
data=have
out=dedup /* so we do not destroy "have" */
dupout=dups (keep=flow_name job_name)
nodupkey
;
by flow_name job_name;
run;
/* find missing job_names and recursive definitions */
data recursive_miss (keep=flow_name flag);
set have;
by flow_name;
if _n_ = 1
then do;
declare hash f ();
f.definekey("job_name");
f.definedone();
end;
if first.flow_name then rc = f.clear();
rc = f.add();
if findw(dependency,job_name,"|","t")
then do;
flag = "recursive";
output;
end;
else do i = 1 to countw(dependency,"|","t");
if f.find(key:scan(dependency,i,"|")) ne 0
then do;
flag = "missing";
output;
leave;
end;
end;
run;
data n_ok (keep=flow_name);
set
dups
recursive_miss
;
by flow_name;
if first.flow_name;
run;
data want;
merge
have
n_ok (in=n)
;
by flow_name;
flag = ifc(n,"not ok","ok");
run;
I corrected the last dependency from your example data so that a duplicate and recursive job_name is actually created, and then added an additional observation to create a "missing" job.
Please supply example data in usable form (correctly working data step with datalines) so we have something to easily play around with.
Without that, I can only give suggestions:
First, check for doubles in Job Name by sorting (BY "Flow Name" "Job Name") and using the NODUPKEY and DUPOUT= option.
Next, expand the Dependency into separate observations, so you can do a join with the Job Name column to see if there is a match for all entries.
See this code:
data have;
infile datalines delimiter=",";
input Flow_Name: $256. Job_Name: $256. Dependency: $1024.;
cards;
CE_B3_STACK,90-IDM-STK_FMTS-00001,
CE_B3_STACK,91-IDM-STK_FW-00001,
CE_B3_STACK,92-IDM-STK_PRTY-00001,91-IDM-STK_FW-00001
CE_B3_STACK,93-IDM-STK_PRXY-00001,91-IDM-STK_FW-00001|92-IDM-STK_PRTY-00001
CE_B3_STACK,94-IDM-STK_B3_ASSET_CLASS-00001,90-IDM-STK_FMTS-00001|92-IDM-STK_PRTY-00001|93-IDM-STK_PRXY-00001
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00001,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00002,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00003,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00004,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00005,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00006,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00007,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00008,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00009,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00010,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00001,20-BSE-CUST_SGB-00001|20-BSE-CUST_SGB-00002|20-BSE-CUST_SGB-00003|20-BSE-CUST_SGB-00004|20-BSE-CUST_SGB-00005|20-BSE-CUST_SGB-00006|20-BSE-CUST_SGB-00007|20-BSE-CUST_SGB-00008
CE_BSE_CUST_SGB,21-BSE-CUST_SGB-00001,20-BSE-CUST_SGB-00001|20-BSE-CUST_SGB-00002|20-BSE-CUST_SGB-00003|20-BSE-CUST_SGB-00004|20-BSE-CUST_SGB-00005|20-BSE-CUST_SGB-00006|20-BSE-CUST_SGB-00007|20-BSE-CUST_SGB-00008|XXX
;
/* find duplicate job_names */
proc sort
data=have
out=dedup /* so we do not destroy "have" */
dupout=dups (keep=flow_name job_name)
nodupkey
;
by flow_name job_name;
run;
/* find missing job_names and recursive definitions */
data recursive_miss (keep=flow_name flag);
set have;
by flow_name;
if _n_ = 1
then do;
declare hash f ();
f.definekey("job_name");
f.definedone();
end;
if first.flow_name then rc = f.clear();
rc = f.add();
if findw(dependency,job_name,"|","t")
then do;
flag = "recursive";
output;
end;
else do i = 1 to countw(dependency,"|","t");
if f.find(key:scan(dependency,i,"|")) ne 0
then do;
flag = "missing";
output;
leave;
end;
end;
run;
data n_ok (keep=flow_name);
set
dups
recursive_miss
;
by flow_name;
if first.flow_name;
run;
data want;
merge
have
n_ok (in=n)
;
by flow_name;
flag = ifc(n,"not ok","ok");
run;
I corrected the last dependency from your example data so that a duplicate and recursive job_name is actually created, and then added an additional observation to create a "missing" job.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.