Hi ,
I have table which has flow name , job name and dependency as columns, I need to find if same job is present in any of the dependency column.
Below is the Sample
Flow Name | Job Name | Dependency |
CE_B3_STACK | 90-IDM-STK_FMTS-00001 | |
CE_B3_STACK | 91-IDM-STK_FW-00001 | |
CE_B3_STACK | 92-IDM-STK_PRTY-00001 | 91-IDM-STK_FW-00001 |
CE_B3_STACK | 93-IDM-STK_PRXY-00001 | 91-IDM-STK_FW-00001|92-IDM-STK_PRTY-00001 |
CE_B3_STACK | 94-IDM-STK_B3_ASSET_CLASS-00001 | 90-IDM-STK_FMTS-00001|92-IDM-STK_PRTY-00001|93-IDM-STK_PRXY-00001 |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00001 | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00002 | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00003 | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00004 | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00005 | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00006 | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00007 | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00008 | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00009 | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00010 | |
CE_BSE_CUST_SGB | 21-BSE-CUST_SGB-00001 | 20-BSE-CUST_SGB-00001|20-BSE-CUST_SGB-00002|20-BSE-CUST_SGB-00003|20-BSE-CUST_SGB-00004|20-BSE-CUST_SGB-00005|20-BSE-CUST_SGB-00006|20-BSE-CUST_SGB-00007|20-BSE-CUST_SGB-00008 |
Flow CE_B3_STACK is perfectly alright. However, CE_BSE_CUST_SGB has 2 issues. It has job name
21-BSE-CUST_SGB-00001 entry twice and also 21-BSE-CUST_SGB-00001 job shows it is dependent on the same job 21-BSE-CUST_SGB-00001.
So I need to know and flag out such entries where job is dependent on same job within a flow . Something like below output is expected ..
Flow Name | Job Name | Dependency | Flag |
CE_B3_STACK | 90-IDM-STK_FMTS-00001 | ok | |
CE_B3_STACK | 91-IDM-STK_FW-00001 | ||
CE_B3_STACK | 92-IDM-STK_PRTY-00001 | 91-IDM-STK_FW-00001 | ok |
CE_B3_STACK | 93-IDM-STK_PRXY-00001 | 91-IDM-STK_FW-00001|92-IDM-STK_PRTY-00001 | ok |
CE_B3_STACK | 94-IDM-STK_B3_ASSET_CLASS-00001 | 90-IDM-STK_FMTS-00001|92-IDM-STK_PRTY-00001|93-IDM-STK_PRXY-00001 | ok |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00001 | Not ok | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00002 | Not ok | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00003 | Not ok | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00004 | Not ok | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00005 | Not ok | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00006 | Not ok | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00007 | Not ok | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00008 | Not ok | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00009 | Not ok | |
CE_BSE_CUST_SGB | 20-BSE-CUST_SGB-00010 | Not ok | |
CE_BSE_CUST_SGB | 21-BSE-CUST_SGB-00001 | 20-BSE-CUST_SGB-00001|20-BSE-CUST_SGB-00002|20-BSE-CUST_SGB-00003|20-BSE-CUST_SGB-00004|20-BSE-CUST_SGB-00005|20-BSE-CUST_SGB-00006|20-BSE-CUST_SGB-00007|20-BSE-CUST_SGB-00008 | Not ok |
Any help is really appreciated
See this code:
data have;
infile datalines delimiter=",";
input Flow_Name: $256. Job_Name: $256. Dependency: $1024.;
cards;
CE_B3_STACK,90-IDM-STK_FMTS-00001,
CE_B3_STACK,91-IDM-STK_FW-00001,
CE_B3_STACK,92-IDM-STK_PRTY-00001,91-IDM-STK_FW-00001
CE_B3_STACK,93-IDM-STK_PRXY-00001,91-IDM-STK_FW-00001|92-IDM-STK_PRTY-00001
CE_B3_STACK,94-IDM-STK_B3_ASSET_CLASS-00001,90-IDM-STK_FMTS-00001|92-IDM-STK_PRTY-00001|93-IDM-STK_PRXY-00001
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00001,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00002,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00003,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00004,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00005,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00006,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00007,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00008,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00009,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00010,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00001,20-BSE-CUST_SGB-00001|20-BSE-CUST_SGB-00002|20-BSE-CUST_SGB-00003|20-BSE-CUST_SGB-00004|20-BSE-CUST_SGB-00005|20-BSE-CUST_SGB-00006|20-BSE-CUST_SGB-00007|20-BSE-CUST_SGB-00008
CE_BSE_CUST_SGB,21-BSE-CUST_SGB-00001,20-BSE-CUST_SGB-00001|20-BSE-CUST_SGB-00002|20-BSE-CUST_SGB-00003|20-BSE-CUST_SGB-00004|20-BSE-CUST_SGB-00005|20-BSE-CUST_SGB-00006|20-BSE-CUST_SGB-00007|20-BSE-CUST_SGB-00008|XXX
;
/* find duplicate job_names */
proc sort
data=have
out=dedup /* so we do not destroy "have" */
dupout=dups (keep=flow_name job_name)
nodupkey
;
by flow_name job_name;
run;
/* find missing job_names and recursive definitions */
data recursive_miss (keep=flow_name flag);
set have;
by flow_name;
if _n_ = 1
then do;
declare hash f ();
f.definekey("job_name");
f.definedone();
end;
if first.flow_name then rc = f.clear();
rc = f.add();
if findw(dependency,job_name,"|","t")
then do;
flag = "recursive";
output;
end;
else do i = 1 to countw(dependency,"|","t");
if f.find(key:scan(dependency,i,"|")) ne 0
then do;
flag = "missing";
output;
leave;
end;
end;
run;
data n_ok (keep=flow_name);
set
dups
recursive_miss
;
by flow_name;
if first.flow_name;
run;
data want;
merge
have
n_ok (in=n)
;
by flow_name;
flag = ifc(n,"not ok","ok");
run;
I corrected the last dependency from your example data so that a duplicate and recursive job_name is actually created, and then added an additional observation to create a "missing" job.
Please supply example data in usable form (correctly working data step with datalines) so we have something to easily play around with.
Without that, I can only give suggestions:
First, check for doubles in Job Name by sorting (BY "Flow Name" "Job Name") and using the NODUPKEY and DUPOUT= option.
Next, expand the Dependency into separate observations, so you can do a join with the Job Name column to see if there is a match for all entries.
See this code:
data have;
infile datalines delimiter=",";
input Flow_Name: $256. Job_Name: $256. Dependency: $1024.;
cards;
CE_B3_STACK,90-IDM-STK_FMTS-00001,
CE_B3_STACK,91-IDM-STK_FW-00001,
CE_B3_STACK,92-IDM-STK_PRTY-00001,91-IDM-STK_FW-00001
CE_B3_STACK,93-IDM-STK_PRXY-00001,91-IDM-STK_FW-00001|92-IDM-STK_PRTY-00001
CE_B3_STACK,94-IDM-STK_B3_ASSET_CLASS-00001,90-IDM-STK_FMTS-00001|92-IDM-STK_PRTY-00001|93-IDM-STK_PRXY-00001
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00001,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00002,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00003,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00004,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00005,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00006,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00007,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00008,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00009,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00010,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00001,20-BSE-CUST_SGB-00001|20-BSE-CUST_SGB-00002|20-BSE-CUST_SGB-00003|20-BSE-CUST_SGB-00004|20-BSE-CUST_SGB-00005|20-BSE-CUST_SGB-00006|20-BSE-CUST_SGB-00007|20-BSE-CUST_SGB-00008
CE_BSE_CUST_SGB,21-BSE-CUST_SGB-00001,20-BSE-CUST_SGB-00001|20-BSE-CUST_SGB-00002|20-BSE-CUST_SGB-00003|20-BSE-CUST_SGB-00004|20-BSE-CUST_SGB-00005|20-BSE-CUST_SGB-00006|20-BSE-CUST_SGB-00007|20-BSE-CUST_SGB-00008|XXX
;
/* find duplicate job_names */
proc sort
data=have
out=dedup /* so we do not destroy "have" */
dupout=dups (keep=flow_name job_name)
nodupkey
;
by flow_name job_name;
run;
/* find missing job_names and recursive definitions */
data recursive_miss (keep=flow_name flag);
set have;
by flow_name;
if _n_ = 1
then do;
declare hash f ();
f.definekey("job_name");
f.definedone();
end;
if first.flow_name then rc = f.clear();
rc = f.add();
if findw(dependency,job_name,"|","t")
then do;
flag = "recursive";
output;
end;
else do i = 1 to countw(dependency,"|","t");
if f.find(key:scan(dependency,i,"|")) ne 0
then do;
flag = "missing";
output;
leave;
end;
end;
run;
data n_ok (keep=flow_name);
set
dups
recursive_miss
;
by flow_name;
if first.flow_name;
run;
data want;
merge
have
n_ok (in=n)
;
by flow_name;
flag = ifc(n,"not ok","ok");
run;
I corrected the last dependency from your example data so that a duplicate and recursive job_name is actually created, and then added an additional observation to create a "missing" job.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.