Solved: Finding common value for a group

Swapnil_21 · Posted 03-18-2022 03:10 AM

Hi ,

I have table which has flow name , job name and dependency as columns, I need to find if same job is present in any of the dependency column.

Below is the Sample

Flow Name	Job Name	Dependency
CE_B3_STACK	90-IDM-STK_FMTS-00001
CE_B3_STACK	91-IDM-STK_FW-00001
CE_B3_STACK	92-IDM-STK_PRTY-00001	91-IDM-STK_FW-00001
CE_B3_STACK	93-IDM-STK_PRXY-00001	91-IDM-STK_FW-00001\|92-IDM-STK_PRTY-00001
CE_B3_STACK	94-IDM-STK_B3_ASSET_CLASS-00001	90-IDM-STK_FMTS-00001\|92-IDM-STK_PRTY-00001\|93-IDM-STK_PRXY-00001
CE_BSE_CUST_SGB	20-BSE-CUST_SGB-00001
CE_BSE_CUST_SGB	20-BSE-CUST_SGB-00002
CE_BSE_CUST_SGB	20-BSE-CUST_SGB-00003
CE_BSE_CUST_SGB	20-BSE-CUST_SGB-00004
CE_BSE_CUST_SGB	20-BSE-CUST_SGB-00005
CE_BSE_CUST_SGB	20-BSE-CUST_SGB-00006
CE_BSE_CUST_SGB	20-BSE-CUST_SGB-00007
CE_BSE_CUST_SGB	20-BSE-CUST_SGB-00008
CE_BSE_CUST_SGB	20-BSE-CUST_SGB-00009
CE_BSE_CUST_SGB	20-BSE-CUST_SGB-00010
CE_BSE_CUST_SGB	21-BSE-CUST_SGB-00001	20-BSE-CUST_SGB-00001\|20-BSE-CUST_SGB-00002\|20-BSE-CUST_SGB-00003\|20-BSE-CUST_SGB-00004\|20-BSE-CUST_SGB-00005\|20-BSE-CUST_SGB-00006\|20-BSE-CUST_SGB-00007\|20-BSE-CUST_SGB-00008

Flow CE_B3_STACK is perfectly alright. However, CE_BSE_CUST_SGB has 2 issues. It has job name

21-BSE-CUST_SGB-00001 entry twice and also 21-BSE-CUST_SGB-00001 job shows it is dependent on the same job 21-BSE-CUST_SGB-00001.

So I need to know and flag out such entries where job is dependent on same job within a flow . Something like below output is expected ..

Flow Name	Job Name	Dependency	Flag
CE_B3_STACK	90-IDM-STK_FMTS-00001		ok
CE_B3_STACK	91-IDM-STK_FW-00001
CE_B3_STACK	92-IDM-STK_PRTY-00001	91-IDM-STK_FW-00001	ok
CE_B3_STACK	93-IDM-STK_PRXY-00001	91-IDM-STK_FW-00001\|92-IDM-STK_PRTY-00001	ok
CE_B3_STACK	94-IDM-STK_B3_ASSET_CLASS-00001	90-IDM-STK_FMTS-00001\|92-IDM-STK_PRTY-00001\|93-IDM-STK_PRXY-00001	ok
CE_BSE_CUST_SGB	20-BSE-CUST_SGB-00001		Not ok
CE_BSE_CUST_SGB	20-BSE-CUST_SGB-00002		Not ok
CE_BSE_CUST_SGB	20-BSE-CUST_SGB-00003		Not ok
CE_BSE_CUST_SGB	20-BSE-CUST_SGB-00004		Not ok
CE_BSE_CUST_SGB	20-BSE-CUST_SGB-00005		Not ok
CE_BSE_CUST_SGB	20-BSE-CUST_SGB-00006		Not ok
CE_BSE_CUST_SGB	20-BSE-CUST_SGB-00007		Not ok
CE_BSE_CUST_SGB	20-BSE-CUST_SGB-00008		Not ok
CE_BSE_CUST_SGB	20-BSE-CUST_SGB-00009		Not ok
CE_BSE_CUST_SGB	20-BSE-CUST_SGB-00010		Not ok
CE_BSE_CUST_SGB	21-BSE-CUST_SGB-00001	20-BSE-CUST_SGB-00001\|20-BSE-CUST_SGB-00002\|20-BSE-CUST_SGB-00003\|20-BSE-CUST_SGB-00004\|20-BSE-CUST_SGB-00005\|20-BSE-CUST_SGB-00006\|20-BSE-CUST_SGB-00007\|20-BSE-CUST_SGB-00008	Not ok

Any help is really appreciated

Kurt_Bremser · Posted 03-22-2022 04:55 AM

See this code:

data have;
infile datalines delimiter=",";
input Flow_Name: $256. Job_Name: $256. Dependency: $1024.;
cards;
CE_B3_STACK,90-IDM-STK_FMTS-00001,
CE_B3_STACK,91-IDM-STK_FW-00001,
CE_B3_STACK,92-IDM-STK_PRTY-00001,91-IDM-STK_FW-00001
CE_B3_STACK,93-IDM-STK_PRXY-00001,91-IDM-STK_FW-00001|92-IDM-STK_PRTY-00001
CE_B3_STACK,94-IDM-STK_B3_ASSET_CLASS-00001,90-IDM-STK_FMTS-00001|92-IDM-STK_PRTY-00001|93-IDM-STK_PRXY-00001
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00001,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00002,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00003,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00004,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00005,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00006,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00007,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00008,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00009,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00010,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00001,20-BSE-CUST_SGB-00001|20-BSE-CUST_SGB-00002|20-BSE-CUST_SGB-00003|20-BSE-CUST_SGB-00004|20-BSE-CUST_SGB-00005|20-BSE-CUST_SGB-00006|20-BSE-CUST_SGB-00007|20-BSE-CUST_SGB-00008
CE_BSE_CUST_SGB,21-BSE-CUST_SGB-00001,20-BSE-CUST_SGB-00001|20-BSE-CUST_SGB-00002|20-BSE-CUST_SGB-00003|20-BSE-CUST_SGB-00004|20-BSE-CUST_SGB-00005|20-BSE-CUST_SGB-00006|20-BSE-CUST_SGB-00007|20-BSE-CUST_SGB-00008|XXX
;

/* find duplicate job_names */
proc sort
  data=have
  out=dedup /* so we do not destroy "have" */
  dupout=dups (keep=flow_name job_name)
  nodupkey
;
by flow_name job_name;
run;

/* find missing job_names and recursive definitions */
data recursive_miss (keep=flow_name flag);
set have;
by flow_name;
if _n_ = 1
then do;
  declare hash f ();
  f.definekey("job_name");
  f.definedone();
end;
if first.flow_name then rc = f.clear();
rc = f.add();
if findw(dependency,job_name,"|","t")
then do;
  flag = "recursive";
  output;
end;
else do i = 1 to countw(dependency,"|","t");
  if f.find(key:scan(dependency,i,"|")) ne 0
  then do;
    flag = "missing";
    output;
    leave;
  end;
end;
run;

data n_ok (keep=flow_name);
set
  dups
  recursive_miss
;
by flow_name;
if first.flow_name;
run;

data want;
merge
  have
  n_ok (in=n)
;
by flow_name;
flag = ifc(n,"not ok","ok");
run;

I corrected the last dependency from your example data so that a duplicate and recursive job_name is actually created, and then added an additional observation to create a "missing" job.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Kurt_Bremser · Posted 03-18-2022 03:42 AM

Please supply example data in usable form (correctly working data step with datalines) so we have something to easily play around with.

Without that, I can only give suggestions:

First, check for doubles in Job Name by sorting (BY "Flow Name" "Job Name") and using the NODUPKEY and DUPOUT= option.

Next, expand the Dependency into separate observations, so you can do a join with the Job Name column to see if there is a match for all entries.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Swapnil_21 · Posted 03-21-2022 10:15 PM

Here is the sample dataset

data have;
infile datalines delimiter=",";
input Flow_Name: $256. Job_Name: $256. Dependency: $1024.;
cards;
CE_B3_STACK,90-IDM-STK_FMTS-00001,
CE_B3_STACK,91-IDM-STK_FW-00001,
CE_B3_STACK,92-IDM-STK_PRTY-00001,91-IDM-STK_FW-00001
CE_B3_STACK,93-IDM-STK_PRXY-00001,91-IDM-STK_FW-00001|92-IDM-STK_PRTY-00001
CE_B3_STACK,94-IDM-STK_B3_ASSET_CLASS-00001,90-IDM-STK_FMTS-00001|92-IDM-STK_PRTY-00001|93-IDM-STK_PRXY-00001
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00001,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00002,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00003,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00004,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00005,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00006,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00007,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00008,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00009,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00010,
CE_BSE_CUST_SGB,21-BSE-CUST_SGB-00001,20-BSE-CUST_SGB-00001|20-BSE-CUST_SGB-00002|20-BSE-CUST_SGB-00003|20-BSE-CUST_SGB-00004|20-BSE-CUST_SGB-00005|20-BSE-CUST_SGB-00006|20-BSE-CUST_SGB-00007|20-BSE-CUST_SGB-00008
;
run;

Kurt_Bremser · Posted 03-22-2022 04:55 AM

See this code:

data have;
infile datalines delimiter=",";
input Flow_Name: $256. Job_Name: $256. Dependency: $1024.;
cards;
CE_B3_STACK,90-IDM-STK_FMTS-00001,
CE_B3_STACK,91-IDM-STK_FW-00001,
CE_B3_STACK,92-IDM-STK_PRTY-00001,91-IDM-STK_FW-00001
CE_B3_STACK,93-IDM-STK_PRXY-00001,91-IDM-STK_FW-00001|92-IDM-STK_PRTY-00001
CE_B3_STACK,94-IDM-STK_B3_ASSET_CLASS-00001,90-IDM-STK_FMTS-00001|92-IDM-STK_PRTY-00001|93-IDM-STK_PRXY-00001
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00001,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00002,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00003,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00004,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00005,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00006,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00007,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00008,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00009,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00010,
CE_BSE_CUST_SGB,20-BSE-CUST_SGB-00001,20-BSE-CUST_SGB-00001|20-BSE-CUST_SGB-00002|20-BSE-CUST_SGB-00003|20-BSE-CUST_SGB-00004|20-BSE-CUST_SGB-00005|20-BSE-CUST_SGB-00006|20-BSE-CUST_SGB-00007|20-BSE-CUST_SGB-00008
CE_BSE_CUST_SGB,21-BSE-CUST_SGB-00001,20-BSE-CUST_SGB-00001|20-BSE-CUST_SGB-00002|20-BSE-CUST_SGB-00003|20-BSE-CUST_SGB-00004|20-BSE-CUST_SGB-00005|20-BSE-CUST_SGB-00006|20-BSE-CUST_SGB-00007|20-BSE-CUST_SGB-00008|XXX
;

/* find duplicate job_names */
proc sort
  data=have
  out=dedup /* so we do not destroy "have" */
  dupout=dups (keep=flow_name job_name)
  nodupkey
;
by flow_name job_name;
run;

/* find missing job_names and recursive definitions */
data recursive_miss (keep=flow_name flag);
set have;
by flow_name;
if _n_ = 1
then do;
  declare hash f ();
  f.definekey("job_name");
  f.definedone();
end;
if first.flow_name then rc = f.clear();
rc = f.add();
if findw(dependency,job_name,"|","t")
then do;
  flag = "recursive";
  output;
end;
else do i = 1 to countw(dependency,"|","t");
  if f.find(key:scan(dependency,i,"|")) ne 0
  then do;
    flag = "missing";
    output;
    leave;
  end;
end;
run;

data n_ok (keep=flow_name);
set
  dups
  recursive_miss
;
by flow_name;
if first.flow_name;
run;

data want;
merge
  have
  n_ok (in=n)
;
by flow_name;
flag = ifc(n,"not ok","ok");
run;

I corrected the last dependency from your example data so that a duplicate and recursive job_name is actually created, and then added an additional observation to create a "missing" job.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Finding common value for a group

Re: Finding common value for a group

Re: Finding common value for a group

Re: Finding common value for a group

Re: Finding common value for a group

Finding common value for a group

Re: Finding common value for a group

Re: Finding common value for a group

Re: Finding common value for a group

Re: Finding common value for a group

The 2025 SAS Hackathon has begun!

SAS Training: Just a Click Away