Solved: Re: To find if the group of elements are substring of another string ?

vrushankshah · Posted 07-16-2018 12:16 AM

I want to find if the given elements under a group are the part of another string or no ? If they are then I want the group number of the string where it was the part. They should be part of another string.For example, 'Benefits, verify' is one string under group 1 and that is part of group 2 string (claims,verify,benefits,verify). I also want to count how many times it appears in another string. Additional : Do not include the single words in calculation. For example in the below example do not include benefits in calculation.

For example

Column1 group

Benefits,verify 1 claims,verify,benefits,verify 2 inquiry,type 3 claims,verify 4

Benefits,verify,inquiry,type,Benefits 5

Benefits 6

Benefits 7

Output expected:

column1 Part of group count

Benefits, verify 2 2

Benefits,verify 5 2 claims,verify 2 1 inquiry,type - - claims,verify,benefits,verify - -

Benefits,verify,inquiry,type,Benefits - -

PeterClemmensen · Posted 07-16-2018 04:49 PM

Ah ok. This gives you the final required output

data have;
length Column1 $200;
input Column1 $ group;
datalines;
Benefits,verify                           1
claims,verify,benefits,verify             2
inquiry,type                              3
claims,verify                             4
Benefits,verify,inquiry,type,Benefits     5
Benefits                                  6
Benefits                                  7
;

data want(keep=Column1 Part_Of_Group count);
   set have nobs=nobs;
   count=0;
   do _N_=1 to nobs;
      set have(rename=(Column1=compColumn1 group=compGroup)) point=_N_;
      if find(compColumn1, Column1, 'it') ne 0 & group ne compGroup & countw(Column1, ",")>1 then do;
         Part_Of_Group=compGroup;
         count+1;
         output;
      end;
   end;
   if count=0 & countw(Column1, ",")>1 then output;
run;

The DATA to DATA Step Macro
Blog: SASnrd

View solution in original post

PeterClemmensen · Posted 07-16-2018 06:08 AM

So your data looks like this correct?

data have;
length Column1 $200;
input Column1 $ group;
datalines;
Benefits,verify                  1
claims,verify,benefits,verify    2
inquiry,type                     3
claims,verify                    4
;

I take it that you want some case insensitive solution since you want the string "Benefits, verify" to be part of "claims,verify,benefits,verify"? And what about the space after the comma? That does not appear in the string for group 2

The DATA to DATA Step Macro
Blog: SASnrd

vrushankshah · Posted 07-16-2018 03:17 PM

Sorry there was some additional thing which I forgot to mention can you please help me with the update part !

vrushankshah · Posted 07-16-2018 06:11 AM

Yes case sensitive is not needed and even doesn’t matter just the words should match and there won’t be any space after commma.sorry for that I didn’t upload the data properly

PeterClemmensen · Posted 07-16-2018 06:16 AM

Ok. What if the string "verify,claims" were to appear? Does the order of the words separated by comma matter?

The DATA to DATA Step Macro
Blog: SASnrd

vrushankshah · Posted 07-16-2018 06:19 AM

The order matters here so verify,claims is different string

MichaelLarsen · Posted 07-16-2018 06:32 AM

My take on it using PROC SQL

data groups;
  infile cards dlm=' ';
  length Column1 $30 group 8;
  input Column1 group;
cards;
Benefits,verify                  1
claims,verify,benefits,verify    2
inquiry,type                     3
claims,verify                    4
;
run;
proc sql noprint;
  create table test as
    select 
      a.Column1,
      a.Group,
      part.PartOfGroup
    from groups a
    left join
      (select distinct
        a.Column1, 
        a.group,
        case
          when (find(b.Column1,a.Column1,'it')) then b.group
          else .
        end as PartofGroup
      from groups a,
           groups b
      having PartofGroup ne a.group
         and PartofGroup ne . ) part
     on a.Column1 = part.Column1
    ;
quit;

PeterClemmensen · Posted 07-16-2018 06:46 AM

Ok. Here is a simple data step solution

data have;
length Column1 $200;
input Column1 $ group;
datalines;
Benefits,verify                  1
claims,verify,benefits,verify    2
inquiry,type                     3
claims,verify                    4
;

data want(keep=Column1 Part_Of_Group count);
   set have nobs=nobs;
   count=0;
   do _N_=1 to nobs;
      set have(rename=(Column1=compColumn1 group=compGroup)) point=_N_;
      if find(compColumn1, Column1, 'it') ne 0 and group ne compGroup then do;
         Part_Of_Group=compGroup;
         count+1;
      end;
   end;
run;

The DATA to DATA Step Macro
Blog: SASnrd

vrushankshah · Posted 07-16-2018 12:38 PM

Under the Part_of_group section it just displays one group number but I wanted the multiple groups numbers where it was the part.

vrushankshah · Posted 07-16-2018 03:18 PM

Really Sorry there was some additional thing which I forgot to mention can you please help me with the update part !

PeterClemmensen · Posted 07-16-2018 04:17 PM

Are you sure you want data presented this way?

It is rarely a good idea to represent multiple numeric values in a single variable.

.. Though it can be done fairly simple

The DATA to DATA Step Macro
Blog: SASnrd

vrushankshah · Posted 07-16-2018 04:29 PM

Actually I wanted on different line each group number but I thought it might be difficult hence told to have in same line.

PeterClemmensen · Posted 07-16-2018 04:33 PM

Not difficult. But please specify exactly how you want the output then.. This solution gives you the Part_Of_Group concatenated on one line

data have;
length Column1 $200;
input Column1 $ group;
datalines;
Benefits,verify                           1
claims,verify,benefits,verify             2
inquiry,type                              3
claims,verify                             4
Benefits,verify,inquiry,type,Benefits     5
Benefits                                  6
Benefits                                  7
;

data want(keep=Column1 Part_Of_Group count);
   set have nobs=nobs;
   length Part_Of_Group $200;
   count=0;
   do _N_=1 to nobs;
      set have(rename=(Column1=compColumn1 group=compGroup)) point=_N_;
      if find(compColumn1, Column1, 'it') ne 0 & group ne compGroup & countw(Column1, ",")>1 then do;
         Part_Of_Group=catx(",",Part_Of_Group,compGroup);
         count+1;
      end;
   end;
run;

The DATA to DATA Step Macro
Blog: SASnrd

vrushankshah · Posted 07-16-2018 04:39 PM

Final update is done how I want the output to be ! Thank you !

PeterClemmensen · Posted 07-16-2018 04:40 PM

Great. No problem. Please mark the thread as a solution to help other users navigate the forum.

The DATA to DATA Step Macro
Blog: SASnrd

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away