BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
vrushankshah
Fluorite | Level 6

I want to find if the given elements under a group are the part of another string or no ? If they are then I want the group number of the string where it was the part. They should be part of another string.For example, 'Benefits, verify' is one string under group 1 and that is part of group 2 string (claims,verify,benefits,verify). I also want to count how many times it appears in another string. Additional : Do not include the single words  in calculation. For example in the below example do not include benefits in calculation.

For example

Column1                                group

Benefits,verify                           1
claims,verify,benefits,verify             2
inquiry,type                              3
claims,verify                             4

Benefits,verify,inquiry,type,Benefits     5

Benefits                                  6

Benefits                                  7

 

Output expected:

column1                            Part of group     count

Benefits, verify                         2             2

Benefits,verify                          5             2
claims,verify                            2             1
inquiry,type                             -             -
claims,verify,benefits,verify            -             -

Benefits,verify,inquiry,type,Benefits    -             -

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Ah ok. This gives you the final required output

 

data have;
length Column1 $200;
input Column1 $ group;
datalines;
Benefits,verify                           1
claims,verify,benefits,verify             2
inquiry,type                              3
claims,verify                             4
Benefits,verify,inquiry,type,Benefits     5
Benefits                                  6
Benefits                                  7
;

data want(keep=Column1 Part_Of_Group count);
   set have nobs=nobs;
   count=0;
   do _N_=1 to nobs;
      set have(rename=(Column1=compColumn1 group=compGroup)) point=_N_;
      if find(compColumn1, Column1, 'it') ne 0 & group ne compGroup & countw(Column1, ",")>1 then do;
         Part_Of_Group=compGroup;
         count+1;
         output;
      end;
   end;
   if count=0 & countw(Column1, ",")>1 then output;
run;

View solution in original post

16 REPLIES 16
PeterClemmensen
Tourmaline | Level 20

So your data looks like this correct?

 

 

data have;
length Column1 $200;
input Column1 $ group;
datalines;
Benefits,verify                  1
claims,verify,benefits,verify    2
inquiry,type                     3
claims,verify                    4
;

I take it that you want some case insensitive solution since you want the string "Benefitsverify" to be part of "claims,verify,benefits,verify"? And what about the space after the comma? That does not appear in the string for group 2

 

 

 

vrushankshah
Fluorite | Level 6
Sorry there was some additional thing which I forgot to mention can you please help me with the update part !
vrushankshah
Fluorite | Level 6
Yes case sensitive is not needed and even doesn’t matter just the words should match and there won’t be any space after commma.sorry for that I didn’t upload the data properly
PeterClemmensen
Tourmaline | Level 20

Ok. What if the string "verify,claims" were to appear? Does the order of the words separated by comma matter?

vrushankshah
Fluorite | Level 6
The order matters here so verify,claims is different string
MichaelLarsen
SAS Employee

My take on it using PROC SQL 

 

data groups;
  infile cards dlm=' ';
  length Column1 $30 group 8;
  input Column1 group;
cards;
Benefits,verify                  1
claims,verify,benefits,verify    2
inquiry,type                     3
claims,verify                    4
;
run;
proc sql noprint;
  create table test as
    select 
      a.Column1,
      a.Group,
      part.PartOfGroup
    from groups a
    left join
      (select distinct
        a.Column1, 
        a.group,
        case
          when (find(b.Column1,a.Column1,'it')) then b.group
          else .
        end as PartofGroup
      from groups a,
           groups b
      having PartofGroup ne a.group
         and PartofGroup ne . ) part
     on a.Column1 = part.Column1
    ;
quit;
PeterClemmensen
Tourmaline | Level 20

Ok. Here is a simple data step solution

 

data have;
length Column1 $200;
input Column1 $ group;
datalines;
Benefits,verify                  1
claims,verify,benefits,verify    2
inquiry,type                     3
claims,verify                    4
;

data want(keep=Column1 Part_Of_Group count);
   set have nobs=nobs;
   count=0;
   do _N_=1 to nobs;
      set have(rename=(Column1=compColumn1 group=compGroup)) point=_N_;
      if find(compColumn1, Column1, 'it') ne 0 and group ne compGroup then do;
         Part_Of_Group=compGroup;
         count+1;
      end;
   end;
run;
vrushankshah
Fluorite | Level 6

Under the Part_of_group section it just displays one group number but I wanted the multiple groups numbers where it was the part.

vrushankshah
Fluorite | Level 6
Really Sorry there was some additional thing which I forgot to mention can you please help me with the update part !
PeterClemmensen
Tourmaline | Level 20

Are you sure you want data presented this way?

 

It is rarely a good idea to represent multiple numeric values in a single variable.

 

.. Though it can be done fairly simple

vrushankshah
Fluorite | Level 6
Actually I wanted on different line each group number but I thought it might be difficult hence told to have in same line.
PeterClemmensen
Tourmaline | Level 20

Not difficult. But please specify exactly how you want the output then.. This solution gives you the Part_Of_Group concatenated on one line

 

data have;
length Column1 $200;
input Column1 $ group;
datalines;
Benefits,verify                           1
claims,verify,benefits,verify             2
inquiry,type                              3
claims,verify                             4
Benefits,verify,inquiry,type,Benefits     5
Benefits                                  6
Benefits                                  7
;

data want(keep=Column1 Part_Of_Group count);
   set have nobs=nobs;
   length Part_Of_Group $200;
   count=0;
   do _N_=1 to nobs;
      set have(rename=(Column1=compColumn1 group=compGroup)) point=_N_;
      if find(compColumn1, Column1, 'it') ne 0 & group ne compGroup & countw(Column1, ",")>1 then do;
         Part_Of_Group=catx(",",Part_Of_Group,compGroup);
         count+1;
      end;
   end;
run;
vrushankshah
Fluorite | Level 6
Final update is done how I want the output to be ! Thank you !
PeterClemmensen
Tourmaline | Level 20

Great. No problem. Please mark the thread as a solution to help other users navigate the forum.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 16 replies
  • 2021 views
  • 2 likes
  • 3 in conversation