- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I want to find if the given elements under a group are the part of another string or no ? If they are then I want the group number of the string where it was the part. They should be part of another string.For example, 'Benefits, verify' is one string under group 1 and that is part of group 2 string (claims,verify,benefits,verify). I also want to count how many times it appears in another string. Additional : Do not include the single words in calculation. For example in the below example do not include benefits in calculation.
For example
Column1 group
Benefits,verify 1
claims,verify,benefits,verify 2
inquiry,type 3
claims,verify 4
Benefits,verify,inquiry,type,Benefits 5
Benefits 6
Benefits 7
Output expected:
column1 Part of group count
Benefits, verify 2 2
Benefits,verify 5 2
claims,verify 2 1
inquiry,type - -
claims,verify,benefits,verify - -
Benefits,verify,inquiry,type,Benefits - -
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Ah ok. This gives you the final required output
data have;
length Column1 $200;
input Column1 $ group;
datalines;
Benefits,verify 1
claims,verify,benefits,verify 2
inquiry,type 3
claims,verify 4
Benefits,verify,inquiry,type,Benefits 5
Benefits 6
Benefits 7
;
data want(keep=Column1 Part_Of_Group count);
set have nobs=nobs;
count=0;
do _N_=1 to nobs;
set have(rename=(Column1=compColumn1 group=compGroup)) point=_N_;
if find(compColumn1, Column1, 'it') ne 0 & group ne compGroup & countw(Column1, ",")>1 then do;
Part_Of_Group=compGroup;
count+1;
output;
end;
end;
if count=0 & countw(Column1, ",")>1 then output;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So your data looks like this correct?
data have;
length Column1 $200;
input Column1 $ group;
datalines;
Benefits,verify 1
claims,verify,benefits,verify 2
inquiry,type 3
claims,verify 4
;
I take it that you want some case insensitive solution since you want the string "Benefits, verify" to be part of "claims,verify,benefits,verify"? And what about the space after the comma? That does not appear in the string for group 2
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Ok. What if the string "verify,claims" were to appear? Does the order of the words separated by comma matter?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
My take on it using PROC SQL
data groups;
infile cards dlm=' ';
length Column1 $30 group 8;
input Column1 group;
cards;
Benefits,verify 1
claims,verify,benefits,verify 2
inquiry,type 3
claims,verify 4
;
run;
proc sql noprint;
create table test as
select
a.Column1,
a.Group,
part.PartOfGroup
from groups a
left join
(select distinct
a.Column1,
a.group,
case
when (find(b.Column1,a.Column1,'it')) then b.group
else .
end as PartofGroup
from groups a,
groups b
having PartofGroup ne a.group
and PartofGroup ne . ) part
on a.Column1 = part.Column1
;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Ok. Here is a simple data step solution
data have;
length Column1 $200;
input Column1 $ group;
datalines;
Benefits,verify 1
claims,verify,benefits,verify 2
inquiry,type 3
claims,verify 4
;
data want(keep=Column1 Part_Of_Group count);
set have nobs=nobs;
count=0;
do _N_=1 to nobs;
set have(rename=(Column1=compColumn1 group=compGroup)) point=_N_;
if find(compColumn1, Column1, 'it') ne 0 and group ne compGroup then do;
Part_Of_Group=compGroup;
count+1;
end;
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Under the Part_of_group section it just displays one group number but I wanted the multiple groups numbers where it was the part.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Are you sure you want data presented this way?
It is rarely a good idea to represent multiple numeric values in a single variable.
.. Though it can be done fairly simple
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Not difficult. But please specify exactly how you want the output then.. This solution gives you the Part_Of_Group concatenated on one line
data have;
length Column1 $200;
input Column1 $ group;
datalines;
Benefits,verify 1
claims,verify,benefits,verify 2
inquiry,type 3
claims,verify 4
Benefits,verify,inquiry,type,Benefits 5
Benefits 6
Benefits 7
;
data want(keep=Column1 Part_Of_Group count);
set have nobs=nobs;
length Part_Of_Group $200;
count=0;
do _N_=1 to nobs;
set have(rename=(Column1=compColumn1 group=compGroup)) point=_N_;
if find(compColumn1, Column1, 'it') ne 0 & group ne compGroup & countw(Column1, ",")>1 then do;
Part_Of_Group=catx(",",Part_Of_Group,compGroup);
count+1;
end;
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Great. No problem. Please mark the thread as a solution to help other users navigate the forum.