I want to find if the given elements under a group are the part of another string or no ? If they are then I want the group number of the string where it was the part. They should be part of another string.For example, 'Benefits, verify' is one string under group 1 and that is part of group 2 string (claims,verify,benefits,verify). I also want to count how many times it appears in another string. Additional : Do not include the single words in calculation. For example in the below example do not include benefits in calculation.
For example
Column1 group
Benefits,verify 1
claims,verify,benefits,verify 2
inquiry,type 3
claims,verify 4
Benefits,verify,inquiry,type,Benefits 5
Benefits 6
Benefits 7
Output expected:
column1 Part of group count
Benefits, verify 2 2
Benefits,verify 5 2
claims,verify 2 1
inquiry,type - -
claims,verify,benefits,verify - -
Benefits,verify,inquiry,type,Benefits - -
Ah ok. This gives you the final required output
data have;
length Column1 $200;
input Column1 $ group;
datalines;
Benefits,verify 1
claims,verify,benefits,verify 2
inquiry,type 3
claims,verify 4
Benefits,verify,inquiry,type,Benefits 5
Benefits 6
Benefits 7
;
data want(keep=Column1 Part_Of_Group count);
set have nobs=nobs;
count=0;
do _N_=1 to nobs;
set have(rename=(Column1=compColumn1 group=compGroup)) point=_N_;
if find(compColumn1, Column1, 'it') ne 0 & group ne compGroup & countw(Column1, ",")>1 then do;
Part_Of_Group=compGroup;
count+1;
output;
end;
end;
if count=0 & countw(Column1, ",")>1 then output;
run;
So your data looks like this correct?
data have;
length Column1 $200;
input Column1 $ group;
datalines;
Benefits,verify 1
claims,verify,benefits,verify 2
inquiry,type 3
claims,verify 4
;
I take it that you want some case insensitive solution since you want the string "Benefits, verify" to be part of "claims,verify,benefits,verify"? And what about the space after the comma? That does not appear in the string for group 2
Ok. What if the string "verify,claims" were to appear? Does the order of the words separated by comma matter?
My take on it using PROC SQL
data groups;
infile cards dlm=' ';
length Column1 $30 group 8;
input Column1 group;
cards;
Benefits,verify 1
claims,verify,benefits,verify 2
inquiry,type 3
claims,verify 4
;
run;
proc sql noprint;
create table test as
select
a.Column1,
a.Group,
part.PartOfGroup
from groups a
left join
(select distinct
a.Column1,
a.group,
case
when (find(b.Column1,a.Column1,'it')) then b.group
else .
end as PartofGroup
from groups a,
groups b
having PartofGroup ne a.group
and PartofGroup ne . ) part
on a.Column1 = part.Column1
;
quit;
Ok. Here is a simple data step solution
data have;
length Column1 $200;
input Column1 $ group;
datalines;
Benefits,verify 1
claims,verify,benefits,verify 2
inquiry,type 3
claims,verify 4
;
data want(keep=Column1 Part_Of_Group count);
set have nobs=nobs;
count=0;
do _N_=1 to nobs;
set have(rename=(Column1=compColumn1 group=compGroup)) point=_N_;
if find(compColumn1, Column1, 'it') ne 0 and group ne compGroup then do;
Part_Of_Group=compGroup;
count+1;
end;
end;
run;
Under the Part_of_group section it just displays one group number but I wanted the multiple groups numbers where it was the part.
Are you sure you want data presented this way?
It is rarely a good idea to represent multiple numeric values in a single variable.
.. Though it can be done fairly simple
Not difficult. But please specify exactly how you want the output then.. This solution gives you the Part_Of_Group concatenated on one line
data have;
length Column1 $200;
input Column1 $ group;
datalines;
Benefits,verify 1
claims,verify,benefits,verify 2
inquiry,type 3
claims,verify 4
Benefits,verify,inquiry,type,Benefits 5
Benefits 6
Benefits 7
;
data want(keep=Column1 Part_Of_Group count);
set have nobs=nobs;
length Part_Of_Group $200;
count=0;
do _N_=1 to nobs;
set have(rename=(Column1=compColumn1 group=compGroup)) point=_N_;
if find(compColumn1, Column1, 'it') ne 0 & group ne compGroup & countw(Column1, ",")>1 then do;
Part_Of_Group=catx(",",Part_Of_Group,compGroup);
count+1;
end;
end;
run;
Great. No problem. Please mark the thread as a solution to help other users navigate the forum.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.