Hi All,
How can I count the number of times a string occurs in a bigger string in SAS?
Basically a variable with a lot of text and I need to determine how many times a certain thing occurs in each of the rows
Example: A variable has a value "This is is a great day" for row 1
I need to know how many time 'is' occurred.
Similarly for all rows
I have been using prxmatch to know if it occurs or not. Cant figure out how to calculate number of occurrences.
Thanks.
How much further are you taking this analysis?
My suggestion would be to separate each word from a sentence into it's own record and then use PROC FREQ.
Here's an example at finding all two word combinations or how to split a file:
*Create sample data;
data random_sentences;
infile cards truncover;
informat sentence $256.;
input sentence $256.;
cards;
This is a random sentence
This is another random sentence
Happy Birthday
My job sucks.
This is a good idea, not.
This is an awesome idea!
How are you today?
Does this make sense?
Have a great day!
;
;
;
;
*Partition into words;
data f1;
set random_sentences;
id=_n_;
nwords=countw(sentence);
nchar=length(compress(sentence));
do word_order=1 to nwords;
word=scan(sentence, word_order);
output;
end;
run;
proc sql;
create table words2 as
select t1.sentence, lowcase(t1.word) as word1, lowcase(t2.word) as word2
from f1 as t1
cross join f1 as t2
where t1.sentence=t2.sentence
and t1.word_order > t2.word_order
order by t1.sentence, t1.word_order;
quit;
proc freq data=words2 noprint order=freq;
table word1*word2 /list out=want;
run;
count = 0;
do i = 1 to countw(yourvar);
count = count + (scan(yourvar,i) = testvar);
end;
drop i;
Applied to your example,
data want;
testvar = 'is';
yourvar = "This is is a great day";
count = 0;
do i = 1 to countw(yourvar);
count = count + (scan(yourvar,i) = testvar);
end;
drop i;
run;
Example: A variable has a value "This is is a great day" for row 1
I need to know how many time 'is' occurred.
Seems to me the answer is 3. Is that right? Can you confirm? Your title indicates you are counting words, but your first sentence indicates you want to know how many times a STRING appears in a bigger string. These are not the same! We need clarification.
The code provided by others is looking for entire words that match 'is' and would not count the letters 'is' in the word "This".
Hi Paige,
I'm sorry for that error.
I didnt see the is in This.
Yes, the count should be 3.
Thank you.
So it should be
"count how many times a string appears in another string"
Hi Kurt,
Thank you for your reply.
I figured out a way on achieving my problem. Is there a better way of achieving the same?
OpportunityDescription is my variable while "F2F", "face to face" are my test strings.
data out.open_records_opp_desc;
set out.open_records;
Count_1 = count(OpportunityDescription,'F2F','i');
Count_2 = count(OpportunityDescription,'face to face','i');
Count_3 = count(OpportunityDescription,'in person','i');
Count_4 = count(OpportunityDescription,'meeting','i');
Count_5 = count(OpportunityDescription,'met','i');
Count_6 = count(OpportunityDescription,'discussed','i');
Count_7 = count(OpportunityDescription,'one on one','i');
Count_8 = count(OpportunityDescription,'discussed','i');
Count_9 = count(OpportunityDescription,'contact','i');
count_f2f_occurance = sum(Count_1,Count_2,Count_3,Count_4,Count_5,Count_6,Count_7,Count_8,Count_9);
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.