DATA Step, Macro, Functions and more

How many times a word occurs in a string?

Reply
Occasional Contributor
Posts: 14

How many times a word occurs in a string?

Hi All,

 

How can I count the number of times a string occurs in a bigger string in SAS?

Basically a variable with a lot of text and I need to determine how many times a certain thing occurs in each of the rows

 

Example: A variable has a value "This is is a great day" for row 1
I need to know how many time 'is' occurred. 

Similarly for all rows

 

I have been using prxmatch to know if it occurs or not. Cant figure out how to calculate number of occurrences.

 

Thanks. 

 

 

Super User
Posts: 23,778

Re: How many times a word occurs in a string?

Posted in reply to silvergrenade

How much further are you taking this analysis?
My suggestion would be to separate each word from a sentence into it's own record and then use PROC FREQ.

 

Here's an example at finding all two word combinations or how to split a file:

 

*Create sample data;
data random_sentences;
	infile cards truncover;
	informat sentence $256.;
	input sentence $256.;
	cards;
This is a random sentence
This is another random sentence
Happy Birthday
My job sucks.
This is a good idea, not.
This is an awesome idea!
How are you today?
Does this make sense?
Have a great day!
;

;
;
;
*Partition into words;
data f1;
	set random_sentences;
	id=_n_;
	nwords=countw(sentence);
	nchar=length(compress(sentence));

	do word_order=1 to nwords;
		word=scan(sentence, word_order);
		output;
	end;
run;

proc sql;
	create table words2 as
		select t1.sentence, lowcase(t1.word) as word1, lowcase(t2.word) as word2
			from f1 as t1
				cross join f1 as t2
			where t1.sentence=t2.sentence 
				and t1.word_order > t2.word_order
			order by t1.sentence, t1.word_order;
quit;

proc freq data=words2 noprint order=freq;
	table word1*word2 /list out=want;
run;
Super User
Posts: 10,280

Re: How many times a word occurs in a string?

Posted in reply to silvergrenade
count = 0;
do i = 1 to countw(yourvar);
  count = count + (scan(yourvar,i) = testvar);
end;
drop i;

Applied to your example,

data want;
testvar = 'is';
yourvar = "This is is a great day";
count = 0;
do i = 1 to countw(yourvar);
  count = count + (scan(yourvar,i) = testvar);
end;
drop i;
run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Respected Advisor
Posts: 3,066

Re: How many times a word occurs in a string?

[ Edited ]
Posted in reply to silvergrenade

Example: A variable has a value "This is is a great day" for row 1
I need to know how many time 'is' occurred. 

 

 


Seems to me the answer is 3. Is that right? Can you confirm? Your title indicates you are counting words, but your first sentence indicates you want to know how many times a STRING appears in a bigger string. These are not the same! We need clarification.

 

The code provided by others is looking for entire words that match 'is' and would not count the letters 'is' in the word "This".

 

 

--
Paige Miller
Occasional Contributor
Posts: 14

Re: How many times a word occurs in a string?

Posted in reply to PaigeMiller

Hi Paige,

 

I'm sorry for that error.

I didnt see the is in This.

Yes, the count should be 3.

 

Thank you.

Super User
Posts: 10,280

Re: How many times a word occurs in a string?

Posted in reply to silvergrenade

So it should be

"count how many times a string appears in another string"

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Occasional Contributor
Posts: 14

Re: How many times a word occurs in a string?

Posted in reply to KurtBremser

Hi Kurt,

 

Thank you for your reply.

I figured out a way on achieving my problem. Is there a better way of achieving the same?

OpportunityDescription is my variable while "F2F", "face to face" are my test strings.

 

 

 

data out.open_records_opp_desc;
set out.open_records;

Count_1 = count(OpportunityDescription,'F2F','i');

Count_2 = count(OpportunityDescription,'face to face','i');
Count_3 = count(OpportunityDescription,'in person','i');
Count_4 = count(OpportunityDescription,'meeting','i');
Count_5 = count(OpportunityDescription,'met','i');
Count_6 = count(OpportunityDescription,'discussed','i');
Count_7 = count(OpportunityDescription,'one on one','i');
Count_8 = count(OpportunityDescription,'discussed','i');
Count_9 = count(OpportunityDescription,'contact','i');

count_f2f_occurance = sum(Count_1,Count_2,Count_3,Count_4,Count_5,Count_6,Count_7,Count_8,Count_9);

run;

 

Ask a Question
Discussion stats
  • 6 replies
  • 204 views
  • 4 likes
  • 4 in conversation