BookmarkSubscribeRSS Feed
Obsidian | Level 7

Hi All,


How can I count the number of times a string occurs in a bigger string in SAS?

Basically a variable with a lot of text and I need to determine how many times a certain thing occurs in each of the rows


Example: A variable has a value "This is is a great day" for row 1
I need to know how many time 'is' occurred. 

Similarly for all rows


I have been using prxmatch to know if it occurs or not. Cant figure out how to calculate number of occurrences.





Super User

How much further are you taking this analysis?
My suggestion would be to separate each word from a sentence into it's own record and then use PROC FREQ.


Here's an example at finding all two word combinations or how to split a file:


*Create sample data;
data random_sentences;
	infile cards truncover;
	informat sentence $256.;
	input sentence $256.;
This is a random sentence
This is another random sentence
Happy Birthday
My job sucks.
This is a good idea, not.
This is an awesome idea!
How are you today?
Does this make sense?
Have a great day!

*Partition into words;
data f1;
	set random_sentences;

	do word_order=1 to nwords;
		word=scan(sentence, word_order);

proc sql;
	create table words2 as
		select t1.sentence, lowcase(t1.word) as word1, lowcase(t2.word) as word2
			from f1 as t1
				cross join f1 as t2
			where t1.sentence=t2.sentence 
				and t1.word_order > t2.word_order
			order by t1.sentence, t1.word_order;

proc freq data=words2 noprint order=freq;
	table word1*word2 /list out=want;
Super User
count = 0;
do i = 1 to countw(yourvar);
  count = count + (scan(yourvar,i) = testvar);
drop i;

Applied to your example,

data want;
testvar = 'is';
yourvar = "This is is a great day";
count = 0;
do i = 1 to countw(yourvar);
  count = count + (scan(yourvar,i) = testvar);
drop i;
Diamond | Level 26

Example: A variable has a value "This is is a great day" for row 1
I need to know how many time 'is' occurred. 



Seems to me the answer is 3. Is that right? Can you confirm? Your title indicates you are counting words, but your first sentence indicates you want to know how many times a STRING appears in a bigger string. These are not the same! We need clarification.


The code provided by others is looking for entire words that match 'is' and would not count the letters 'is' in the word "This".



Paige Miller
Obsidian | Level 7

Hi Paige,


I'm sorry for that error.

I didnt see the is in This.

Yes, the count should be 3.


Thank you.

Obsidian | Level 7

Hi Kurt,


Thank you for your reply.

I figured out a way on achieving my problem. Is there a better way of achieving the same?

OpportunityDescription is my variable while "F2F", "face to face" are my test strings.




data out.open_records_opp_desc;
set out.open_records;

Count_1 = count(OpportunityDescription,'F2F','i');

Count_2 = count(OpportunityDescription,'face to face','i');
Count_3 = count(OpportunityDescription,'in person','i');
Count_4 = count(OpportunityDescription,'meeting','i');
Count_5 = count(OpportunityDescription,'met','i');
Count_6 = count(OpportunityDescription,'discussed','i');
Count_7 = count(OpportunityDescription,'one on one','i');
Count_8 = count(OpportunityDescription,'discussed','i');
Count_9 = count(OpportunityDescription,'contact','i');

count_f2f_occurance = sum(Count_1,Count_2,Count_3,Count_4,Count_5,Count_6,Count_7,Count_8,Count_9);



Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 4 in conversation