BookmarkSubscribeRSS Feed
mar00390
Fluorite | Level 6

How do I delete prepositions/conjunctions/auxiliary verbs from a string? My strings have a length of 32,767 

7 REPLIES 7
SASKiwi
PROC Star

Base SAS doesn't contain any functionality to identify language components. You are limited to word and character pattern matches.

 

SAS Text Miner probably has more capabilities, but I doubt it can parse grammatical terms.

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @mar00390 

 

You nede to create a list of stop words. There might be something to download as a starting point, but otherwise it's just hard work. Given the list and a SAS data set with your strings, an easy solution is to use a format to pick the stop words in the string. The following working code shows the principles.

 

You need to set proper lengths etc. to make it work with your data. Given your word classes it is probably unnecessary to handle uppercase/lowercase words, but it can be done with a lowcase function on teststr. And be aware that words in the output string are always separated by one blank even if there are more in the input string.

 

* Test data;
data stopwords;
	input stopword $20.;
	cards;
abc 
xyz 
;
run;

data have;
	infile cards truncover;
	input string $char200.;
	cards;
aaa abc bbbbbbbbbb c 123 dddd ffff xyz
123 zzzzzzzzzz xyz hhhhhh
;
run;

* Create format;
data stopfmt; set stopwords end=end;
	retain type 'C' fmtname 'stopfmt';
	start = stopword;
	label = stopword;
	output;
	if end then do;
		hlo = 'O';
		start = '';
		label = '';
		output;
	end;

run;
proc format cntlin=stopfmt;
run;

* Remove all words defined as stop words from string;
data want (drop=i teststr); set have;
	length newstr $200 teststr $50;
	do i = 1 to countw(string,' ');
		teststr = scan(string,i,' ');
		if put(teststr,$stopfmt.) = '' then newstr = catx(' ',newstr,teststr);
	end;
run;
ErikLund_Jensen
Rhodochrosite | Level 12
A smart guy with a better command of hash objects would give a more elegant solution without the format step.
Reeza
Super User
Pretty sure the OP is using TextMiner though.
mar00390
Fluorite | Level 6

This kept it the same. Is there a reason it wouldn't work?

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @mar00390 

 

Sorry, but I need to know a bit more to answer that.

 

If you ran my example code, then notice that the original string is also kept in output, so you have before/after in each record.

 

If you used your own data, I need to have an example, at least one stop word and a string where the stop word occurs. Then I'll look into it.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 3408 views
  • 1 like
  • 4 in conversation