BookmarkSubscribeRSS Feed
mar00390
Fluorite | Level 6

How do I delete prepositions/conjunctions/auxiliary verbs from a string? My strings have a length of 32,767 

7 REPLIES 7
SASKiwi
PROC Star

Base SAS doesn't contain any functionality to identify language components. You are limited to word and character pattern matches.

 

SAS Text Miner probably has more capabilities, but I doubt it can parse grammatical terms.

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @mar00390 

 

You nede to create a list of stop words. There might be something to download as a starting point, but otherwise it's just hard work. Given the list and a SAS data set with your strings, an easy solution is to use a format to pick the stop words in the string. The following working code shows the principles.

 

You need to set proper lengths etc. to make it work with your data. Given your word classes it is probably unnecessary to handle uppercase/lowercase words, but it can be done with a lowcase function on teststr. And be aware that words in the output string are always separated by one blank even if there are more in the input string.

 

* Test data;
data stopwords;
	input stopword $20.;
	cards;
abc 
xyz 
;
run;

data have;
	infile cards truncover;
	input string $char200.;
	cards;
aaa abc bbbbbbbbbb c 123 dddd ffff xyz
123 zzzzzzzzzz xyz hhhhhh
;
run;

* Create format;
data stopfmt; set stopwords end=end;
	retain type 'C' fmtname 'stopfmt';
	start = stopword;
	label = stopword;
	output;
	if end then do;
		hlo = 'O';
		start = '';
		label = '';
		output;
	end;

run;
proc format cntlin=stopfmt;
run;

* Remove all words defined as stop words from string;
data want (drop=i teststr); set have;
	length newstr $200 teststr $50;
	do i = 1 to countw(string,' ');
		teststr = scan(string,i,' ');
		if put(teststr,$stopfmt.) = '' then newstr = catx(' ',newstr,teststr);
	end;
run;
ErikLund_Jensen
Rhodochrosite | Level 12
A smart guy with a better command of hash objects would give a more elegant solution without the format step.
Reeza
Super User
Pretty sure the OP is using TextMiner though.
mar00390
Fluorite | Level 6

This kept it the same. Is there a reason it wouldn't work?

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @mar00390 

 

Sorry, but I need to know a bit more to answer that.

 

If you ran my example code, then notice that the original string is also kept in output, so you have before/after in each record.

 

If you used your own data, I need to have an example, at least one stop word and a string where the stop word occurs. Then I'll look into it.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 2609 views
  • 1 like
  • 4 in conversation