How do I delete prepositions/conjunctions/auxiliary verbs from a string? My strings have a length of 32,767
Base SAS doesn't contain any functionality to identify language components. You are limited to word and character pattern matches.
SAS Text Miner probably has more capabilities, but I doubt it can parse grammatical terms.
Hi @mar00390
You nede to create a list of stop words. There might be something to download as a starting point, but otherwise it's just hard work. Given the list and a SAS data set with your strings, an easy solution is to use a format to pick the stop words in the string. The following working code shows the principles.
You need to set proper lengths etc. to make it work with your data. Given your word classes it is probably unnecessary to handle uppercase/lowercase words, but it can be done with a lowcase function on teststr. And be aware that words in the output string are always separated by one blank even if there are more in the input string.
* Test data;
data stopwords;
input stopword $20.;
cards;
abc
xyz
;
run;
data have;
infile cards truncover;
input string $char200.;
cards;
aaa abc bbbbbbbbbb c 123 dddd ffff xyz
123 zzzzzzzzzz xyz hhhhhh
;
run;
* Create format;
data stopfmt; set stopwords end=end;
retain type 'C' fmtname 'stopfmt';
start = stopword;
label = stopword;
output;
if end then do;
hlo = 'O';
start = '';
label = '';
output;
end;
run;
proc format cntlin=stopfmt;
run;
* Remove all words defined as stop words from string;
data want (drop=i teststr); set have;
length newstr $200 teststr $50;
do i = 1 to countw(string,' ');
teststr = scan(string,i,' ');
if put(teststr,$stopfmt.) = '' then newstr = catx(' ',newstr,teststr);
end;
run;
This kept it the same. Is there a reason it wouldn't work?
Hi @mar00390
Sorry, but I need to know a bit more to answer that.
If you ran my example code, then notice that the original string is also kept in output, so you have before/after in each record.
If you used your own data, I need to have an example, at least one stop word and a string where the stop word occurs. Then I'll look into it.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.