Text Mining Question RegEx

elainathewonder · Posted 08-08-2019 05:33 PM

Is there a way to group together the misspellings of words prior to creating a summary table so when I am searching the data set it uses the one word for all the misspellings (I have all the misspellings listed).

Example:

Replacing like for love, likes, liked, liker

elainathewonder · Posted 08-12-2019 05:38 AM

Hi elainathewonder,

Greetings of the day.

I have done something for you, just have a check and let me know if you mean this.



data test;
 patternID=prxparse("/L\w+E/o");
 input address $80. ;
 position = prxmatch(patternID, address);
 
  if position ^= 0 then address= tranwrd(address,substr(address,POSITION,5),'Love');

 datalines;
Zack Johnson, 153 LirsE Str, Chapel Hill, NC27514
Dan Zack, 67891 64th st, Brea, CA
Sally Johns, 4 Moritz LtreE, Duarte, CA 91010
;

run;

In the above example LirsE, LtreE few words are there which got replaced with a common word 'Love'. So if think like there is some kind of similarity in the misspelling texts you can identify that and parse the same as per example and you are done.

Please check and let me know if there is any disconnect.

Text Mining Question RegEx

Re: Text Mining Question RegEx

Catch up on SAS Innovate 2026