Hey guys, I need to remove the words (or terms) with numbers in them from a string. I have tried compress, translate, tranwrd and prxchange but no luck. Codes I have used: data cleaned;
set &SRC_DTST;
NO_NUM_SCHAR=COMPBL(TRANSLATE(upcase(&COLUMN), " " , ".,;:?!-/\+[]%1234567890$@#){}'|^&~*<>("));
NO_NUM_SCHAR=COMPRESS(NO_NUM_SCHAR,,'KAW');
NO_NUM_SCHAR = prxchange('s/\s+/ /oi',-1,trim(NO_NUM_SCHAR));
NO_NUM_SCHAR = TRANWRD(NO_NUM_SCHAR, '09'x, '');
NO_STP_WRD=prxchange('s/\b(JR|SR|III|IV|DECD|THE|A|AN|I|HE|SHE|WE|IT|THEM|TO|AND|AS|OF|FROM|TO|ABOARD|IF|II|IV|OR|NON|ABOUT|HAVE|HAD|HOW|ONE|
NOT|BEEN|ABOVE|ACROSS|AFTER|AGAINST|ALONG|AMID|AMONG|ANTI|AROUND|AS|AT|BEFORE|BEHIND|BELOW|BENEATH|BESIDE|BESIDES|BETWEEN|
BEYOND|BUT|BY|CONCERNING|CONSIDERING|DESPITE|DOWN|DURING|EXCEPT|EXCEPTING|EXCLUDING|FOLLOWING|FOR|FROM|IN|INSIDE|INTO|LIKE|
MINUS|NEAR|OF|OFF|ON|ONTO|OPPOSITE|OUTSIDE|OVER|PAST|PER|PLUS|REGARDING|ROUND|SAVE|SINCE|THAN|THROUGH|TO|TOWARD|TOWARDS|
UNDER|UNDERNEATH|UNLIKE|UNTIL|UP|UPON|VERSUS|VIA|WITH|WITHIN|WITHOUT|FULL|TYPE|NONE|OTHER|MUST|NON|B|C|D|E|F|G|H|I|J|K|L|
M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z)\b/ /o',-1,NO_NUM_SCHAR);
cleaned_desc = COMPBL(STRIP(NO_STP_WRD));
DROP NO_NUM_SCHAR NO_STP_WRD;
run; /* next step is for removing duplicate words*/ data cleaned(keep=Concatenated_Categories LOV_LONG_DSC cleaned_desc); set cleaned; newstring=scan(cleaned_desc, 1, ' '); do i=2 to countw(cleaned_desc,' '); word=scan(cleaned_desc, i, ' '); found=find(newstring, word, 'it'); if found=0 then newstring=catx(' ', newstring, word); end; cleaned_desc= newstring; DROP newstring; run; Input: ATN1 (atrophin 1) (eg, dentatorubral-pallidoluysian atrophy) gene analysis, evaluation to detect abnormal (eg, expanded) alleles My output: ATN ATROPHIN EG DENTATORUBRAL PALLIDOLUYSIAN ATROPHY GENE ANALYSIS EVALUATION DETECT ABNORMAL EXPANDED ALLELES Expected output: ATROPHIN EG DENTATORUBRAL PALLIDOLUYSIAN ATROPHY GENE ANALYSIS EVALUATION DETECT ABNORMAL EXPANDED ALLELES Also good to have: I also want to remove any 2 letter words from the string such as 'EG' in this case. Any guidance will be greatly appreciated.
... View more