Hi @ballardw and SAS Users, I am back
Thank you for your suggestion yesterday, I did quite a bit of searches and end up with something as below
First off, regarding FINDW and INDEXW, I prefer to use FINDW because it is more updated compared to INDEXW.
So, I have a simple code to delete the observation of ENAME that contains any of specific words as below
data screen1;
set arg_merge2;
array delwords {84} $ 15 _temporary_ ('DUPLICATE', 'DUPL', 'DUP', 'DUPE','DULP', 'DUPLI',
'1000DUPL','XSQ','XET','ADR','GDR','PREFERRED','PF','PFD','PREF', ''PF'' /*note here*/ ,'WARRANT','WARRANTS','WTS','WTS2','WARRT',
'DEB','DB','DCB','DEBT','DEBENTURES','DEBENTURE','RLST IT','INVESTMENT TRUST','INV TST','UNIT TRUST','UNT TST',
'TRUST UNITS','TST UNITS','TRUST UNIT','TST UNIT','UT','IT.','IT','500','BOND','DEFER','DEP','DEPY','ELKS','ETF', 'FUND','FD','IDX','INDEX','LP','MIPS','MITS','MITT','MPS','NIKKEI','NOTE',
'PERQS','PINES','PRTF','PTNS','PTSHP','QUIBS','QUIDS','RATE','RCPTS',
'RECEIPTS','REIT','RETUR','SCORE','SPDR','STRYPES','TOPRS','UNIT','UNT',
'UTS','WTS','XXXXX','YIELD','YLD','EXPIRED','EXPD','EXPIRY','EXPY');
do i= 1 to dim(delwords);
if findw(ENAME,delword[i],'','eir') >0 then delete;
end;
run;
And how to track
data screen1 (drop=row)
deleted (keep=row)
;
set arg_merge2;
row=_n_;
array delwords {84} $ 15 _temporary_ ('DUPLICATE', 'DUPL', 'DUP', 'DUPE','DULP', 'DUPLI',
'1000DUPL','XSQ','XET','ADR','GDR','PREFERRED','PF','PFD','PREF','PF','WARRANT','WARRANTS','WTS','WTS2','WARRT',
'DEB','DB','DCB','DEBT','DEBENTURES','DEBENTURE','RLST IT','INVESTMENT TRUST','INV TST','UNIT TRUST','UNT TST',
'TRUST UNITS','TST UNITS','TRUST UNIT','TST UNIT','UT','IT.','IT','500','BOND','DEFER','DEP','DEPY','ELKS','ETF','FUND','FD','IDX','INDEX','LP','MIPS','MITS','MITT','MPS','NIKKEI','NOTE',
'PERQS','PINES','PRTF','PTNS','PTSHP','QUIBS','QUIDS','RATE','RCPTS',
'RECEIPTS','REIT','RETUR','SCORE','SPDR','STRYPES','TOPRS','UNIT','UNT',
'UTS','WTS','XXXXX','YIELD','YLD','EXPIRED','EXPD','EXPIRY','EXPY');
do i= 1 to dim(delwords);
if findw(ENAME,delword[i],'','eir') >0 then do; /* e:return the order of word in the string i: ignore case letter r: removes traling or leading delimiters */
output deleted;
delete;
end;
else output screen1;
run;
I am not sure if the written code is correct?
If they are correct, can I ask three further questions?
1. How come I retain the observations NOT contain any of these words rather than delete the observations containing any of these words (to make it compatible with my previous code)
2. In the code above, I did not use upcase function because all the observations of ENAME are in upcase already. Apart from that, I also did not use function strip because the 'r' modifier in FINDW also satisfies the "removing leading or trailing blank" already. I am not sure if they are correct?
3. Regarding the word 'PF' in the array, can I use a double quotation to search for it? for example, in the data, I have such an example
should I search for 'B' or ''B''.
4. I also add the data under the SAS data file under sas7bdat format, hopefully, that this dataset can satisfy your requirement
... View more