03-10-2017 04:00 PM
I'm trying to scan a column (using base SAS) that contains notations that agents enter when dealing with customers. The problem is that i am getting many false positives. That is, when coding a string of words to flag, I get records that flag 1, yet without the words being in the value.
I tried using both index function and the prxmatch function, with same result. The column that I am scanning is formatted as $2000.
Example of my query:
data want;set have:
if prxmatch("m/COUNTER|COUNTERS|CW30|REISSUE|REISSUED|STRATEGY|RE-OPEN|RE-ISSUE|APPROVE|PIN|SECURITY|TRANSACTION|TRANSACTIONS/i",column_have)> 0 then found=1;
If you have a suggestion as an alternate way to flag for presence of words in given field, with better accuracy, please advise.
03-10-2017 04:21 PM
Hi Ballardw, here is an example:
MDECLINED LOC INCR TO 4K$ FOR X-MAS SHOPPING, SCORE 684/3/471, EST ON BURO SINCE 1990, NO DEROGS ON BURO, DEBT RATIO HIGH, HIGHEST TRADE 18K$ ALL MAX ON OTHER TRADE, GOOD PYMT HISTORY.
03-10-2017 04:26 PM
Assuming you issue is finding substrings instead of whole words try adding \b to signify a word boundry on both sides of your term list
03-10-2017 04:58 PM
You can use alternative way.:
data _NULL_; column_have = "MDECLINED LOC INCR TO 4K$ FOR X-MAS SHOPPING, SCORE 684/3/471, EST ON BURO SINCE 1990, NO DEROGS ON BURO, DEBT RATIO HIGH, HIGHEST TRADE 18K$ ALL MAX ON OTHER TRADE, GOOD PYMT HISTORY."; chk_for ="COUNTER|COUNTERS|CW30|REISSUE|REISSUED|STRATEGY|RE-OPEN|RE-ISSUE|APPROVE|PIN|SECURITY|TRANSACTION|TRANSACTIONS"; do i=1 to 50; word = scan(chk_for,i,'|'); if word = ' ' then leave; pos = findw(column_have,word); put word= pos=; end; RUN;
none of the words in chk_for variable were found. All have position 0.
03-13-2017 12:40 PM
ok, I think i figured it out. The result I was getting, not true false postives... but is the result of my string |PIN| which flagged word SHOPPING. To avoid hitting this, i could add a space before and after, | PIN |. So i think this closes this message