hi
I'm trying to scan a column (using base SAS) that contains notations that agents enter when dealing with customers. The problem is that i am getting many false positives. That is, when coding a string of words to flag, I get records that flag 1, yet without the words being in the value.
I tried using both index function and the prxmatch function, with same result. The column that I am scanning is formatted as $2000.
Example of my query:
data want;set have:
if prxmatch("m/COUNTER|COUNTERS|CW30|REISSUE|REISSUED|STRATEGY|RE-OPEN|RE-ISSUE|APPROVE|PIN|SECURITY|TRANSACTION|TRANSACTIONS/i",column_have)> 0 then found=1;
else found=0;RUN;
If you have a suggestion as an alternate way to flag for presence of words in given field, with better accuracy, please advise.
thank you
You should also show at least one example of the data that is incorrectly flagged.
Hi Ballardw, here is an example:
MDECLINED LOC INCR TO 4K$ FOR X-MAS SHOPPING, SCORE 684/3/471, EST ON BURO SINCE 1990, NO DEROGS ON BURO, DEBT RATIO HIGH, HIGHEST TRADE 18K$ ALL MAX ON OTHER TRADE, GOOD PYMT HISTORY.
Assuming you issue is finding substrings instead of whole words try adding \b to signify a word boundry on both sides of your term list
/\b(COUNTER|COUNTERS|CW30|REISSUE|REISSUED|STRATEGY|RE-OPEN|RE-ISSUE|APPROVE|PIN|SECURITY|TRANSACTION|TRANSACTIONS)\b/i
You can use alternative way.:
data _NULL_;
column_have = "MDECLINED LOC INCR TO 4K$ FOR X-MAS SHOPPING, SCORE 684/3/471, EST ON BURO SINCE 1990, NO DEROGS ON BURO, DEBT RATIO HIGH, HIGHEST TRADE 18K$ ALL MAX ON OTHER TRADE, GOOD PYMT HISTORY.";
chk_for ="COUNTER|COUNTERS|CW30|REISSUE|REISSUED|STRATEGY|RE-OPEN|RE-ISSUE|APPROVE|PIN|SECURITY|TRANSACTION|TRANSACTIONS";
do i=1 to 50;
word = scan(chk_for,i,'|');
if word = ' ' then leave;
pos = findw(column_have,word); put word= pos=;
end;
RUN;
none of the words in chk_for variable were found. All have position 0.
ok, I think i figured it out. The result I was getting, not true false postives... but is the result of my string |PIN| which flagged word SHOPPING. To avoid hitting this, i could add a space before and after, | PIN |. So i think this closes this message
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.