Text mining and content categorization

Getting false positives when flagging for words

Reply
Frequent Contributor
Posts: 80

Getting false positives when flagging for words

hi 

 

I'm trying to scan a column (using base SAS) that contains notations that  agents enter when dealing with customers. The problem is that i am getting many false positives. That is, when coding a string of words to flag, I get records that flag 1, yet without the words  being in the value. 

 

I tried using both index function and the prxmatch function, with same result. The column that I am scanning is formatted as $2000.

 

Example of my query:

 

data want;set have:
if prxmatch("m/COUNTER|COUNTERS|CW30|REISSUE|REISSUED|STRATEGY|RE-OPEN|RE-ISSUE|APPROVE|PIN|SECURITY|TRANSACTION|TRANSACTIONS/i",column_have)> 0 then found=1;
else found=0;RUN;

 

If you have a suggestion as an alternate way to flag for presence of words in given field, with better accuracy, please advise.

 

thank you

 

Grand Advisor
Posts: 10,043

Re: Getting false positives when flagging for words

You should also show at least one example of the data that is incorrectly flagged.

Frequent Contributor
Posts: 80

Re: Getting false positives when flagging for words

Hi Ballardw, here is an example:

 

MDECLINED LOC INCR TO 4K$ FOR X-MAS SHOPPING, SCORE 684/3/471, EST ON BURO SINCE 1990, NO DEROGS ON BURO, DEBT RATIO HIGH, HIGHEST TRADE 18K$ ALL MAX ON OTHER TRADE, GOOD PYMT HISTORY.

Trusted Advisor
Posts: 1,297

Re: Getting false positives when flagging for words

Assuming you issue is finding substrings instead of whole words try adding \b to signify a word boundry on both sides of your term list

 

/\b(COUNTER|COUNTERS|CW30|REISSUE|REISSUED|STRATEGY|RE-OPEN|RE-ISSUE|APPROVE|PIN|SECURITY|TRANSACTION|TRANSACTIONS)\b/i
Super User
Posts: 1,159

Re: Getting false positives when flagging for words

You can use alternative way.:

 

data _NULL_;
     column_have = "MDECLINED LOC INCR TO 4K$ FOR X-MAS SHOPPING, SCORE 684/3/471, EST ON BURO SINCE 1990, NO DEROGS ON BURO, DEBT RATIO HIGH, HIGHEST TRADE 18K$ ALL MAX ON OTHER TRADE, GOOD PYMT HISTORY.";
     chk_for ="COUNTER|COUNTERS|CW30|REISSUE|REISSUED|STRATEGY|RE-OPEN|RE-ISSUE|APPROVE|PIN|SECURITY|TRANSACTION|TRANSACTIONS";
     do i=1 to 50;
        word = scan(chk_for,i,'|');
        if word = ' ' then leave;
        pos = findw(column_have,word); put word= pos=; 
     end;
RUN;

none of the words in chk_for variable were found. All have position 0.

Frequent Contributor
Posts: 80

Re: Getting false positives when flagging for words

thanks for the tip... i ll try it on Monday
Frequent Contributor
Posts: 80

Re: Getting false positives when flagging for words

thanks... issue is primarily finding whole word match... not sure why I am getting false positves
Frequent Contributor
Posts: 80

Re: Getting false positives when flagging for words

ok, I think i figured it out. The result I was getting, not true false postives... but is the result of my string |PIN| which flagged word SHOPPING. To avoid hitting this, i could add a space before and after, | PIN |. So i think this closes this message

Ask a Question
Discussion stats
  • 7 replies
  • 116 views
  • 1 like
  • 4 in conversation