BookmarkSubscribeRSS Feed
brulard
Pyrite | Level 9

hi 

 

I'm trying to scan a column (using base SAS) that contains notations that  agents enter when dealing with customers. The problem is that i am getting many false positives. That is, when coding a string of words to flag, I get records that flag 1, yet without the words  being in the value. 

 

I tried using both index function and the prxmatch function, with same result. The column that I am scanning is formatted as $2000.

 

Example of my query:

 

data want;set have:
if prxmatch("m/COUNTER|COUNTERS|CW30|REISSUE|REISSUED|STRATEGY|RE-OPEN|RE-ISSUE|APPROVE|PIN|SECURITY|TRANSACTION|TRANSACTIONS/i",column_have)> 0 then found=1;
else found=0;RUN;

 

If you have a suggestion as an alternate way to flag for presence of words in given field, with better accuracy, please advise.

 

thank you

 

7 REPLIES 7
ballardw
Super User

You should also show at least one example of the data that is incorrectly flagged.

brulard
Pyrite | Level 9

Hi Ballardw, here is an example:

 

MDECLINED LOC INCR TO 4K$ FOR X-MAS SHOPPING, SCORE 684/3/471, EST ON BURO SINCE 1990, NO DEROGS ON BURO, DEBT RATIO HIGH, HIGHEST TRADE 18K$ ALL MAX ON OTHER TRADE, GOOD PYMT HISTORY.

FriedEgg
SAS Employee

Assuming you issue is finding substrings instead of whole words try adding \b to signify a word boundry on both sides of your term list

 

/\b(COUNTER|COUNTERS|CW30|REISSUE|REISSUED|STRATEGY|RE-OPEN|RE-ISSUE|APPROVE|PIN|SECURITY|TRANSACTION|TRANSACTIONS)\b/i
Shmuel
Garnet | Level 18

You can use alternative way.:

 

data _NULL_;
     column_have = "MDECLINED LOC INCR TO 4K$ FOR X-MAS SHOPPING, SCORE 684/3/471, EST ON BURO SINCE 1990, NO DEROGS ON BURO, DEBT RATIO HIGH, HIGHEST TRADE 18K$ ALL MAX ON OTHER TRADE, GOOD PYMT HISTORY.";
     chk_for ="COUNTER|COUNTERS|CW30|REISSUE|REISSUED|STRATEGY|RE-OPEN|RE-ISSUE|APPROVE|PIN|SECURITY|TRANSACTION|TRANSACTIONS";
     do i=1 to 50;
        word = scan(chk_for,i,'|');
        if word = ' ' then leave;
        pos = findw(column_have,word); put word= pos=; 
     end;
RUN;

none of the words in chk_for variable were found. All have position 0.

brulard
Pyrite | Level 9
thanks for the tip... i ll try it on Monday
brulard
Pyrite | Level 9
thanks... issue is primarily finding whole word match... not sure why I am getting false positves
brulard
Pyrite | Level 9

ok, I think i figured it out. The result I was getting, not true false postives... but is the result of my string |PIN| which flagged word SHOPPING. To avoid hitting this, i could add a space before and after, | PIN |. So i think this closes this message

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1235 views
  • 1 like
  • 4 in conversation