BookmarkSubscribeRSS Feed
brulard
Pyrite | Level 9

hi 

 

I'm trying to scan a column (using base SAS) that contains notations that  agents enter when dealing with customers. The problem is that i am getting many false positives. That is, when coding a string of words to flag, I get records that flag 1, yet without the words  being in the value. 

 

I tried using both index function and the prxmatch function, with same result. The column that I am scanning is formatted as $2000.

 

Example of my query:

 

data want;set have:
if prxmatch("m/COUNTER|COUNTERS|CW30|REISSUE|REISSUED|STRATEGY|RE-OPEN|RE-ISSUE|APPROVE|PIN|SECURITY|TRANSACTION|TRANSACTIONS/i",column_have)> 0 then found=1;
else found=0;RUN;

 

If you have a suggestion as an alternate way to flag for presence of words in given field, with better accuracy, please advise.

 

thank you

 

7 REPLIES 7
ballardw
Super User

You should also show at least one example of the data that is incorrectly flagged.

brulard
Pyrite | Level 9

Hi Ballardw, here is an example:

 

MDECLINED LOC INCR TO 4K$ FOR X-MAS SHOPPING, SCORE 684/3/471, EST ON BURO SINCE 1990, NO DEROGS ON BURO, DEBT RATIO HIGH, HIGHEST TRADE 18K$ ALL MAX ON OTHER TRADE, GOOD PYMT HISTORY.

FriedEgg
SAS Employee

Assuming you issue is finding substrings instead of whole words try adding \b to signify a word boundry on both sides of your term list

 

/\b(COUNTER|COUNTERS|CW30|REISSUE|REISSUED|STRATEGY|RE-OPEN|RE-ISSUE|APPROVE|PIN|SECURITY|TRANSACTION|TRANSACTIONS)\b/i
Shmuel
Garnet | Level 18

You can use alternative way.:

 

data _NULL_;
     column_have = "MDECLINED LOC INCR TO 4K$ FOR X-MAS SHOPPING, SCORE 684/3/471, EST ON BURO SINCE 1990, NO DEROGS ON BURO, DEBT RATIO HIGH, HIGHEST TRADE 18K$ ALL MAX ON OTHER TRADE, GOOD PYMT HISTORY.";
     chk_for ="COUNTER|COUNTERS|CW30|REISSUE|REISSUED|STRATEGY|RE-OPEN|RE-ISSUE|APPROVE|PIN|SECURITY|TRANSACTION|TRANSACTIONS";
     do i=1 to 50;
        word = scan(chk_for,i,'|');
        if word = ' ' then leave;
        pos = findw(column_have,word); put word= pos=; 
     end;
RUN;

none of the words in chk_for variable were found. All have position 0.

brulard
Pyrite | Level 9
thanks for the tip... i ll try it on Monday
brulard
Pyrite | Level 9
thanks... issue is primarily finding whole word match... not sure why I am getting false positves
brulard
Pyrite | Level 9

ok, I think i figured it out. The result I was getting, not true false postives... but is the result of my string |PIN| which flagged word SHOPPING. To avoid hitting this, i could add a space before and after, | PIN |. So i think this closes this message

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1046 views
  • 1 like
  • 4 in conversation