Hello All, I have a year's worth of customer transactions (c.11m observations). This dataset shows where customers have spent/transferred their money and the goal of the analysis is to identify behaviours that can be used for customer scoring models. However, the problem I have now is that the merchant names (companies where customers spent their money) vary greatly. After doing a proc summary, there are c.290k types of merchant names. But a lot of them are the same thing, just spelt differently, e.g. Sainsburys vs. JSainsburys vs. J Sainsburys vs. www.sainsburys.com etc. I wanted to do some kind of filter where the SAS program could recognise similar names and make a separate variable with a corrected name. I used the FIND and FINDW functions but it just did not want to work (most likely I made a mistake somewhere). Reading this forum I also found the PRXMATCH function, and have tried it below and there has been some success. Problem is, in the output file I can see that the PRXM is doing something. It correctly identifies merchants that are not in the below IF statements and gives them a 0 PRXM flag. However, the merchants that are listed below, their PRXM flag and new merchant names are just blank. So it's doing something, and on the right track, but now what I want. I read somewhere that the FIND function can only work on a single word/string in a SAS data step, so the below code maybe wouldn't work if I replaced PRXM with FIND. But I'm having a hard time finding the appropriate literature for what I want. in most cases the user just wants to replace a single string in a variable with a different string OR filter out using a WHERE CONTAINS kind of function. I don't want to do that as I am trying to clean up the data rather than cut-up/delete observations. Any help would be greatly appreciated. data clean4; set clean3; length MRCHNT_NM_V4 $30; if prxmatch("m/MCDONALD/oi",MRCHNT_NM_V3) >0 then MRCHNT_NM_V4="MCDONALDS" and prxm_FG=1; else if prxmatch("m/ITUNES/oi",MRCHNT_NM_V3) >0 then MRCHNT_NM_V4="ITUNES" and prxm_FG=1; else if prxmatch("m/ASDA/oi",MRCHNT_NM_V3) > 0 then MRCHNT_NM_V4="ASDA" and prxm_FG=1; else if prxmatch("m/SAINSBURY/oi",MRCHNT_NM_V3) > 0 then MRCHNT_NM_V4="SAINSBURYS" and prxm_FG=1; else if prxmatch("m/LIDL/oi",MRCHNT_NM_V3) > 0 then MRCHNT_NM_V4="LIDL" and prxm_FG=1; else if prxmatch("m/ALDI/oi",MRCHNT_NM_V3) > 0 then MRCHNT_NM_V4="ALDI" and prxm_FG=1; else if prxmatch("m/AMAZON/oi",MRCHNT_NM_V3) > 0 then MRCHNT_NM_V4="AMAZON" and prxm_FG=1; else if prxmatch("m/MORRISON/oi",MRCHNT_NM_V3) > 0 then MRCHNT_NM_V4="MORRISONS" and prxm_FG=1; else if prxmatch("m/NETFLIX/oi",MRCHNT_NM_V3) > 0 then MRCHNT_NM_V4="NETFLIX" and prxm_FG=1; else if prxmatch("m/FACEBOOK/oi",MRCHNT_NM_V3) > 0 then MRCHNT_NM_V4="FACEBOOK" and prxm_FG=1; else if prxmatch("m/PAYPAL/oi",MRCHNT_NM_V3) > 0 then MRCHNT_NM_V4="PAYPAL" and prxm_FG=1; else if prxmatch("m/SPOTIFY/oi",MRCHNT_NM_V3) > 0 then MRCHNT_NM_V4="SPOTIFY" and prxm_FG=1; else if prxmatch("m/SPAR/oi",MRCHNT_NM_V3) > 0 then MRCHNT_NM_V4="SPAR" and prxm_FG=1; else prxm_FG=0; run;
... View more