BookmarkSubscribeRSS Feed
buddha_d
Pyrite | Level 9

SAS gurus,

            I have 2 data sets that i need to merge. The merging variable is character variable with length of 100. One variable name (name1) in dataset one is similar dataset two , but not exact name (like name2). please check the attachment to see the data. Any suggestions is appreciated.
         I have searched for options online and wasn't clear of what functions yield best results for my data. Please drop a line or two explaining what you are doing and what is the rationale behind it. 

 

thanks in advance.

3 REPLIES 3
foobarbaz
Obsidian | Level 7

Hi,

Look at the use of phonetic encoding algorithms or edit distance style measures which calculate the cost of turning string 1 into string 2.  In SAS these are the soundex function and complev which calculates edit distance.  If you have SAS Data Management you can also use the DQMATCH procedure to create a matchcode in DataFlux which performs a similar phonetically encoded string.  It's typically a good idea to clean your data beforehand as well.  Do things like;

 - make everything the same case

 - Remove punctuation if it is not important

 - Compress spaces  etc. if not important.  Look at the compress function for this.

 - Remove repeating terms and letters using compbl if applicable.

 - Standardise common terms such as company prefixes etc.

 

Once that is done, then the method is to create a new variable using soundex and merge by that new variable then use complev to calculate the similarity.  The lower the score the closer they match.

 

HTH.

Cameron

Regards,
Cameron | Selerity
s_lassen
Meteorite | Level 14

That's not so easy. You basically need to do an outer join, on the condition that one or more words from one name is in the other name, or vice versa. Here is a possible solution that uses a datastep to create the outer product (by reading every observation from the second dataset with POINT=) and compare:

data one;
input name1 $40. /amount1;
cards;
wwwamazoncom
100.5
toysrus
50.25
OLIVE GARDEN
61.85
walMart
86.24
;run;

data two;
input name2 $40. /amount2;
cards;
US AMAZON AR USA
25.68
online toysrus newjersey us
126.98
ORDER olivegarden Washington DC
29.99
us wwwwalmartcom toys texas
75.86
;run;

data want;
  set one;
  score=0;
  length common $60;
  do _N_=1 to nobs;
    set two nobs=nobs point=_N_;
    common=' ';
    score=0;
    do i=1 to countw(name1);
      if length(scan(name1,i))<3 then continue;
      if find(name2,scan(name1,i),'i') then do;
        score=score+1;
        if not findw(common,scan(name1,i),' ','i') then
          call catx(' ',common,scan(name1,i));
        end;
      end;
  do i=1 to countw(name2);
      if length(scan(name2,i))<3 then continue;
      if find(name1,scan(name2,i),'i') then do;
        score=score+1;
        if not findw(common,scan(name2,i),' ','i') then
          call catx(' ',common,scan(name2,i));
        end;
      end;
    if score>0 then output;
    end;
run;

Note that one pair of values were joined just on the word "toys", you may want to increase the length value in the line with "continue".

buddha_d
Pyrite | Level 9

Thank you s_lassen and foobarbaz for replying. The solution given by S-lassen only applies to this sample data and we all know real world data is messy. But, I got valuable information from both of you and Thanks again. 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 483 views
  • 0 likes
  • 3 in conversation