Hello,
I want to match names between tow database. the names are not exactly the same. I want to apprximate and after that make a hand check
I run this program but it gives no match
proc sql ;
create table results as
select secid, FUND_NAME_MS, FUND_NAME
from imf_samplenew , Infra_CRSP
where imf_samplenew.FUND_NAME_MS eqt Infra_CRSP.FUND_NAME ;
quit ;
thanks
@sasphd wrote:
Hello,
I want to match names between tow database. the names are not exactly the same. I want to apprximate and after that make a hand check
I run this program but it gives no match
proc sql ; create table results as select secid, FUND_NAME_MS, FUND_NAME from imf_samplenew , Infra_CRSP where imf_samplenew.FUND_NAME_MS eqt Infra_CRSP.FUND_NAME ; quit ;
thanks
Of course not, you are requiring some form of equality for the length of the shortest name. If one name is "ABC Co" and the other is "ABC CO" they do not match.
If your differences are of case, like the above then you could try:
where upcase(imf_samplenew.FUND_NAME_MS) eqt upcase(Infra_CRSP.FUND_NAME) ;
If you think the differences are one name is part of the other you might try
where index(upcase(Longernamevariable),upcase(shorternamevariable))>0
If the differences are more complex I might try
where compged(imf_samplenew.FUND_NAME_MS,Infra_CRSP.FUND_NAME) < 800 ;
COMPGED is one of the functions that will calculate a "spelling distance" based on some internal rules so minor changes "ABC" and "ABc" have low values .If you have too many really different results reduce the 800, if you don't get matches increase it.
@sasphd wrote:
Hello,
I want to match names between tow database. the names are not exactly the same. I want to apprximate and after that make a hand check
I run this program but it gives no match
proc sql ; create table results as select secid, FUND_NAME_MS, FUND_NAME from imf_samplenew , Infra_CRSP where imf_samplenew.FUND_NAME_MS eqt Infra_CRSP.FUND_NAME ; quit ;
thanks
Of course not, you are requiring some form of equality for the length of the shortest name. If one name is "ABC Co" and the other is "ABC CO" they do not match.
If your differences are of case, like the above then you could try:
where upcase(imf_samplenew.FUND_NAME_MS) eqt upcase(Infra_CRSP.FUND_NAME) ;
If you think the differences are one name is part of the other you might try
where index(upcase(Longernamevariable),upcase(shorternamevariable))>0
If the differences are more complex I might try
where compged(imf_samplenew.FUND_NAME_MS,Infra_CRSP.FUND_NAME) < 800 ;
COMPGED is one of the functions that will calculate a "spelling distance" based on some internal rules so minor changes "ABC" and "ABc" have low values .If you have too many really different results reduce the 800, if you don't get matches increase it.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.