Hello,
I want to match names between tow database. the names are not exactly the same. I want to apprximate and after that make a hand check
I run this program but it gives no match
proc sql ;
create table results as
select secid, FUND_NAME_MS, FUND_NAME
from imf_samplenew , Infra_CRSP
where imf_samplenew.FUND_NAME_MS eqt Infra_CRSP.FUND_NAME ;
quit ;
thanks
@sasphd wrote:
Hello,
I want to match names between tow database. the names are not exactly the same. I want to apprximate and after that make a hand check
I run this program but it gives no match
proc sql ; create table results as select secid, FUND_NAME_MS, FUND_NAME from imf_samplenew , Infra_CRSP where imf_samplenew.FUND_NAME_MS eqt Infra_CRSP.FUND_NAME ; quit ;
thanks
Of course not, you are requiring some form of equality for the length of the shortest name. If one name is "ABC Co" and the other is "ABC CO" they do not match.
If your differences are of case, like the above then you could try:
where upcase(imf_samplenew.FUND_NAME_MS) eqt upcase(Infra_CRSP.FUND_NAME) ;
If you think the differences are one name is part of the other you might try
where index(upcase(Longernamevariable),upcase(shorternamevariable))>0
If the differences are more complex I might try
where compged(imf_samplenew.FUND_NAME_MS,Infra_CRSP.FUND_NAME) < 800 ;
COMPGED is one of the functions that will calculate a "spelling distance" based on some internal rules so minor changes "ABC" and "ABc" have low values .If you have too many really different results reduce the 800, if you don't get matches increase it.
@sasphd wrote:
Hello,
I want to match names between tow database. the names are not exactly the same. I want to apprximate and after that make a hand check
I run this program but it gives no match
proc sql ; create table results as select secid, FUND_NAME_MS, FUND_NAME from imf_samplenew , Infra_CRSP where imf_samplenew.FUND_NAME_MS eqt Infra_CRSP.FUND_NAME ; quit ;
thanks
Of course not, you are requiring some form of equality for the length of the shortest name. If one name is "ABC Co" and the other is "ABC CO" they do not match.
If your differences are of case, like the above then you could try:
where upcase(imf_samplenew.FUND_NAME_MS) eqt upcase(Infra_CRSP.FUND_NAME) ;
If you think the differences are one name is part of the other you might try
where index(upcase(Longernamevariable),upcase(shorternamevariable))>0
If the differences are more complex I might try
where compged(imf_samplenew.FUND_NAME_MS,Infra_CRSP.FUND_NAME) < 800 ;
COMPGED is one of the functions that will calculate a "spelling distance" based on some internal rules so minor changes "ABC" and "ABc" have low values .If you have too many really different results reduce the 800, if you don't get matches increase it.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.