Here is what I tried, using complev. Seems to work pretty well. You might have to experiment with the best distance limit (10 in this example)
libname t01 excel "&sasforum.\datasets\temp01a.xlsx";
libname t02 excel "&sasforum.\datasets\temp02.xlsx";
proc sql;
create table t01 as select * from t01.temp01a;
create table t02 as select * from t02.temp02;
quit;
proc sql;
create table names as
select ticker, upcase(comnam) as name from t01
union
select ticker, upcase(name) as name from t02
order by ticker;
quit;
data nameEqv;
array n{10} $60;
array e{10} $60;
do i = 1 by 1 until(last.ticker);
set names; by ticker;
n{i} = name;
e{i} = name;
do j = 1 to i-1;
if complev(n{j}, n{i}, 10) < 10 then do;
e{i} = e{j};
leave;
end;
end;
eqvName = e{i};
output;
end;
keep ticker name eqvName;
run;
proc sql;
create table t03 as
select
A.*,
upcase(B.name) as otherName,
B.issuerDescription,
B.startDate,
B.endDate
from
(select T01.*, n01.eqvName
from
T01 inner join
nameEqv as n01
on T01.ticker=n01.ticker and upcase(T01.comnam)=n01.name) as A inner join
(select T02.*, n02.eqvName
from
T02 inner join
nameEqv as n02
on T02.ticker=n02.ticker and upcase(T02.name)=n02.name) as B
on A.ticker=B.ticker and A.eqvName=B.eqvName;
quit;
Thank you for your reply. I never used macro before, could you explain what do the first two rows mean (i.e. what "&sasforum.\datasets "stands for) ?
The two libname statements and the first proc SQL step bring the data from your Excel files into two datasets : t01 and t02. The real work starts on the second proc SQL step where a list of all distinct ticker-name combinations from both datasets is assembled.
Thank you PG! That works! there are about 1,600s out of 2,889 observations matched. For the unmatched ones, do i have to match them manually or there is a better way?
Should there always be a match?
You could play with the edit distance limit (set to 10 above). As you increase the limit you will get more matches, but you risk also getting some false matches.
Cases involving name changes will have to be handled separately.
Isn’t there a data set somewhere that has ticker symbols and the dates they belong to a particular company? That would likely be a better way. Also, you’re going to miss name changes, same ticker but company rebrands under a different name. That’s more likely to occur IME than finding companies reusing the same symbol over time.
I would do a distance calc like suggested and take the nearest match.
For sample data, please try this, i know it’s not perfect but it’s a lot easier for people to work with. You should try to include as many variations of the issue as you can.
@Songchan wrote:
Hi guys,
I'm trying to merge two data sets by ticker symbols, but ticker symbols are not unique to every company, they can be reused by other companies, so I also need to merge two data sets by company names, The problem is names for same company in different data sets may be different in formats (such as up or lower case, inc. or corp, with "-" or " " in two words), how could I solve this problem? Please have a look of below codes
Kind regards,
Songchan
proc sql; create table temp03 as select distinct a.*, b.permno, b.comnam, b.ticker, b.date from temp02 as a left join temp01a as b on a.ticker=b.ticker; quit;
Yes, there is a data set for CRSP data (the first of the OP's two data sets), that allows construction of a date range connecting a particular TICKER to a particular PERMNO (the CRSP id variable).
However the OP might not have such a table for the other source, which only uses company name, not an id variable. As I'm sure you realize, that's why I asked the OP whether the ticker in the 2nd data set is correct for the stated date range, or even the ENDDATE (or STARTDATE).
If the second data set has any date though, it would help with the merge, or at least be better than a text merge.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.