Hi SAS Community,
I have a data set ("WORK.PTMatchUS") and would like to extend it (in a new table "WORK.Merged") with two columns (NewCOMNAM and NewCUSIP) from a different data set (CRSP.STOCKNAMES) as soon as there is a match in the variable CUSIP.
PROC SQL;
CREATE TABLE WORK.Merged AS SELECT Old.* , New.COMNAM AS NewCOMNAM , New.CUSIP AS NewCUSIP
FROM WORK.PTMatchUS as Old
Left JOIN CRSP.STOCKNAMES as New ON Old.CUSIP=New.CUSIP;
QUIT;
Example:
Data set A:
101 // Daniel
211 // Steve
31 // Joe
Data set B:
111 // 175
211 // 180
51 // 182
23 // 165
31 // 177
New table should be then:
101 // Daniel // N/A
211 // Steve // 180
31 // Joe // 177
Problem: I get much more rows in the new created table than it should...My new table should have the same amount of rows as the primary data set "Work.PTMatchUS" but just extended to the two new columns (NewCOMNAM and NewCUSIP) with the information from the other dataset (CRSP.STOCKNAMES)... in other words just like a basic VLOOKUP in Excel.
Any ideas how to change my code in a way that it works?
Thanks in advance.
Jorge
Hi,
I ran your code, with some minor changes and it appears to work for the example data given. See last PROC SQL step in code below.
I also wrote my own SQL query (first SQL step below) before checking yours. Both are creating 3 rows in the output datasets.
I suggest you
I suspect there is something about the 2 datasets that impacts the results, that you are unaware of
data PTMatchUS ;
infile cards ;
input cusip $ name $ ;
cards ;
101 Daniel
211 Steve
31 Joe
;
data stocknames ;
infile cards ;
input cusip $ newcusip $ ;
cards ;
111 175
211 180
51 182
23 165
31 177
;
proc sql ;
create table
merged1 as
select
old.*,
new.newcusip as newcusip
from
PTMatchUS as old
left join
stocknames as new
on
old.cusip=new.cusip
;
quit ;
PROC SQL;
CREATE TABLE WORK.Merged AS SELECT Old.* , New.newcusip AS Newcusip
FROM WORK.PTMatchUS as Old
Left JOIN STOCKNAMES as New ON Old.CUSIP=New.CUSIP;
QUIT;
Hi,
I ran your code, with some minor changes and it appears to work for the example data given. See last PROC SQL step in code below.
I also wrote my own SQL query (first SQL step below) before checking yours. Both are creating 3 rows in the output datasets.
I suggest you
I suspect there is something about the 2 datasets that impacts the results, that you are unaware of
data PTMatchUS ;
infile cards ;
input cusip $ name $ ;
cards ;
101 Daniel
211 Steve
31 Joe
;
data stocknames ;
infile cards ;
input cusip $ newcusip $ ;
cards ;
111 175
211 180
51 182
23 165
31 177
;
proc sql ;
create table
merged1 as
select
old.*,
new.newcusip as newcusip
from
PTMatchUS as old
left join
stocknames as new
on
old.cusip=new.cusip
;
quit ;
PROC SQL;
CREATE TABLE WORK.Merged AS SELECT Old.* , New.newcusip AS Newcusip
FROM WORK.PTMatchUS as Old
Left JOIN STOCKNAMES as New ON Old.CUSIP=New.CUSIP;
QUIT;
Hi AMSAS,
thanks for your help. Apparently my code was right but my description was not clearly enough. My Data set contains several rows with also the same ID (CUSIP) and this is why I got more rows in the resulting table. I could fix the problem by deleting duplicates in the data. Thanks anyway for you help!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.