Dear all,
I have two datasets (attached below).
data pins has 3 vars: permno, year, pin
data names has 5 vars: permno,cusip, coname, year_be, year_end.
I want to add cusip information to my pins data, so I plan to use proc sql.
year_be, year_end indicates during which period, the company is using this name (coname) and the accompanying identifiers (permno,cusip).
considering the possibility of changing permno,cusip over time, what would be the best way to add cusip to data pins?
thanks !
Lan
Assuming that a PERMNO relates to an individual customer . . . .
proc sql ;
create table want as
select p.*, n.cusip, n.COMNAM, n.year_bg, n.year_end
from pins p
, names n
where p.permno = n.permno
and n.year_bg <= p.year <= n.year_end
order by p.permno, n.cusip
;
quit;
Message was edited by: Michael McCormick
One might want to consider a left join if you wish track if any permno doesn't have any cusip for any specific year.
thanks to Fugue and LinusH.
I tried left join(see code below), but the number of observations in data want is greater than that for data pins, how do I correct this?
proc sql ;
create table want as
select p.*, n.cusip, n.comnam, n.year_bg, n.year_end
from pins p left join name1 n
on p.permno = n.permno
and n.year_bg <= p.year <= year_end
order by p.permno, n.cusip
;quit;
This must mean that your names data has overlapping periods for a given permno.
Hypothetical example . . .
permno cusip comnam year_bg year_end
111111 12345 Joe Blow 1993 2012
111111 12346 Jim Blow 2012 2013
The years overlap and this is probably the cause of your excess records . . .
Question: should there be overlapping years in your names data?
Fugue,
I think there are overlapping period, that is, same permno same cusip for multiple periods.
How should we adjust your code to account for this?
thanks!
Lan
Your names1 data definitely has overlapping year records for a given permno. For example:
PERMNO CUSIP COMNAM YEAR_BG YEAR_END
10001 36720410 GREAT FALLS GAS CO 1986 1993
10001 36720410 ENERGY WEST INC 1993 2008
So, YEAR_END on the first record overlaps the YEAR_BG of the second record. In fact, your names1 data has many, many, MANY cases where the dates overlap. Most commonly, this is because the starting year of the next record is the same as the ending year of the previous record.
Other observations:
* There are many PERMNO in NAMES1 that do not exist in PINS (e.g. permno 10012)
* there is one PERMNO in PINS that does not exist in NAMES1 (permno 88394)
* there are multiple year records in PINS for the same PERMNO (e.g. permno 10597 has 2 records for 2001). This is in addition to the "problem" of overlapping records in NAME1.
* many of the name records in NAMES1 for a given PERMNO are identical (e.g., the name is the same on record after record) although the starting and ending dates might change
Hi LanMin,
Code to add 'cusip' to dataset PINS from dataset NAMES :
PROC SQL;
CREATE TABLE FINAL AS
SELECT DISTINCT
A.*
, B.CUSIP
FROM PINS A
, NAMES B
WHERE A.PERMNO = B.PERMNO
;
QUIT;
Explanation :
As per my observation, the dataset NAMES have unique combination of values for variables PERMNO and CUSIP that is PERMNO value and CUSIP value is same even when the CONAME changes therefore we should not consider about putting the condition for year data.
Example :
PERMNO | CUSIP | COMNAM | year_bg | year_end |
10001 | 36720410 | GREAT FALLS GAS CO | 1986 | 1993 |
10001 | 36720410 | ENERGY WEST INC | 1993 | 2008 |
In the above example, the PERMNO and CUSIP have the same value of 10001 and 3670410 respectively where the COMNAM is different.
In the SELECT clause I have used DISTINCT keyword as there are multiple PERMNO in NAMES dataset and our objective is just to add CUSIP variable to the dataset PINS.
Thanks!
@Sanyam
I'm not sure if LanMin's objective was solely to add the CUSIP to the PINS data. If so, that does simplify the query.
However, there remain unanswered questions about the data (like the presence of duplicates in PINS, missing PERMNO, etc) and how LanMin wants to address these will, in my view, determine the coding approach.
I'm at home and don't have access to the data/SAS, but I also seem to recall there were multiple CUSIP for some PERMNO; hence the original need to use YEAR_BG and YEAR_END as a matching criteria.
Thank again to Fugue and Sanyam.
My goal is to add cusip to Pins data.
In Name1 data, year_bg and year_end specify during which time period, the company has a certain name, So in Fugue's example
PERMNO CUSIP COMNAM YEAR_BG YEAR_END
10001 36720410 GREAT FALLS GAS CO 1986 1993
10001 36720410 ENERGY WEST INC 1993 2008
this company changed its name in 1993. my understanding is cusip and permno should be permanent, they are assigned to individual stock as unique identifier.
to Fugue:
should I use distinct key word in proc sql , or it does not matter based on your latest code.
Lan
Question: do you also want to merge the name (COMNAM)?
I do not need COMNAM.
Then, Sanyam's suggestion should work. But first make sure that there is only 1 CUSIP for each PERMNO in NAME1; otherwise, you will still have a many-to-one match between NAME1 and PINS.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.