- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Dear all,
I have two datasets (attached below).
data pins has 3 vars: permno, year, pin
data names has 5 vars: permno,cusip, coname, year_be, year_end.
I want to add cusip information to my pins data, so I plan to use proc sql.
year_be, year_end indicates during which period, the company is using this name (coname) and the accompanying identifiers (permno,cusip).
considering the possibility of changing permno,cusip over time, what would be the best way to add cusip to data pins?
thanks !
Lan
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Assuming that a PERMNO relates to an individual customer . . . .
proc sql ;
create table want as
select p.*, n.cusip, n.COMNAM, n.year_bg, n.year_end
from pins p
, names n
where p.permno = n.permno
and n.year_bg <= p.year <= n.year_end
order by p.permno, n.cusip
;
quit;
Message was edited by: Michael McCormick
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
One might want to consider a left join if you wish track if any permno doesn't have any cusip for any specific year.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
thanks to Fugue and LinusH.
I tried left join(see code below), but the number of observations in data want is greater than that for data pins, how do I correct this?
proc sql ;
create table want as
select p.*, n.cusip, n.comnam, n.year_bg, n.year_end
from pins p left join name1 n
on p.permno = n.permno
and n.year_bg <= p.year <= year_end
order by p.permno, n.cusip
;quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This must mean that your names data has overlapping periods for a given permno.
Hypothetical example . . .
permno cusip comnam year_bg year_end
111111 12345 Joe Blow 1993 2012
111111 12346 Jim Blow 2012 2013
The years overlap and this is probably the cause of your excess records . . .
Question: should there be overlapping years in your names data?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Fugue,
I think there are overlapping period, that is, same permno same cusip for multiple periods.
How should we adjust your code to account for this?
thanks!
Lan
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your names1 data definitely has overlapping year records for a given permno. For example:
PERMNO CUSIP COMNAM YEAR_BG YEAR_END
10001 36720410 GREAT FALLS GAS CO 1986 1993
10001 36720410 ENERGY WEST INC 1993 2008
So, YEAR_END on the first record overlaps the YEAR_BG of the second record. In fact, your names1 data has many, many, MANY cases where the dates overlap. Most commonly, this is because the starting year of the next record is the same as the ending year of the previous record.
Other observations:
* There are many PERMNO in NAMES1 that do not exist in PINS (e.g. permno 10012)
* there is one PERMNO in PINS that does not exist in NAMES1 (permno 88394)
* there are multiple year records in PINS for the same PERMNO (e.g. permno 10597 has 2 records for 2001). This is in addition to the "problem" of overlapping records in NAME1.
* many of the name records in NAMES1 for a given PERMNO are identical (e.g., the name is the same on record after record) although the starting and ending dates might change
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi LanMin,
Code to add 'cusip' to dataset PINS from dataset NAMES :
PROC SQL;
CREATE TABLE FINAL AS
SELECT DISTINCT
A.*
, B.CUSIP
FROM PINS A
, NAMES B
WHERE A.PERMNO = B.PERMNO
;
QUIT;
Explanation :
As per my observation, the dataset NAMES have unique combination of values for variables PERMNO and CUSIP that is PERMNO value and CUSIP value is same even when the CONAME changes therefore we should not consider about putting the condition for year data.
Example :
PERMNO | CUSIP | COMNAM | year_bg | year_end |
10001 | 36720410 | GREAT FALLS GAS CO | 1986 | 1993 |
10001 | 36720410 | ENERGY WEST INC | 1993 | 2008 |
In the above example, the PERMNO and CUSIP have the same value of 10001 and 3670410 respectively where the COMNAM is different.
In the SELECT clause I have used DISTINCT keyword as there are multiple PERMNO in NAMES dataset and our objective is just to add CUSIP variable to the dataset PINS.
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Sanyam
I'm not sure if LanMin's objective was solely to add the CUSIP to the PINS data. If so, that does simplify the query.
However, there remain unanswered questions about the data (like the presence of duplicates in PINS, missing PERMNO, etc) and how LanMin wants to address these will, in my view, determine the coding approach.
I'm at home and don't have access to the data/SAS, but I also seem to recall there were multiple CUSIP for some PERMNO; hence the original need to use YEAR_BG and YEAR_END as a matching criteria.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank again to Fugue and Sanyam.
My goal is to add cusip to Pins data.
In Name1 data, year_bg and year_end specify during which time period, the company has a certain name, So in Fugue's example
PERMNO CUSIP COMNAM YEAR_BG YEAR_END
10001 36720410 GREAT FALLS GAS CO 1986 1993
10001 36720410 ENERGY WEST INC 1993 2008
this company changed its name in 1993. my understanding is cusip and permno should be permanent, they are assigned to individual stock as unique identifier.
to Fugue:
should I use distinct key word in proc sql , or it does not matter based on your latest code.
Lan
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Question: do you also want to merge the name (COMNAM)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I do not need COMNAM.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Then, Sanyam's suggestion should work. But first make sure that there is only 1 CUSIP for each PERMNO in NAME1; otherwise, you will still have a many-to-one match between NAME1 and PINS.