BookmarkSubscribeRSS Feed
LanMin
Fluorite | Level 6

Dear all,

I have two datasets (attached below).

data pins has 3 vars: permno, year, pin

data names has 5 vars: permno,cusip, coname, year_be, year_end.

I want to add cusip information to my pins data, so I plan to use proc sql.

year_be, year_end indicates during which period, the company is using this name (coname) and the accompanying identifiers (permno,cusip).

considering the possibility of changing permno,cusip over time, what would be the best way to add cusip to data pins?

thanks !

Lan

12 REPLIES 12
Fugue
Quartz | Level 8

Assuming that a PERMNO relates to an individual customer . . . .

proc sql ;

     create table want as

     select p.*, n.cusip, n.COMNAM, n.year_bg, n.year_end

     from pins p

          , names n

     where p.permno = n.permno

          and n.year_bg <= p.year <= n.year_end

     order by p.permno, n.cusip

;

quit;

Message was edited by: Michael McCormick

LinusH
Tourmaline | Level 20

One might want to consider a left join if you wish track if any permno doesn't have any cusip for any specific year.

Data never sleeps
LanMin
Fluorite | Level 6

thanks to Fugue and LinusH.

I tried left join(see code below), but the number of observations in data want is greater than that for data pins, how do I correct this?

proc sql ;

     create table want as

     select p.*, n.cusip, n.comnam, n.year_bg, n.year_end

     from pins p left join            name1 n

    on p.permno = n.permno

          and n.year_bg <= p.year <= year_end

     order by p.permno, n.cusip

;quit;

Fugue
Quartz | Level 8

This must mean that your names data has overlapping periods for a given permno.

Hypothetical example . . .

permno     cusip     comnam     year_bg     year_end

111111     12345     Joe Blow     1993          2012

111111     12346     Jim Blow     2012          2013

The years overlap and this is probably the cause of your excess records . . .

Question: should there be overlapping years in your names data?

LanMin
Fluorite | Level 6

Fugue,

I think there are overlapping period, that is, same permno same cusip for multiple periods.

How should we adjust your code to account for this?

thanks!

Lan

Fugue
Quartz | Level 8

Your names1 data definitely has overlapping year records for a given permno. For example:

PERMNO    CUSIP          COMNAM                             YEAR_BG     YEAR_END

10001          36720410     GREAT FALLS GAS CO     1986               1993

10001          36720410     ENERGY WEST INC           1993               2008

So, YEAR_END on the first record overlaps the YEAR_BG of the second record. In fact, your names1 data has many, many, MANY cases where the dates overlap. Most commonly, this is because the starting year of the next record is the same as the ending year of the previous record.

Other observations:

* There are many PERMNO in NAMES1 that do not exist in PINS (e.g. permno 10012)

* there is one PERMNO in PINS that does not exist in NAMES1 (permno 88394)

* there are multiple year records in PINS for the same PERMNO (e.g. permno 10597 has 2 records for 2001). This is in addition to the "problem" of overlapping records in NAME1.

* many of the name records in NAMES1 for a given PERMNO are identical (e.g., the name is the same on record after record) although the starting and ending dates might change

Sanyam
Calcite | Level 5

Hi LanMin,

Code to add 'cusip' to dataset PINS from dataset NAMES :

PROC SQL;

CREATE TABLE FINAL AS

SELECT  DISTINCT

               A.*

             , B.CUSIP

    FROM PINS     A

            , NAMES  B

WHERE A.PERMNO = B.PERMNO

;

QUIT;

Explanation :

As per my observation, the dataset NAMES have unique combination of values for variables PERMNO and CUSIP that is PERMNO value and CUSIP value is same even when the CONAME changes therefore we should not consider about putting the condition for year data.

Example :

PERMNOCUSIPCOMNAMyear_bgyear_end
1000136720410GREAT FALLS GAS CO19861993
1000136720410ENERGY WEST INC19932008

In the above example, the PERMNO and CUSIP have the same value of 10001 and 3670410 respectively where the COMNAM is different. 

In the SELECT clause I have used DISTINCT keyword as there are multiple PERMNO in NAMES dataset and our objective is just to add CUSIP variable to the dataset PINS.

Thanks!

Fugue
Quartz | Level 8

@Sanyam

I'm not sure if LanMin's objective was solely to add the CUSIP to the PINS data. If so, that does simplify the query.

However, there remain unanswered questions about the data (like the presence of duplicates in PINS, missing PERMNO, etc) and how LanMin wants to address these will, in my view, determine the coding approach.

I'm at home and don't have access to the data/SAS, but I also seem to recall there were multiple CUSIP for some PERMNO; hence the original need to use YEAR_BG and YEAR_END as a matching criteria.

LanMin
Fluorite | Level 6

Thank again to Fugue and Sanyam.


My goal is to add cusip to Pins data.


In Name1 data, year_bg and year_end specify during which time period, the company has a certain name, So in Fugue's example


PERMNO    CUSIP          COMNAM                             YEAR_BG     YEAR_END

10001          36720410     GREAT FALLS GAS CO     1986               1993

10001          36720410     ENERGY WEST INC           1993               2008

this company changed its name in 1993. my understanding is cusip and permno should be permanent, they are assigned to individual stock as unique identifier.

to Fugue:

should I use distinct key word in proc sql , or it does not matter based on your latest code.

Lan


Fugue
Quartz | Level 8

Question: do you also want to merge the name (COMNAM)?

LanMin
Fluorite | Level 6

I do not need COMNAM.

Fugue
Quartz | Level 8

Then, Sanyam's suggestion should work. But first make sure that there is only 1 CUSIP for each PERMNO in NAME1; otherwise, you will still have a many-to-one match between NAME1 and PINS.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1436 views
  • 0 likes
  • 4 in conversation