DATA Step, Macro, Functions and more

Merge two datasets

Reply
Frequent Contributor
Posts: 102

Merge two datasets

Dear all,

I have two datasets (attached below).

data pins has 3 vars: permno, year, pin

data names has 5 vars: permno,cusip, coname, year_be, year_end.

I want to add cusip information to my pins data, so I plan to use proc sql.

year_be, year_end indicates during which period, the company is using this name (coname) and the accompanying identifiers (permno,cusip).

considering the possibility of changing permno,cusip over time, what would be the best way to add cusip to data pins?

thanks !

Lan

Super Contributor
Posts: 307

Re: Merge two datasets

Assuming that a PERMNO relates to an individual customer . . . .

proc sql ;

     create table want as

     select p.*, n.cusip, n.COMNAM, n.year_bg, n.year_end

     from pins p

          , names n

     where p.permno = n.permno

          and n.year_bg <= p.year <= n.year_end

     order by p.permno, n.cusip

;

quit;

Message was edited by: Michael McCormick

Super User
Posts: 5,257

Re: Merge two datasets

One might want to consider a left join if you wish track if any permno doesn't have any cusip for any specific year.

Data never sleeps
Frequent Contributor
Posts: 102

Re: Merge two datasets

thanks to Fugue and LinusH.

I tried left join(see code below), but the number of observations in data want is greater than that for data pins, how do I correct this?

proc sql ;

     create table want as

     select p.*, n.cusip, n.comnam, n.year_bg, n.year_end

     from pins p left join            name1 n

    on p.permno = n.permno

          and n.year_bg <= p.year <= year_end

     order by p.permno, n.cusip

;quit;

Super Contributor
Posts: 307

Re: Merge two datasets

This must mean that your names data has overlapping periods for a given permno.

Hypothetical example . . .

permno     cusip     comnam     year_bg     year_end

111111     12345     Joe Blow     1993          2012

111111     12346     Jim Blow     2012          2013

The years overlap and this is probably the cause of your excess records . . .

Question: should there be overlapping years in your names data?

Frequent Contributor
Posts: 102

Re: Merge two datasets

Fugue,

I think there are overlapping period, that is, same permno same cusip for multiple periods.

How should we adjust your code to account for this?

thanks!

Lan

Super Contributor
Posts: 307

Re: Merge two datasets

Your names1 data definitely has overlapping year records for a given permno. For example:

PERMNO    CUSIP          COMNAM                             YEAR_BG     YEAR_END

10001          36720410     GREAT FALLS GAS CO     1986               1993

10001          36720410     ENERGY WEST INC           1993               2008

So, YEAR_END on the first record overlaps the YEAR_BG of the second record. In fact, your names1 data has many, many, MANY cases where the dates overlap. Most commonly, this is because the starting year of the next record is the same as the ending year of the previous record.

Other observations:

* There are many PERMNO in NAMES1 that do not exist in PINS (e.g. permno 10012)

* there is one PERMNO in PINS that does not exist in NAMES1 (permno 88394)

* there are multiple year records in PINS for the same PERMNO (e.g. permno 10597 has 2 records for 2001). This is in addition to the "problem" of overlapping records in NAME1.

* many of the name records in NAMES1 for a given PERMNO are identical (e.g., the name is the same on record after record) although the starting and ending dates might change

Occasional Contributor
Posts: 13

Re: Merge two datasets

Hi LanMin,

Code to add 'cusip' to dataset PINS from dataset NAMES :

PROC SQL;

CREATE TABLE FINAL AS

SELECT  DISTINCT

               A.*

             , B.CUSIP

    FROM PINS     A

            , NAMES  B

WHERE A.PERMNO = B.PERMNO

;

QUIT;

Explanation :

As per my observation, the dataset NAMES have unique combination of values for variables PERMNO and CUSIP that is PERMNO value and CUSIP value is same even when the CONAME changes therefore we should not consider about putting the condition for year data.

Example :

PERMNOCUSIPCOMNAMyear_bgyear_end
1000136720410GREAT FALLS GAS CO19861993
1000136720410ENERGY WEST INC19932008

In the above example, the PERMNO and CUSIP have the same value of 10001 and 3670410 respectively where the COMNAM is different. 

In the SELECT clause I have used DISTINCT keyword as there are multiple PERMNO in NAMES dataset and our objective is just to add CUSIP variable to the dataset PINS.

Thanks!

Super Contributor
Posts: 307

Re: Merge two datasets

@Sanyam

I'm not sure if LanMin's objective was solely to add the CUSIP to the PINS data. If so, that does simplify the query.

However, there remain unanswered questions about the data (like the presence of duplicates in PINS, missing PERMNO, etc) and how LanMin wants to address these will, in my view, determine the coding approach.

I'm at home and don't have access to the data/SAS, but I also seem to recall there were multiple CUSIP for some PERMNO; hence the original need to use YEAR_BG and YEAR_END as a matching criteria.

Frequent Contributor
Posts: 102

Re: Merge two datasets

Thank again to Fugue and Sanyam.


My goal is to add cusip to Pins data.


In Name1 data, year_bg and year_end specify during which time period, the company has a certain name, So in Fugue's example


PERMNO    CUSIP          COMNAM                             YEAR_BG     YEAR_END

10001          36720410     GREAT FALLS GAS CO     1986               1993

10001          36720410     ENERGY WEST INC           1993               2008

this company changed its name in 1993. my understanding is cusip and permno should be permanent, they are assigned to individual stock as unique identifier.

to Fugue:

should I use distinct key word in proc sql , or it does not matter based on your latest code.

Lan


Super Contributor
Posts: 307

Re: Merge two datasets

Question: do you also want to merge the name (COMNAM)?

Frequent Contributor
Posts: 102

Re: Merge two datasets

I do not need COMNAM.

Super Contributor
Posts: 307

Re: Merge two datasets

Then, Sanyam's suggestion should work. But first make sure that there is only 1 CUSIP for each PERMNO in NAME1; otherwise, you will still have a many-to-one match between NAME1 and PINS.

Ask a Question
Discussion stats
  • 12 replies
  • 478 views
  • 0 likes
  • 4 in conversation