Help using Base SAS procedures

problem about joining datasets

Accepted Solution Solved
Reply
Contributor
Posts: 38
Accepted Solution

problem about joining datasets

Hi all,

Please see attachments:

frame is a balanced frame, which is 24 firms (i.e. PERMNO) * 336months=8064 observations.

e1 is also about the same 24 firms but some of them do not have observations during the whole 336-month period, so it is not a balanced dataset.

e2 is only the monthly data for 324 months.

What I want is to let e1 and e2 join frame, because frame is a balanced frame and I want generate a whole balanced panel.

the unique linking variable should be PERMNOYm (when joining e1 and frame) or Ym(when joining e2 and frame).

Can anybody help me to do this, no matter merge or proc sql is used? I find some problems when I did it. Thank you in advance.

Attachment
Attachment
Attachment

Accepted Solutions
Solution
‎03-10-2014 09:24 PM
Respected Advisor
Posts: 4,934

Re: problem about joining datasets

Posted in reply to comeon2012

Try something like this:

proc sql;

create table panel as

select frame.*,

  DATEliq, PS_LEVEL, PS_INNOV, PS_VWF,

  RETX, vwretd, vwretx, ewretd, ewretx, sprtrn

from

  frame natural left join

  e2 left join

  e1 on frame.PERMNOYm=e1.PERMNOYm

order by PERMNOYm;

quit;

PG

PG

View solution in original post


All Replies
Solution
‎03-10-2014 09:24 PM
Respected Advisor
Posts: 4,934

Re: problem about joining datasets

Posted in reply to comeon2012

Try something like this:

proc sql;

create table panel as

select frame.*,

  DATEliq, PS_LEVEL, PS_INNOV, PS_VWF,

  RETX, vwretd, vwretx, ewretd, ewretx, sprtrn

from

  frame natural left join

  e2 left join

  e1 on frame.PERMNOYm=e1.PERMNOYm

order by PERMNOYm;

quit;

PG

PG
Contributor
Posts: 38

Re: problem about joining datasets

Thanks PG

but when I did an experiment and tried to join frame and e1 (I just edited the code based on yours as follows), there is always an error (in bold) :

proc sql;

create table panel as

select frame.*,

  RETX, vwretd, vwretx, ewretd, ewretx, sprtrn

from

  frame natural left join

  e1 on frame.PERMNOYm=e1.PERMNOYm

       --

      22

      76

ERROR 22-322: Syntax error, expecting one of the following: a name, ;, (, ',', '.', ANSIMISS,

              AS, CROSS, EXCEPT, FULL, GROUP, HAVING, INNER, INTERSECT, JOIN, LEFT, NATURAL,

              NOMISS, ORDER, OUTER, RIGHT, UNION, WHERE.

ERROR 76-322: Syntax error, statement will be ignored.

order by PERMNOYm;

quit;

what is the problem? Thanks again.

Super User
Posts: 7,854

Re: problem about joining datasets

Posted in reply to comeon2012

In a natural join there should be no ON clause, because "natural" implies a join on all matching (name and type) columns.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 38

Re: problem about joining datasets

Hi PG and KurtBremser,

Thank you for your help. One more question if you don't mind:

what is the order that SAS join these datasets?

frame natural left join e2 and generate a temporary file, and this temporary file left join e1 (on frame.PERMNOYm=e1.PERMNOYm)

Am I right? but why is the left hand side PERMNOYm from frame?

Thank you.

Respected Advisor
Posts: 4,934

Re: problem about joining datasets

Posted in reply to comeon2012

Without parentheses, joins are done from left to right, as you understood. In a left join, all records from the left side are kept, so it is better to use variables from the left side table to do further joins. Variables from the right side of a left join can be set to missing and thus cannot be joined-to further.

 

PG

PG
Contributor
Posts: 38

Re: problem about joining datasets

@PGStats

@KurtBremser

and other experts

Similar to the example in my original post in this session, I am trying to join the following 4 datasets and form a new dataset reit_etf_panel

reit_etf_framel

liq

ff

reit_etf

The code is like below:

proc sql noprint;

create table reit_etf_panel as

select reit_etf_framel.*,

  DATEliq, PS_LEVEL, PS_INNOV, PS_VWF,

  dateff, mktrf, smb, hml, rf, umd,

  datereit, SHRCD, SICCD, TICKER, COMNAM, SHRCLS, PERMCO, CUSIP, PRC, VOL, RET, SHROUT, ALTPRC, SPREAD, RETX, vwretd, vwretx, ewretd, ewretx, sprtrn

from

  reit_etf_framel natural left join

  liq natural left join

  ff left join

  reit_etf on reit_etf_framel.PERMNOYm=reit_etf.PERMNOYm

order by PERMNOYm;

quit;

Everything is ok but with one problem that, as you can see, in reit_etf_panel, from line 325 to 336 there are missing values for those variables from both ff and liq. Actually, the problem exists for the last year (i.e. from 201301 to 201312) for each PERMNO.

Can you help me to check what is the reason and how to deal with it? Thanks.

Attachment
Attachment
Attachment
Attachment
Attachment
Respected Advisor
Posts: 4,934

Re: problem about joining datasets

Posted in reply to comeon2012

If you only want records where information is complete then remove the word left from the query

... from

  reit_etf_framel natural join

  liq natural join

  ff inner join

  reit_etf on reit_etf_framel.PERMNOYm=reit_etf.PERMNOYm ...

. - PG

PG
Contributor
Posts: 38

Re: problem about joining datasets

Hi PG,

Thank you for your reply, but sorry that I might not say clearly.

As you can see, reit_etf_framel is a balanced frame, which is 24 firms (i.e. PERMNO) * 336months=8064 observations.

reit_etf is also about the same 24 firms but some of them do not have observations during the whole 336-month period, so it is not a balanced dataset.

ff and liq is only the monthly data for 324 months.

What I want is to let reit_etf, liq, and ff join reit_etf_framel, becausereit_etf_framel is a balanced frame and I want to generate a whole balanced panel.

the unique linking variable should be PERMNOYm (when joining reit_etf and reit_etf_framel) or Ym (when joining liq and ff to the other two).

The way you just mentioned can only generate a balanced dataset consisting only around 1900 observations, which is not what I want.

Could you help me reconsider it? Your help will be highly appreciated.

Respected Advisor
Posts: 4,934

Re: problem about joining datasets

Posted in reply to comeon2012

What would you like to see in the last year (i.e. from 201301 to 201312) for each PERMNO?

PG
Contributor
Posts: 38

Re: problem about joining datasets

There should be information about the variables (such as mktrf, smb, hml, rf, umd) from datasets ff and liq in the last year (i.e. from 201301 to 201312) for each PERMNO.

Actually, I need information about the variables from datasets ff and liq for all available years for each PERMNO. Now the problem is that information about variables from datasets ff and liq for the last year of each PERMNO is missing (for other years, there is no problem). I don't know why and reckon there might be some minor mistake during the joining process.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 1145 views
  • 9 likes
  • 3 in conversation