Hi all,
Please see attachments:
frame is a balanced frame, which is 24 firms (i.e. PERMNO) * 336months=8064 observations.
e1 is also about the same 24 firms but some of them do not have observations during the whole 336-month period, so it is not a balanced dataset.
e2 is only the monthly data for 324 months.
What I want is to let e1 and e2 join frame, because frame is a balanced frame and I want generate a whole balanced panel.
the unique linking variable should be PERMNOYm (when joining e1 and frame) or Ym(when joining e2 and frame).
Can anybody help me to do this, no matter merge or proc sql is used? I find some problems when I did it. Thank you in advance.
Try something like this:
proc sql;
create table panel as
select frame.*,
DATEliq, PS_LEVEL, PS_INNOV, PS_VWF,
RETX, vwretd, vwretx, ewretd, ewretx, sprtrn
from
frame natural left join
e2 left join
e1 on frame.PERMNOYm=e1.PERMNOYm
order by PERMNOYm;
quit;
PG
Try something like this:
proc sql;
create table panel as
select frame.*,
DATEliq, PS_LEVEL, PS_INNOV, PS_VWF,
RETX, vwretd, vwretx, ewretd, ewretx, sprtrn
from
frame natural left join
e2 left join
e1 on frame.PERMNOYm=e1.PERMNOYm
order by PERMNOYm;
quit;
PG
Thanks PG
but when I did an experiment and tried to join frame and e1 (I just edited the code based on yours as follows), there is always an error (in bold) :
proc sql;
create table panel as
select frame.*,
RETX, vwretd, vwretx, ewretd, ewretx, sprtrn
from
frame natural left join
e1 on frame.PERMNOYm=e1.PERMNOYm
--
22
76
ERROR 22-322: Syntax error, expecting one of the following: a name, ;, (, ',', '.', ANSIMISS,
AS, CROSS, EXCEPT, FULL, GROUP, HAVING, INNER, INTERSECT, JOIN, LEFT, NATURAL,
NOMISS, ORDER, OUTER, RIGHT, UNION, WHERE.
ERROR 76-322: Syntax error, statement will be ignored.
order by PERMNOYm;
quit;
what is the problem? Thanks again.
In a natural join there should be no ON clause, because "natural" implies a join on all matching (name and type) columns.
Hi PG and KurtBremser,
Thank you for your help. One more question if you don't mind:
what is the order that SAS join these datasets?
frame natural left join e2 and generate a temporary file, and this temporary file left join e1 (on frame.PERMNOYm=e1.PERMNOYm)
Am I right? but why is the left hand side PERMNOYm from frame?
Thank you.
Without parentheses, joins are done from left to right, as you understood. In a left join, all records from the left side are kept, so it is better to use variables from the left side table to do further joins. Variables from the right side of a left join can be set to missing and thus cannot be joined-to further.
PG
@PGStats
@KurtBremser
and other experts
Similar to the example in my original post in this session, I am trying to join the following 4 datasets and form a new dataset reit_etf_panel
reit_etf_framel
liq
ff
reit_etf
The code is like below:
proc sql noprint;
create table reit_etf_panel as
select reit_etf_framel.*,
DATEliq, PS_LEVEL, PS_INNOV, PS_VWF,
dateff, mktrf, smb, hml, rf, umd,
datereit, SHRCD, SICCD, TICKER, COMNAM, SHRCLS, PERMCO, CUSIP, PRC, VOL, RET, SHROUT, ALTPRC, SPREAD, RETX, vwretd, vwretx, ewretd, ewretx, sprtrn
from
reit_etf_framel natural left join
liq natural left join
ff left join
reit_etf on reit_etf_framel.PERMNOYm=reit_etf.PERMNOYm
order by PERMNOYm;
quit;
Everything is ok but with one problem that, as you can see, in reit_etf_panel, from line 325 to 336 there are missing values for those variables from both ff and liq. Actually, the problem exists for the last year (i.e. from 201301 to 201312) for each PERMNO.
Can you help me to check what is the reason and how to deal with it? Thanks.
If you only want records where information is complete then remove the word left from the query
... from
reit_etf_framel natural join
liq natural join
ff inner join
reit_etf on reit_etf_framel.PERMNOYm=reit_etf.PERMNOYm ...
. - PG
Hi PG,
Thank you for your reply, but sorry that I might not say clearly.
As you can see, reit_etf_framel is a balanced frame, which is 24 firms (i.e. PERMNO) * 336months=8064 observations.
reit_etf is also about the same 24 firms but some of them do not have observations during the whole 336-month period, so it is not a balanced dataset.
ff and liq is only the monthly data for 324 months.
What I want is to let reit_etf, liq, and ff join reit_etf_framel, becausereit_etf_framel is a balanced frame and I want to generate a whole balanced panel.
the unique linking variable should be PERMNOYm (when joining reit_etf and reit_etf_framel) or Ym (when joining liq and ff to the other two).
The way you just mentioned can only generate a balanced dataset consisting only around 1900 observations, which is not what I want.
Could you help me reconsider it? Your help will be highly appreciated.
What would you like to see in the last year (i.e. from 201301 to 201312) for each PERMNO?
There should be information about the variables (such as mktrf, smb, hml, rf, umd) from datasets ff and liq in the last year (i.e. from 201301 to 201312) for each PERMNO.
Actually, I need information about the variables from datasets ff and liq for all available years for each PERMNO. Now the problem is that information about variables from datasets ff and liq for the last year of each PERMNO is missing (for other years, there is no problem). I don't know why and reckon there might be some minor mistake during the joining process.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.