04-26-2013 12:14 PM
I have dataone with variable gvkey which I will use to extract data from another datatwo. Since datatwo is on server, so I plan to use
proc upload data=locallib.dataone out= dataone;run;
my question is how I should write proc sql to get datatwo based on dataone’s gvkey. Below is my code, I want to know
1. If I should use left join
2. If we do not use proc sql, any other better options?
create table WANT as select
A.*, B. gvkey, B.cusip, B.EPSPX
From dataone as A left join datatwo as B
on A.gvkey=B.gvkey and A.year=B.year
order by gvkey, year;
04-26-2013 02:19 PM
Whether you need to use a LEFT JOIN or not depends on your question ("requirement" as LinusH said). I'd suggest that you read the PROC SQL documentation about left vs inner join and if you still have questions, post with samples of A, B, and WANT for us to better help you.
Be careful with the "*" syntax. Since gvkey is in both tables, you will have two variables with the same name in WANT and SAS will give unexpected results.
It looks like you are using the LIBNAME approach to connecting to the database. The only way to guarantee that the query runs on the database (rather than the SAS server) is to use a pass thru query.
There are non-SQL ways to do this (with a DATA step and MERGE), but all the processing would be done in SAS, so there is no value to the UPLOAD.
04-26-2013 06:49 PM
thanks to both of you!
proc upload is relevant, because datatwo is stored on the server, I upload dataone to server, in order to use its gvkey as the linking variable to extract data from datatwo and I only want those observations that have the same gvkey as dataone- left join will serve that purpose.
Doc, you are right about * syntax, since gvkey is in both tables, would you suggest I list all variables to be included in final output data?