I have two data
first one has say like below:
permno1 date
12345 20141111
23456 20141113
the second one has like below:
permno2 date
34567 20141111
45678 20141111
56789 20141111
34567 20141113
67890 20141113
I want to get:
permno1 permno2 date
12345 34567 20141111
12345 45678 20141111
12345 56789 20141111
23456 34567 20141113
23456 67890 20141113
How should I write the code?
Thank you
Just SQL join on date?
you mean?
I am a beginner
proc sql;
create table d as
select
a.*,
b.permno2
from d1 as a, d2 as b
where a.date=b.date;
???
proc sql;
create table d as
select
a.permno1,
b.*
from d1 as a, d2 as b
where a.date=b.date;
quit;
gives you the correct order of columns.
so if I want to get a event window [-10,10]
suppose in d2, there are not only date in d1, but also other dates.
then
proc sql;
create table d as
select
a.permno1,
b.permno2,
b.date
from d1 as a, d2 as b
where a.date-10<=b.date<=a.date+10;
??
If you want to have only dates contained in d1, use the "left join".
And what if I have dataset like below that d1 not change but d2 contain permno1, and d2 contain permno with different date from 1996 till now
first one has say like below:
permno1 date
12345 20141111
23456 20141113
the second one has like below:
permno2 date
12345 20141111
23456 20141111
34567 20141111
45678 20141111
56789 20141111
12345 20141113
23456 20141113
34567 20141113
67890 20141113
I want to get: where permno1<>permno2
permno1 permno2 date
12345 23456 20141111
12345 34567 20141111
12345 45678 20141111
12345 56789 20141111
23456 12345 20141113
23456 34567 20141113
23456 67890 20141113
How should I write the code?
proc sql;
create table d as
select
a.permno1,
b.permno2,
b.date
from d1 as a, d2 as b
where a.date-10<=b.date<=a.date+10 and a.permno1<>b.permno2;
then what if there are 1billion data?
it will take a long time to finish the sql.
SQL can be sub-optimal when joining large datasets.
Sort the datasets with proc sort first (if they are not already sorted) and use a datastep with the merge and by statement.
Be aware that the data step merge cannot build a cartesian product when multiple by values are present in both datasets.
I don't only want data from d1
I only want the date from d1 and other data from d2.
So with a large dataset with say three variables need to match (sic permno eventdate date)
d1 has sic permno1 eventdate
d2 has sic permno2 date
I can use:
rename to make eventdate=date
then:
proc sort data=d1;
set d1;
by sic permno1 date;
proc sort data=d2;
set d2;
by sic permno2 date;
then I want sic are same, permno1<>permno2, and eventdate-10<=date<=eventdate+10
what should I write?
data merge;
merge d1 d1;
by sic date;?
or the order should be
by sic date permno1;
by sic date permno2?
date d1 (rename=(eventdate=date));
set d1;
proc sort data=d1;
set d1;
by sic date permno1;
proc sort data=d2;
set d2;
by sic date permno2;
data merge;
merge d1 d2;
by sic;
where
date>=date-10 and date<=date+10;
Also
If I have two data
first one has say like below:
permno date
12345 20141111
23456 20141113
the second one has like below:
permno date
34567 20141111
45678 20141111
and I want
permno date
12345 20141111
23456 20141113
34567 20141111
45678 20141111
how could I do?
proc sql;
select * from ds1
union
select * from ds2;
quit;
proc sort data=dataset1;
by date;
run;
proc sort data=dataset2;
by date;
run;
data dataset3;
merge dataset1(in=d1) dataset2(in=d2);
by date;
if d1 and d2;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.