I have two data
first one has say like below:
permno1 date
12345 20141111
23456 20141113
the second one has like below:
permno2 date
34567 20141111
45678 20141111
56789 20141111
34567 20141113
67890 20141113
I want to get:
permno1 permno2 date
12345 34567 20141111
12345 45678 20141111
12345 56789 20141111
23456 34567 20141113
23456 67890 20141113
How should I write the code?
Thank you
Just SQL join on date?
you mean?
I am a beginner
proc sql;
create table d as
select
a.*,
b.permno2
from d1 as a, d2 as b
where a.date=b.date;
???
proc sql;
create table d as
select
a.permno1,
b.*
from d1 as a, d2 as b
where a.date=b.date;
quit;
gives you the correct order of columns.
so if I want to get a event window [-10,10]
suppose in d2, there are not only date in d1, but also other dates.
then
proc sql;
create table d as
select
a.permno1,
b.permno2,
b.date
from d1 as a, d2 as b
where a.date-10<=b.date<=a.date+10;
??
If you want to have only dates contained in d1, use the "left join".
And what if I have dataset like below that d1 not change but d2 contain permno1, and d2 contain permno with different date from 1996 till now
first one has say like below:
permno1 date
12345 20141111
23456 20141113
the second one has like below:
permno2 date
12345 20141111
23456 20141111
34567 20141111
45678 20141111
56789 20141111
12345 20141113
23456 20141113
34567 20141113
67890 20141113
I want to get: where permno1<>permno2
permno1 permno2 date
12345 23456 20141111
12345 34567 20141111
12345 45678 20141111
12345 56789 20141111
23456 12345 20141113
23456 34567 20141113
23456 67890 20141113
How should I write the code?
proc sql;
create table d as
select
a.permno1,
b.permno2,
b.date
from d1 as a, d2 as b
where a.date-10<=b.date<=a.date+10 and a.permno1<>b.permno2;
then what if there are 1billion data?
it will take a long time to finish the sql.
SQL can be sub-optimal when joining large datasets.
Sort the datasets with proc sort first (if they are not already sorted) and use a datastep with the merge and by statement.
Be aware that the data step merge cannot build a cartesian product when multiple by values are present in both datasets.
I don't only want data from d1
I only want the date from d1 and other data from d2.
So with a large dataset with say three variables need to match (sic permno eventdate date)
d1 has sic permno1 eventdate
d2 has sic permno2 date
I can use:
rename to make eventdate=date
then:
proc sort data=d1;
set d1;
by sic permno1 date;
proc sort data=d2;
set d2;
by sic permno2 date;
then I want sic are same, permno1<>permno2, and eventdate-10<=date<=eventdate+10
what should I write?
data merge;
merge d1 d1;
by sic date;?
or the order should be
by sic date permno1;
by sic date permno2?
date d1 (rename=(eventdate=date));
set d1;
proc sort data=d1;
set d1;
by sic date permno1;
proc sort data=d2;
set d2;
by sic date permno2;
data merge;
merge d1 d2;
by sic;
where
date>=date-10 and date<=date+10;
Also
If I have two data
first one has say like below:
permno date
12345 20141111
23456 20141113
the second one has like below:
permno date
34567 20141111
45678 20141111
and I want
permno date
12345 20141111
23456 20141113
34567 20141111
45678 20141111
how could I do?
proc sql;
select * from ds1
union
select * from ds2;
quit;
proc sort data=dataset1;
by date;
run;
proc sort data=dataset2;
by date;
run;
data dataset3;
merge dataset1(in=d1) dataset2(in=d2);
by date;
if d1 and d2;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.