Hi
I would like to merge two data sets.
The first data set looks like this.
cusip date_crsp ret
000361 1990Q1 0.2
000361 1990Q2 0.12
000361 1990Q3 -0.06
000361 1990Q4 -0.02
The second data set looks like this.
cusip date_crsp sales
000361 1990Q1 1240
000361 1990Q2 4673
000361 1990Q3 3467
The output should look like this.
cusip date_crsp return sales
000361 1990Q1 0.2 1240
000361 1990Q2 0.12 4673
000361 1990Q3 -0.06 3467
000361 1990Q4 -0.02 .
But my output looks like this.
cusip date_crsp ret sales
000361 1990Q1 . 1240
000361 1990Q1 0.2 .
000361 1990Q2 . 4673
000361 1990Q2 0.12 .
000361 1990Q3 . 3467
000361 1990Q3 -0.06 .
000361 1990Q4 -0.02 .
The by variables are cusip and date_crsp.
The cusip in the two data sets have the same length, and the date_crsp have the same format.
My code is also not too complicated, so I do not know where the mistake is. I did also the proc sort for the two data sets for cusip and date_crsp.
I used the following code:
data combined;
merge dataset1 dataset2;
by cusip date_crsp;
run;
Thanks for your help!!
The values are probably NOT the same in the two datasets. Looks like you have date variables formatted to display as year and quarter. So there are about 90 different values the date variables could have to make them look like they have the same value when displayed as quarters.
Try either converting your variables to character strings or convert all values to the first day of the quarter.
The values are probably NOT the same in the two datasets. Looks like you have date variables formatted to display as year and quarter. So there are about 90 different values the date variables could have to make them look like they have the same value when displayed as quarters.
Try either converting your variables to character strings or convert all values to the first day of the quarter.
Hi Tom
Thank you for the hint. Now the merge command works perfectly.
Hi:
If you only need the merged data then try:
data combined;
merge dataset1 (in=a)
dataset2 (in=b)
by cusip date_crsp;
if a and b then output;
else delete;
run;
I prefer to do:
proc sql;
create table combined as
select distinct a.*,
b.sales
from dataset1 as a,
dataset2 as b
where a.cusip=b.cusip and
a.date_crsp=b.date_crsp;
quit;
run;
Nice solution OS, I would like to add that Merge is much faster than SQL but even I prefer SQL because its simpler and more dependable but an inner join instead of this join would have been faster however both will give the same answer
Thanks Njoi
proc sort data=first;
by cusip date_crsp ;
run;
proc sort data=second;
by cusip date_crsp ;
run;
data final;
merge first(in=a) second;
by cusip date_crsp ;
if a;
run;
it works
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.