Hi everyone, I have a merging question. I have two datasets that I want to merge using an identifier and year. But I need to include a condition that if the year is not present in the second dataset, I would like to use the previous year's data entry. Here are examples of the data:
Data 1:
Firm Year
1004 1996
1160 1996
1235 1996
Data 2:
Firm Year RepMeasure
1004 1995 0.36
1004 1996 0.42
1160 1995 0.11
1235 1995 0.78
1235 1996 0.69
For Firms 1004 and 1235, merging by firm and year is fine. For firm 1160 though, I don't have 1996 data so I need to use the 1995 RepMeasure. Let me know if I need to provide any more info. Thanks for your help!
Performancewise, data step hash probably better, but Proc SQL can offer a simple syntax:
Data _1;
input Firm$ Year;
cards;
1004 1996
1160 1996
1235 1996
;
Data _2;
input Firm$ Year RepMeasure;
cards;
1004 1995 0.36
1004 1996 0.42
1160 1995 0.11
1235 1995 0.78
1235 1996 0.69
;
proc sql;
create table _merge as
select a.*, repmeasure
from _1 a
left join _2 b
on a.firm=b.firm
and a.year - b.year in (1,0)
group by a.firm
having a.year - b.year = min(a.year-b.year)
;
quit;
Haikuo
Easiest way might be just to take the Data2 dataset and add rows for missing years, then merge.
Performancewise, data step hash probably better, but Proc SQL can offer a simple syntax:
Data _1;
input Firm$ Year;
cards;
1004 1996
1160 1996
1235 1996
;
Data _2;
input Firm$ Year RepMeasure;
cards;
1004 1995 0.36
1004 1996 0.42
1160 1995 0.11
1235 1995 0.78
1235 1996 0.69
;
proc sql;
create table _merge as
select a.*, repmeasure
from _1 a
left join _2 b
on a.firm=b.firm
and a.year - b.year in (1,0)
group by a.firm
having a.year - b.year = min(a.year-b.year)
;
quit;
Haikuo
Hi Sir,
I would like to seek clarification on the logic /*and a.year - b.year in (1,0)*/ as it seems the program returns the same results without the condition. So is that condition written to make sure to avoid other possible values? I'd appreciate if you can correct my understanding
sql;
create table _merge as
select a.*, repmeasure
from _1 a
left join _2 b
on a.firm=b.firm
/* and a.year - b.year in (1,0)*/
group by a.firm
a.year - b.year = min(a.year-b.year);
;
Yes, it is there to rule out other possibilities. In your initial question, you asked to use "previous year" in case of no-match. Consider the following slightly modified data, run the code with and without the additional condition.
Data _2;
input Firm$ Year RepMeasure;
cards;
1004 1995 0.36
1004 1996 0.42
1160 1995 0.11
1160 1997 100
1235 1995 0.78
1235 1996 0.69
;
Haikuo
Two key questions ...
1. Can we rely on this being the full set of all variables in the two incoming data sets?
2. Which observations do you want in the output? There are at least two ways to imagine what the results should look like.
Hai.kuo that worked well. Thanks!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.