you can do thiscouple of ways. this is untested code. i would prefer the second way
proc sql;
select distinct 1.GVKEY, lag_year, meansale
from table1 left join table2
on 1.gvkey=2.gvkey
and 1.lag_year=2.year;
or
proc sql
select a.*, meansale from
(select distinct * from table2)a
left join
(select distinct GVKEY, year, meansale from table 2)b
on a.gvkey=b.gvkey
and a.lag_year=b.year;
I suspect you need another criteria on your join.
It looks like at present you have multiples by your join key variables,
GVKEY and YEAR, in each data set. If this is correct, then for every N1 records in Table1 and N2 records in Table2 for a given GVKEY and YEAR you'll get N1*N2 records.
Does table 2 contain only the two variables - GVKEY and Lag_Year ?
@trungcva112 wrote "2nd dataset (with lag_year = year (in 1st dataset) -1 )" -
does it mean that the 2nd table was derived from table1 ?
If positive, you can create the wanted table directly from table1:
proc sort data=tabl1 out=temp nodupkey ; by gvkey year. run;
data want;
set temp;
by gvkey year.
retain year1; drop year1;
if first.gvkey then year1=year;
lag_year = year - 1;
if lag_year = year1 then meansale = .;
run;
Is there any way to not include the duplicates with PROC SQL (or any other method?)
Why not a simple merge after sorting. If cartesian is not considered an overhead before left join on equality operator to filter your needs, I am sure sort should be acceptable. Else, load your have1 with multidata:yes in a hash object choosing keys gvkey and year. look up from have2 in the set statement with hash.find() method else call missing(meansale). Good night!
data have1;
input GVKEY $ Year Meansale ;
datalines;
1001 1983 10
1001 1983 10
1001 1983 10
1001 1983 10
1001 1984 15
1001 1984 15
1001 1984 15
1001 1984 15
1001 1984 15
;
data have2;
input GVKEY $ Lag_Year;
datalines;
1001 1982
1001 1982
1001 1982
1001 1982
1001 1983
1001 1983
1001 1983
1001 1983
1001 1983
;
data want;
merge have2(in=a) have1(rename=(Year=Lag_Year));
by gvkey lag_year;
if a;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.