- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
you can do thiscouple of ways. this is untested code. i would prefer the second way
proc sql;
select distinct 1.GVKEY, lag_year, meansale
from table1 left join table2
on 1.gvkey=2.gvkey
and 1.lag_year=2.year;
or
proc sql
select a.*, meansale from
(select distinct * from table2)a
left join
(select distinct GVKEY, year, meansale from table 2)b
on a.gvkey=b.gvkey
and a.lag_year=b.year;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I suspect you need another criteria on your join.
It looks like at present you have multiples by your join key variables,
GVKEY and YEAR, in each data set. If this is correct, then for every N1 records in Table1 and N2 records in Table2 for a given GVKEY and YEAR you'll get N1*N2 records.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Does table 2 contain only the two variables - GVKEY and Lag_Year ?
@trungcva112 wrote "2nd dataset (with lag_year = year (in 1st dataset) -1 )" -
does it mean that the 2nd table was derived from table1 ?
If positive, you can create the wanted table directly from table1:
proc sort data=tabl1 out=temp nodupkey ; by gvkey year. run;data want;
set temp; by gvkey year. retain year1; drop year1; if first.gvkey then year1=year;
lag_year = year - 1; if lag_year = year1 then meansale = .;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Is there any way to not include the duplicates with PROC SQL (or any other method?)
Why not a simple merge after sorting. If cartesian is not considered an overhead before left join on equality operator to filter your needs, I am sure sort should be acceptable. Else, load your have1 with multidata:yes in a hash object choosing keys gvkey and year. look up from have2 in the set statement with hash.find() method else call missing(meansale). Good night!
data have1;
input GVKEY $ Year Meansale ;
datalines;
1001 1983 10
1001 1983 10
1001 1983 10
1001 1983 10
1001 1984 15
1001 1984 15
1001 1984 15
1001 1984 15
1001 1984 15
;
data have2;
input GVKEY $ Lag_Year;
datalines;
1001 1982
1001 1982
1001 1982
1001 1982
1001 1983
1001 1983
1001 1983
1001 1983
1001 1983
;
data want;
merge have2(in=a) have1(rename=(Year=Lag_Year));
by gvkey lag_year;
if a;
run;