Hello,
It has been a while since my last visit.
Miss you guys.
A new issue kinda stuck me. I have two tables.
data have1 ;
input
COUNTRY1 COUNTRY2 $5.;
datalines;
001 101
001 002
001 103
002 201
002 202
002 203
003 203
003 301
003 302
003 303
;
run;
data have2 ;
input
COUNTRY PRODUCT $5.;
datalines;
001 A1
001 A2
001 A3
002 A3
002 A4
002 A5
003 A5
003 A6
003 A7
003 A8
101 A1
103 A3
201 A3
202 A4
203 A5
203 A8
301 A6
302 A4
303 A6
;
run;
HAVE1 has two variables - each data point represent a country code.
HAVE2's COUNTRY variables contains all the HAVE1 data points. Besides, there is a corresponding PRODUCT variable for each COUNTRY.
What I would like to have is, based on the pairs in HAVE1, to compute the number of OVERLAP in PRODUCT between COUNTRY1 and COUNTRY2.
data want;
input
COUNTRY1 COUNTRY2 OVERLAP $5.;
datalines;
001 101 1
001 002 1
001 103 1
002 201 1
002 202 1
002 203 1
003 203 2
003 301 1
003 302 0
003 303 1
;
run;
I actually can achieve what I want. But it takes hundreds of codes in a very inefficient way.
Thus I wonder if there is any convenient way. I am exploring RIGHT JOIN or LEFT JOINT. But I don't know if I am right.
Thank you so much.
Would be easier if you combined your by year tables into a single table with a YEAR variable.
data have2 ;
length dsname $50 ;
set have2001-have2003 indsname=dsname ;
length YEAR $4 ;
year = substr(dsname,length(dsname)-3);
run;
Seems pretty simple. Just join the list of pairs the country product list twice.
proc sql ;
create table want as
select a.country1,a.country2,sum(b.product = c.product) as n_overlap
from have1 a
left join have2 b on a.country1=b.country
left join have2 c on a.country2=c.country
group by 1,2
order by 1,2
;
quit;
Obs COUNTRY1 COUNTRY2 n_overlap 1 001 002 1 2 001 101 1 3 001 103 1 4 002 201 1 5 002 202 1 6 002 203 1 7 003 203 2 8 003 301 1 9 003 302 0 10 003 303 1
Thank you. This is exactly what I need.
I actually know it will use JOIN or LEFT JOIN or RIGHT JOIN. But I don't quite understand how those JOIN work. Thus it confuses me sometimes.
Thanks a lot.
I wonder what if I have 20 such tables. How can I merge them in one table according to the date.
Instead of HAVE1, HAVE2. Now I have
data have1 ;
input
COUNTRY1 YEAR COUNTRY2 $5.;
datalines;
001 2000 101
001 2000 002
001 2001 103
002 2000 201
002 2000 202
002 2001 203
003 2001 203
003 2000 301
003 2000 302
003 2001 303
;
run;
data have2000 ;
input
COUNTRY PRODUCT $5.;
datalines;
001 A1
001 A2
001 A3
002 A3
002 A4
002 A5
003 A5
003 A6
003 A7
003 A8
101 A1
103 A3
201 A3
202 A4
203 A5
203 A8
301 A6
302 A4
303 A6
;
run;
data have2001 ;
input
COUNTRY PRODUCT $5.;
datalines;
001 A3
001 A2
001 A1
002 A5
002 A4
002 A3
003 A4
003 A8
003 A7
003 A9
101 A1
103 A3
201 A3
202 A4
203 A6
203 A7
301 A5
302 A3
303 A5
;
run;
What I would like to have is to compute the OVERLAP only for the YEAR in HAVE1.
Shall I use WHERE condition?
Would be easier if you combined your by year tables into a single table with a YEAR variable.
data have2 ;
length dsname $50 ;
set have2001-have2003 indsname=dsname ;
length YEAR $4 ;
year = substr(dsname,length(dsname)-3);
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.