BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
yanshuai
Quartz | Level 8

Hello,

 

It has been a while since my last visit.

Miss you guys.

 

A new issue kinda stuck me. I have two tables.

data have1 ;
input
COUNTRY1 COUNTRY2 $5.;
datalines;
001  101
001  002
001  103
002  201
002  202
002  203
003  203
003  301
003  302
003  303
;
run;

data have2 ;
input
COUNTRY PRODUCT $5.;
datalines;
001  A1
001  A2
001  A3
002  A3
002  A4
002  A5
003  A5
003  A6
003  A7
003  A8
101  A1
103  A3
201  A3
202  A4
203  A5
203  A8
301  A6
302  A4
303  A6
;
run;

HAVE1 has two variables - each data point represent a country code.

HAVE2's COUNTRY variables contains all the HAVE1 data points. Besides, there is a corresponding PRODUCT variable for each COUNTRY.

 

What I would like to have is, based on the pairs in HAVE1, to compute the number of OVERLAP in PRODUCT between COUNTRY1 and COUNTRY2.

 

data want;
input
COUNTRY1 COUNTRY2 OVERLAP $5.;
datalines;
001  101  1
001  002  1
001  103  1
002  201  1
002  202  1
002  203  1
003  203  2
003  301  1
003  302  0
003  303  1
;
run;

I actually can achieve what I want. But it takes hundreds of codes in a very inefficient way.

Thus I wonder if there is any convenient way. I am exploring RIGHT JOIN or LEFT JOINT. But I don't know if I am right.

 

Thank you so much.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Would be easier if you combined your by year tables into a single table with a YEAR variable.

data have2 ;
   length dsname $50 ;
   set have2001-have2003 indsname=dsname ;
   length YEAR $4 ;
   year = substr(dsname,length(dsname)-3);
run;

View solution in original post

4 REPLIES 4
Tom
Super User Tom
Super User

Seems pretty simple.  Just join the list of pairs the country product list twice.

proc sql ;
  create table want as 
    select a.country1,a.country2,sum(b.product = c.product) as n_overlap
    from have1 a 
    left join have2 b on a.country1=b.country
    left join have2 c on a.country2=c.country
    group by 1,2
    order by 1,2
  ;
quit;
Obs    COUNTRY1    COUNTRY2    n_overlap

  1      001         002           1
  2      001         101           1
  3      001         103           1
  4      002         201           1
  5      002         202           1
  6      002         203           1
  7      003         203           2
  8      003         301           1
  9      003         302           0
 10      003         303           1
yanshuai
Quartz | Level 8

Thank you. This is exactly what I need.

 

I actually know it will use JOIN or LEFT JOIN or RIGHT JOIN. But I don't quite understand how those JOIN work. Thus it confuses me sometimes.

 

Thanks a lot.

yanshuai
Quartz | Level 8

I wonder what if I have 20 such tables. How can I merge them in one table according to the date.

Instead of HAVE1, HAVE2. Now I have

data have1 ;
input
COUNTRY1 YEAR COUNTRY2 $5.;
datalines;
001  2000  101
001  2000  002
001  2001  103
002  2000  201
002  2000  202
002  2001  203
003  2001  203
003  2000  301
003  2000  302
003  2001  303
;
run;

data have2000 ;
input
COUNTRY PRODUCT $5.;
datalines;
001  A1
001  A2
001  A3
002  A3
002  A4
002  A5
003  A5
003  A6
003  A7
003  A8
101  A1
103  A3
201  A3
202  A4
203  A5
203  A8
301  A6
302  A4
303  A6
;
run;

data have2001 ;
input
COUNTRY PRODUCT $5.;
datalines;
001  A3
001  A2
001  A1
002  A5
002  A4
002  A3
003  A4
003  A8
003  A7
003  A9
101  A1
103  A3
201  A3
202  A4
203  A6
203  A7
301  A5
302  A3
303  A5
;
run;

What I would like to have is to compute the OVERLAP only for the YEAR in HAVE1.

Shall I use WHERE condition?

 

Tom
Super User Tom
Super User

Would be easier if you combined your by year tables into a single table with a YEAR variable.

data have2 ;
   length dsname $50 ;
   set have2001-have2003 indsname=dsname ;
   length YEAR $4 ;
   year = substr(dsname,length(dsname)-3);
run;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1389 views
  • 1 like
  • 2 in conversation