Solved: How to count one variable according to different dataset's two variabl...

yanshuai · Posted 09-10-2018 12:19 AM

Hello,

It has been a while since my last visit.

Miss you guys.

A new issue kinda stuck me. I have two tables.

data have1 ;
input
COUNTRY1 COUNTRY2 $5.;
datalines;
001  101
001  002
001  103
002  201
002  202
002  203
003  203
003  301
003  302
003  303
;
run;

data have2 ;
input
COUNTRY PRODUCT $5.;
datalines;
001  A1
001  A2
001  A3
002  A3
002  A4
002  A5
003  A5
003  A6
003  A7
003  A8
101  A1
103  A3
201  A3
202  A4
203  A5
203  A8
301  A6
302  A4
303  A6
;
run;

HAVE1 has two variables - each data point represent a country code.

HAVE2's COUNTRY variables contains all the HAVE1 data points. Besides, there is a corresponding PRODUCT variable for each COUNTRY.

What I would like to have is, based on the pairs in HAVE1, to compute the number of OVERLAP in PRODUCT between COUNTRY1 and COUNTRY2.

data want;
input
COUNTRY1 COUNTRY2 OVERLAP $5.;
datalines;
001  101  1
001  002  1
001  103  1
002  201  1
002  202  1
002  203  1
003  203  2
003  301  1
003  302  0
003  303  1
;
run;

I actually can achieve what I want. But it takes hundreds of codes in a very inefficient way.

Thus I wonder if there is any convenient way. I am exploring RIGHT JOIN or LEFT JOINT. But I don't know if I am right.

Thank you so much.

Tom · Posted 09-10-2018 04:38 PM

Would be easier if you combined your by year tables into a single table with a YEAR variable.

data have2 ;
   length dsname $50 ;
   set have2001-have2003 indsname=dsname ;
   length YEAR $4 ;
   year = substr(dsname,length(dsname)-3);
run;

View solution in original post

Tom · Posted 09-10-2018 12:51 AM

Seems pretty simple. Just join the list of pairs the country product list twice.

proc sql ;
  create table want as 
    select a.country1,a.country2,sum(b.product = c.product) as n_overlap
    from have1 a 
    left join have2 b on a.country1=b.country
    left join have2 c on a.country2=c.country
    group by 1,2
    order by 1,2
  ;
quit;

Obs    COUNTRY1    COUNTRY2    n_overlap

  1      001         002           1
  2      001         101           1
  3      001         103           1
  4      002         201           1
  5      002         202           1
  6      002         203           1
  7      003         203           2
  8      003         301           1
  9      003         302           0
 10      003         303           1

yanshuai · Posted 09-10-2018 01:40 PM

Thank you. This is exactly what I need.

I actually know it will use JOIN or LEFT JOIN or RIGHT JOIN. But I don't quite understand how those JOIN work. Thus it confuses me sometimes.

Thanks a lot.

yanshuai · Posted 09-10-2018 03:34 PM

I wonder what if I have 20 such tables. How can I merge them in one table according to the date.

Instead of HAVE1, HAVE2. Now I have

data have1 ;
input
COUNTRY1 YEAR COUNTRY2 $5.;
datalines;
001  2000  101
001  2000  002
001  2001  103
002  2000  201
002  2000  202
002  2001  203
003  2001  203
003  2000  301
003  2000  302
003  2001  303
;
run;

data have2000 ;
input
COUNTRY PRODUCT $5.;
datalines;
001  A1
001  A2
001  A3
002  A3
002  A4
002  A5
003  A5
003  A6
003  A7
003  A8
101  A1
103  A3
201  A3
202  A4
203  A5
203  A8
301  A6
302  A4
303  A6
;
run;

data have2001 ;
input
COUNTRY PRODUCT $5.;
datalines;
001  A3
001  A2
001  A1
002  A5
002  A4
002  A3
003  A4
003  A8
003  A7
003  A9
101  A1
103  A3
201  A3
202  A4
203  A6
203  A7
301  A5
302  A3
303  A5
;
run;

What I would like to have is to compute the OVERLAP only for the YEAR in HAVE1.

Shall I use WHERE condition?

Tom · Posted 09-10-2018 04:38 PM

Would be easier if you combined your by year tables into a single table with a YEAR variable.

data have2 ;
   length dsname $50 ;
   set have2001-have2003 indsname=dsname ;
   length YEAR $4 ;
   year = substr(dsname,length(dsname)-3);
run;

How to count one variable according to different dataset's two variables

Re: How to count one variable according to different dataset's two variables

Re: How to count one variable according to different dataset's two variables

Re: How to count one variable according to different dataset's two variables

Re: How to count one variable according to different dataset's two variables

Re: How to count one variable according to different dataset's two variables

Catch up on SAS Innovate 2026

How to count one variable according to different dataset's two variables

Re: How to count one variable according to different dataset's two variables

Re: How to count one variable according to different dataset's two variables

Re: How to count one variable according to different dataset's two variables

Re: How to count one variable according to different dataset's two variables

Re: How to count one variable according to different dataset's two variables

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away