Solved: Re: How to merge many tables and replace variables

yanshuai · Posted 09-10-2018 04:42 PM

Hello,

I have many many tables. And I would like to have them merged together. I would like to put OVERLAP in HAVE2000 in 2000's row in HAVE1. And put OVERLAP in HAVE2001 in 2001 row in HAVE1.

data have1 ;
input
COUNTRY1 YEAR COUNTRY2 $5.;
datalines;
001  2000  101
001  2000  002
001  2001  103
002  2000  201
002  2000  202
002  2001  203
003  2001  203
003  2000  301
003  2000  302
003  2001  303
;
run;

data have2000 ;
input
COUNTRY1 COUNTRY2 OVERLAP $5.;
datalines;
001  101  1
001  002  2
001  103  1
002  201  0
002  202  1
002  203  2
003  203  1
003  301  2
003  302  0
003  303  0
;
run;

data have2001 ;
input
COUNTRY1 COUNTRY2 OVERLAP $5.;
datalines;
001  101  2
001  002  1
001  103  0
002  201  1
002  202  2
002  203  1
003  203  1
003  301  1
003  302  2
003  303  0
;
run;

Want is like this

data want ;
input
COUNTRY1 YEAR COUNTRY2 OVERLAP $5.;
datalines;
001  2000  101  1
001  2000  002  2
001  2001  103  0
002  2000  201  0
002  2000  202  1
002  2001  203  1
003  2001  203  1
003  2000  301  2
003  2000  302  0
003  2001  303  0
;
run;

Shall I use left join or something?

Thank you so much.

PGStats · Posted 09-10-2018 11:06 PM

If you prefer SQL:

proc sql;
select have1.*, overlap 
from have1 inner join have2000 on
    have1.year=2000 and 
    have1.country1=have2000.country1 and 
    have1.country2=have2000.country2
union all
select have1.*, overlap  
from have1 inner join have2001 on
    have1.year=2001 and 
    have1.country1=have2001.country1 and 
    have1.country2=have2001.country2;
quit;

PG

View solution in original post

novinosrin · Posted 09-10-2018 05:04 PM

data have1 ;
input
COUNTRY1 YEAR COUNTRY2 ;
datalines;
001  2000  101
001  2000  002
001  2001  103
002  2000  201
002  2000  202
002  2001  203
003  2001  203
003  2000  301
003  2000  302
003  2001  303
;
run;

data have2000 ;
input
COUNTRY1 COUNTRY2 OVERLAP $5.;
datalines;
001  101  1
001  002  2
001  103  1
002  201  0
002  202  1
002  203  2
003  203  1
003  301  2
003  302  0
003  303  0
;
run;

data have2001 ;
input
COUNTRY1 COUNTRY2 OVERLAP $5.;
datalines;
001  101  2
001  002  1
001  103  0
002  201  1
002  202  2
002  203  1
003  203  1
003  301  1
003  302  2
003  303  0
;
run;

data want;
if 0 then set have1;
dcl hash H (multidata:'y') ;
   h.definekey  ('year','COUNTRY1','COUNTRY2') ;
   h.definedata ('overlap') ;
   h.definedone();
do until(lr);
set have2000 have2001 indsname=name end=lr;
year=input(compress(name,,'kd'),8.);
rc=h.add();
end;
lr=0;
do until(lr);
set have1;
rc=h.find();
output;
end;
stop;
drop rc;
run;

yanshuai · Posted 09-13-2018 01:35 PM

Thank you very much for the answer!

PGStats · Posted 09-10-2018 11:06 PM

If you prefer SQL:

proc sql;
select have1.*, overlap 
from have1 inner join have2000 on
    have1.year=2000 and 
    have1.country1=have2000.country1 and 
    have1.country2=have2000.country2
union all
select have1.*, overlap  
from have1 inner join have2001 on
    have1.year=2001 and 
    have1.country1=have2001.country1 and 
    have1.country2=have2001.country2;
quit;

PG

yanshuai · Posted 09-13-2018 01:35 PM

Thank you.

SQL is always my favorite. So far I just add a YEAR variable to and append all have&year. tables and match by year.

SQL is also working. But since I have 20 tables have&years., should I add UNION ALL 20 times in SQL?

Thank you. If there is a convenient way to repeat UNION ALL steps by year, it will be very greate.

PGStats · Posted 09-13-2018 03:44 PM

It would be a lot simpler to concatenate your yearly datasets (while adding a YEAR variable) before joining with your main (have1) dataset on YEAR, COUNTRY1, and COUNTRY2.

Concatenate with something like:

data haveAll;

set have20: INDSNAME=dsn;

/* get year from dataset name, such as WORK.HAVE2010 */

year = input(substr(scan(dsn, 2), 5), best.);

run;

PG

yanshuai · Posted 09-13-2018 04:08 PM

Thank you very much.

I actually did that. Here is something I use so far. It has been a while since last using....but it seems work out fine.

data haveALL ;
   set have2000-have2017 indsname=dsname ;
   length YEAR $4. ;
   year = substr(dsname,length(dsname)-3);
run;

PGStats · Posted 09-13-2018 04:26 PM

Great! But year is a number in have1 and created as a character variable in haveALL. You can make them more compatible with the input function.

PG

Registration is open

SAS Training: Just a Click Away