Solved: Re: join on multiple variable selecting matched variable only once

Sir_Highbury · Posted 09-13-2016 08:51 AM

Dear experts,

I should merge two data sets on two variables but taking the observation on the left part of the join only once. The output below generated should have for the second and fourth observation a value=""

DATA data1;
INPUT code $2. location $3. location2 $4.;
DATALINES;
61 UK ABC
45 CH DEF
03 CH DEF
11 JP GHI
14 JP GHI
;

DATA data2;
INPUT value $3. location $3. location2 $4.;
DATALINES;
100 UK ABC
200 CH DEF
300 JP GHI
;

proc sql; create table merge as select t0.*, t1.* from data1 t0 left outer join data2 t1
on (t1.location=t0.location and t1.location2=t0.location2) ;quit;

Thanks a lo in advance. BRs, SH

Kurt_Bremser · Posted 09-13-2016 09:08 AM

Try the data step method:

proc sort data=data1;
by location location2;
run;

proc sort data=data2;
by location location2;
run;

data merge2;
merge
  data1 (in=a)
  data2 (in=b)
;
by location location2;
if a;
if first.location2;
run;

You might want to consider additional sort criteria for data1 to put the desired code first.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

LinusH · Posted 09-13-2016 09:04 AM

That'snot how SQL works. You requirement is row (observation) oriented, while SQL is column oriented.

SQL first performs a Cartesian join (in theory) and the filters using the on (or where criteria when used).

You need to use a data step to solve this.

Perhaps you need a better data model so you can describe which is the first observation in the "join group" (in case of resorting for instance), and why.

Data never sleeps

Sir_Highbury · Posted 09-14-2016 05:58 AM

I just showed that I was not able to do it (using sql) and I asked how to get the output I want from the input I have. Where did I write to do it necessarily in sql? The only implicit constrain is using SAS. Pheraphs what I was doing is not related to any data model? Please try to answer accordingly to the question, thanks anyway for your effort.

RW9 · Posted 09-13-2016 09:06 AM

Thats not a logical join though. There is no way of knowing, from the data, that any row is "first". This is information you need to put in before joining, and example given below. Note alo that the use of * causes warnings as variables appear in multiple tables - its not a good idea to use this syntax.

data data1;
  input code $2. location $3. location2 $4. first;
datalines;
61 UK ABC 1
45 CH DEF 1
03 CH DEF 0
11 JP GHI 1
14 JP GHI 0
;
run;

data data2;
  input value $3. location $3. location2 $4. first;
datalines;
100 UK ABC 1
200 CH DEF 1
300 JP GHI 1
; 
run;

proc sql; 
  create table MERGE as 
  select  T0.*, 
          T1.* 
  from    DATA1 T0 
  left join DATA2 T1
  on      T1.LOCATION=T0.LOCATION 
  and     T1.LOCATION2=T0.LOCATION2
  and     T1.FIRST=T0.FIRST;
quit;

Kurt_Bremser · Posted 09-13-2016 09:08 AM

Try the data step method:

proc sort data=data1;
by location location2;
run;

proc sort data=data2;
by location location2;
run;

data merge2;
merge
  data1 (in=a)
  data2 (in=b)
;
by location location2;
if a;
if first.location2;
run;

You might want to consider additional sort criteria for data1 to put the desired code first.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Astounding · Posted 09-13-2016 09:47 AM

The DATA step is a workable approach here. SQL does not guarantee any "order" to the incoming records.

Here are a couple of variations to consider on the DATA step approach.

Instead of sorting/merging by LOCATION LOCATION2, you could sort/merge by LOCATION2 LOCATION. In the sample data you have provided, that would preserve the current order of the observations (and could conceivably eliminate the need for sorting).

For the final statement, you could consider this variation:

if first.location2=0 then value=' ';

That would keep all records, but reset VALUE to a blank value.

join on multiple variable selecting matched variable only once

Re: join on multiple variable selecting matched variable only once

Re: join on multiple variable selecting matched variable only once

Re: join on multiple variable selecting matched variable only once

Re: join on multiple variable selecting matched variable only once

Re: join on multiple variable selecting matched variable only once

Re: join on multiple variable selecting matched variable only once

Registration is open

SAS Training: Just a Click Away