Help using Base SAS procedures

join on multiple variable selecting matched variable only once

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 127
Accepted Solution

join on multiple variable selecting matched variable only once

Dear experts,

 

I should merge two data sets on two variables but taking the observation on the left part of the join only once. The output below generated should have for the second and fourth observation a value=""

 

DATA data1;
INPUT code $2. location $3. location2 $4.;
DATALINES;
61 UK ABC
45 CH DEF
03 CH DEF
11 JP GHI
14 JP GHI
;


DATA data2;
INPUT value $3. location $3. location2 $4.;
DATALINES;
100 UK ABC
200 CH DEF
300 JP GHI
;

proc sql; create table merge as select t0.*, t1.* from data1 t0 left outer join data2 t1
on (t1.location=t0.location and t1.location2=t0.location2) ;quit;

 

Thanks a lo in advance. BRs, SH


Accepted Solutions
Solution
‎09-14-2016 05:51 AM
Super User
Posts: 7,771

Re: join on multiple variable selecting matched variable only once

Posted in reply to Sir_Highbury

Try the data step method:

proc sort data=data1;
by location location2;
run;

proc sort data=data2;
by location location2;
run;

data merge2;
merge
  data1 (in=a)
  data2 (in=b)
;
by location location2;
if a;
if first.location2;
run;

You might want to consider additional sort criteria for data1 to put the desired code first.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers

View solution in original post


All Replies
Super User
Posts: 5,426

Re: join on multiple variable selecting matched variable only once

Posted in reply to Sir_Highbury

That'snot how SQL works. You requirement is row (observation) oriented, while SQL is column oriented.

SQL first performs a Cartesian join (in theory) and the filters using the on (or where criteria when used).

You need to use a data step to solve this.

Perhaps you need a better data model so you can describe which is the first observation in the "join group" (in case of resorting for instance), and why.

Data never sleeps
Frequent Contributor
Posts: 127

Re: join on multiple variable selecting matched variable only once

I just showed that I was not able to do it (using sql) and I asked how to get the output I want from the input I have. Where did I write to do it necessarily in sql? The only implicit constrain is using SAS. Pheraphs what I was doing is not related to any data model? Please try to answer accordingly to the question, thanks anyway for your effort.

Super User
Super User
Posts: 7,942

Re: join on multiple variable selecting matched variable only once

Posted in reply to Sir_Highbury

Thats not a logical join though.  There is no way of knowing, from the data, that any row is "first".  This is information you need to put in before joining, and example given below.  Note alo that the use of * causes warnings as variables appear in multiple tables - its not a good idea to use this syntax.

data data1;
  input code $2. location $3. location2 $4. first;
datalines;
61 UK ABC 1
45 CH DEF 1
03 CH DEF 0
11 JP GHI 1
14 JP GHI 0
;
run;

data data2;
  input value $3. location $3. location2 $4. first;
datalines;
100 UK ABC 1
200 CH DEF 1
300 JP GHI 1
; 
run;

proc sql; 
  create table MERGE as 
  select  T0.*, 
          T1.* 
  from    DATA1 T0 
  left join DATA2 T1
  on      T1.LOCATION=T0.LOCATION 
  and     T1.LOCATION2=T0.LOCATION2
  and     T1.FIRST=T0.FIRST;
quit;
Solution
‎09-14-2016 05:51 AM
Super User
Posts: 7,771

Re: join on multiple variable selecting matched variable only once

Posted in reply to Sir_Highbury

Try the data step method:

proc sort data=data1;
by location location2;
run;

proc sort data=data2;
by location location2;
run;

data merge2;
merge
  data1 (in=a)
  data2 (in=b)
;
by location location2;
if a;
if first.location2;
run;

You might want to consider additional sort criteria for data1 to put the desired code first.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 5,503

Re: join on multiple variable selecting matched variable only once

Posted in reply to KurtBremser

The DATA step is a workable approach here.  SQL does not guarantee any "order" to the incoming records.

 

Here are a couple of variations to consider on the DATA step approach.

 

Instead of sorting/merging by LOCATION LOCATION2, you could sort/merge by LOCATION2 LOCATION.  In the sample data you have provided, that would preserve the current order of the observations (and could conceivably eliminate the need for sorting).

 

For the final statement, you could consider this variation:

 

if first.location2=0 then value=' ';

 

That would keep all records, but reset VALUE to a blank value.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 650 views
  • 6 likes
  • 5 in conversation