Solved: Multiple joins by first row

lydiawawa · Posted 02-25-2020 12:23 PM

Hi,

I have 3 datasets and I hope to join them by first row because they are in different levels.

I assigned the base dataset to be (the other 2 will be left merged to the base dataset):

item  state  county  by_county

Apple MD   aa     4
Apple DC     bb    3
Apple DC     cc     2
Pear   VA     cc     6

2nd dataset:

 Item state  by_state

Apple MD   4
Apple DC    5
Pear   VA    6

3rd dataset:

item        by_item
Apple        9
Pear         6

I'm hoping to join the datasets to have output as the following:

Item   state  county  by_county   by_state  by_item
Apple   MD     aa     4            4        9
Apple   DC     bb     3            5
Apple   DC     cc     2       
Pear    VA     cc     6            6        6

You can order by item,state and county to determine the first row.

Thank you!

Tom · Posted 02-25-2020 12:58 PM

Normally you would do that in two steps. One to combine the datasets that use ITEM and STATE as the keys. Then a second step to combine the datasets that use ITEM as the key.

data one;
 input item $ state $ county $  by_county;
cards;
Apple DC     bb    3
Apple DC     cc     2
Apple MD   aa     4
Pear  VA     cc     6
;

data two;
 input item $ state $ by_state;
cards;
Apple DC    5
Apple MD   4
Pear   VA    6
;

data three; 
  input item $  by_item;
cards;
Apple        9
Pear         6
;

data step1;
  merge one two;
  by item state;
run;

data want;
  merge step1 three;
  by item;
run;

Do you really want the value of BY_ITEM to only appear on one of the multiple rows with that value of ITEM?

Here is trick to do that.

data step1;
  merge one two;
  by item state;
  output;
  call missing(of _all_);
run;

data want;
  merge step1 three;
  by item;
  output;
  call missing(of _all_);
run;

If (and this is a BIG if) you are positive that THREE has one and only one observation for every value of ITEM in ONE or TWO then you can try using this to combine in one step. But I would NOT use it for anything serious. Just as a check to see if it could work.

data want ;
  merge one two;
  by item state;
  if first.item then set three;
  output;
  call missing(of _all_);
run;

To take that crazy idea even further if you are sure that TWO has one and only one observation for every ITEM/STATE combination that is in ONE then your code could become:

data want ;
  set one ;
  by item state;
  if first.state then set two;
  if first.item then set three;
  output;
  call missing(of _all_);
run;

View solution in original post

Tom · Posted 02-25-2020 12:58 PM

Normally you would do that in two steps. One to combine the datasets that use ITEM and STATE as the keys. Then a second step to combine the datasets that use ITEM as the key.

data one;
 input item $ state $ county $  by_county;
cards;
Apple DC     bb    3
Apple DC     cc     2
Apple MD   aa     4
Pear  VA     cc     6
;

data two;
 input item $ state $ by_state;
cards;
Apple DC    5
Apple MD   4
Pear   VA    6
;

data three; 
  input item $  by_item;
cards;
Apple        9
Pear         6
;

data step1;
  merge one two;
  by item state;
run;

data want;
  merge step1 three;
  by item;
run;

Do you really want the value of BY_ITEM to only appear on one of the multiple rows with that value of ITEM?

Here is trick to do that.

data step1;
  merge one two;
  by item state;
  output;
  call missing(of _all_);
run;

data want;
  merge step1 three;
  by item;
  output;
  call missing(of _all_);
run;

If (and this is a BIG if) you are positive that THREE has one and only one observation for every value of ITEM in ONE or TWO then you can try using this to combine in one step. But I would NOT use it for anything serious. Just as a check to see if it could work.

data want ;
  merge one two;
  by item state;
  if first.item then set three;
  output;
  call missing(of _all_);
run;

To take that crazy idea even further if you are sure that TWO has one and only one observation for every ITEM/STATE combination that is in ONE then your code could become:

data want ;
  set one ;
  by item state;
  if first.state then set two;
  if first.item then set three;
  output;
  call missing(of _all_);
run;

Multiple joins by first row

Re: Multiple joins by first row

Re: Multiple joins by first row

Registration is open

Multiple joins by first row

Re: Multiple joins by first row

Re: Multiple joins by first row

Registration is open

SAS Training: Just a Click Away