BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lydiawawa
Lapis Lazuli | Level 10

Hi,

 

I have 3 datasets and I hope to join them by first row because they are in different levels.

 

I assigned the base dataset to be (the other 2 will be left merged to the base dataset):

 

item  state  county  by_county

Apple MD   aa     4
Apple DC     bb    3
Apple DC     cc     2
Pear   VA     cc     6

2nd dataset:

 Item state  by_state

Apple MD   4
Apple DC    5
Pear   VA    6

3rd dataset:

item        by_item
Apple        9
Pear         6

I'm hoping to join the datasets to have output as the following:

 

Item   state  county  by_county   by_state  by_item
Apple   MD     aa     4            4        9
Apple   DC     bb     3            5
Apple   DC     cc     2       
Pear    VA     cc     6            6        6

You can order by item,state and county to determine the first row.

 

Thank you!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Normally you would do that in two steps. One to combine the datasets that use ITEM and STATE as the keys.  Then a second step to combine the datasets that use ITEM as the key.

data one;
 input item $ state $ county $  by_county;
cards;
Apple DC     bb    3
Apple DC     cc     2
Apple MD   aa     4
Pear  VA     cc     6
;

data two;
 input item $ state $ by_state;
cards;
Apple DC    5
Apple MD   4
Pear   VA    6
;

data three; 
  input item $  by_item;
cards;
Apple        9
Pear         6
;

data step1;
  merge one two;
  by item state;
run;

data want;
  merge step1 three;
  by item;
run;

Do you really want the value of BY_ITEM to only appear on one of the multiple rows with that value of ITEM?

Here is trick to do that.

data step1;
  merge one two;
  by item state;
  output;
  call missing(of _all_);
run;

data want;
  merge step1 three;
  by item;
  output;
  call missing(of _all_);
run;

If (and this is a BIG if) you are positive that THREE has one and only one observation for every value of ITEM in ONE or TWO then you can try using this to combine in one step.  But I would NOT use it for anything serious. Just as a check to see if it could work.

data want ;
  merge one two;
  by item state;
  if first.item then set three;
  output;
  call missing(of _all_);
run;

To take that crazy idea even further if you are sure that TWO has one and only one observation for every ITEM/STATE combination that is in ONE then your code could become:

data want ;
  set one ;
  by item state;
  if first.state then set two;
  if first.item then set three;
  output;
  call missing(of _all_);
run;

 

View solution in original post

1 REPLY 1
Tom
Super User Tom
Super User

Normally you would do that in two steps. One to combine the datasets that use ITEM and STATE as the keys.  Then a second step to combine the datasets that use ITEM as the key.

data one;
 input item $ state $ county $  by_county;
cards;
Apple DC     bb    3
Apple DC     cc     2
Apple MD   aa     4
Pear  VA     cc     6
;

data two;
 input item $ state $ by_state;
cards;
Apple DC    5
Apple MD   4
Pear   VA    6
;

data three; 
  input item $  by_item;
cards;
Apple        9
Pear         6
;

data step1;
  merge one two;
  by item state;
run;

data want;
  merge step1 three;
  by item;
run;

Do you really want the value of BY_ITEM to only appear on one of the multiple rows with that value of ITEM?

Here is trick to do that.

data step1;
  merge one two;
  by item state;
  output;
  call missing(of _all_);
run;

data want;
  merge step1 three;
  by item;
  output;
  call missing(of _all_);
run;

If (and this is a BIG if) you are positive that THREE has one and only one observation for every value of ITEM in ONE or TWO then you can try using this to combine in one step.  But I would NOT use it for anything serious. Just as a check to see if it could work.

data want ;
  merge one two;
  by item state;
  if first.item then set three;
  output;
  call missing(of _all_);
run;

To take that crazy idea even further if you are sure that TWO has one and only one observation for every ITEM/STATE combination that is in ONE then your code could become:

data want ;
  set one ;
  by item state;
  if first.state then set two;
  if first.item then set three;
  output;
  call missing(of _all_);
run;

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 600 views
  • 1 like
  • 2 in conversation