Hi,
I have 3 datasets and I hope to join them by first row because they are in different levels.
I assigned the base dataset to be (the other 2 will be left merged to the base dataset):
item state county by_county
Apple MD aa 4
Apple DC bb 3
Apple DC cc 2
Pear VA cc 6
2nd dataset:
Item state by_state
Apple MD 4
Apple DC 5
Pear VA 6
3rd dataset:
item by_item
Apple 9
Pear 6
I'm hoping to join the datasets to have output as the following:
Item state county by_county by_state by_item
Apple MD aa 4 4 9
Apple DC bb 3 5
Apple DC cc 2
Pear VA cc 6 6 6
You can order by item,state and county to determine the first row.
Thank you!
Normally you would do that in two steps. One to combine the datasets that use ITEM and STATE as the keys. Then a second step to combine the datasets that use ITEM as the key.
data one;
input item $ state $ county $ by_county;
cards;
Apple DC bb 3
Apple DC cc 2
Apple MD aa 4
Pear VA cc 6
;
data two;
input item $ state $ by_state;
cards;
Apple DC 5
Apple MD 4
Pear VA 6
;
data three;
input item $ by_item;
cards;
Apple 9
Pear 6
;
data step1;
merge one two;
by item state;
run;
data want;
merge step1 three;
by item;
run;
Do you really want the value of BY_ITEM to only appear on one of the multiple rows with that value of ITEM?
Here is trick to do that.
data step1;
merge one two;
by item state;
output;
call missing(of _all_);
run;
data want;
merge step1 three;
by item;
output;
call missing(of _all_);
run;
If (and this is a BIG if) you are positive that THREE has one and only one observation for every value of ITEM in ONE or TWO then you can try using this to combine in one step. But I would NOT use it for anything serious. Just as a check to see if it could work.
data want ;
merge one two;
by item state;
if first.item then set three;
output;
call missing(of _all_);
run;
To take that crazy idea even further if you are sure that TWO has one and only one observation for every ITEM/STATE combination that is in ONE then your code could become:
data want ;
set one ;
by item state;
if first.state then set two;
if first.item then set three;
output;
call missing(of _all_);
run;
Normally you would do that in two steps. One to combine the datasets that use ITEM and STATE as the keys. Then a second step to combine the datasets that use ITEM as the key.
data one;
input item $ state $ county $ by_county;
cards;
Apple DC bb 3
Apple DC cc 2
Apple MD aa 4
Pear VA cc 6
;
data two;
input item $ state $ by_state;
cards;
Apple DC 5
Apple MD 4
Pear VA 6
;
data three;
input item $ by_item;
cards;
Apple 9
Pear 6
;
data step1;
merge one two;
by item state;
run;
data want;
merge step1 three;
by item;
run;
Do you really want the value of BY_ITEM to only appear on one of the multiple rows with that value of ITEM?
Here is trick to do that.
data step1;
merge one two;
by item state;
output;
call missing(of _all_);
run;
data want;
merge step1 three;
by item;
output;
call missing(of _all_);
run;
If (and this is a BIG if) you are positive that THREE has one and only one observation for every value of ITEM in ONE or TWO then you can try using this to combine in one step. But I would NOT use it for anything serious. Just as a check to see if it could work.
data want ;
merge one two;
by item state;
if first.item then set three;
output;
call missing(of _all_);
run;
To take that crazy idea even further if you are sure that TWO has one and only one observation for every ITEM/STATE combination that is in ONE then your code could become:
data want ;
set one ;
by item state;
if first.state then set two;
if first.item then set three;
output;
call missing(of _all_);
run;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.