Hi,
question on merge. I would like to merge 2 data sets
DATA_1 | |||||||
HOME_a | DESCRIPTION | VEDNOR_a | BRAND_a | DISTRIBUTOR_a | SALES_a | QUANTITY_a | |
HOME 1 | MUFFINS | VENDOR 1 | BRAND 1 | DIST 1 | $519 | 10 | |
HOME 2 | MUFFINS | VENDOR 1 | BRAND 1 | DIST 2 | $536 | 10 | |
TOTAL | $1,055 | 20 |
DATA_2 | |||||||
HOME_b | DESCRIPTION | VEDNOR_b | BRAND_b | DISTRIBUTOR | QUANTITY_b | SALES_b | REBATES |
HOME 2 | MUFFINS | VENDOR 1 | BRAND 1 | DIST 2 | 10 | $536 | $17 |
the code I wrote is
data COMB;
merge data_1 (in=a)
data_2(in=b);
by Description
if a=1 and b=1;
run;
I am getting
HOME_a | DESCRIPTION | VEDNOR_a | BRAND_a | DISTRIBUTOR_a | SALES_a | QUANTITY_a | HOME_b | DESCRIPTION_b | VEDNOR_b | BRAND_b | QUANTITY_b | SALES_b | REBATES |
HOME 1 | MUFFINS | VENDOR 1 | BRAND 1 | DIST 1 | $519 | 10 | HOME 2 | MUFFINS | VENDOR 1 | BRAND 1 | 10 | $536 | $17 |
HOME 2 | MUFFINS | VENDOR 1 | BRAND 1 | DIST 2 | $536 | 10 | HOME 2 | MUFFINS | VENDOR 1 | BRAND 1 | 10 | $536 | $17 |
What I want is
HOME | DESCRIPTION | VEDNOR | BRAND | DISTRIBUTOR | SALES | QUANTITY | REBATES |
HOME 2 | MUFFINS | VENDOR 1 | BRAND 1 | DIST 2 | $536 | 10 | $17 |
I am not sure why the rebates and sales are getting duplicated.
Thanks,
That's because DESCRIPTION contains duplicate values. You need to merge by other variables as well like HOME_2.
Ok, so it looks like i need to have something unique. I am working with transactional data, therefore I find it a little challenging
DATA 1 has the transactional data with sales only and DATA 2 has transactional data with only rebates. Maybe I need to create a new variable and combine 2 variables to make it unique?
You probably want to INTERLEAVE the observations instead of MERGING them.
Use SET instead of merge and you will get all of the observations from both datasets.
You can then apply your rules to combine them in a way that makes sense.
For example if you wanted to find the total cost after removing the rebates you might do something like to add the COST and subtract the REBATE into a new variable TOTAL and collapse to just one observation per ID group.
data want;
set costs rebates;
by id;
if first.id then total=0;
total+cost-rebate;
if last.id then output;
run;
Looking at your sample data it appears the key for merging/joining needs to be:
HOME | DESCRIPTION | VEDNO | BRAND | DISTRIBUTOR |
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.