Re: Merging 2 BY variables

Gladis6680 · Posted 06-07-2019 12:24 PM

Hi, I have a data with Produce as my BY Variable /unique variable

My goal is to give potential Amounts to Produce that does not have an amount.

I started off by creating another data set with Produce that have an amount and then later thought of merging with the main data, but my dilemma is I have 3 Chicken with 2 different Manufacturer and 2 brands giving 3 amounts, similarly for MILK.

Ideally I would like to show amounts of Chicken with the corresponding Manufacturer, Brand and Amount.

Produce	Manufacturer	Brand	Amount
Beans	CLICxxx	Clicxxxx	$0
Kidneys	Pantry	Pantry Shelf	$0
Brussel Sprouts	Limson	Limson	$0
Cabbage	Bear	Bear Creek	$0
Chicken	Grannys	Grannys Poultry Cooperative	$0
Milk	Global	Global Direct	$0
Beans	Ital	Ital-Pasta	$4
Kidneys	Primo	Primo	$5
Brussel Sprouts	Alasko	Alasko	$6
Cabbage	Keybrand	Laurie's	$7
Chicken	Galco	Flamingo	$2
Chicken	Maple		$3
Chicken	Maple	Kitchen Essential	$6
Milk	Agropur	Sealtest	$5
Milk	Saputo	Neilson	$6

Reeza · Posted 06-07-2019 01:55 PM

Imputation for missing values - you do need to specify the rules. Personally, If the brand or manufacture were the same I'd take one of those, likely Brand first, then manufacturer. And then if it was still missing I'd fill in with the Median value for that item instead.

Gladis6680 · Posted 06-07-2019 02:04 PM

Thank you for your response

Can you please explain what you mean with an example

Gladis6680 · Posted 06-09-2019 08:24 AM

Hi, I feel this subject may not be the appropriate title.sorry

I thought I would explain my concern once again. I have changed the data slightly. Produce is my unique variable/by variable

Produce	Manufacturer	Brand	Amount
Beans	CLICxxx	Clicxxxx	$0
Kidneys	Pantry	Pantry Shelf	$0
Brussel Sprouts	Limson	Limson	$0
Cabbage	Bear	Bear Creek	$0
Chicken	Grannys	Grannys Poultry Cooperative	$0
Milk	Global	Global Direct	$0
Beans	Ital	Ital-Pasta	$4
Kidneys	Primo	Primo	$5
Brussel Sprouts	Alasko	Alasko	$6
Cabbage	Keybrand	Laurie's	$7
Chicken	Galco	Flamingo	$2
Chicken	Maple	Sofina	$3
Chicken	Maple	Kitchen Essential	$6
Milk	Agropur	Sealtest	$5
Milk	Saputo	Neilson	$6

What I would like is the table below:

Produce

Manufacturer

Brand

Amount

Manufacturer_1

New_Brand_1

Potential_Amount_1

Manufacturer_2

New_Brand_2

Potential_Amount_2

Manufacturer_3

New_Brand_3

Potential_Amount_3

Beans

CLICxxx

Clicxxxx

$0

Ital

Ital-Pasta

$4

Kidneys

Pantry

Pantry Shelf

$0

Primo

$5

Brussel Sprouts

Limson

$0

Alasko

$6

Cabbage

Bear

Bear Creek

$0

Keybrand

Laurie's

$7

Chicken

Grannys

Grannys Poultry Cooperative

$0

Galco

Flamingo

$2

Maple

Sofina

$3

Maple

Kitchen Essential

$6

Milk

Global

Global Direct

$0

Agropur

Sealtest

$5

Saputo

Neilson

$6

Basically for the same Produce variable that does not have an Amount, I would like to give Potential amounts with the corresponding Manufacturer and Brand and if there are multiple Manufactuers, Brands and Amounts for the Produce variable...in this case Chicken and Milk, I would like to see those potential Manufacturers, Brands, Amounts.

Reeza · Posted 06-09-2019 06:59 PM

You'll have to merge in your data. I'd probably recommend doing all the merges at once and then capture the values into a row. Then using the logic to determine which values you want to use to fill it in.

If you're having trouble with this, rest assured it's not a beginner level problem and tbh the biggest difficulty is setting the logic, not necessarily the programming.

Gladis6680 · Posted 06-09-2019 07:15 PM

Thank you, is it possible for you to help me with a sample code. I think it will be easier for me to understand

Reeza · Posted 06-09-2019 08:17 PM

Untested but this should get you started. You'll end up with a data set, step 3 that you can then decide how to impute the missing data.

Note that you can end up with duplicate matches so you'll need to decide how to filter/limit when you have multiple matches. You may want to add a key to your 'have' table to uniquely identify the rows.

DATA HAVE;
INFILE CARDS DLM=',' TRUNCOVER;
LENGTH PRODUCE MANUFACTURER BRAND $20;
INPUT Produce $ Manufacturer  $  Brand $  Amount;
CARDS;
Beans,   CLICxxx, Clicxxxx,    0
Kidneys, Pantry,  Pantry Shelf,    0
Brussel Sprouts, Limson,  Limson,  0
Cabbage, Bear,    Bear Creek,  0
Chicken, Grannys,     Grannys Poultry Cooperative, 0
Milk,    Global,  Global Direct,   0
Beans,   Ital,    Ital-Pasta,  4
Kidneys, Primo,   Primo,   5
Brussel Sprouts, Alasko,  Alasko,  6
Cabbage, Keybrand,    Laurie's,    7
Chicken, Galco,   Flamingo,    2
Chicken, Maple,    ,   3
Chicken, Maple,   Kitchen Essential,   6
Milk,    Agropur, Sealtest,    5
Milk,    Saputo,  Neilson, 6
;;;;


*MERGE IN BY MANUFACTURER AND BRAND;
proc sql;
create table step1 as
select t1.*, t2.amount as amount1
from have as t1
left join have as t2
on t1.produce = t2.produce 
and t1.manufacturer=t2.manufacturer
and t1.brand=t2.brand;
quit;

*MERGE IN BY BRAND;
proc sql;
create table step2 as
select t1.*, t2.amount as amount2
from step1 as t1
left join have as t2
on t1.produce = t2.produce 
and t1.brand=t2.brand;
quit;

*MERGE IN BY Manufacturer;
proc sql;
create table step3 as
select t1.*, t2.amount as amount3
from step2 as t1
left join have as t2
on t1.produce = t2.produce 
and t1.manufacturer=t2.manufacturer;
quit;


/*insert data step to fill in logic here*/

data want;
set step3;

if amount = 0 the coalesce(amount1, amount2, amount3);

run;

@Gladis6680 wrote:

Thank you, is it possible for you to help me with a sample code. I think it will be easier for me to understand

Gladis6680 · Posted 06-09-2019 08:26 PM

Thank you very much, I forgot to mention, that ..yes i am still figuring out which logic to use first(I may use the min AMOUNT, or max AMOUNT) for duplicates (like for Chicken,which Manufacture and which Brand I should use) .....but for the time being I would like the output to have all the the Chicken - Manufacture, Brand and Amount side by side like Manufacture_1 Brand _1 Manufacture_2 Brand_2.

Also I am so sorry is it also possible to give this to me in a DATA STEP as I not familiar with SQL.

Thank for your help

Filling in missing values