Solved: Can I replace this join?

mrzlatan91 · Posted 09-08-2018 07:04 AM

Hey guys, I have a further problem with SAS and I would really appreciate if someone could help me with this.

I have the following dataset:

data have;
input date $ product $ supermarket $ price $;
datalines;
01JAN01 1 x 1.00
01JAN01 1 y 1.20
01JAN01 2 x 2.00
01FEB01 1 x 3.00
01FEB01 1 b 3.50
01MAR01 1 y 1.00
01MAR01 2 z 1.50
01MAR01 3 z 2.00
;
run;

The dataset contains observations of the prices, some supermarkets have in a specific month for a specific product.

For example, the first row says, that in January, the price for product 1 in supermarket x was 1 Dollar.

Now I want to add an additional column, for example price_of_x which contains the price of supermarket x for the specific product AND the specific month, or missing if no price is available in the data set. This could look like this:

data want;
input date $ product $ supermarket $ price $ price_of_x $;
datalines;
01JAN01 1 x 1.00 1.00
01JAN01 1 y 1.20 1.00
01JAN01 2 x 2.00 2.00
01FEB01 1 x 3.00 3.00
01FEB01 1 b 3.50 3.00
01MAR01 1 y 1.00 .
01MAR01 2 z 1.50 .
01MAR01 3 z 2.00 .
;
run;

My attempt to do this was to create a subset of all observations of supermarket x and then to inner join it with the original dataset. Unfortunately, I have a really big dataset and need to do this join very often. Thus, I want to know if there is a more intelligent (in terms of performance) way to do this?

If it helps, in the next step I add 2 further columns agreement and disagreement. agreement gets the value 1 if the price is the same, otherwise its a disagreement. Maybe this could also be integrated in the solution 🙂

Thanks in advance 🙂

mrzlatan

Ksharp · Posted 09-08-2018 07:19 AM

data have;
input date $ product $ supermarket $ price $;
datalines;
01JAN01 1 x 1.00
01JAN01 1 y 1.20
01JAN01 2 x 2.00
01FEB01 1 x 3.00
01FEB01 1 b 3.50
01MAR01 1 y 1.00
01MAR01 2 z 1.50
01MAR01 3 z 2.00
;
run;
proc sort data=have;
by date product;
run;
data want;
 merge have have( rename=(supermarket=_s price=price_x) where=(_s='x'));
 by date product;
 drop _s;
run;

View solution in original post

Ksharp · Posted 09-08-2018 07:19 AM

data have;
input date $ product $ supermarket $ price $;
datalines;
01JAN01 1 x 1.00
01JAN01 1 y 1.20
01JAN01 2 x 2.00
01FEB01 1 x 3.00
01FEB01 1 b 3.50
01MAR01 1 y 1.00
01MAR01 2 z 1.50
01MAR01 3 z 2.00
;
run;
proc sort data=have;
by date product;
run;
data want;
 merge have have( rename=(supermarket=_s price=price_x) where=(_s='x'));
 by date product;
 drop _s;
run;

mrzlatan91 · Posted 09-08-2018 07:31 AM

Thank you very much, this is exactly what I was looking for

Can I replace this join?

Re: Can I replace this join?

Re: Can I replace this join?

Re: Can I replace this join?

Registration is open

Can I replace this join?

Re: Can I replace this join?

Re: Can I replace this join?

Re: Can I replace this join?

Registration is open

SAS Training: Just a Click Away