Grouping rows and then comparing them with pervious row and next row.

Reply
Contributor
Posts: 20

Grouping rows and then comparing them with pervious row and next row.

Hi,

For the last month I have been working on this project which deals with US consumer data that shows if the consumer did a store switch or a product switch. If the there is a store switch then the stores are not equal if there is a product switch then the barcodes are not equal. I have included the input data and the desired output data.

Input

Consumer_IDShopping_DateBarcodeStore
2212/09/2007123Macy's
2212/09/2007123Macy's
2213/09/2007124Macy's
2214/09/2007256Jc Penny
2214/09/2007256Jc Penny
2214/09/2007256Jc Penny
2215/09/2007356Wallmart
2215/09/2007356Wallmart
2411/09/2007456Wallmart
2412/09/2007145

Wallmart

Desired output

From_StoreTo_StoreFrom_BarcodeTo_BarcodeFrom_StoreProduct SwitchStore SwitchConsumer_ID
Macy'sMacy's1231241 to 1YesNo22
Macy'sJc Penny1242561 to 3YesYes22
Macy'sJc Penny1242562 to 3YesYes22
Macy'sJc Penny1242563 to 3YesYes22
Jc PennyWallmart2563563 to 2YesYes22
Jc PennyWallmart2563563 to 2YesYes22
Jc PennyWallmart2563563 to 2YesYes22
Jc PennyWallmart2563563 to 2YesYes22
Jc PennyWallmart2563563 to 2YesYes22
Jc PennyWallmart2563563 to 2YesYes22
WallmartWallmart4561451 to 1YesYes24

Some details:

We need to show From and To meaning what the consumer went from and what the consumer ended up with. If the transaction occurs on the same date for example the JC penny we group these transactions into one and compare it with pervious transaction or the next transaction. If we hit a new consumer_id we dont compare the pervous consumers data.

I have looked into retain lag but cant write up any code to get that desired output

More explanation

IF store are not equal then store switch

IF bar codes are not equal then product switch

IF we have multiple transactions that occur on the same date we group them and compare them with the previous or next one

Also A person can visit multiple stores on the same date so they can visit JcPenny and Wallmart on the same data, its the same we just group them and treat them as one transaction when comparing.

Pls help guys really really stuck

Respected Advisor
Posts: 4,653

Re: Grouping rows and then comparing them with pervious row and next row.

Wouldn't this be enough?

data have;

Input ID SDate :ddmmyy10. Barcode Store $&;

format SDate ddmmyy10.;

datalines;

22 12/09/2007 123 Macy's

22 12/09/2007 123 Macy's

22 13/09/2007 124 Macy's

22 14/09/2007 256 Jc Penny

22 14/09/2007 256 Jc Penny

22 14/09/2007 256 Jc Penny

22 15/09/2007 356 Wallmart

22 15/09/2007 356 Wallmart

24 11/09/2007 456 Wallmart

24 12/09/2007 145 Wallmart

;

proc sql;

create table havec as

select ID, SDate, store, barcode, count(*) as n

from have

group by ID, SDate, store, barcode;

quit;

data want(keep=ID from: to: switchSmiley Happy;

set havec; by ID;

fromStore = lag(store);

fromBarcode = lag(Barcode);

fromN = lag(n);

if not first.ID then do;

  toStore = store;

  toBarcode = barcode;

  toN = n;

  switchProduct = fromBarcode ne toBarcode;

  switchStore = fromStore ne toStore;

  output;

  end;

run;

PG

PG
Contributor
Posts: 20

Re: Grouping rows and then comparing them with pervious row and next row.

Hi,

your code produces the right output but it does not produce the actual rows. for example JcPenny should have 3 rows I think you just make them into one row. Also does your code account for more Consumer_IDS.

Regards

Respected Advisor
Posts: 4,653

Re: Grouping rows and then comparing them with pervious row and next row.

I don't see why you want repeats in your output and if you want to repeat fromN times or toN times (or the product of the two). Here is how to do it with fromN repeats:

data have;

Input ID SDate :ddmmyy10. Barcode Store $&;

format SDate ddmmyy10.;

datalines;

22 12/09/2007 123 Macy's

22 12/09/2007 123 Macy's

22 13/09/2007 124 Macy's

22 14/09/2007 256 Jc Penny

22 14/09/2007 256 Jc Penny

22 14/09/2007 256 Jc Penny

22 15/09/2007 356 Wallmart

22 15/09/2007 356 Wallmart

24 11/09/2007 456 Wallmart

24 12/09/2007 145 Wallmart

;

proc sql;

create table havec as

select ID, SDate, store, barcode, count(*) as n

from have

group by ID, store, barcode, SDate;

quit;

data want(keep=ID from: to: switchSmiley Happy;

set havec; by ID;

fromStore = lag(store);

fromBarcode = lag(Barcode);

fromN = lag(n);

if not first.ID then do;

  toStore = store;

  toBarcode = barcode;

  toN = n;

  switchProduct = fromBarcode ne toBarcode;

  switchStore = fromStore ne toStore;

  if switchStore or switchProduct then do;

  do i = 1 to fromN;

  output;

  end;

  end;

  end;

run;

PG

PG
Contributor
Posts: 20

Re: Grouping rows and then comparing them with pervious row and next row.

Thank you so much there is one issue your code works but it does not output the like the output I have what I mean by that is JcPenny To Macy's is first it should be Macy's to Mac'ys then Macy to Jcpenny and so on.

Contributor
Posts: 20

Re: Grouping rows and then comparing them with pervious row and next row.

Also your code does not account for JcPenny to Wallmart

Super Contributor
Posts: 307

Re: Grouping rows and then comparing them with pervious row and next row.

Does this question serve any real-world purpose, or is this simply an academic exercise (e.g. project for a course of study)?

Without a standardized time component, you do not know necessarily know the order of events on a given day UNLESS the data is inherently ordered by some (unseen) time variable. As it is, you do not know if person A went to store A first (or store B) if it occurred on the same day. The observations APPEAR to be ordered, but that may not be the case. Similarly, without a time variable or implict time ordering, you do not know if the person purchased product X or product Y first.

You seem to propose that you will collapse all events on the same day for the same person at the same store (so that there is only one observation per day):

* will there be a unique crossing for each person-store-date-product combination, or just person-store-date?

* depending on the above, you will then have to decide how to order the data (since a person may be going back and forth several times between stores, sometimes buying the same product, sometimes not). Which "event" comes before other events once you "collapse" the observations?

I'm also not sure what you are trying to show in the fifth column of your suggested output (e.g. 1 to 3, 2 to 3, 3 to 2). Do these represent store numbers?

Contributor
Posts: 20

Re: Grouping rows and then comparing them with pervious row and next row.

Hi this is a self study thing I really want to learn different aspects about SAS. The 5th column represents how many transaction to how many transactions mening wallmart to wallmart the person switches product so its different product and therefore its a one to one switch but the next row there is a 1 to 3 switch. There are three simples rules to get to the output they are:

1)If different bardcode then product switch

2) If different store then store switch

3) if both different bardcode and store then both store and bardcode switch

4) If the transactions occur on the same date they are treated as one to compare with pervvious and the row after

4)There are more Consumer_IDS and aslo you cant compare consumer data meaning you cant compare pervious consumer with the next consumer,.

Hope that helps. I know this seems very un logical I am doing this as a part of study., The desired output cannot change.

Ask a Question
Discussion stats
  • 7 replies
  • 280 views
  • 7 likes
  • 3 in conversation