BookmarkSubscribeRSS Feed
johnhuang12
Calcite | Level 5

I have some consumer data that tells where the consumer shops the data has more rows with different consumer ids but for testing purposes I have included chunk of that data 

    ConsumerID  Retailer    Product_Code     Shopping_Date   

        1       Wallmart    12345            20090721

        1       Wallmart    12345            20090722

        1       Bestbuy     23456            20090801

        1       Bestbuy            23456            20090801 

        1       Bestbuy            23456            20090801

        1       Bestbuy            23456            20090801

        1       Frys            23444            20090908

       

Basically **if the retailer is not the same and product code is not the same the consumer has switched retailers and products** for example the first row and the second row have the same retailer so they **would not count in the new data set**. **The new data set that I want to create should contain fromretailer toretailer, if the date is the same then that is counted as one occasion**

So my new data should look like this

    ConsumerID  FromRetailer ToRetailer    FromDate        ToDate      FP

            1   Wallmart     BestBuy        20090722       20090801    1/4

            1   Wallmart     BestBuy        20090722       20090801    1/4

            1   Wallmart     BestBuy        20090722       20090801    1/4

            1   Wallmart     BestBuy        20090722       20090801    1/4

            1   Bestbuy      Frys           20090722       20090908    1/4

            1   Bestbuy      Frys           20090722       20090908    1/4

            1   Bestbuy      Frys           20090722       20090908    1/4

            1   Bestbuy      Frys           20090722       20090908    1/4

          

FP is basically how many times the person switched in this case they switched from one time to 4 times this can change the go from 2 times to 4 times which would make the FP 1/8

My main issue is how would I firstly compare the  first row with the second row then the next issue is for example if the second row has the same date then it is classified as one occasion so the consumer goes from wall mart to Best buy 4 times.

12 REPLIES 12
kuridisanjeev
Quartz | Level 8

Hello,

I have not understood your requirement completely.But some how i tried bellow code which compare first obs with second obs and also i have created a flag variable which shows value if customer shifted to one retails to another one.

Data Have;

input Cus_id Ret $ product_code Shoping_date anydtdte10.;

format Shoping_date date9.;

cards;

1 Wallmart 12345 20090721

1 Wallmart 12345 20090722

1 Bestbuy  23456 20090801

1 Bestbuy  23456 20090801

1 Bestbuy  23456 20090801

1 Bestbuy  23456 20090801

1 Frys     23444 20090908

;

run;

Proc Sort data=have;

by shoping_date ret product_code  ;

run;

Data new;

merge HAve have(rename=(Ret=re1 product_code=product_code1 Shoping_date=Shoping_date1) firstobs=2) ;

if ret ne ret1 and product_code ne product_code1 then flag=1;

run;

Proc print;

run;



johnhuang12
Calcite | Level 5

Thank you so much for the help but your code outputs Wallmart to Wallmart which it shouldnt because the retailer is the same and also the product code is the same. Secondly The buyer goes from Bestbuy to Frys this happens 4 times so basically any transaction that happens on the same date for example best buy is classified as one occasion.

So the output should be

  Wallmart BestBuy 20090722 20090801 1/4
  1 Wallmart BestBuy 20090722 20090801 1/4
  1 Wallmart BestBuy 20090722 20090801 1/4
  1 Wallmart BestBuy 20090722 20090801 1/4
  1 Bestbuy Frys 20090722 20090908 1/4
  1 Bestbuy Frys 20090722 20090908 1/4
  1 Bestbuy Frys 20090722 20090908 1/4
  1 Bestbuy Frys 20090722 20090908 1/4

johnhuang12
Calcite | Level 5

I think you are on the right track but the code you give me for some reason goes from bestbuy to bestbuy which should not happen because its the same retailer I swithc is when retaher and [product Id changes

UrvishShah
Fluorite | Level 6

Hi,

If you want to find the out how many times a particular customer is moving to different retailer for different products and for specific time interval, then you need to include all possible mevements...Here in sample data, there are 3 retailers so you need to find out following:

How Many times customer is moving from WALLMART to BESTBUY for different products

How Many times customer is moving from WALLMART to FRYS for different products

How Many times customer is moving from BESTBUY to FRYS for different products

Here is my try, don't know it meets your requirement...But based on the business logic you mentioned in your very first post, i prepered the SAS Code as follow:

proc sql;

   create table want as

   select a.consumer_id,a.retailer as from_retailer,

          b.retailer as to_retailer,a.product_code,

          a.shopping_date as from_date,b.shopping_date as to_date,

          count(a.consumer_id) as FP

   from have as a,have as b

   where a.Consumer_ID = b.Consumer_ID

   group by 1,2,3,4,5,6;

quit;

data want;

  set want;

  if from_retailer = to_retailer or

  from_date GT to_date then delete;

run;

proc sort data = want;

  by from_retailer to_retailer FP;

run;

data want;

  set want;

  by from_retailer to_retailer FP;

  if last.from_retailer or last.to_retailer and last.FP then output;

run;

proc sort data = want(drop = product_code);

   by from_date to_date FP;

run;

-Urvish

johnhuang12
Calcite | Level 5

You are close but the thing is we go row by row so that means we compare wallmart with bestbuy then bestbuy with frys we only compare rows when there is a switch so firt instance wallmart to wallmart there is no switch but second instance wallmart to bestbuy there is a switch then bestbuy to frys there is a switch so we dont compare wallmart to frys we go down the observations. Keeping in mind if the the consumer goes to the shop in the same date then thats one transcation so from wallmart to bestbuy its 1/4 cause 1 to 4 then from bestbuy to frys its still 1/4 because 4 times into 1 hope that helps

UrvishShah
Fluorite | Level 6


If you want row wise comparision (my output gives more insight) then you can simply eliminate the comparision which is not relevant for your business, from the final dataset...

johnhuang12
Calcite | Level 5

Thanks for much for the help you are output is more in detail. I am still confused because from your code I can compare the first row with the second row and then second row with the third row so fourth my issue is for example if we go from wallmart to bestbuy where bestbuy transactions occur on the same date how would I go about doing this in sas so essentially it should be

From           To

Wallmart     Bestbuy

Wallmart     Bestbuy

Wallmart     Bestbuy

Wallmart     Bestbuy

Bestbuy     Frys

Bestbuy     Frys

Bestbuy     Frys

Bestbuy     Frys

Haikuo
Onyx | Level 15

Probably like others on this forum, I am confused by your rules as well. For one, what is role of product_code? even though it has been mentioned several times, but I couldn't see how it fits into the whole picture. Therefore, I have to assume:

1. You only concern store to store switch, in term of chronically order.

2. The switching counts are determined by the maximum event counts in between two stores.

The following code is based on above assumptions, and it seems to give out what you have been asking for. Please elaborate in more details of your rules and exhaust all of the possibilities if this does not address your purpose.

Data Have;

input Cus_id Ret $ product_code Shoping_date anydtdte10.;

format Shoping_date date9.;

cards;

1 Wallmart 12345 20090721

1 Wallmart 12345 20090722

1 Bestbuy  23456 20090801

1 Bestbuy  23456 20090801

1 Bestbuy  23456 20090801

1 Bestbuy  23456 20090801

1 Frys     23444 20090908

1 alkj     2314 20091021

1 alkj     2314 20091021

;

/*Sort by Cus_id and Shopping date, acending*/

PROC SORT DATA=HAVE;

BY Cus_id SHOPING_DATE;

RUN;

data WANT;

KEEP Cus_id FROM_STORE TO_STORE FROM_DATE TO_DATE FP;

do _i=1 by 1 until (last.ret);

set have;

        by Cus_id Ret notsorted;

end;

length from_store to_store $ 20;

retain _from_count

         _to_count

         from_store

         to_store

         from_date

         to_date ;

format from_date

         to_date yymmdd10.;

_from_count=_to_count;

_to_count=_i;

from_store=to_store;

from_date=to_date;

to_store=ret;

to_date=Shoping_date;

_count=max(_to_count,_from_count);

fp=cats('1','/',_count);

if _n_>1 then do _i=1 to _count;

output;

end;

run;

quit;

Haikuo

Edit: Fixed: to sort by Cus_id and date.

johnhuang12
Calcite | Level 5

there are two switches we have store switch and a product switch so your output is right but for example if we are in wallmart and we product that has prduct code of 123456  and then on another we are at wallmart again ad we buy a product of product code 34566 then its a product switch so there are two types of switches store switch and product switch. Product_code is used to determine product_switch so if the product_code is different then we have product switch. Also there are more consumer_ids. Each household should not be compared with each other so when we hit a new household we dont compare the retailer of the previous household.

Haikuo
Onyx | Level 15

Please show some example regarding Product_code switch, both input and wanted output, it is still ambiguous to me how this switch works, especially how to count number of switches, help us help you.

Haikuo

johnhuang12
Calcite | Level 5

        1       Wallmart    12345            20090721

        1       Wallmart    23456            20090722

        1       Wallmart   23456            20090821

  

The output should be

Consumer_ID  From_Store   To_Store          From_Porduct    To_Product  FP

1                    Wallmart         Wallmart        12345                 23456        1

Explanation:

There are two types of switches Product switch and Store switch so the condition for a switch should be if retailer!=retailer2 or productCode!=productCode2 then its a switch (either product or store).

FP is calculated simply by  the switch count that is for example in the example above the consumer goes from wallmart to wallmart that is a one to one switch so therefore FP is one, but in the pervious example the person went from wallmart to besbuy which is 1/4 because they bought 4 items in best buy.

The Shopping_date is important because for examlple a person buys 4 items it becomes on shopping trip but when we display the data we still display 4 transactions the reason why I say it becomes one trip is so that we dont compare bestbuy with bestbuy as they occured on the same shopping_date.

So to sum up everything the data contains more than one consumer_id each consumer_id needs to be compared indicually so we group the Consumer_ID

then secondly we check if its a store switch or product switch if we have a store/product switch we compare the rows if the rows have the shooping date they are classified as one trip but we still display 4 transactions in the final output.

Sample Data

        1       Wallmart    12345            20090721

        1       Wallmart    23456            20090722

        1       Wallmart   23456            20090724

     1    Bestbuy  23456       20090801

     1    Bestbuy  23456       20090801

     1    Bestbuy  23456       20090801

     1    Bestbuy  23456       20090801

      1    Frys     3456        20090903

      2    Frys     12455       20090905

      2    Frys     3456        20090904

      2    Frys     3456        20090904


Output Data

Consumer_ID      From_Store     To_Store    From_Product   To_Product       From_Date        To_Date                                   FP   Type of Switch

1                        Wallmart         Wallmart    12345               23456              20090721              20090724                              1       Product_Switch

1                        Wallart            Bestbuy     23456              23456            20090724             20090801                  1/4  Store Switch

1                        Wallart            Bestbuy     23456              23456            20090724             20090801                  1/4  Store Switch

1                        Wallart            Bestbuy     23456              23456            20090724             20090801                  1/4  Store Switch

1                        Wallart            Bestbuy     23456              23456            20090724             20090801                  1/4  Store Switch

1                        Bestbuy           Frys          23456               3456              20090801        20090903                  1/4   Store Switch

1                        Bestbuy           Frys          23456               3456              20090801        20090903                  1/4   Store Switch

1                        Bestbuy           Frys          23456               3456              20090801        20090903                  1/4   Store Switch

1                        Bestbuy           Frys          23456               3456              20090801        20090903                  1/4   Store Switch

2              Frys         Frys     12455         3456        20090905        20090904                  1   Store_Switch



NOTES: Each Consumer's are treated different we dont compare the consumers transcations with anohter consumer we essentialy group consumers. I hope this helps we dont need type of switch I put it there for understanding

HBruun
Calcite | Level 5

Use the lagfunction to compare with next obs: newVar= lag1(varname);

Sample:

data allseg3 (compress=yes);                                             

set allseg2a;                                                        

if  lag1(team)   = team                                             

and lag1(vagtnr) = vagtnr                                           

and lag1(date)   = date                                             

and lag1(time)   = time                                             

then                                                                

;                                                                 

else output;                                                           

run;       

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 5908 views
  • 11 likes
  • 5 in conversation