- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have some consumer data that tells where the consumer shops the data has more rows with different consumer ids but for testing purposes I have included chunk of that data
ConsumerID Retailer Product_Code Shopping_Date
1 Wallmart 12345 20090721
1 Wallmart 12345 20090722
1 Bestbuy 23456 20090801
1 Bestbuy 23456 20090801
1 Bestbuy 23456 20090801
1 Bestbuy 23456 20090801
1 Frys 23444 20090908
Basically **if the retailer is not the same and product code is not the same the consumer has switched retailers and products** for example the first row and the second row have the same retailer so they **would not count in the new data set**. **The new data set that I want to create should contain fromretailer toretailer, if the date is the same then that is counted as one occasion**
So my new data should look like this
ConsumerID FromRetailer ToRetailer FromDate ToDate FP
1 Wallmart BestBuy 20090722 20090801 1/4
1 Wallmart BestBuy 20090722 20090801 1/4
1 Wallmart BestBuy 20090722 20090801 1/4
1 Wallmart BestBuy 20090722 20090801 1/4
1 Bestbuy Frys 20090722 20090908 1/4
1 Bestbuy Frys 20090722 20090908 1/4
1 Bestbuy Frys 20090722 20090908 1/4
1 Bestbuy Frys 20090722 20090908 1/4
FP is basically how many times the person switched in this case they switched from one time to 4 times this can change the go from 2 times to 4 times which would make the FP 1/8
My main issue is how would I firstly compare the first row with the second row then the next issue is for example if the second row has the same date then it is classified as one occasion so the consumer goes from wall mart to Best buy 4 times.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have not understood your requirement completely.But some how i tried bellow code which compare first obs with second obs and also i have created a flag variable which shows value if customer shifted to one retails to another one.
Data Have;
input Cus_id Ret $ product_code Shoping_date anydtdte10.;
format Shoping_date date9.;
cards;
1 Wallmart 12345 20090721
1 Wallmart 12345 20090722
1 Bestbuy 23456 20090801
1 Bestbuy 23456 20090801
1 Bestbuy 23456 20090801
1 Bestbuy 23456 20090801
1 Frys 23444 20090908
;
run;
Proc Sort data=have;
by shoping_date ret product_code ;
run;
Data new;
merge HAve have(rename=(Ret=re1 product_code=product_code1 Shoping_date=Shoping_date1) firstobs=2) ;
if ret ne ret1 and product_code ne product_code1 then flag=1;
run;
Proc print;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you so much for the help but your code outputs Wallmart to Wallmart which it shouldnt because the retailer is the same and also the product code is the same. Secondly The buyer goes from Bestbuy to Frys this happens 4 times so basically any transaction that happens on the same date for example best buy is classified as one occasion.
So the output should be
Wallmart BestBuy 20090722 20090801 1/4
1 Wallmart BestBuy 20090722 20090801 1/4
1 Wallmart BestBuy 20090722 20090801 1/4
1 Wallmart BestBuy 20090722 20090801 1/4
1 Bestbuy Frys 20090722 20090908 1/4
1 Bestbuy Frys 20090722 20090908 1/4
1 Bestbuy Frys 20090722 20090908 1/4
1 Bestbuy Frys 20090722 20090908 1/4
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think you are on the right track but the code you give me for some reason goes from bestbuy to bestbuy which should not happen because its the same retailer I swithc is when retaher and [product Id changes
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
If you want to find the out how many times a particular customer is moving to different retailer for different products and for specific time interval, then you need to include all possible mevements...Here in sample data, there are 3 retailers so you need to find out following:
How Many times customer is moving from WALLMART to BESTBUY for different products
How Many times customer is moving from WALLMART to FRYS for different products
How Many times customer is moving from BESTBUY to FRYS for different products
Here is my try, don't know it meets your requirement...But based on the business logic you mentioned in your very first post, i prepered the SAS Code as follow:
proc sql;
create table want as
select a.consumer_id,a.retailer as from_retailer,
b.retailer as to_retailer,a.product_code,
a.shopping_date as from_date,b.shopping_date as to_date,
count(a.consumer_id) as FP
from have as a,have as b
where a.Consumer_ID = b.Consumer_ID
group by 1,2,3,4,5,6;
quit;
data want;
set want;
if from_retailer = to_retailer or
from_date GT to_date then delete;
run;
proc sort data = want;
by from_retailer to_retailer FP;
run;
data want;
set want;
by from_retailer to_retailer FP;
if last.from_retailer or last.to_retailer and last.FP then output;
run;
proc sort data = want(drop = product_code);
by from_date to_date FP;
run;
-Urvish
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You are close but the thing is we go row by row so that means we compare wallmart with bestbuy then bestbuy with frys we only compare rows when there is a switch so firt instance wallmart to wallmart there is no switch but second instance wallmart to bestbuy there is a switch then bestbuy to frys there is a switch so we dont compare wallmart to frys we go down the observations. Keeping in mind if the the consumer goes to the shop in the same date then thats one transcation so from wallmart to bestbuy its 1/4 cause 1 to 4 then from bestbuy to frys its still 1/4 because 4 times into 1 hope that helps
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you want row wise comparision (my output gives more insight) then you can simply eliminate the comparision which is not relevant for your business, from the final dataset...
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for much for the help you are output is more in detail. I am still confused because from your code I can compare the first row with the second row and then second row with the third row so fourth my issue is for example if we go from wallmart to bestbuy where bestbuy transactions occur on the same date how would I go about doing this in sas so essentially it should be
From To
Wallmart Bestbuy
Wallmart Bestbuy
Wallmart Bestbuy
Wallmart Bestbuy
Bestbuy Frys
Bestbuy Frys
Bestbuy Frys
Bestbuy Frys
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Probably like others on this forum, I am confused by your rules as well. For one, what is role of product_code? even though it has been mentioned several times, but I couldn't see how it fits into the whole picture. Therefore, I have to assume:
1. You only concern store to store switch, in term of chronically order.
2. The switching counts are determined by the maximum event counts in between two stores.
The following code is based on above assumptions, and it seems to give out what you have been asking for. Please elaborate in more details of your rules and exhaust all of the possibilities if this does not address your purpose.
Data Have;
input Cus_id Ret $ product_code Shoping_date anydtdte10.;
format Shoping_date date9.;
cards;
1 Wallmart 12345 20090721
1 Wallmart 12345 20090722
1 Bestbuy 23456 20090801
1 Bestbuy 23456 20090801
1 Bestbuy 23456 20090801
1 Bestbuy 23456 20090801
1 Frys 23444 20090908
1 alkj 2314 20091021
1 alkj 2314 20091021
;
/*Sort by Cus_id and Shopping date, acending*/
PROC SORT DATA=HAVE;
BY Cus_id SHOPING_DATE;
RUN;
data WANT;
KEEP Cus_id FROM_STORE TO_STORE FROM_DATE TO_DATE FP;
do _i=1 by 1 until (last.ret);
set have;
by Cus_id Ret notsorted;
end;
length from_store to_store $ 20;
retain _from_count
_to_count
from_store
to_store
from_date
to_date ;
format from_date
to_date yymmdd10.;
_from_count=_to_count;
_to_count=_i;
from_store=to_store;
from_date=to_date;
to_store=ret;
to_date=Shoping_date;
_count=max(_to_count,_from_count);
fp=cats('1','/',_count);
if _n_>1 then do _i=1 to _count;
output;
end;
run;
quit;
Haikuo
Edit: Fixed: to sort by Cus_id and date.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
there are two switches we have store switch and a product switch so your output is right but for example if we are in wallmart and we product that has prduct code of 123456 and then on another we are at wallmart again ad we buy a product of product code 34566 then its a product switch so there are two types of switches store switch and product switch. Product_code is used to determine product_switch so if the product_code is different then we have product switch. Also there are more consumer_ids. Each household should not be compared with each other so when we hit a new household we dont compare the retailer of the previous household.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please show some example regarding Product_code switch, both input and wanted output, it is still ambiguous to me how this switch works, especially how to count number of switches, help us help you.
Haikuo
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
1 Wallmart 12345 20090721
1 Wallmart 23456 20090722
1 Wallmart 23456 20090821
The output should be
Consumer_ID From_Store To_Store From_Porduct To_Product FP
1 Wallmart Wallmart 12345 23456 1
Explanation:
There are two types of switches Product switch and Store switch so the condition for a switch should be if retailer!=retailer2 or productCode!=productCode2 then its a switch (either product or store).
FP is calculated simply by the switch count that is for example in the example above the consumer goes from wallmart to wallmart that is a one to one switch so therefore FP is one, but in the pervious example the person went from wallmart to besbuy which is 1/4 because they bought 4 items in best buy.
The Shopping_date is important because for examlple a person buys 4 items it becomes on shopping trip but when we display the data we still display 4 transactions the reason why I say it becomes one trip is so that we dont compare bestbuy with bestbuy as they occured on the same shopping_date.
So to sum up everything the data contains more than one consumer_id each consumer_id needs to be compared indicually so we group the Consumer_ID
then secondly we check if its a store switch or product switch if we have a store/product switch we compare the rows if the rows have the shooping date they are classified as one trip but we still display 4 transactions in the final output.
Sample Data
1 Wallmart 12345 20090721
1 Wallmart 23456 20090722
1 Wallmart 23456 20090724
1 Bestbuy 23456 20090801
1 Bestbuy 23456 20090801
1 Bestbuy 23456 20090801
1 Bestbuy 23456 20090801
1 Frys 3456 20090903
2 Frys 12455 20090905
2 Frys 3456 20090904
2 Frys 3456 20090904
Output Data
Consumer_ID From_Store To_Store From_Product To_Product From_Date To_Date FP Type of Switch
1 Wallmart Wallmart 12345 23456 20090721 20090724 1 Product_Switch
1 Wallart Bestbuy 23456 23456 20090724 20090801 1/4 Store Switch
1 Wallart Bestbuy 23456 23456 20090724 20090801 1/4 Store Switch
1 Wallart Bestbuy 23456 23456 20090724 20090801 1/4 Store Switch
1 Wallart Bestbuy 23456 23456 20090724 20090801 1/4 Store Switch
1 Bestbuy Frys 23456 3456 20090801 20090903 1/4 Store Switch
1 Bestbuy Frys 23456 3456 20090801 20090903 1/4 Store Switch
1 Bestbuy Frys 23456 3456 20090801 20090903 1/4 Store Switch
1 Bestbuy Frys 23456 3456 20090801 20090903 1/4 Store Switch
2 Frys Frys 12455 3456 20090905 20090904 1 Store_Switch
NOTES: Each Consumer's are treated different we dont compare the consumers transcations with anohter consumer we essentialy group consumers. I hope this helps we dont need type of switch I put it there for understanding
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Use the lagfunction to compare with next obs: newVar= lag1(varname);
Sample:
data allseg3 (compress=yes);
set allseg2a;
if lag1(team) = team
and lag1(vagtnr) = vagtnr
and lag1(date) = date
and lag1(time) = time
then
;
else output;
run;