Dears,
I have data for customer visits in mall which the first column is unique user identifier. The second column if indices of a
day when customer have visited a mall. Enumeration is started from some fixed day; the day
with index 1 is Monday (e. g. 7th is Sunday, 8th is again Monday). Indices are in the range from
1 to 1001 which equals to 143 full weeks. You are to predict the first day of the next (144th)
week when user will visit a mall. For example, if a user will visit a mall on Wednesday of the next
week, then your prediction should be equal to 3. So for each user, you need to predict a
number:
● 0: no visit on the next week
● 1: Monday
● 2: Tuesday
● 3: Wednesday
● 4: Thursday
● 5: Friday
● 6: Saturday
● 7: Sunday
the sample of data is:
User_id, Visits
1,30 84 126 135 137 179 242 342 426 456 460 462 483 594 600 604 704 723 744 787 804 886 924 928 946 954 
2,24 53 75 134 158 192 194 211 213 238 251 305 404 418 458 476 493 571 619 731 739 759 761 847 883 943 962 981 983
and I have the actual visits as
user_id, Next_visit_day
1,3
2,4
How to resolve the issue in order to get the best accurate model, which nodes or techniques to handle this type of problem.
I appreciate your support
Your problem represents a complex problem which really combines multiple underlying business questions, so you are unlikely to find a broadly used approach to solving the problem. You can analyze the data in several ways, however, and likely come up with some meaningful insights. For example, your problem seems to be a combination of two different more common problems:
1 - A reliability/survival type problem which tries to identify the number of weeks until a customer is likely to visit the mall and/or likelihood of visiting in the next week
2 - A multiclass problem that predicts which day of the week the customer is most likely to visit (if they visit at all)
I suspect that you will need to be creative in how you organize your data and how you combine the results of the two models. For example, in your sample data you listed the first visit as day 24 for customer 1 but day 30 for customer 2. These are likely in different weeks, but is it better to try and align the weeks so that week 1 corresponds to the first week of visit even if it represents different calendar weeks for different customers? This has the benefit of making our "time till event" predictions meaningful since they are all conditioned off the first recorded visit but we lose the ability to identify any potential seasonal patterns (e.g. people might visit more frequently between Thanksgiving and Christmas). You will likely find it beneficial to arrange the data differently for the different aspects of this problem, and you will then need to combine the results creatively to provide a prediction that is better than just guessing at random.
Hope this helps!
Doug
Your problem represents a complex problem which really combines multiple underlying business questions, so you are unlikely to find a broadly used approach to solving the problem. You can analyze the data in several ways, however, and likely come up with some meaningful insights. For example, your problem seems to be a combination of two different more common problems:
1 - A reliability/survival type problem which tries to identify the number of weeks until a customer is likely to visit the mall and/or likelihood of visiting in the next week
2 - A multiclass problem that predicts which day of the week the customer is most likely to visit (if they visit at all)
I suspect that you will need to be creative in how you organize your data and how you combine the results of the two models. For example, in your sample data you listed the first visit as day 24 for customer 1 but day 30 for customer 2. These are likely in different weeks, but is it better to try and align the weeks so that week 1 corresponds to the first week of visit even if it represents different calendar weeks for different customers? This has the benefit of making our "time till event" predictions meaningful since they are all conditioned off the first recorded visit but we lose the ability to identify any potential seasonal patterns (e.g. people might visit more frequently between Thanksgiving and Christmas). You will likely find it beneficial to arrange the data differently for the different aspects of this problem, and you will then need to combine the results creatively to provide a prediction that is better than just guessing at random.
Hope this helps!
Doug
Hi
Were you able to solve this?
I have similar problem
Hey Can you plz send us working code in python
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.
