Hi All,
I have a dataset of store IDs with its performance related to Customer enrollment.
I am running a reward program for around 150 stores for getting customer enrollment with proper customer data.
Stores are encouraged to do customer enrollment for loyalty program but many stores didn't collect all required details, such as Mobile no, Email id, communication preferrence , gender, date of birth.
a sample data row looks like this--
Store id- 0001
Count of Enrollment- 1200
% of customer Mobile number present= 85%
%of Email present=90%
%of dob mentioned= 82%
% of gender mentioned= 80%
%of communication preference mentioned= 70%
total record count (/stores) = 150
so, we have to decide on what weightage to give on each variable and then come up with a solid logic to rank stores.Finally top10 and bottom 10 leader board is intended.
What analysis, logic/ statistical procedures should I be using?
Thanks for reading this. please reply.
I tried Factor analysis, but its not giving any conclusive result. can you elaborate how you meant to use it.
I have used-- enrollment and other variables in absolute term.
Did you start with basic linear regression and imputing the missing with the mean or random variables to get a distribution around your estimate? That's probably a good starting point to get you a baseline.
You will need to standardize your data, in particular enrollment needs to be standardized to include the population values. A store in NY will by default sign up more people than one in Kentucky. So you need to make some decisions on that.
Then to account for missing or to score those as zero. I lean towards scoring them as zero because that's a nudge to the those teams to increase their data quality BUT if people are less likely to give up information in certain stores it seems wrong to penalize stores. Also, different jurisdictions could have different rules around what you can collect - no idea of the where your stores are but basically, context of the problem does matter. Would you still penalize stores for missing in these cases?
After linear regression I would probably try PROC PLS next but its a bit more complex, so if the accuracy isn't there I'd pick the simpler model.
@Picanion wrote:
I tried Factor analysis, but its not giving any conclusive result. can you elaborate how you meant to use it.
I have used-- enrollment and other variables in absolute term.
Hi All,
I have a dataset of store IDs with its performance related to Customer enrollment.
I am running a reward program for around 150 stores for getting customer enrollment with proper customer data.
Stores are encouraged to do customer enrollment for loyalty program but many stores didn't collect all required details, such as Mobile no, Email id, communication preferrence, gender, date of birth.
a sample data row looks like this--
Store id- 0001
Count of Enrollment- 1200
% of customer Mobile number present= 85%
%of Email present=90%
%of dob mentioned= 82%
% of gender mentioned= 80%
%of communication preference mentioned= 70%
total record count (/stores) = 150
so, we have to decide on what weightage to give on each variable and then come up with a solid logic to rank stores.Finally top10 and bottom 10 leader board is intended.
What analysis, logic/ statistical tool should I be using?
(please suggest/refer code/logic implementable in python or R studio).
Thanks for reading this. please reply.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.