Fluorite | Level 6

## Which Statistical method to use to create top 10 leader board for store or merchant performance

Hi All,

I have a dataset of store IDs with its performance related to Customer enrollment.

I am running a reward program for around 150 stores for getting customer enrollment with  proper customer data.

Stores are encouraged to do customer enrollment for loyalty program but many stores didn't collect all required details, such as Mobile no, Email id, communication preferrence , gender, date of birth.

a sample data row looks like this--

Store id- 0001

Count of Enrollment- 1200

% of customer Mobile number present= 85%

%of Email present=90%

%of dob mentioned= 82%

% of gender mentioned= 80%

%of communication preference mentioned= 70%

total record count (/stores) = 150

so, we have to decide on what weightage to give on each variable and then come up with a solid logic to rank stores.Finally top10 and bottom 10 leader board is intended.

What analysis, logic/ statistical procedures should I be using?

4 REPLIES 4
Super User

## Re: Which Statistical method to use to create top 10 leader board for store or merchant performance

PCA usually to determine weights.
Fluorite | Level 6

## Re: Which Statistical method to use to create top 10 leader board for store or merchant performance

I tried Factor analysis, but its not giving any conclusive result. can you elaborate how you meant to use it.

I have used-- enrollment and other variables in absolute term.

Super User

## Re: Which Statistical method to use to create top 10 leader board for store or merchant performance

Did you start with basic linear regression and imputing the missing with the mean or random variables to get a distribution around your estimate? That's probably a good starting point to get you a baseline.

You will need to standardize your data, in particular enrollment needs to be standardized to include the population values. A store in NY will by default sign up more people than one in Kentucky. So you need to make some decisions on that.

Then to account for missing or to score those as zero. I lean towards scoring them as zero because that's a nudge to the those teams to increase their data quality BUT if people are less likely to give up information in certain stores it seems wrong to penalize stores. Also, different jurisdictions could have different rules around what you can collect - no idea of the where your stores are but basically, context of the problem does matter. Would you still penalize stores for missing in these cases?

After linear regression I would probably try PROC PLS next but its a bit more complex, so if the accuracy isn't there I'd pick the simpler model.

@Picanion wrote:

I tried Factor analysis, but its not giving any conclusive result. can you elaborate how you meant to use it.

I have used-- enrollment and other variables in absolute term.

Fluorite | Level 6

## How to create top 10 leader board for store or merchant performance

Hi All,

I have a dataset of store IDs with its performance related to Customer enrollment.

I am running a reward program for around 150 stores for getting customer enrollment with proper customer data.

Stores are encouraged to do customer enrollment for loyalty program but many stores didn't collect all required details, such as Mobile no, Email id, communication preferrence, gender, date of birth.

a sample data row looks like this--

Store id- 0001

Count of Enrollment- 1200

% of customer Mobile number present= 85%

%of Email present=90%

%of dob mentioned= 82%

% of gender mentioned= 80%

%of communication preference mentioned= 70%

total record count (/stores) = 150

so, we have to decide on what weightage to give on each variable and then come up with a solid logic to rank stores.Finally top10 and bottom 10 leader board is intended.

What analysis, logic/ statistical tool should I be using?

(please suggest/refer code/logic implementable in python or R studio).