turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Data Mining
- /
- Assigning weights to variables to calculate rank/s...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-20-2016 10:38 PM

I have data on customer purchase history. I want to score each of these customers based on the attributes. For this, I want to calculate the score by assigning weights to variables, (ex: 10% to v1, 20% to v2, 50% to v3 etc.,) and then sum up these weights. The resultant score should tell me how good a customer is. For instance, a score above 500 means they are good/loyal customers and we can expect good sales from them next time. While the threshold can be decided once we get a score, I want to know how I can approach this problem.

I decided to run PCA, from which I can get the PCA scores and hence use coefficients as weights.

For example, if I select the first principal component and take the coefficients,

y1=0.5v1+0.8v2-0.2v3 ,

replacing v1, v2 , v3 with the values of the attributes, I can get a score of each observation.

I am not sure if this is a clever approach. Is there a better way to optimize the weights and calculate the score of each customer? Any thoughts are appreciated.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-20-2016 11:31 PM

That is a clever approach.

But PCA is only applied for continuous variables.

And you also missed the second Primary Component, which maybe occupy very big variance of data.

Maybe you could includ these two primary component or three......

Suppose for the first PC,which occupy %60

y1=0.5v1+0.8v2-0.2v3 ,

Suppose for the second PC,which occupy %40

y2=0.5v1+0.8v2-0.2v3 ,

the final score maybe : Y=0.6*Y1+0.4*Y2 ?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-20-2016 11:53 PM

So this is an unsupervised learning problem?

You have no data to calibrate your model with?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-21-2016 12:12 AM

Yes @Reeza. It is unsupervised.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-20-2016 11:58 PM

Or you could use Log-Linear Model.

Check the documentation of PROC CATMOD

Example 32.4: Log-Linear Model, Three Dependent Variables

Note: remove the non-significant variables before applying your model.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-21-2016 12:31 AM

Look at proc varclus

Also, make sure to standardize variables. Otherwise larger variables take over.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-21-2016 03:22 AM - edited 10-21-2016 03:45 AM

@Reeza ,

Very good point . That make lots of sense.

Or you could check Possion Model.(which can take care both category and continuous variable)

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-25-2016 03:17 PM

Thanks, @Ksharp.

Looks like Possion Model works for supervised model. I don't have any target variable in my data , that is related to other variables.

I want each observation to get a weight based on the weights of other variables, exactly like your first answer-

"Suppose for the first PC,which occupy %60

y1=0.5v1+0.8v2-0.2v3 ,

Suppose for the second PC,which occupy %40

y2=0.5v1+0.8v2-0.2v3 ,

the final score maybe : Y=0.6*Y1+0.4*Y2 "

But here , Y is my each observation, and X are my variables, coefficients being the weights.

X's are both categorical and continuous.

Looks like Possion Model works for supervised model. I don't have any target variable in my data , that is related to other variables.

I want each observation to get a weight based on the weights of other variables, exactly like your first answer-

"Suppose for the first PC,which occupy %60

y1=0.5v1+0.8v2-0.2v3 ,

Suppose for the second PC,which occupy %40

y2=0.5v1+0.8v2-0.2v3 ,

the final score maybe : Y=0.6*Y1+0.4*Y2 "

But here , Y is my each observation, and X are my variables, coefficients being the weights.

X's are both categorical and continuous.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-25-2016 10:06 PM

Do you take a look at PRINQUAL Procedure ?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-25-2016 03:01 PM

@Reeza,

Is there any way we could use proc varclus for all types of variables?

I looked at the documentation and it says it takes all the numerical values by default.

My dataset has both categorical and continuous variables. Also some of the categorical variables coded as 1,0.

Is there any way we could use proc varclus for all types of variables?

I looked at the documentation and it says it takes all the numerical values by default.

My dataset has both categorical and continuous variables. Also some of the categorical variables coded as 1,0.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-21-2016 03:35 AM

Or Check this:

Overview: PRINQUAL Procedure

The PRINQUAL procedure performs principal component analysis (PCA) of qualitative, quantitative, or

mixed data. PROC PRINQUAL is based on the work of Kruskal and Shepard (1974); Young, Takane, and