turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Data Normalization

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-10-2015 01:13 PM

I have a problem that is challenging to me about fitting coefficient parameter. Here is the problem data (simple example)

Customers | Demand | Price |
---|---|---|

A | 50 | 3.3 |

A | 60 | 3.1 |

A | 40 | 3.4 |

B | 300 | 2.6 |

B | 250 | 2.7 |

B | 350 | 2.35 |

The goal is to fit the coefficient parameter (A) about how demand responds to price changes. That is, Demand = C + A*Price + Error.

My point is that, such data cannot be directly used in fitting A, because there are big differences in bases of customer needs and pricing strategy differences for various customers. Herein, such two variations need to be removed by NORMALIZATION before we do regression. The normalization procedure means to put demand and price into the same level. After normalization, the data becomes as below:

Customers | Normalized Demand | Normalized Price |
---|---|---|

A | 50*(175/50) = 175 | 2.94 |

A | 60*(175/50) = 210 | 2.76 |

A | 40*(175/50) = 140 | 3.03 |

B | 300*(175/300) = 175 | 2.97 |

B | 250*(175/300) = 146 | 3.08 |

B | 350*(175/300) = 204 | 2.68 |

In the table, 175 is the overall demand average.

The purpose of the normalization procedure is to remove the variations existed in customer need bases and pricing differences for different customers. However, it still well preserves the sensitivity of prices on demand for each customer, and put them into the same level. We can then do parameter A fitting after normalization.

My question to you is that: Do you think whether this method is statistically valid? If not, do you know any existing statistical method to handle such data issue? Appreciate your help for this.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-10-2015 01:48 PM

This obliterates any pre-existing differences in price due to customer group. Why not fit separate slopes for each group, e.g.

Demand = intercept + beta1 * price * (indicator for customer group) + error.

This is pretty commonly done in PROC GLM.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-10-2015 02:26 PM

We assume that all customers share the same price sensitivity, A is the same for all customers.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-10-2015 02:30 PM

Then what is the meaning of 'A' and 'B' in the customers column? If all customers have the characteristic A, there is certainly no need to be adjusting that I can readily see.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-10-2015 02:37 PM

Sorry for mis-presenting the table. Coefficient A is different from Customer A listed in the table. We should change the A and B in the table to Customer_A and Customer_B.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-10-2015 02:45 PM

So I assume that you have many customers, and you want a common estimate of the slope. That sounds like a random slopes regression to me. Something like:

proc mixed data=yourdata;

class customer;

model demand=price;

random intercept price/subject=customer;

run;

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-10-2015 02:47 PM

Do you think whether this method is statistically valid?

I agree with Steve, and if you wanted to do this in PROC GLM, you could force the slopes to be the same for each group (or not), it's up to you.

But the question isn't "is this method statistically valid"? For you to be using this method, you would have to be able to justify it based upon subject matter reasoning and knowledge (not statistical reasoning). You need to talk to the subject matter experts to see if they think this is reasonable.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-10-2015 05:48 PM

In PROC GLM, can we set the intercept to be different for different groups?

Another way I have been thinking is to add a dummy variable for different groups. Can the dummy variable account for the base differences for different groups?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-13-2015 09:10 AM

In GLM, by putting the group variable in the class statement, it is setting different intercepts for each level of the group variable. Use the NOINT option to get these estimates for each group, rather than the estimates as deviations from the overall intercept by group level. Inclusion of the group variable in the CLASS statement automatically codes dummy variables for the different groups, so there would be no need to add constructed dummy variables.

Steve Denham