Programming the statistical procedures from SAS

Data Normalization

Reply
Contributor
Posts: 52

Data Normalization

I have a problem that is challenging to me about fitting coefficient parameter.  Here is the problem data (simple example)

CustomersDemandPrice
A503.3
A603.1
A403.4
B3002.6
B2502.7
B3502.35

The goal is to fit the coefficient parameter (A) about how demand responds to price changes. That is, Demand = C + A*Price + Error.

My point is that, such data cannot be directly used in fitting A, because there are big differences in bases of customer needs and pricing strategy differences for various customers. Herein, such two variations need to be removed by NORMALIZATION before we do regression. The normalization procedure means to put demand and price into the same level.  After normalization, the data becomes as below:

CustomersNormalized DemandNormalized Price
A50*(175/50) = 1752.94
A60*(175/50) = 2102.76
A40*(175/50) = 1403.03
B300*(175/300) = 1752.97
B250*(175/300) = 1463.08
B350*(175/300) = 2042.68

In the table, 175 is the overall demand average.

The purpose of the normalization procedure is to remove the variations existed in customer need bases and pricing differences for different customers. However, it still well preserves the sensitivity of prices on demand for each customer, and put them into the same level.  We can then do parameter A fitting after normalization.

My question to you is that:  Do you think whether this method is statistically valid?  If not, do you know any existing statistical method to handle such data issue?  Appreciate your help for this.

Respected Advisor
Posts: 2,655

Re: Data Normalization

This obliterates any pre-existing differences in price due to customer group.  Why not fit separate slopes for each group, e.g.

Demand = intercept + beta1 * price * (indicator for customer group) + error.

This is pretty commonly done in PROC GLM.

Steve Denham

Contributor
Posts: 52

Re: Data Normalization

We assume that all customers share the same price sensitivity, A is the same for all customers.

Respected Advisor
Posts: 2,655

Re: Data Normalization

Then what is the meaning of 'A' and 'B' in the customers column?  If all customers have the characteristic A, there is certainly no need to be adjusting that I can readily see.

Steve Denham

Contributor
Posts: 52

Re: Data Normalization


Sorry for mis-presenting the table. Coefficient A is different from Customer A listed in the table.  We should change the A and B in the table to Customer_A and Customer_B.

Respected Advisor
Posts: 2,655

Re: Data Normalization

So I assume that you have many customers, and you want a common estimate of the slope.  That sounds like a random slopes regression to me.  Something like:

proc mixed data=yourdata;

class customer;

model demand=price;

random intercept price/subject=customer;

run;

Steve Denham

Trusted Advisor
Posts: 1,409

Re: Data Normalization

Do you think whether this method is statistically valid?

I agree with Steve, and if you wanted to do this in PROC GLM, you could force the slopes to be the same for each group (or not), it's up to you.

But the question isn't "is this method statistically valid"? For you to be using this method, you would have to be able to justify it based upon subject matter reasoning and knowledge (not statistical reasoning). You need to talk to the subject matter experts to see if they think this is reasonable.

Contributor
Posts: 52

Re: Data Normalization

In PROC GLM, can we set the intercept to be different for different groups?

Another way I have been thinking is to add a dummy variable for different groups.  Can the dummy variable account for the base differences for different groups?

Respected Advisor
Posts: 2,655

Re: Data Normalization

In GLM, by putting the group variable in the class statement, it is setting different intercepts for each level of the group variable.  Use the NOINT option to get these estimates for each group, rather than the estimates as deviations from the overall intercept by group level.  Inclusion of the group variable in the CLASS statement automatically codes dummy variables for the different groups, so there would be no need to add constructed dummy variables.

Steve Denham

Ask a Question
Discussion stats
  • 8 replies
  • 609 views
  • 0 likes
  • 3 in conversation