04-16-2018 05:41 AM

I would like to perform some basic statistical test to establish whether certain customer segments are more price sensitive than other. For each customer segment (CustomerSegmentId) I have samples of how many units were bought of one specific product (NumberOfUnits) at each price (Price). The data structure is as follows:

CustomerSegmentId Price ProductId NumberOfUnits

Certain customer segments have much lower samples than others, making it an unbalanced problem. This means that I should use PROC GLM rather than PROC ANOVA using code along those lines:

```
proc glm data = SomeData;
class CustomerSegmentId ProductId;
model NumberOfUnits
= Price CustomerSegmentId ProductId;
run;
quit;
```

I know that this community does not exist to answer statistical questions but the only site I am aware of Cross Validated:

https://stats.stackexchange.com/

is not very responsive (please suggest other sites).

Is the above a good starting point? Also how do I perform post hoc tests to answer questions as to whether CustomerSegmentId=1 is more price sensitive than CustomerSegmentId=2?

I also had a look at choice set approaches, which use for example logistic regression. Unfortunately, I only have observational data in this format:

TargetProductId ComparableProductId TargetPriceProductPrice ComparableProductPrice CustomerSegmentId TargetProductBought

1 2 23 25 1 0

1 3 23 25.50 1 0

1 4 23 21 2 1

Here we look at a target product at the time and we can establish if another comparable product of a customer was viewed. We know the price of the target product and the comparable product. We also know if the target product was bought by the customer belonging to a certain segment (TargetProductBought = binary).

Perhaps one could fit a logistic regression model using these product pair data (there would also be independent variables for each customer segment etc.)? I am aware of great publications by Warren F. Kuhfeld, e.g.:

https://support.sas.com/techsup/technote/mr2010f.pdf

but I am not sure whether my data described above could be used.

Any feedback would be very much appreciated. Thanks!