topic Re: Structuring data for an Alternative Specific Multinomial Regression in Statistical Procedures

Structuring data for an Alternative Specific Multinomial Regression

Errant — Wed, 18 Jul 2018 13:31:43 GMT

Hello All,

I have a data set that looks like this

As you can see obs are on the far left, the following four columns represent whether or not a specific brand was purchased. The next four columns represent prices, the next eight columns indicate whether or not a display and feature was used for each brand.

I would like to do an alternative specific regression in this context to represent each brand. Would the best data structure in this case be to sort and organize observations by brand purchase? Essentially, should I lump all the observations together where the purchase of nabisco is 1, for example?

Thank you!

Re: Structuring data for an Alternative Specific Multinomial Regression

ballardw — Wed, 18 Jul 2018 15:38:42 GMT

If I understand your question I suspect that a complete restructure of the data set to look more like

<id variables> Brand Price Feature Display.

Then in many regressions use the Brand variable a Class variable or perhaps a By variable.

Re: Structuring data for an Alternative Specific Multinomial Regression

Errant — Wed, 18 Jul 2018 22:45:37 GMT

I'm a little confused by your comment, do you mean categorize by id variables? What do you mean by <id variables>?

Re: Structuring data for an Alternative Specific Multinomial Regression

ballardw — Fri, 20 Jul 2018 15:41:40 GMT

@Errant wrote:
I'm a little confused by your comment, do you mean categorize by id variables? What do you mean by <id variables>?

<id variables> is a generic place holder for the information other than the brand information variables, might be client, date, shipping label, survey respondent identification or anything that identifies a specific record in the data if any. In your viewtable I see OBS as the likely one, but you have more variables to the right and I don't know what they might be.

Trust me, you work with data long enough and you will learn the value of identifying records in some unique form.

So some of your data might look like (truncating some of the values as I don't feel like typing 10+ characters just to illustrate)

Obs Brand Price Feature Display

1 Private 0.7099 0 0

1 Sunshine 0.9800 0 0

1 Keebler 0.8799 0 (can't tell as the image is incomplete)

Re: Structuring data for an Alternative Specific Multinomial Regression

Errant — Sat, 21 Jul 2018 18:16:17 GMT

Thank you, that helps a lot. I'm trying to structure the data exactly the way you depicted it, with proc transpose and I'm running into some issues.

So the ideal dataset would look like this, virtually the same thing you depicted.

obs	OBS	Purchase	Brand	Feature	Display	Price
1	1	0	Private	0	0	0.709999979
2	1	0	Keebler	0	0	0.980000019
3	1	1	Nabisco	0	0	0.879999995
4	1	0	Sunshine	0	0	1.199999928

I've tried a few variations of proc transpose, but I haven't been able to mimic that exact structure. Can you tell what I'm missing?

proc transpose data=HW5.crackers_hw5;
   var DisplKeebler  DisplNabisco DisplPrivate DisplSunshine FeatKeebler FeatNabisco FeatPrivate FeatSunshine; 
   by obs KEEBLER NABISCO PRIVATE SUNSHINE;
run;

Re: Structuring data for an Alternative Specific Multinomial Regression

ballardw — Mon, 23 Jul 2018 18:04:48 GMT

Unless the variables follow very specific naming conventions it can be difficult to get proc transpose to realize that 1) you want to get two things from one variable its value and its name, 2) doing some other aligned item.

Are you sure the values you show for price are correct for the example picture data?

I would expect for OBS 1 that Keebler would have the value of PriceKeepler or 0.879999995 and Nabisco would have the value of PriceNabisco or 1.199999928.

If the values are as I suspect then use of arrays in a data step might be a better choice than going through Proc transpose:

data want;
   set have;
   array b PRIVATE      KEEBLER       NABISCO      SUNSHINE;
   array d DisplPrivate DisplKeebler  DisplNabisco DisplSunshine ;
   array f FeatPrivate  FeatKeebler   FeatNabisco  FeatSunshine;
   array p PricePrivate PriceKeebler  PriceNabisco PriceSunshine;
   length Brand $ 8 ;
   do i=1 to dim(b);
      Purchase = b[i];
      Brand=vname( b[i] );
      Display = d[i];
      Feature = f[i];
      Price   = pr[i];
      output;
   end;
   keep obs brand display feature price;
run;

Please note that the order of the variables in the array definitions is critical so that each item references the same product set of values.