BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Errant
Obsidian | Level 7

Hello All,

 

I have a data set that looks like this cracker.jpg

As you can see obs are on the far left, the following four columns represent whether or not a specific brand was purchased. The next four columns represent prices, the next eight columns indicate whether or not a display and feature was used for each brand.  

 

I would like to do an alternative specific regression in this context to represent each brand. Would the best data structure in this case be to sort and organize observations by brand purchase? Essentially, should I lump all the observations together where the purchase of nabisco is 1, for example?

 

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

If I understand your question I suspect that a complete restructure of the data set to look more like

<id variables> Brand   Price Feature Display.

 

Then in many regressions use the Brand variable a Class variable or perhaps a By variable.

View solution in original post

5 REPLIES 5
ballardw
Super User

If I understand your question I suspect that a complete restructure of the data set to look more like

<id variables> Brand   Price Feature Display.

 

Then in many regressions use the Brand variable a Class variable or perhaps a By variable.

Errant
Obsidian | Level 7
I'm a little confused by your comment, do you mean categorize by id variables? What do you mean by <id variables>?
ballardw
Super User

@Errant wrote:
I'm a little confused by your comment, do you mean categorize by id variables? What do you mean by <id variables>?

<id variables> is a generic place holder for the information other than the brand information variables, might be client, date, shipping label, survey respondent identification  or anything that identifies a specific record in the data if any. In your viewtable I see OBS as the likely one, but you have more variables to the right and I don't know what they might be.

 

Trust me, you work with data long enough and you will learn the value of identifying records in some unique form.

So some of your data might look like (truncating some of the values as I don't feel like typing 10+ characters just to illustrate)

Obs    Brand      Price     Feature   Display

1        Private     0.7099  0              0

1        Sunshine 0.9800  0              0

1        Keebler    0.8799  0             (can't tell as the image is incomplete)

Errant
Obsidian | Level 7

Thank you, that helps a lot. I'm trying to structure the data exactly the way you depicted it, with proc transpose and I'm running into some issues.

 

So the ideal dataset would look like this, virtually the same thing you depicted. 

 

obsOBSPurchaseBrandFeatureDisplayPrice
110Private 000.709999979
210Keebler000.980000019
311Nabisco000.879999995
410Sunshine001.199999928

 

I've tried a few variations of proc transpose, but I haven't been able to mimic that exact structure. Can you tell what I'm missing?

 

proc transpose data=HW5.crackers_hw5;
   var DisplKeebler  DisplNabisco DisplPrivate DisplSunshine FeatKeebler FeatNabisco FeatPrivate FeatSunshine; 
   by obs KEEBLER NABISCO PRIVATE SUNSHINE;
run;

 

 

ballardw
Super User

Unless the variables follow very specific naming conventions it can be difficult to get proc transpose to realize that 1) you want to get two things from one variable its value and its name, 2) doing some other aligned item.

 

Are you sure the values you show for price are correct for the example picture data?

I would expect for OBS 1 that Keebler would have the value of PriceKeepler or 0.879999995 and Nabisco would have the value of PriceNabisco or 1.199999928.

If the values are as I suspect then use of arrays in a data step might be a better choice than going through Proc transpose:

 

data want;
   set have;
   array b PRIVATE      KEEBLER       NABISCO      SUNSHINE;
   array d DisplPrivate DisplKeebler  DisplNabisco DisplSunshine ;
   array f FeatPrivate  FeatKeebler   FeatNabisco  FeatSunshine;
   array p PricePrivate PriceKeebler  PriceNabisco PriceSunshine;
   length Brand $ 8 ;
   do i=1 to dim(b);
      Purchase = b[i];
      Brand=vname( b[i] );
      Display = d[i];
      Feature = f[i];
      Price   = pr[i];
      output;
   end;
   keep obs brand display feature price;
run;

Please note that the order of the variables in the array definitions is critical so that each item references the same product set of values.

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1786 views
  • 1 like
  • 2 in conversation