First off, my apologies if this is a frequently asked or poorly worded question. I have only recently started using SAS, so I'm still trying to grasp the basics. That being said, these are my problems: Short Version: In PROC REG, can you define a group of variables and only generate regressions using EXACTLY one variable from the group? In PROC REG, can you define a group of variables and only generate regressions using AT MOST one variable from the group? In PROC REG, can you define a group of variables and only generate regressions that use either ALL or NONE of the group? Long Version: My data set is sales data for a fictional retail store. My dependent variable is sales, and the data set began with hundreds of variables about each store location, including several describing consumer income near the store, several describing X trait (and Y trait, and Z trait, etc) of local households, and a multi-characteristic dummy variable that describes the geographical region of the store (NE_dummy, SE_dummy, W_dummy, etc). These variables have been pared down to 20-25 that either correlate significantly with sales or need to be included from a theory/logical perspective. My current SAS code essentially boils down to this: PROC REG;
model sales = var1 .... var20 / selection=CP start=6 stop=10 best=100;
run; Since I am essentially hoping to generate a demand function, theory dictates I include consumer income. However, income has a fairly weak correlation with sales, so just throwing all regressors in to PROC MEANS & generating the 100 best models by Mallows' Cp ("blindly" generating regressions) yields no models with any income variable. I wish to force the model to use exactly one regressor from the group of consumer income variables. Is this possible? Or if it's not possible for the group case, can you force SAS to only generate regressions that use a given variable (ie sales = avgincome_5miles + {any combination of other variables}). Variables about consumer trait X (and Y, Z, etc) are defined based on radius around the given store - 1 radial mile, 5 radial miles, 10 radial miles. "Blindly" generating regressions often yields two or three from a single group. I want my model to include at most one regressor from X, at most one regressor from Y, and at most 1 regressor from Z. Is there any way to do this? For the multi-characteristic regional dummy (NE_dummy, SE_dummy, W_dummy, etc), "blindly" generating regressions will yield models with only one or two of the dummies in the group. I want the model to either include ALL of them (not regressing on the base group, SW_dummy of course) or NONE of them . Is there any way to do this? Thank you for reading through my question.
... View more