BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
therock
Calcite | Level 5

Hi Everyone,

 

I am using proc surveyreg on an unbalanced panel data. Here is my model:

 

Y = X1 + X2 + X3 + C + X1*X2 + X1*X3 + X2*X3 + X1*X2*X3 (no intercept)

where.

X1 and X2 are dummy variables

X3 is Continous Variable

C are Control Variables

 

If I create percentile based on X3 (Low. medium, & high), and I want to find the interaction between X1, X2, & X3, how would I do that? More specifically, do I still include the continous variable X3 and then have dummy variables for percentile values, such LowX3, MedX3, and HighX3? What about the interaction term?

 

Thanks for your suggestion,

The rock 

1 ACCEPTED SOLUTION

Accepted Solutions
alexchien
Pyrite | Level 9

The one  i mentioned would allow different X3 slopes for different X3 percentile values. Which one is better? I would start with the model you proposed and add additional complexity such as the terms included in the model i mentioned to see if they add any significant value in terms of goodness-of-fit or validation using a holdout data. Typically, however, the simplier the model, the better.

Cheers  

View solution in original post

5 REPLIES 5
alexchien
Pyrite | Level 9

I think you should include X3 in addition to the percentile dummies. You can test X3 to see if it is significant after inlcuding the percentile dummies. Also interations are possible predictors to be considered if data shows different slop of X3 for each percentile value.

therock
Calcite | Level 5

So how would you actually model it?

Thanks!

alexchien
Pyrite | Level 9

you would need 2 dummy variables for the percenticle values: LowX3 and MedX3 (or whichever 2 you pick). X3 with high percentile values can be represented by setting LowX3 = MedX3 = 0.

 

original model

X1 + X2 + X3 + C + X1*X2 + X1*X3 + X2*X3 + X1*X2*X3

 

New model

X1 + X2 + X3 + C + X1*X2 + X1*X3 + X2*X3 + X1*X2*X3 

+ LowX3 + MedX3

+ X1*LowX3*X3 + X1*MedX3*X3 + X2*LowX3*X3 + X2*MedX3*X3

+ X1*X2*X3*LowX3 + X1*X2*X3*MedX3

 

 

therock
Calcite | Level 5

Thanks for quick reply. A question:

 

What would the difference be between the one you mentioned and this one:

 

New model

X1 + X2 + X3 + C + X1*X2 + LowX3 + MedX3

+ X1*LowX3 + X1*MedX3 + X2*LowX3 + X2*MedX3

+ X1*X2*LowX3 + X1*X2*MedX3

 

Which one is better?

 

Thanks so much!

alexchien
Pyrite | Level 9

The one  i mentioned would allow different X3 slopes for different X3 percentile values. Which one is better? I would start with the model you proposed and add additional complexity such as the terms included in the model i mentioned to see if they add any significant value in terms of goodness-of-fit or validation using a holdout data. Typically, however, the simplier the model, the better.

Cheers  

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1966 views
  • 0 likes
  • 2 in conversation