I'm trying to analyze political effects in US Presidential elections.
Red States= Republican wins by >=5%
Blue States = Democrat wins by >=5%
Battleground States = In-Between
Hypotheses: H1 Null Ho Red vs Battleground is more Signif than Blue vs Battleground in affecting Y
H1 Alternate HA Blue vs Battleground is more Signif than Red vs Battleground in affecting Y
H2 Null Ho Political Extremism in State (ie EITHER Red or Blue) is more Signif than Battleground in affecting Y
H2 Alternate HA Battleground is more Signif than Political Extremism in State (ie EITHER Red or Blue) in affecting Y
How would I set up dummy or categorical explanatory variables for this type of model? It seems perplexing.
I've thought of 2 possible solutions, but they both seem wrong:
a) I have a dummy variable RED (=1 if state is Red State; =0 if Blue or Battleground),
I have a dummy variable BATTLE (=1 if state is Battleground; =0 if not Battleground).
b) I have a categorical variable RED (=1 if state is Red State; =0 if Battleground; =-1 if Blue State),
I have a dummy variable BATTLE (=1 if state is Battleground; =0 if not Battleground).
What if I instead have variables like these:
c) I have a dummy variable RED (=1 if state is Red State; =0 if Blue or Battleground),
I have a dummy variable REDBLUE (=1 if state is Red or Blue; =0 if IS Battleground).
Then the following combinations of RED,REDBLUE would mean the following:
1,1 Red State
0,1 Blue State
0,0 Battleground
Comparing the significances of the RED coefficient (vs 0) helps us resolve Hypothesis H1;
and of the REDBLUE coefficient, Hypothesis H2.
Am I making the correct conclusion here?
How would I set up dummy or categorical explanatory variables for this type of model? It seems perplexing.
No need to create dummy variables at all. Most times a categorical variable works well. Exactly how you set it up probably doesn't matter, as long as you have three values of the categorical variable. Then just about any PROC in SAS that you choose to do the analysis will be able to work with this categorical variable.
But I need to have 2 variables to test the 2 hypotheses right? And if one of them is a categorical variable, it will create a constrained effect that will equate the effects of redness and blueness that would be incorrect, right?
@jjsingh04 wrote:
But I need to have 2 variables to test the 2 hypotheses right? And if one of them is a categorical variable, it will create a constrained effect that will equate the effects of redness and blueness that would be incorrect, right?
No. You create one set of categorical variables, and then you can ask SAS to do both comparisons. For example, if Y is continuous then you use PROC GLM, you can do both comparisons using two CONTRAST or two ESTIMATE statements. Similarly, if Y is categorical too, then you could do similar comparisons in PROC LOGISTIC or PROC GENMOD.
Simple examples in PROC GLM using the ESTIMATE statement: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_glm_syntax07.htm
Y is a dummy actually (1,0) so I do logistic. So you're saying 2 separate logistic regressions, 1 with each variable?
I've read in several places online about the problem of constrained effects with categorical variables too--the implicit assumption of equal spacing and equal magnitude being implied: https://stats.stackexchange.com/questions/278837/numerical-coding-and-constraints-for-categorical-va...
I was to avoid all that 🙂
@jjsingh04 wrote:
I've read in several places online about the problem of constrained effects with categorical variables too--the implicit assumption of equal spacing and equal magnitude being implied: https://stats.stackexchange.com/questions/278837/numerical-coding-and-constraints-for-categorical-va...
I was to avoid all that 🙂
That doesn't apply here. I am not suggesting you use numerical coding at all. What I was talking about was not a "constraint", anyway.
Y is a dummy actually (1,0) so I do logistic. So you're saying 2 separate logistic regressions, 1 with each variable?
I don't see anywhere that I say to do 2 separate logistic regressions, 1 with each variable, which wouldn't make any sense. I said "you can do both comparisons using two CONTRAST or two ESTIMATE statements". You can have multiple CONTRAST or multiple ESTIMATE statements in one regression.
I think I now understand. Is this example illustrative of what you mean?:
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_logistic_examples02.htm
This example contrast's the 3-level categorical variable treatment: A vs P, B vs P, and A vs B, which in my case is analogous to Red vs Battle, Blue vs Battle, and Red vs Blue.
But I want to also test a hypothesis of (Either Red or Blue) vs (Battle), which I'm not sure how I would set up.
@jjsingh04 wrote:
I think I now understand. Is this example illustrative of what you mean?:
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_logistic_examples02.htm
This example contrast's the 3-level categorical variable treatment: A vs P, B vs P, and A vs B, which in my case is analogous to Red vs Battle, Blue vs Battle, and Red vs Blue.
But I want to also test a hypothesis of (Either Red or Blue) vs (Battle), which I'm not sure how I would set up.
Yes, that is it except it uses CONTRAST instead of ESTIMATE, which is fine.
What is the math would you use to test the hypothesis of "(Either Red or Blue) vs (Battle)"? Don't explain with SAS code, explain with words and simple formulas.
Well, I had been thinking that if there was a separate dummy variable called BATTLE, =1 for Battleground States, and =0 for Red or Blue States, then the significance of that dummy variable would be measure enough to test that hypothesis, no?
And if so, would that go in the same logistic regression as what we were just discussing, or in a separate one?
And what if you didn't use dummy variables at all (because you don't need them)? You just had three categories, Red Blue and Battle? How do you compare Battle to Red and Blue?
Hint: there's an actual SAS example in the link I gave earlier, comparing one category to two other categories.
I'm also trying to get you to stop thinking about dummy variables, they're not helpful here and not helpful in most cases in SAS. Yes, you learn about them in you university training, but SAS pretty much makes them obsolete in most cases, SAS computes the dummy variables behind the scenes, so you can think about categories and comparisons that you want to make between the categories and you can avoid thinking about dummy variables.
Do you mean A1+A2 vs A3 in the discussion of Divisor? Does that work with CONTRAST as well?:
DIVISOR=number
specifies a value by which to divide all coefficients so that fractional coefficients can be entered as integer numerators. For example, you can use
estimate '1/3(A1+A2) - 2/3A3' a 1 1 -2 / divisor=3;
instead of
estimate '1/3(A1+A2) - 2/3A3' a 0.33333 0.33333 -0.66667;
Yes that's it
(1/2) * A1 + (1/2) * A2 - A3 is the comparison, where A1 A2 and A3 are the means for each group.
estimate 'Mean(A1,A2) - A3' a 1 1 -2 / divisor=2;which actually doesn't even need the divisor part.
Also, you can do
(1/2) * A1 + (1/2) * A3 - A2 which compares the means of A1 and A3 to A2, and there's one more comparison that can be done like this.
You can use the CONTRAST statement similarly, that would work fine for this problem.
In my case, which variables would refer to Red States, Blue States, Battleground States, respectively.
Is it A1, A2, A3 or A1, A3, A2?
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
