- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I've gone through a few articles and haven't found proper information on how to analyze the chi-square in SAS EM.
Here are my results
Can someone please give a detailed explanation of how this would be analyzed, I have the gist of it but would like to understand this more!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Can you briefly describe the problem, and what you did to get this far?
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is just the initial data exploration, I simply imported my file and connected the StatExplore to the original dataset (https://github.com/washingtonpost/data-police-shootings)
Im using StatExplore to see missing values and look at the chi-square to see the relationship between the input and target variable. Hence the graphs above have been generated as a result
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It is telling you which variables are "good predictors" of the target variable.
If the PROB is <0.05 then the predictor is statistically significant (in other words, has some predictive ability). The bigger the Chi-Square value, then the better the predictive value of this variable.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
For DF, i can see some of my variables are showing a large amount such as for city it is showing 16500, for reference, the race target has 7 levels at the moment, so is this being taken into account for DF?
When I get the chi-square for another binary target variable, the df is significantly smaller.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Since I don't have your data, I don't know. Maybe you should LOOK AT the data with your own eyes and see if you can see a reason why data set 1 has 300 df for variable STATE, while data set 2 has 50 df for variable STATE.
Paige Miller