BookmarkSubscribeRSS Feed
b_smsha
Obsidian | Level 7

Hi Everyone, 

The past few days I have went through a lot of questions and answers to help me out and it greatly did. But I would like to ask more questions

 

My target variables are Race and Mental Illness. 

 

b_smsha_0-1623008582600.png

b_smsha_1-1623008598777.png

The dataset I've merged with certain factors such as county and preprocessed some information such as region, day, month and year in SAS Studio.

 

At the moment in my SAS Eminer I've made Decision Tree modeling each target variable and I need one more model.

The regression model for M.I comes out fine, (I haven't analyzed it yet however it's producing properly) however when I produce one for Race, it looks weird and just doesn't look right and also because I have went through countless search to find out if there was a way to model Race using logistic regression but it seems impossible unless I have 2 levels or 3.

 

Can someone please suggest example of a model I may create in SAS EM which can work for both target variables. Please and thank you!

6 REPLIES 6
PaigeMiller
Diamond | Level 26

@b_smsha wrote:

however when I produce one for Race, it looks weird and just doesn't look right and also because I have went through countless search to find out if there was a way to model Race using logistic regression but it seems impossible unless I have 2 levels or 3.


Logistic regression is not limited to three levels of the target variable. See https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_logistic_examples03.htm. Maybe E-Miner has such a limitation, I don't know.

 

Can someone please suggest example of a model I may create in SAS EM which can work for both target variables.

I'm afraid this question isn't clear to me. Do you want the same model method (logistic, decision tree, neural network, etc.) to work on both target variables? Or do you want the same model fit to apply to both target variables?

--
Paige Miller
b_smsha
Obsidian | Level 7

By limited to level of target variables, I mean like if it is nominal target, then if it has more than 2 levels, it will not do multinomial regression in sas eminer. According to sas it says this 

b_smsha_0-1623014240363.png

 

but according to the book by Mr. Kattamuri in the "Predictive Analytics with SAS Eminer" he says this

b_smsha_1-1623014365664.png

 

 

 

For your second answer,

At the moment Ive created decision trees per target variables as such

b_smsha_2-1623014410667.png

So i believe I mean that I want to be able to do the same model for each target variable.. 

PaigeMiller
Diamond | Level 26

That's very unfortunate if E-Miner does not allow more than two levels of a categorical target variable. Especially since PROC LOGISTIC does allow more than two levels. Both are SAS products, but one has a limitation.

 

I find there are a number of shortcomings in E-Miner, which to me don't seem to have an obvious rationale. The other big shortcoming in the E-Miner modeling is that there are no features to handle problems with multiple Y variables, even though other SAS software such as JMP and PROC GLM do allow this.

 

Both of these limitations prevent certain real-world situations from being properly analyzed by E-miner.

--
Paige Miller
ballardw
Super User

You say:

" when I produce one for Race, it looks weird and just doesn't look right".

 

Why doesn't it look right? Some code (generated or otherwise) might help.

 

If you try to predict race from other variables you might be looking at the equivalent of trying to predict package color from the contents of a package.

 

I can see a model using race as an independent variable, in which case "looking right" can depend a lot on how well other data as well as "race" is collected and used. Race should almost never be dependent variable.

b_smsha
Obsidian | Level 7

Hi,

 

Yes here I have listed the code and the property setting I used for my regression node. 

b_smsha_6-1623014873339.pngb_smsha_7-1623014884188.png

 

b_smsha_0-1623014584301.png

 

b_smsha_1-1623014600933.png

 

b_smsha_2-1623014610531.png

b_smsha_4-1623014688958.png

b_smsha_5-1623014711991.png

 

b_smsha_3-1623014648672.png

 

 

I've the last part of the analysis of maximum likely hood and the summary. It somewhat is turning out like this. 

 

 

The reason for race as one of the target variables is because im doing a report on analyzing how mental illness and race affect police shootings, im still a student and am working under a supervisor however i'm somewhat learning or finding everything on my own, like supervising myself.. 😕

 

 

PaigeMiller
Diamond | Level 26

@ballardw wrote:

You say:

" when I produce one for Race, it looks weird and just doesn't look right".

 

Why doesn't it look right? Some code (generated or otherwise) might help.

 

If you try to predict race from other variables you might be looking at the equivalent of trying to predict package color from the contents of a package.

 

I can see a model using race as an independent variable, in which case "looking right" can depend a lot on how well other data as well as "race" is collected and used. Race should almost never be dependent variable.


Actually, I don't find this to be a problem at all. For example, you have found an actual skeleton and by measuring the bones, you want to determine gender, or age, or race (one famous case is that bones found on a South Pacific island in 1940, near the known flight path of Amelia Earhart, were determined to likely be from a female of European descent of approximately the same height and age as Earhart, and there are no other known females of European descent that were lost in this area of the South Pacific).

 

These are all real-world problems that use discriminant analysis (PROC DISCRIM) to determine a model which can be used on  skeletons found in the future (or past). And of course, the problem isn't really limited to skeletons. Whether it makes sense to do a logistic regression or decision tree or discriminant analysis in the EXACT situation that @b_smsha faces, well I don't know, but I don't have a problem with the concept.

 

Your example of predicting the color of a package by knowing the contents is somewhat spurious because the color of the package is likely uncorrelated with the contents. The race of a skeleton may be (I don't know, I'm not an anthropologist) correlated with the physical dimensions of a skeleton.

--
Paige Miller

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1665 views
  • 0 likes
  • 3 in conversation