Hi everyone,
I'm doing some predictive modeling (with Logistic Regression) and I'm at the stage of doing some bivariate analysis. I want to visualize this analysis but I don't have access to proc sgplot, just proc gchart.
The situation: I have a column/variable with an applicant's income (one of the predictors), and I want to do some analysis of this variable by comparing to the response variable (what is being predicted)..Loan_Status (Yes or No ).
What have I done: I have just imported the train and test data-sets. The only difference is that the train data-set has the response variable. (Loan_Status)
The problem: In the existing train/test datasets, I just have all the applicants income, but what I would want to do is create 4 different bins (Low, Average, High, Very High) income groups and then compare it to the Loan_Status. I want to see some stacked bars. I've attached an image of what I want. Now I know I can use proc format to create these bins..but I'm not sure how to then link this format to an existing data-set?
For example, I created this line of code: But I keep getting errors. One error is that no such variable such as value exists. There might be other errors in the code.
proc format;
value group
1 = 'Low'
2 = 'Average'
3 = 'High'
4 = 'Very High'
;
run;
data income;
set train; (name of initial dataset I imported)
if Applicant_Inc <=5000 then incomeg=1;
if Applicant_Inc <=8000 then incomeg=2;
if Applicant_Inc <=16000 then incomeg=3;
else incomeg=4;
run;
proc gchart data=income;
format Applicant_Inc group.;
vbar Applicant_Inc /
coutline=black
subgroup=Loan_Status
sumvar=value
legend=legend1
type=sum
width=8
maxis=axis1
raxis=axis2
discrete;
run;
If you're getting errors show the code and log. And you have to fix errors in the order it appears, so if SAS says a variable doesn't exist, it likely doesn't exist.
Here's one place for sure though that is incorrect. This would create your groups incorrectly, all should else if, see my changes in red.
data income; set train; (name of initial dataset I imported) if Applicant_Inc <=5000 then incomeg=1; else if Applicant_Inc <=8000 then incomeg=2; else if Applicant_Inc <=16000 then incomeg=3; else incomeg=4; format incomeg group.; run;
You also seem to be applying the format to the original variable, applicant_in instead of the recoded variable. One way to avoid these logical errors is to comment your code and that forces you to think through what you're doing.
I do not have access to GCHART so cannot assist beyond this. A stacked bar chart should be relatively easy though, you can find a lot of example on Robslink.com, in particular the one with the Excel version of graphs will help you out.
Edit: Also, not sure how you can have a test set that does not have your outcome to verify it. In that case it doesn't really appear to be a 'test' data set, but a scoring data set or predicted values. You don't actually know how accurate it is.
@edasdfasdfasdfa wrote:
Hi everyone,
I'm doing some predictive modeling (with Logistic Regression) and I'm at the stage of doing some bivariate analysis. I want to visualize this analysis but I don't have access to proc sgplot, just proc gchart.
The situation: I have a column/variable with an applicant's income (one of the predictors), and I want to do some analysis of this variable by comparing to the response variable (what is being predicted)..Loan_Status (Yes or No ).
What have I done: I have just imported the train and test data-sets. The only difference is that the train data-set has the response variable. (Loan_Status)
The problem: In the existing train/test datasets, I just have all the applicants income, but what I would want to do is create 4 different bins (Low, Average, High, Very High) income groups and then compare it to the Loan_Status. I want to see some stacked bars. I've attached an image of what I want. Now I know I can use proc format to create these bins..but I'm not sure how to then link this format to an existing data-set?
For example, I created this line of code: But I keep getting errors. One error is that no such variable such as value exists. There might be other errors in the code.
proc format;
value group
1 = 'Low'
2 = 'Average'
3 = 'High'
4 = 'Very High'
;
run;
data income;
set train; (name of initial dataset I imported)
if Applicant_Inc <=5000 then incomeg=1;
if Applicant_Inc <=8000 then incomeg=2;
if Applicant_Inc <=16000 then incomeg=3;
else incomeg=4;
run;
proc gchart data=income;
format Applicant_Inc group.;
vbar Applicant_Inc /
coutline=black
subgroup=Loan_Status
sumvar=value
legend=legend1
type=sum
width=8
maxis=axis1
raxis=axis2
discrete;
run;
Thanks, Reeza. Here is the full error code. I am not surprised that there is an error as Loan_Status is part of the train file I imported (its called train) but I'm not reading that dataset at all here. It also says that it can't find the value variable..which is part of the proc format. This is the main problem I'm having..how do I associate this proc format, theif-else logic, the gchart, to the initial dataset (train) that I imported?
21 proc format;
22
23 value group
24
25 1 = 'Low'
26
27 2 = 'Average'
28
29 3 = 'High'
30
31 4 = 'Very High'
32 ;
NOTE: Format group output
33 run;
NOTE: Procedure format step took :
real time : 0.003
cpu time : 0.000
34
35 data income;
36 if Applicant_Inc <=5000 then incomeg=1;
37 else if Applicant_Inc <=8000 then incomeg=2;
38 else if Applicant_Inc <=16000 then incomeg=3;
39 else incomeg=4;
40 format incomeg group.;
41 run;
NOTE: Variable "Applicant_Inc" may not be initialized
NOTE: Data set "WORK.income" has 1 observation(s) and 2 variable(s)
NOTE: The data step took :
real time : 0.004
cpu time : 0.000
42
43 proc gchart data=income;
44
45 format incomeg group.;
46
47 vbar Applicant_Inc /
48
49 coutline=black
50
51 subgroup=Loan_Status
^
ERROR: Variable "Loan_Status" not found
52
53 sumvar=value
^
ERROR: Variable "value" not found
54
55 legend=legend1
56
57 type=sum
58
59 width=8
60
61 maxis=axis1
62
63 raxis=axis2
64
65 discrete;
NOTE: Statements not executed because of errors detected
66
67 run;
NOTE: Procedure gchart step took :
real time : 0.002
cpu time : 0.000
68 quit; run;
69 ODS _ALL_ CLOSE;
@edasdfasdfasdfa wrote:
Thanks, Reeza. Here is the full error code. I am not surprised that there is an error as Loan_Status is part of the train file I imported (its called train) but I'm not reading that dataset at all here. It also says that it can't find the value variable..which is part of the proc format. This is the main problem I'm having..how do I associate this proc format, theif-else logic, the gchart, to the initial dataset (train) that I imported?
51 subgroup=Loan_Status
^
ERROR: Variable "Loan_Status" not found
52
53 sumvar=value
^
ERROR: Variable "value" not found
54
Those tell us that the variables Loan_status and Value are not in the data set you imported. Or in another step you have accidentally removed them. Or you intended to use a different variable and are using some example code and forgot to change the variable names.
You have created and referenced the format GROUP but the variable incomeg with that format is not used in your Gchart code. I think you meant to use Incomeg instead of Applicant_inc.
Note that the income "bins" can be created directly with a format such as
proc format library=work; value incomegrp 0 -< 5000 = 'Low' 5000 -< 8000 = 'Average' 8000 -<16000 = 'High' 16000 - high = 'Very High' ; run;
and apply that to format to the income variable, Applicant_inc in this case.
Thank you very much. Really useful information.
I have one final question.
Take a look at the image of the stacked bar that I attached in my first email. Look at the Y axis. It says percent. When I choose type=percent on my graph, it doesn't go from 0.1 to 1 but 0% to 100%. Is there an option to make it like the graph I attached?
Also, this might be related to that, but I would like the bars to be the same size (as in that pic)..so then that I can easily compare the Yes/No for Loan/Status among the different income bins.
Hope that makes sense.
Use the WIDTH option to control your bar widths.
Use an AXIS statement to control the Yaxis, documented here.
If you are using WPS, I don’t believe you ever answered that question about your SAS version, you’ll need to find the relevant section in their documentation.
@edasdfasdfasdfa wrote:
Thank you very much. Really useful information.
I have one final question.
Take a look at the image of the stacked bar that I attached in my first email. Look at the Y axis. It says percent. When I choose type=percent on my graph, it doesn't go from 0.1 to 1 but 0% to 100%. Is there an option to make it like the graph I attached?
Also, this might be related to that, but I would like the bars to be the same size (as in that pic)..so then that I can easily compare the Yes/No for Loan/Status among the different income bins.
Hope that makes sense.
Hello,
I have tried multiple things using AXIS but it keeps changing my X axis not Y. Any thoughts?
proc gchart data=combineincome;
format Total_Income incomegrp.;
vbar Total_Income /
coutline=black
subgroup=Loan_Status
legend=legend1
type=percent
width=8
maxis=axis1
raxis=axis2
discrete;
run;
@edasdfasdfasdfa wrote:
Thank you very much. Really useful information.
I have one final question.
Take a look at the image of the stacked bar that I attached in my first email. Look at the Y axis. It says percent. When I choose type=percent on my graph, it doesn't go from 0.1 to 1 but 0% to 100%. Is there an option to make it like the graph I attached?
Also, this might be related to that, but I would like the bars to be the same size (as in that pic)..so then that I can easily compare the Yes/No for Loan/Status among the different income bins.
Hope that makes sense.
First would be to have an appropriate format assigned to the YAXIS variable. You may have done something in a prior step that assigned a PERCENT format to your y variable. Try adding a FORMAT statement that uses something like F3.1 to show one decimal.
In GCHART you would provide finer control over values displayed at tick marks and which by creating an AXIS statement with ORDER list. In SGPLOT the YAXIS statement allows setting a list of values but you would still want an appropriate format to show 1 versus 100%. Both of the axis value lists allow syntax such as 0 to 1 by .1 to create 11 axis tick marks with values of 0, 0.1, 0.2.
There are lots of worked examples in the online code, at
http://support.sas.com/sassamples/graphgallery/index.html
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.