BookmarkSubscribeRSS Feed
edasdfasdfasdfa
Quartz | Level 8

Hi everyone,

 

I'm doing some predictive modeling (with Logistic Regression) and I'm at the stage of doing some bivariate analysis. I want to visualize this analysis but I don't have access to proc sgplot, just proc gchart.

 

The situation: I have a column/variable with an applicant's income (one of the predictors), and I want to do some analysis of this variable by comparing to the response variable (what is being predicted)..Loan_Status (Yes or No ). 

 

What have I done: I have just imported the train and test data-sets. The only difference is that the train data-set has the response variable. (Loan_Status)

 

The problem: In the existing train/test datasets, I just have all the applicants income, but what I would want to do is create 4 different bins (Low, Average, High, Very High) income groups and then compare it to the Loan_Status. I want to see some stacked bars. I've attached an image of what I want.  Now I know I can use proc format to create these bins..but I'm not sure how to then link this format to an existing data-set? 

 

For example, I created this line of code: But I keep getting errors. One error is that no such variable such as value exists. There might be other errors in the code.

 

proc format;

value group

1 = 'Low'

2 = 'Average'

3 = 'High'

4 = 'Very High'
;
run;

 

data income;
set train; (name of initial dataset I imported)
if Applicant_Inc <=5000 then incomeg=1;
if Applicant_Inc <=8000 then incomeg=2;
if Applicant_Inc <=16000 then incomeg=3;
else incomeg=4;
run;

 

proc gchart data=income;

format Applicant_Inc group.;

vbar Applicant_Inc /

coutline=black

subgroup=Loan_Status

sumvar=value

legend=legend1

type=sum

width=8

maxis=axis1

raxis=axis2

discrete;

run;

Income.png

13 REPLIES 13
Cynthia_sas
Diamond | Level 26
Hi:
What version of SAS are you running? If you have SAS 9.3 or higher then you DO have SGPLOT. In 9.2, SG procedures needed a SAS/GRAPH license. However with SAS 9.3, the SG Procedures and ODS Graphics were provided as part of Base SAS without needing a SAS/GRAPH license. So unless you are running a really, really old version of SAS, like SAS 8 or SAS 9.0 or 9.1, you should have the SG procedures available to you.
Cynthia
Reeza
Super User

If you're getting errors show the code and log. And you have to fix errors in the order it appears, so if SAS says a variable doesn't exist, it likely doesn't exist. 

 

Here's one place for sure though that is incorrect. This would create your groups incorrectly, all should else if, see my changes in red. 

 

data income;
set train; (name of initial dataset I imported)
if Applicant_Inc <=5000 then incomeg=1;
else if Applicant_Inc <=8000 then incomeg=2;
else if Applicant_Inc <=16000 then incomeg=3;
else incomeg=4;

format incomeg group.;
run;

You also seem to be applying the format to the original variable, applicant_in instead of the recoded variable. One way to avoid these logical errors is to comment your code and that forces you to think through what you're doing. 

 

I do not have access to GCHART so cannot assist beyond this. A stacked bar chart should be relatively easy though, you can find a lot of example on Robslink.com, in particular the one with the Excel version of graphs will help you out. 

 

Edit: Also, not sure how you can have a test set that does not have your outcome to verify it. In that case it doesn't really appear to be a 'test' data set, but a scoring data set or predicted values. You don't actually know how accurate it is. 

 


@edasdfasdfasdfa wrote:

Hi everyone,

 

I'm doing some predictive modeling (with Logistic Regression) and I'm at the stage of doing some bivariate analysis. I want to visualize this analysis but I don't have access to proc sgplot, just proc gchart.

 

The situation: I have a column/variable with an applicant's income (one of the predictors), and I want to do some analysis of this variable by comparing to the response variable (what is being predicted)..Loan_Status (Yes or No ). 

 

What have I done: I have just imported the train and test data-sets. The only difference is that the train data-set has the response variable. (Loan_Status)

 

The problem: In the existing train/test datasets, I just have all the applicants income, but what I would want to do is create 4 different bins (Low, Average, High, Very High) income groups and then compare it to the Loan_Status. I want to see some stacked bars. I've attached an image of what I want.  Now I know I can use proc format to create these bins..but I'm not sure how to then link this format to an existing data-set? 

 

For example, I created this line of code: But I keep getting errors. One error is that no such variable such as value exists. There might be other errors in the code.

 

proc format;

value group

1 = 'Low'

2 = 'Average'

3 = 'High'

4 = 'Very High'
;
run;

 

data income;
set train; (name of initial dataset I imported)
if Applicant_Inc <=5000 then incomeg=1;
if Applicant_Inc <=8000 then incomeg=2;
if Applicant_Inc <=16000 then incomeg=3;
else incomeg=4;
run;

 

proc gchart data=income;

format Applicant_Inc group.;

vbar Applicant_Inc /

coutline=black

subgroup=Loan_Status

sumvar=value

legend=legend1

type=sum

width=8

maxis=axis1

raxis=axis2

discrete;

run;

Income.png


 

 

edasdfasdfasdfa
Quartz | Level 8

Thanks, Reeza. Here is the full error code. I am not surprised that there is an error as Loan_Status is part of the train file I imported (its called train) but I'm not reading that dataset at all here. It also says that it can't find the value variable..which is part of the proc format. This is the main problem I'm having..how do I associate this proc format, theif-else logic, the gchart, to the initial dataset (train) that I imported?

 

21 proc format;
22
23 value group
24
25 1 = 'Low'
26
27 2 = 'Average'
28
29 3 = 'High'
30
31 4 = 'Very High'
32 ;
NOTE: Format group output
33 run;
NOTE: Procedure format step took :
real time : 0.003
cpu time : 0.000


34
35 data income;
36 if Applicant_Inc <=5000 then incomeg=1;
37 else if Applicant_Inc <=8000 then incomeg=2;
38 else if Applicant_Inc <=16000 then incomeg=3;
39 else incomeg=4;
40 format incomeg group.;
41 run;
NOTE: Variable "Applicant_Inc" may not be initialized

NOTE: Data set "WORK.income" has 1 observation(s) and 2 variable(s)
NOTE: The data step took :
real time : 0.004
cpu time : 0.000


42
43 proc gchart data=income;
44
45 format incomeg group.;
46
47 vbar Applicant_Inc /
48
49 coutline=black
50
51 subgroup=Loan_Status
^
ERROR: Variable "Loan_Status" not found
52
53 sumvar=value
^
ERROR: Variable "value" not found
54
55 legend=legend1
56
57 type=sum
58
59 width=8
60
61 maxis=axis1
62
63 raxis=axis2
64
65 discrete;
NOTE: Statements not executed because of errors detected
66
67 run;
NOTE: Procedure gchart step took :
real time : 0.002
cpu time : 0.000


68 quit; run;
69 ODS _ALL_ CLOSE;

ballardw
Super User

@edasdfasdfasdfa wrote:


Thanks, Reeza. Here is the full error code. I am not surprised that there is an error as Loan_Status is part of the train file I imported (its called train) but I'm not reading that dataset at all here. It also says that it can't find the value variable..which is part of the proc format. This is the main problem I'm having..how do I associate this proc format, theif-else logic, the gchart, to the initial dataset (train) that I imported?

 

51 subgroup=Loan_Status
^
ERROR: Variable "Loan_Status" not found
52
53 sumvar=value
^
ERROR: Variable "value" not found
54


Those tell us that the variables Loan_status and Value are not in the data set you imported. Or in another step you have accidentally removed them. Or you intended to use a different variable and are using some example code and forgot to change the variable names.

 

You have created and referenced the format GROUP but the variable incomeg with that format is not used in your Gchart code. I think you meant to use Incomeg instead of Applicant_inc.

 

Note that the income "bins" can be created directly with a format such as

proc format library=work;
value incomegrp
    0 -< 5000 = 'Low'
 5000 -< 8000 = 'Average'
 8000 -<16000 = 'High'
16000 -  high = 'Very High'
;
run; 

and apply that to format to the income variable, Applicant_inc in this case.

 

edasdfasdfasdfa
Quartz | Level 8

Thank you very much. Really useful information.

 

I have one final question.

 

Take a look at the image of the stacked bar that I attached in my first email. Look at the Y axis. It says percent. When I choose type=percent on my graph, it doesn't go from 0.1 to 1 but 0% to 100%. Is there an option to make it like the graph I attached?

 

Also, this might be related to that, but I would like the bars to be the same size (as in that pic)..so then that I can easily compare the Yes/No for Loan/Status among the different income bins. 

 

Hope that makes sense.

Reeza
Super User

Use the WIDTH option to control your bar widths. 

https://documentation.sas.com/?docsetId=graphref&docsetVersion=9.4&docsetTarget=p0nse2mlct7rs3n1vam1...

 

Use an AXIS statement to control the Yaxis, documented here.

 

https://documentation.sas.com/?docsetId=graphref&docsetVersion=9.4&docsetTarget=p0rvgwbkch5iqsn1rghq...

 

If you are using WPS, I don’t believe you ever answered that question about your SAS version, you’ll need to find the relevant section in their documentation. 

 


@edasdfasdfasdfa wrote:

Thank you very much. Really useful information.

 

I have one final question.

 

Take a look at the image of the stacked bar that I attached in my first email. Look at the Y axis. It says percent. When I choose type=percent on my graph, it doesn't go from 0.1 to 1 but 0% to 100%. Is there an option to make it like the graph I attached?

 

Also, this might be related to that, but I would like the bars to be the same size (as in that pic)..so then that I can easily compare the Yes/No for Loan/Status among the different income bins. 

 

Hope that makes sense.


 

edasdfasdfasdfa
Quartz | Level 8

Hello,

 

I have tried multiple things using AXIS but it keeps changing my X axis not Y. Any thoughts?

 

proc gchart data=combineincome;

format Total_Income incomegrp.;

vbar Total_Income /

coutline=black

subgroup=Loan_Status

legend=legend1

type=percent

width=8

maxis=axis1

raxis=axis2

discrete;

run;

Reeza
Super User
Show what you've tried. I don't see an axis statement.
edasdfasdfasdfa
Quartz | Level 8
The axis statement is at the bottom; It just messed up my X axis.

proc gchart data=combineincome;

format Total_Income incomegrp.;

vbar Total_Income /

coutline=black

subgroup=Loan_Status

legend=legend1

type=percent

width=8

maxis=axis1

raxis=axis2

discrete;

axis order=(0 to 1 by .2) value=(height=3pct c=blue tick=1 "");

run;
Reeza
Super User
The log is ok with that? Axis statements are not part of the PROC and should go before. I don’t think your value options are correct, but again, don’t have it to test.
Reeza
Super User
And you never assign it to an axis either. Review the docs on how to point it to your axis.
ballardw
Super User

@edasdfasdfasdfa wrote:

Thank you very much. Really useful information.

 

I have one final question.

 

Take a look at the image of the stacked bar that I attached in my first email. Look at the Y axis. It says percent. When I choose type=percent on my graph, it doesn't go from 0.1 to 1 but 0% to 100%. Is there an option to make it like the graph I attached?

 

Also, this might be related to that, but I would like the bars to be the same size (as in that pic)..so then that I can easily compare the Yes/No for Loan/Status among the different income bins. 

 

Hope that makes sense.


First would be to have an appropriate format assigned to the YAXIS variable. You may have done something in a prior step that assigned a PERCENT format to your y variable. Try adding a FORMAT statement that uses something like F3.1 to show one decimal.

In GCHART you would provide finer control over values displayed at tick marks and which by creating an AXIS statement with ORDER list. In SGPLOT the YAXIS statement allows setting a list of values but you would still want an appropriate format to show 1 versus 100%. Both of the axis value lists allow syntax such as 0 to 1 by .1 to create 11 axis tick marks with values of 0, 0.1, 0.2.

There are lots of worked examples in the online code, at

http://support.sas.com/sassamples/graphgallery/index.html

https://blogs.sas.com/content/graphicallyspeaking/

http://robslink.com/SAS/

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 3428 views
  • 2 likes
  • 4 in conversation