How do I add a new variable and a new Y axis to a line graph using PROC SGPLOT?
The syntax at the bottom of this post produces this line graph:
I am outputting percentages from a PROC FREQ and then graphing the percentage over time, by a grouping variable (i.e., group = cann_use_status) using a format:
VALUE
cann_use_statusfmt
1 = "Never"
2 = "Past Use"
3 = "Current Use, Light (< 5d/past mo)"
4 = "Current, Heavy (≥ 5d/past mo)";
I want a graph that looks like this (see pink dotted line)
In this example, the pink line is the grouping variable, but with a different format (see below). But...I would like to learn how to add any variable to a second axis (assuming the units are the same).
VALUE
cann_use_status3fmt
1 = "Never"
2 = "Past Use"
3,4 = "Current Use";
I want to graph the column percent (i.e., the percentage of respondents in that year that identify as current users).
The 1st Y axis is the percentage reporting a prior MI (i.e., heart attack) by the grouping variable (i.e., cann_use_status). I'd like to add cann_use_status to the other Y axis (i.e., percentage of current users, independent of MI status).
Sorry for the long, complicated post. Thanks for any assistance.
PROC FREQ DATA=nhanes.go;
where cann_use_status ne .;
TABLE ever_told_mi*year*cann_use_status / outpct out=freqout;
format cann_use_status cann_use_statusfmt.;
run;
proc sgplot data=freqout ;
where ever_told_mi=1;
series x=year y=pct_row
/ lineattrs=(thickness=2px) markers DATALABEL=pct_row group=cann_use_status;
xaxis
label = "Time in 2-year intervals"
values=(2009-2010 to 2017-2018);
yaxis
label = "Percentage Reporting Prior MI"
values=(0 to 100 by 10)
VALUESFORMAT=f8.1;
format
ever_told_mi yes_nofmt.
cann_use_status cann_use_statusfmt.
pct_row 8.1;
title "Figure 1b. Percentage of Middle-Aged Adults Reporting Prior MI in NHANES Biennial Examinations, by Cannabis Use Status and year (2009-2020)";
run;
This will get you close. Not sure how Current is calculated, doesn't appear in data so I just averaged the current values and added it as a separate series.
You specify Y2AXIS as an option on the series/scatter statement to have a secondary axis.
You can control the legend with KEYLEGEND statement. Left that as an exercise to you.
proc means data=freqout noprint nway;
where cann_use_status in (3, 4) and ever_told_mi=1;
class year ever_told_mi;
weight count;
var pct_row;
output out=totals mean = AVG_VALUE;
run;
data freqout2;
set freqout totals;
run;
proc format;
VALUE
cann_use_status3fmt
1 = "Never"
2 = "Past Use"
3,4 = "Current Use";
VALUE
cann_use_statusfmt
1 = "Never"
2 = "Past Use"
3 = "Current Use, Light (< 5d/past mo)"
4 = "Current, Heavy (≥ 5d/past mo)";
run;
proc sgplot data=freqout2 ;
where ever_told_mi=1;
series x=year y=pct_row
/ lineattrs=(thickness=2px) markers DATALABEL=pct_row group=cann_use_status;
xaxis
label = "Time in 2-year intervals";
series x=year y=AVG_VALUE
/ lineattrs=(thickness=2px) markers DATALABEL=AVG_VALUE;
y2axis
label = "Time in 2-year intervals";
*values=(2009-2010 to 2017-2018);
yaxis
label = "Percentage Reporting Prior MI"
values=(0 to 100 by 10)
VALUESFORMAT=f8.1;
y2axis
label = "Percentage Reporting Current Use"
values=(0 to 100 by 10)
VALUESFORMAT=f8.1;
format
ever_told_mi yes_nofmt.
cann_use_status cann_use_statusfmt.
pct_row 8.1;
title "Figure 1b. Percentage of Middle-Aged Adults Reporting Prior MI in NHANES Biennial Examinations, by Cannabis Use Status and year (2009-2020)";
run;
@Reeza Thanks. This helps a lot.
A couple of clarifying questions (See image below):
1. The other Y axis is not represented on the graph. Is there a way to add it?
2. Do you know what the dotted grey line in the legend represents (i.e., how do I get rid of it. Full PROC SGPLOT syntax at the end of the post)?
3. The pink line is a good proof of concept, but it is not graphing the correct percentage (my fault, not yours). I want to graph the column percent using a different format (or in the future, a totally different nominal variable). Based on my limited knowledge, I would run a separate PROC FREQ and output a different column percent. This is my attempt to mimic your PROC MEANS approach:
PROC FREQ DATA=nhanes.go;
where cann_use_status ne .;
TABLE ever_told_mi*(cann_use_status)*year / outpct out=freqout_1;
format cann_use_status cann_use_statusfmt.;
run;
PROC FREQ DATA=nhanes.go;
where cann_use_status ne .;
TABLE ever_told_mi*(cann_use_status)*year / outpct out=freqout_3;
format cann_use_status cann_use_status3fmt.;
run;
data freqout_4;
set freqout_1 freqout_3;
run;
The 2nd PROC FREQ uses the different format. The problem is that the column percent (i.e., PCT_COL) has the same name in the outputted data sets (i.e., freqout_1 and freqout_3), so I can't identify it as a separate variable in the PROC SGPLOT (see image below). There is only 1 PCT_COL.
I don't see a way to specify the statistics to output in a PROC FREQ (e.g., row_pct, col_pct, etc.) like you would in a PROC MEANS (e.g., mean, max, etc.). Does it simply output everything? Also, is there a way to specify the name of an outputted variable in PROC FREQ (e.g., PCT_COL = col_pct_2)? This would be similar to "output out=totals mean = AVG_VALUE;" in the PROC MEANS.
Thanks again.
proc sgplot data=freqout2 ;
where ever_told_mi=1;
series x=year y=pct_row
/ lineattrs=(thickness=2px) markers DATALABEL=pct_row group=cann_use_status;
xaxis
label = "Time in 2-year intervals";
yaxis
label = "Percentage Reporting Prior MI"
values=(0 to 60 by 10)
VALUESFORMAT=f8.1;
series x=year y=AVG_VALUE
/ lineattrs=(thickness=4px) markers DATALABEL=AVG_VALUE
legendlabel="Percentage Reporting Current Use";
keylegend / position=bottom ;;
x2axis
label = "Time in 2-year intervals";
*values=(2009-2010 to 2017-2018);
y2axis
label = "Percentage Reporting Current Use"
values=(0 to 60 by 10)
VALUESFORMAT=f8.1;
format
ever_told_mi yes_nofmt.
cann_use_status cann_use_statusfmt.
pct_row 8.1
avg_value 8.1;
title "Figure 1b. Percentage of Middle-Aged Adults Reporting Prior MI in NHANES Biennial Examinations, by Cannabis Use Status and year (2009-2020)";
run;
@_maldini_ wrote:
A couple of clarifying questions (See image below):
1. The other Y axis is not represented on the graph. Is there a way to add it?
You never actually put the Y2AXIS option in your code, recheck it against my code and run it. Should have the second axis.
2. Do you know what the dotted grey line in the legend represents (i.e., how do I get rid of it. Full PROC SGPLOT syntax at the end of the post)?
I suspect that's the output from your second PROC FREQ.
Look at the data structure on the data set I created and compare it to yours. It is not the same. You need to have a new variable in the data set, you cannot reuse the same variable on a different axis or at least it doesn't make sense in context of your question.
The 2nd PROC FREQ uses the different format. The problem is that the column percent (i.e., PCT_COL) has the same name in the outputted data sets (i.e., freqout_1 and freqout_3), so I can't identify it as a separate variable in the PROC SGPLOT (see image below). There is only 1 PCT_COL.
I don't see a way to specify the statistics to output in a PROC FREQ (e.g., row_pct, col_pct, etc.) like you would in a PROC MEANS (e.g., mean, max, etc.). Does it simply output everything? Also, is there a way to specify the name of an outputted variable in PROC FREQ (e.g., PCT_COL = col_pct_2)? This would be similar to "output out=totals mean = AVG_VALUE;" in the PROC MEANS.
You can rename any variable in a data set using the data set options.
In PROC FREQ the statistics are output by default and you suppress them with options, such as norow, nopercent etc. See the PROC FREQ documentations for details. You cannot specify the names but you can easily rename them using a data set option either on the OUT= data set or the SET statement.
data freqout_4; set freqout_1 freqout_3 (rename=pct_col=pct_col_y2); run;
<You never actually put the Y2AXIS option in your code, recheck it against my code and run it. Should have the second axis.>
When I run your code it doesn't add the axis. I copied and pasted it verbatim. This is the graph (below).
From this code
LABEL is not valid within the SERIES statement.
proc sgplot data=freqout2 ;
where ever_told_mi=1;
series x=year y=pct_row
/ markers lineattrs=(thickness=2px) group=cann_use_status DATALABEL=pct_row; ;
series x=year y=AVG_VALUE
/ markers lineattrs=(thickness=2px) y2axis DATALABEL=AVG_VALUE;
*values=(2009-2010 to 2017-2018);
yaxis
label = "Percentage Reporting Prior MI"
values=(0 to 100 by 10)
VALUESFORMAT=f8.1;
y2axis
label = "Percentage Reporting Current Use"
values=(0 to 100 by 10)
VALUESFORMAT=f8.1;
format
ever_told_mi yes_nofmt.
cann_use_status cann_use_statusfmt.
pct_row 8.1;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.