I'd like to create a line graph showing the proportion of respondents answering "Yes" to past month tobacco use, by age category, over time. Basically I am trying to create a graph like this one:
These instructions are helpful, but I'm not sure how to manipulate the data before running the PROC SGPLOT.
When I run this syntax using the data as is...
PROC SGPLOT DATA = input;
SERIES X = year Y = past_month_use / group=age_cat;
SERIES X = year Y = past_month_use / group=age_cat;
SERIES X = year Y = past_month_use / group=age_cat;
RUN;
I get this:
How do I need to manipulate these data so that I can produce the desired outcome?
Thank you.
Sample data below:
4 vars:
id (1-20 respondents)
age_cat (1,2,3)
past_month_use (0=no, 1=yes)
year (2018-2020)
You need to summarize the data to create the proportion.
Using a 1/0 coded value when you use the mean you get the percentage (proportion) of 1 values. I used proc summary, other methods are possible, to create data set with the summary and the other values.
The Xaxis and Yaxis statements to make the X values "nicer" and to force 0 into the graph for the Y value axis.
The format is very optional but sets a nice appearance. Labels on your variables would get nicer axis labels.
data have; input id age_cat past_month_use year; datalines; 1 1 1 2018 1 1 1 2019 1 1 1 2020 1 2 0 2021 2 2 0 2018 2 2 0 2019 2 2 1 2020 2 2 1 2021 3 3 1 2018 3 3 1 2019 3 3 1 2020 3 3 1 2021 4 2 1 2018 4 3 1 2019 4 3 1 2020 4 3 1 2021 5 1 0 2018 5 1 0 2019 5 1 0 2020 5 1 0 2021 6 3 0 2018 6 3 1 2019 6 3 0 2020 6 3 1 2021 7 2 1 2018 7 2 1 2019 7 2 1 2020 7 2 1 2021 8 2 0 2018 8 2 1 2019 8 3 0 2020 8 3 1 2021 9 3 1 2018 9 3 1 2019 9 3 1 2020 9 3 1 2021 10 1 1 2018 10 1 0 2019 10 1 0 2020 10 1 0 2021 11 2 0 2018 11 2 0 2019 11 2 1 2020 11 2 1 2021 12 1 1 2018 12 1 0 2019 12 1 1 2020 12 2 1 2021 13 3 1 2018 13 3 1 2019 13 3 1 2020 13 3 1 2021 14 2 1 2018 14 2 1 2019 14 2 0 2020 14 3 0 2021 15 1 0 2018 15 1 0 2019 15 1 1 2020 15 1 1 2021 16 2 1 2018 16 2 0 2019 16 2 0 2020 16 2 0 2021 17 2 0 2018 17 3 1 2019 17 3 1 2020 17 3 1 2021 18 2 1 2018 18 2 1 2019 18 2 1 2020 18 2 1 2021 19 3 1 2018 19 3 1 2019 19 3 1 2020 19 3 1 2021 20 1 1 2018 20 2 1 2019 20 2 1 2020 20 2 1 2021 ; proc summary data=have nway; class age_cat year; var past_month_use ; output out=summary mean=; run; proc sgplot data=summary; series x=year y=past_month_use/ group=age_cat; xaxis values=(2018 to 2021 by 1) ; yaxis values=(0 to 1 by .1); format past_month_use percent6.; run;
Do you mean something like this:
data have;
input id age_cat past_month_use year;
cards;
1 1 1 2018
1 1 1 2019
1 1 1 2020
1 2 0 2021
2 2 0 2018
2 2 0 2019
2 2 1 2020
2 2 1 2021
3 3 1 2018
3 3 1 2019
3 3 1 2020
3 3 1 2021
4 2 1 2018
4 3 1 2019
4 3 1 2020
4 3 1 2021
5 1 0 2018
5 1 0 2019
5 1 0 2020
5 1 0 2021
6 3 0 2018
6 3 1 2019
6 3 0 2020
6 3 1 2021
7 2 1 2018
7 2 1 2019
7 2 1 2020
7 2 1 2021
8 2 0 2018
8 2 1 2019
8 3 0 2020
8 3 1 2021
9 3 1 2018
9 3 1 2019
9 3 1 2020
9 3 1 2021
10 1 1 2018
10 1 0 2019
10 1 0 2020
10 1 0 2021
11 2 0 2018
11 2 0 2019
11 2 1 2020
11 2 1 2021
12 1 1 2018
12 1 0 2019
12 1 1 2020
12 2 1 2021
13 3 1 2018
13 3 1 2019
13 3 1 2020
13 3 1 2021
14 2 1 2018
14 2 1 2019
14 2 0 2020
14 3 0 2021
15 1 0 2018
15 1 0 2019
15 1 1 2020
15 1 1 2021
16 2 1 2018
16 2 0 2019
16 2 0 2020
16 2 0 2021
17 2 0 2018
17 3 1 2019
17 3 1 2020
17 3 1 2021
18 2 1 2018
18 2 1 2019
18 2 1 2020
18 2 1 2021
19 3 1 2018
19 3 1 2019
19 3 1 2020
19 3 1 2021
20 1 1 2018
20 2 1 2019
20 2 1 2020
20 2 1 2021
;
run;
proc sort data = have;
by year age_cat past_month_use id;
run;
data want;
set have;
by year age_cat;
if first.age_cat then
do;
d = 0;
n = 0;
end;
d + past_month_use;
n + 1;
if last.age_cat then
do;
prop = divide(d,n);
if prop then output;
format prop percent10.2;
end;
run;
proc sgplot data = want;
series x = year y = prop / group=age_cat;
xaxis integer;
run;
?
Bart
You need to summarize the data to create the proportion.
Using a 1/0 coded value when you use the mean you get the percentage (proportion) of 1 values. I used proc summary, other methods are possible, to create data set with the summary and the other values.
The Xaxis and Yaxis statements to make the X values "nicer" and to force 0 into the graph for the Y value axis.
The format is very optional but sets a nice appearance. Labels on your variables would get nicer axis labels.
data have; input id age_cat past_month_use year; datalines; 1 1 1 2018 1 1 1 2019 1 1 1 2020 1 2 0 2021 2 2 0 2018 2 2 0 2019 2 2 1 2020 2 2 1 2021 3 3 1 2018 3 3 1 2019 3 3 1 2020 3 3 1 2021 4 2 1 2018 4 3 1 2019 4 3 1 2020 4 3 1 2021 5 1 0 2018 5 1 0 2019 5 1 0 2020 5 1 0 2021 6 3 0 2018 6 3 1 2019 6 3 0 2020 6 3 1 2021 7 2 1 2018 7 2 1 2019 7 2 1 2020 7 2 1 2021 8 2 0 2018 8 2 1 2019 8 3 0 2020 8 3 1 2021 9 3 1 2018 9 3 1 2019 9 3 1 2020 9 3 1 2021 10 1 1 2018 10 1 0 2019 10 1 0 2020 10 1 0 2021 11 2 0 2018 11 2 0 2019 11 2 1 2020 11 2 1 2021 12 1 1 2018 12 1 0 2019 12 1 1 2020 12 2 1 2021 13 3 1 2018 13 3 1 2019 13 3 1 2020 13 3 1 2021 14 2 1 2018 14 2 1 2019 14 2 0 2020 14 3 0 2021 15 1 0 2018 15 1 0 2019 15 1 1 2020 15 1 1 2021 16 2 1 2018 16 2 0 2019 16 2 0 2020 16 2 0 2021 17 2 0 2018 17 3 1 2019 17 3 1 2020 17 3 1 2021 18 2 1 2018 18 2 1 2019 18 2 1 2020 18 2 1 2021 19 3 1 2018 19 3 1 2019 19 3 1 2020 19 3 1 2021 20 1 1 2018 20 2 1 2019 20 2 1 2020 20 2 1 2021 ; proc summary data=have nway; class age_cat year; var past_month_use ; output out=summary mean=; run; proc sgplot data=summary; series x=year y=past_month_use/ group=age_cat; xaxis values=(2018 to 2021 by 1) ; yaxis values=(0 to 1 by .1); format past_month_use percent6.; run;
Thank you!
Can you accomplish the same thing using PROC FREQ instead of PROC SUMMARY?
@_maldini_ wrote:
Thank you!
Can you accomplish the same thing using PROC FREQ instead of PROC SUMMARY?
Suggestion is always "try and see".
The main thing with proc freq is it COUNTS all formatted values. So you would end up with counts/percents of 0 as well as 1. Also Proc Freq shifts percentages calculated by *100 so the Y axis values change.
proc freq data=have noprint; tables age_cat*year*past_month_use / outpct out=freqout; run; proc sgplot data=freqout; where past_month_use=1; series x=year y=pct_row/ group=age_cat; xaxis values=(2018 to 2021 by 1) ; yaxis values=(0 to 100 by 10); run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.