Hello SAS users,
I have the following data, for which I am trying to build a specific boxplot.
I am nearly there, I just need to be able to display the difference between the connected means in each panel. Does anyone know how to do it? Many thanks!
Data:
data kccq_data;
length Treatment $ 15;
input ID $ KCCQ_var Treatment $ Timepoint;
datalines;
1 15 Implant 1
2 24 Implant 1
3 34 Implant 1
4 44 Implant 1
5 54 Implant 1
6 65 Implant 1
7 73 Implant 1
8 83 Implant 1
9 95.5 Implant 1
10 20 Implant 7
11 31 Implant 7
12 45.5 Implant 7
13 55.5 Implant 7
14 51 Implant 7
15 65.5 Implant 7
16 82 Implant 7
17 82 Implant 7
18 72 Implant 7
19 60 Implant 7
20 20 Implant 7
1 10 Control 1
2 22 Control 1
3 30 Control 1
4 42 Control 1
5 50 Control 1
6 80 Control 1
7 71 Control 1
8 60 Control 1
9 90 Control 1
10 72 Control 1
11 51 Control 7
12 30 Control 7
13 50 Control 7
14 70 Control 7
15 51 Control 7
16 60 Control 7
17 20 Control 7
18 11 Control 7
19 20 Control 7
20 30 Control 7
;
run;
Code to produce boxplot so far:
PROC SGPANEL DATA = kccq_data;
PANELBY Treatment / columns = 2 novarname;
VBOX KCCQ_var / category = Timepoint GROUP= Treatment connect=mean;
keylegend / title = "Treatment arm";
ROWAXIS label='KCCQ Overall Summary Score';
RUN;
You can use an INSET statement to display the difference value in the graph. Below is code that shows you how to calculate the difference.
data kccq_data;
length Treatment $ 15;
input ID $ KCCQ_var Treatment $ Timepoint;
datalines;
1 15 Implant 1
2 24 Implant 1
3 34 Implant 1
4 44 Implant 1
5 54 Implant 1
6 65 Implant 1
7 73 Implant 1
8 83 Implant 1
9 95.5 Implant 1
10 20 Implant 7
11 31 Implant 7
12 45.5 Implant 7
13 55.5 Implant 7
14 51 Implant 7
15 65.5 Implant 7
16 82 Implant 7
17 82 Implant 7
18 72 Implant 7
19 60 Implant 7
20 20 Implant 7
1 10 Control 1
2 22 Control 1
3 30 Control 1
4 42 Control 1
5 50 Control 1
6 80 Control 1
7 71 Control 1
8 60 Control 1
9 90 Control 1
10 72 Control 1
11 51 Control 7
12 30 Control 7
13 50 Control 7
14 70 Control 7
15 51 Control 7
16 60 Control 7
17 20 Control 7
18 11 Control 7
19 20 Control 7
20 30 Control 7
;
run;
proc sort data=kccq_data out=kccq_data;
by treatment timepoint;
run;
/* calculate mean for each treatment & timepoint */
proc means data=kccq_data;
var kccq_var;
by treatment timepoint;
output out=mean mean=mean;
run;
proc sort data=mean out=result;
by treatment;
run;
/* Calculate Difference Between Mean Values For Each Treatment & Timepoint */
/* This code assumes there are two treatments. You can modify this code to make it more dynamic */
data new (keep=diff treatment timepoint) ;
label diff='Diff between means is ';
format diff 5.2;
set result;
lag1=lag(mean);
lag2=lag2(mean);
if _n_=2 then do;
diff=lag1-mean;
output;
end;
if _n_=4 then do;
diff=lag1-mean;
output;
end;
run;
data all;
merge new kccq_data;
by treatment ;
run;
proc sgpanel data = all;
panelby treatment / columns = 2 novarname;
vbox kccq_var / category = timepoint group= treatment connect=mean;
keylegend / title = "Treatment Arm";
rowaxis label='KCCQ Overall Summary Score';
inset diff/position=ne;
run;
/*
You need to calculated it by hand.
*/
data kccq_data;
length Treatment $ 15;
infile cards truncover;
input ID $ KCCQ_var Treatment $ Timepoint mean_diff;
label mean_diff='diff between means is ';
datalines;
1 15 Implant 1 100
2 24 Implant 1
3 34 Implant 1
4 44 Implant 1
5 54 Implant 1
6 65 Implant 1
7 73 Implant 1
8 83 Implant 1
9 95.5 Implant 1
10 20 Implant 7
11 31 Implant 7
12 45.5 Implant 7
13 55.5 Implant 7
14 51 Implant 7
15 65.5 Implant 7
16 82 Implant 7
17 82 Implant 7
18 72 Implant 7
19 60 Implant 7
20 20 Implant 7
1 10 Control 1 2000
2 22 Control 1
3 30 Control 1
4 42 Control 1
5 50 Control 1
6 80 Control 1
7 71 Control 1
8 60 Control 1
9 90 Control 1
10 72 Control 1
11 51 Control 7
12 30 Control 7
13 50 Control 7
14 70 Control 7
15 51 Control 7
16 60 Control 7
17 20 Control 7
18 11 Control 7
19 20 Control 7
20 30 Control 7
;
run;
PROC SGPANEL DATA = kccq_data;
PANELBY Treatment / columns = 2 novarname;
VBOX KCCQ_var / category = Timepoint GROUP= Treatment connect=mean;
keylegend / title = "Treatment arm";
ROWAXIS label='KCCQ Overall Summary Score';
inset mean_diff/position=ne;
RUN;
You can use an INSET statement to display the difference value in the graph. Below is code that shows you how to calculate the difference.
data kccq_data;
length Treatment $ 15;
input ID $ KCCQ_var Treatment $ Timepoint;
datalines;
1 15 Implant 1
2 24 Implant 1
3 34 Implant 1
4 44 Implant 1
5 54 Implant 1
6 65 Implant 1
7 73 Implant 1
8 83 Implant 1
9 95.5 Implant 1
10 20 Implant 7
11 31 Implant 7
12 45.5 Implant 7
13 55.5 Implant 7
14 51 Implant 7
15 65.5 Implant 7
16 82 Implant 7
17 82 Implant 7
18 72 Implant 7
19 60 Implant 7
20 20 Implant 7
1 10 Control 1
2 22 Control 1
3 30 Control 1
4 42 Control 1
5 50 Control 1
6 80 Control 1
7 71 Control 1
8 60 Control 1
9 90 Control 1
10 72 Control 1
11 51 Control 7
12 30 Control 7
13 50 Control 7
14 70 Control 7
15 51 Control 7
16 60 Control 7
17 20 Control 7
18 11 Control 7
19 20 Control 7
20 30 Control 7
;
run;
proc sort data=kccq_data out=kccq_data;
by treatment timepoint;
run;
/* calculate mean for each treatment & timepoint */
proc means data=kccq_data;
var kccq_var;
by treatment timepoint;
output out=mean mean=mean;
run;
proc sort data=mean out=result;
by treatment;
run;
/* Calculate Difference Between Mean Values For Each Treatment & Timepoint */
/* This code assumes there are two treatments. You can modify this code to make it more dynamic */
data new (keep=diff treatment timepoint) ;
label diff='Diff between means is ';
format diff 5.2;
set result;
lag1=lag(mean);
lag2=lag2(mean);
if _n_=2 then do;
diff=lag1-mean;
output;
end;
if _n_=4 then do;
diff=lag1-mean;
output;
end;
run;
data all;
merge new kccq_data;
by treatment ;
run;
proc sgpanel data = all;
panelby treatment / columns = 2 novarname;
vbox kccq_var / category = timepoint group= treatment connect=mean;
keylegend / title = "Treatment Arm";
rowaxis label='KCCQ Overall Summary Score';
inset diff/position=ne;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.