Hello SAS users,
I have the following data, for which I am trying to build a specific boxplot.
I am nearly there, I just need to be able to display the difference between the connected means in each panel. Does anyone know how to do it? Many thanks!
Data:
data kccq_data;
length Treatment $ 15;
input ID $ KCCQ_var Treatment $ Timepoint;
datalines;
1 15 Implant 1
2 24 Implant 1
3 34 Implant 1
4 44 Implant 1
5 54 Implant 1
6 65 Implant 1
7 73 Implant 1
8 83 Implant 1
9 95.5 Implant 1
10 20 Implant 7
11 31 Implant 7
12 45.5 Implant 7
13 55.5 Implant 7
14 51 Implant 7
15 65.5 Implant 7
16 82 Implant 7
17 82 Implant 7
18 72 Implant 7
19 60 Implant 7
20 20 Implant 7
1 10 Control 1
2 22 Control 1
3 30 Control 1
4 42 Control 1
5 50 Control 1
6 80 Control 1
7 71 Control 1
8 60 Control 1
9 90 Control 1
10 72 Control 1
11 51 Control 7
12 30 Control 7
13 50 Control 7
14 70 Control 7
15 51 Control 7
16 60 Control 7
17 20 Control 7
18 11 Control 7
19 20 Control 7
20 30 Control 7
;
run;
Code to produce boxplot so far:
PROC SGPANEL DATA = kccq_data;
PANELBY Treatment / columns = 2 novarname;
VBOX KCCQ_var / category = Timepoint GROUP= Treatment connect=mean;
keylegend / title = "Treatment arm";
ROWAXIS label='KCCQ Overall Summary Score';
RUN;
You can use an INSET statement to display the difference value in the graph. Below is code that shows you how to calculate the difference.
data kccq_data;
length Treatment $ 15;
input ID $ KCCQ_var Treatment $ Timepoint;
datalines;
1 15 Implant 1
2 24 Implant 1
3 34 Implant 1
4 44 Implant 1
5 54 Implant 1
6 65 Implant 1
7 73 Implant 1
8 83 Implant 1
9 95.5 Implant 1
10 20 Implant 7
11 31 Implant 7
12 45.5 Implant 7
13 55.5 Implant 7
14 51 Implant 7
15 65.5 Implant 7
16 82 Implant 7
17 82 Implant 7
18 72 Implant 7
19 60 Implant 7
20 20 Implant 7
1 10 Control 1
2 22 Control 1
3 30 Control 1
4 42 Control 1
5 50 Control 1
6 80 Control 1
7 71 Control 1
8 60 Control 1
9 90 Control 1
10 72 Control 1
11 51 Control 7
12 30 Control 7
13 50 Control 7
14 70 Control 7
15 51 Control 7
16 60 Control 7
17 20 Control 7
18 11 Control 7
19 20 Control 7
20 30 Control 7
;
run;
proc sort data=kccq_data out=kccq_data;
by treatment timepoint;
run;
/* calculate mean for each treatment & timepoint */
proc means data=kccq_data;
var kccq_var;
by treatment timepoint;
output out=mean mean=mean;
run;
proc sort data=mean out=result;
by treatment;
run;
/* Calculate Difference Between Mean Values For Each Treatment & Timepoint */
/* This code assumes there are two treatments. You can modify this code to make it more dynamic */
data new (keep=diff treatment timepoint) ;
label diff='Diff between means is ';
format diff 5.2;
set result;
lag1=lag(mean);
lag2=lag2(mean);
if _n_=2 then do;
diff=lag1-mean;
output;
end;
if _n_=4 then do;
diff=lag1-mean;
output;
end;
run;
data all;
merge new kccq_data;
by treatment ;
run;
proc sgpanel data = all;
panelby treatment / columns = 2 novarname;
vbox kccq_var / category = timepoint group= treatment connect=mean;
keylegend / title = "Treatment Arm";
rowaxis label='KCCQ Overall Summary Score';
inset diff/position=ne;
run;
/*
You need to calculated it by hand.
*/
data kccq_data;
length Treatment $ 15;
infile cards truncover;
input ID $ KCCQ_var Treatment $ Timepoint mean_diff;
label mean_diff='diff between means is ';
datalines;
1 15 Implant 1 100
2 24 Implant 1
3 34 Implant 1
4 44 Implant 1
5 54 Implant 1
6 65 Implant 1
7 73 Implant 1
8 83 Implant 1
9 95.5 Implant 1
10 20 Implant 7
11 31 Implant 7
12 45.5 Implant 7
13 55.5 Implant 7
14 51 Implant 7
15 65.5 Implant 7
16 82 Implant 7
17 82 Implant 7
18 72 Implant 7
19 60 Implant 7
20 20 Implant 7
1 10 Control 1 2000
2 22 Control 1
3 30 Control 1
4 42 Control 1
5 50 Control 1
6 80 Control 1
7 71 Control 1
8 60 Control 1
9 90 Control 1
10 72 Control 1
11 51 Control 7
12 30 Control 7
13 50 Control 7
14 70 Control 7
15 51 Control 7
16 60 Control 7
17 20 Control 7
18 11 Control 7
19 20 Control 7
20 30 Control 7
;
run;
PROC SGPANEL DATA = kccq_data;
PANELBY Treatment / columns = 2 novarname;
VBOX KCCQ_var / category = Timepoint GROUP= Treatment connect=mean;
keylegend / title = "Treatment arm";
ROWAXIS label='KCCQ Overall Summary Score';
inset mean_diff/position=ne;
RUN;
You can use an INSET statement to display the difference value in the graph. Below is code that shows you how to calculate the difference.
data kccq_data;
length Treatment $ 15;
input ID $ KCCQ_var Treatment $ Timepoint;
datalines;
1 15 Implant 1
2 24 Implant 1
3 34 Implant 1
4 44 Implant 1
5 54 Implant 1
6 65 Implant 1
7 73 Implant 1
8 83 Implant 1
9 95.5 Implant 1
10 20 Implant 7
11 31 Implant 7
12 45.5 Implant 7
13 55.5 Implant 7
14 51 Implant 7
15 65.5 Implant 7
16 82 Implant 7
17 82 Implant 7
18 72 Implant 7
19 60 Implant 7
20 20 Implant 7
1 10 Control 1
2 22 Control 1
3 30 Control 1
4 42 Control 1
5 50 Control 1
6 80 Control 1
7 71 Control 1
8 60 Control 1
9 90 Control 1
10 72 Control 1
11 51 Control 7
12 30 Control 7
13 50 Control 7
14 70 Control 7
15 51 Control 7
16 60 Control 7
17 20 Control 7
18 11 Control 7
19 20 Control 7
20 30 Control 7
;
run;
proc sort data=kccq_data out=kccq_data;
by treatment timepoint;
run;
/* calculate mean for each treatment & timepoint */
proc means data=kccq_data;
var kccq_var;
by treatment timepoint;
output out=mean mean=mean;
run;
proc sort data=mean out=result;
by treatment;
run;
/* Calculate Difference Between Mean Values For Each Treatment & Timepoint */
/* This code assumes there are two treatments. You can modify this code to make it more dynamic */
data new (keep=diff treatment timepoint) ;
label diff='Diff between means is ';
format diff 5.2;
set result;
lag1=lag(mean);
lag2=lag2(mean);
if _n_=2 then do;
diff=lag1-mean;
output;
end;
if _n_=4 then do;
diff=lag1-mean;
output;
end;
run;
data all;
merge new kccq_data;
by treatment ;
run;
proc sgpanel data = all;
panelby treatment / columns = 2 novarname;
vbox kccq_var / category = timepoint group= treatment connect=mean;
keylegend / title = "Treatment Arm";
rowaxis label='KCCQ Overall Summary Score';
inset diff/position=ne;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.