BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Anita_n
Pyrite | Level 9

Dear all, 

if I have a data that looks like this:

data have;                                          
infile datalines;                                   
input Drug_type $44. visit 	treatment_duration 3. num_of_patients;
datalines; 
Cefalexin+Baclofen+Betahistine	            1 120 15
Calcipotriol+Baclofen+Betahistine	        1 164 10
Folic acid+Baclofen	                        1 26  2 
Piriton+Baclofen	                        1 30  5 
Fentanyl/Co-beneldopa+Baclofen+Betahistine	1 90  10
Folic acid+Baclofen	                        1 26  33
Fentanyl/Co-beneldopa+Baclofen	            1 35  50
Folic acid+Baclofen	                        1 15  6 
Cefalexin+Baclofen	                        1 11  1 
Fentanyl/Co-beneldopa+Baclofen	            2 35  14
Allopurinol+Baclofen+Betahistine	        2 300 25
Folic acid+Baclofen+Betahistine	            2 27  30
Cefalexin+Baclofen	                        2 110 17
Allopurinol+Baclofen+Betahistine	        2 240 55
Cefalexin+Baclofen	                        2 11  17
Folic acid+Baclofen	                        2 26  33
Piriton+Baclofen	                        3 77  9 
Folic acid+Baclofen+Betahistine	            3 27  19
Allopurinol+Baclofen+Betahistine	        3 105 16
Piriton+Baclofen	                        3 210 8 
Cefalexin+Baclofen	                        3 11  32
Piriton+Baclofen	                        3 38  11
Cefalexin+Baclofen	                        3 11  31
Cefalexin+Baclofen+Betahistine	            3 200 20
Cefalexin+Baclofen	                        3 11  10
Cefalexin+Baclofen	                        4 11  4 
Fentanyl/Co-beneldopa+Baclofen	            4 35  18
Diclofenac+Baclofen	                        4 66  25
Diclofenac+Baclofen	                        4 45  14
Fentanyl/Co-beneldopa+Baclofen	            4 115 22
;
run;

I wish to calculate the mean, median and confidence interval of the treatment duration showing this graphically. I was thinking of using boxplot but am not able to display all the 4 variables in the data. Can someone help? Maybe there is a better way to do this than using a boxplot

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

I suggest a few things:

  1. Use PROC MEANS to compute the statistics you want
  2. Put the drug types on the Y axis, since the labels are so long.
  3. Use the SCATTER statement with the XERRORLOWER and XERRORUPPER options to display the mean and CL
  4. Use a second SCATTER statement to display the median
data have;                                          
infile datalines;                                   
input Drug_type $44. visit    treatment_duration  num_of_patients;
datalines; 
Cefalexin+Baclofen+Betahistine               1 120 15
Calcipotriol+Baclofen+Betahistine            1 164 10
Folic acid+Baclofen                          1 26  2 
Piriton+Baclofen                             1 30  5 
Fentanyl/Co-beneldopa+Baclofen+Betahistine   1 90  10
Folic acid+Baclofen                          1 26  33
Fentanyl/Co-beneldopa+Baclofen               1 35  50
Folic acid+Baclofen                          1 15  6 
Cefalexin+Baclofen                           1 11  1 
Fentanyl/Co-beneldopa+Baclofen               2 35  14
Allopurinol+Baclofen+Betahistine             2 300 25
Folic acid+Baclofen+Betahistine              2 27  30
Cefalexin+Baclofen                           2 110 17
Allopurinol+Baclofen+Betahistine             2 240 55
Cefalexin+Baclofen                           2 11  17
Folic acid+Baclofen                          2 26  33
Piriton+Baclofen                             3 77  9 
Folic acid+Baclofen+Betahistine              3 27  19
Allopurinol+Baclofen+Betahistine             3 105 16
Piriton+Baclofen                             3 210 8 
Cefalexin+Baclofen                           3 11  32
Piriton+Baclofen                             3 38  11
Cefalexin+Baclofen                           3 11  31
Cefalexin+Baclofen+Betahistine               3 200 20
Cefalexin+Baclofen                           3 11  10
Cefalexin+Baclofen                           4 11  4 
Fentanyl/Co-beneldopa+Baclofen               4 35  18
Diclofenac+Baclofen                          4 66  25
Diclofenac+Baclofen                          4 45  14
Fentanyl/Co-beneldopa+Baclofen               4 115 22
;

/* https://blogs.sas.com/content/iml/2019/10/09/statistic-error-bars-mean.html */
proc means data=have noprint;
   class drug_type;
   freq num_of_patients;
   var treatment_duration;
   output out=MeanOut Mean=Mean Median=Median lclm=LCLM uclm=UCLM;
run;

title "Mean and Median Treatment Length";
title2 "95% Confidence Interval for Mean";
proc sgplot data=MeanOut;
scatter y=drug_type x=Mean / xerrorlower=lclm xerrorupper=uclm
        legendlabel="Mean";
scatter y=drug_type x=Median / legendlabel="Median" markerattrs=(Symbol=Diamond);
xaxis label="Treatment Length";
yaxis display=(nolabel);
run;

View solution in original post

7 REPLIES 7
PaigeMiller
Diamond | Level 26

@Anita_n wrote:

I wish to calculate the mean, median and confidence interval of the treatment duration showing this graphically. I was thinking of using boxplot but am not able to display all the 4 variables in the data. Can someone help? Maybe there is a better way to do this than using a boxplot


I don't know what you mean by "display all 4 variables in the data". What 4 variables are you talking about?

 

Nevertheless, maybe this is helpful:

 

proc sgplot data=have;
    vbox treatment_duration/group=drug_type meanattrs=(color=white symbol=asterisk);
run;
proc sgplot data=have;
    vbox treatment_duration/group=visit meanattrs=(color=white symbol=asterisk);
run;

 

--
Paige Miller
ballardw
Super User

Something like this perhaps? Note that the data step was modified to run as some of the spaces in your datalines seem to be tab characters and when pasted into my editor did not align so generated invalid data errors.

Boxplots generally don't display "confidence intervals" but the Interquartile Range as the upper/lower values of the box when enough values are present. Is that what you meant? If not then you may want to summarize data and use error bars with the confidence limits setting the lower and upper limits on a scatter plot.

 

data have;                                          
infile datalines;                                   
input Drug_type $44. visit    treatment_duration  num_of_patients;
datalines; 
Cefalexin+Baclofen+Betahistine               1 120 15
Calcipotriol+Baclofen+Betahistine            1 164 10
Folic acid+Baclofen                          1 26  2 
Piriton+Baclofen                             1 30  5 
Fentanyl/Co-beneldopa+Baclofen+Betahistine   1 90  10
Folic acid+Baclofen                          1 26  33
Fentanyl/Co-beneldopa+Baclofen               1 35  50
Folic acid+Baclofen                          1 15  6 
Cefalexin+Baclofen                           1 11  1 
Fentanyl/Co-beneldopa+Baclofen               2 35  14
Allopurinol+Baclofen+Betahistine             2 300 25
Folic acid+Baclofen+Betahistine              2 27  30
Cefalexin+Baclofen                           2 110 17
Allopurinol+Baclofen+Betahistine             2 240 55
Cefalexin+Baclofen                           2 11  17
Folic acid+Baclofen                          2 26  33
Piriton+Baclofen                             3 77  9 
Folic acid+Baclofen+Betahistine              3 27  19
Allopurinol+Baclofen+Betahistine             3 105 16
Piriton+Baclofen                             3 210 8 
Cefalexin+Baclofen                           3 11  32
Piriton+Baclofen                             3 38  11
Cefalexin+Baclofen                           3 11  31
Cefalexin+Baclofen+Betahistine               3 200 20
Cefalexin+Baclofen                           3 11  10
Cefalexin+Baclofen                           4 11  4 
Fentanyl/Co-beneldopa+Baclofen               4 35  18
Diclofenac+Baclofen                          4 66  25
Diclofenac+Baclofen                          4 45  14
Fentanyl/Co-beneldopa+Baclofen               4 115 22
;
run;

proc sgplot data=have;
vbox treatment_duration
/category= drug_type
freq=num_of_patients
;
run;
Anita_n
Pyrite | Level 9

@ballardw @PaigeMiller  thanks for your reply, In your example you assigned the treatment_duration to vbox, drug_type to category and num_of_patients to freq. Is it also possible to assign the visits in the same plot or do I need a plot for each visit?

 

PaigeMiller
Diamond | Level 26

It makes no sense to have VISITS on the same plot as the others. The vertical scales would not match, distorting the plot. Don't do this.

--
Paige Miller
ballardw
Super User

@Anita_n wrote:

@ballardw @PaigeMiller  thanks for your reply, In your example you assigned the treatment_duration to vbox, drug_type to category and num_of_patients to freq. Is it also possible to assign the visits in the same plot or do I need a plot for each visit?

 


Whether it is worth doing depends on what you mean by "assign the visits". Do you mean a separate overlay plot of Vbox=treatment_duration and Category=Visit? Or use Group=Visit? Your example data isn't going to have much range of values inside visit and treatment and just visit across drugs doesn't seem to make much sense, but I'm not a drug expert in any sense.

Another option might be to use SGPANEL with Visit as the Panelby variable. That would make a separate graph for each level of visit.

Anita_n
Pyrite | Level 9

Thankyou, I used ods gridded layout to arrange each plot per visit. It's okay now

Rick_SAS
SAS Super FREQ

I suggest a few things:

  1. Use PROC MEANS to compute the statistics you want
  2. Put the drug types on the Y axis, since the labels are so long.
  3. Use the SCATTER statement with the XERRORLOWER and XERRORUPPER options to display the mean and CL
  4. Use a second SCATTER statement to display the median
data have;                                          
infile datalines;                                   
input Drug_type $44. visit    treatment_duration  num_of_patients;
datalines; 
Cefalexin+Baclofen+Betahistine               1 120 15
Calcipotriol+Baclofen+Betahistine            1 164 10
Folic acid+Baclofen                          1 26  2 
Piriton+Baclofen                             1 30  5 
Fentanyl/Co-beneldopa+Baclofen+Betahistine   1 90  10
Folic acid+Baclofen                          1 26  33
Fentanyl/Co-beneldopa+Baclofen               1 35  50
Folic acid+Baclofen                          1 15  6 
Cefalexin+Baclofen                           1 11  1 
Fentanyl/Co-beneldopa+Baclofen               2 35  14
Allopurinol+Baclofen+Betahistine             2 300 25
Folic acid+Baclofen+Betahistine              2 27  30
Cefalexin+Baclofen                           2 110 17
Allopurinol+Baclofen+Betahistine             2 240 55
Cefalexin+Baclofen                           2 11  17
Folic acid+Baclofen                          2 26  33
Piriton+Baclofen                             3 77  9 
Folic acid+Baclofen+Betahistine              3 27  19
Allopurinol+Baclofen+Betahistine             3 105 16
Piriton+Baclofen                             3 210 8 
Cefalexin+Baclofen                           3 11  32
Piriton+Baclofen                             3 38  11
Cefalexin+Baclofen                           3 11  31
Cefalexin+Baclofen+Betahistine               3 200 20
Cefalexin+Baclofen                           3 11  10
Cefalexin+Baclofen                           4 11  4 
Fentanyl/Co-beneldopa+Baclofen               4 35  18
Diclofenac+Baclofen                          4 66  25
Diclofenac+Baclofen                          4 45  14
Fentanyl/Co-beneldopa+Baclofen               4 115 22
;

/* https://blogs.sas.com/content/iml/2019/10/09/statistic-error-bars-mean.html */
proc means data=have noprint;
   class drug_type;
   freq num_of_patients;
   var treatment_duration;
   output out=MeanOut Mean=Mean Median=Median lclm=LCLM uclm=UCLM;
run;

title "Mean and Median Treatment Length";
title2 "95% Confidence Interval for Mean";
proc sgplot data=MeanOut;
scatter y=drug_type x=Mean / xerrorlower=lclm xerrorupper=uclm
        legendlabel="Mean";
scatter y=drug_type x=Median / legendlabel="Median" markerattrs=(Symbol=Diamond);
xaxis label="Treatment Length";
yaxis display=(nolabel);
run;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1085 views
  • 2 likes
  • 4 in conversation