BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
_maldini_
Barite | Level 11

Is there a way to produce a box plot using PROC SURVEYREG?

 

I'm using an LSMEANS statement in PROC SURVEYREG to calculate adjusted means for the various levels of the categorical variable in the CLASS statement (reg_use = never, non-regular use, regular use). I'd like to see these adjusted means in a box plot.

 

PROC SURVEYMEANS produces a box plot, but the means are not adjusted for the other variables (Adjusting for age_yrs and sex below).

proc surveyreg data=&dataset;
  	STRATUM sdmvstra;
 	CLUSTER sdmvpsu;
 	WEIGHT &weight;
 
	class reg_use (ref="Never") sex;
           
	model htn = reg_use age_yrs sex / solution clparm;
	lsmeans  reg_use / pdiff=all adjust=tukey;

   	format 
	htn					htn_fmt.
	reg_use				reg_usefmt.
	sex					sexfmt.	
	;
run;  

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

The doc for the LSMEANS statement indicates that it can produce several kinds of plots. I think the most appropriate plot is the MEANPLOT, which shows the estimated mean and 95% CI for each category on the LSMEANS statement:

lsmeans reg_use / pdiff=all adjust=tukey plots=(meanplot);

 

View solution in original post

7 REPLIES 7
Rick_SAS
SAS Super FREQ

The doc for the LSMEANS statement indicates that it can produce several kinds of plots. I think the most appropriate plot is the MEANPLOT, which shows the estimated mean and 95% CI for each category on the LSMEANS statement:

lsmeans reg_use / pdiff=all adjust=tukey plots=(meanplot);

 

_maldini_
Barite | Level 11

Thank you. 

 

2 additional questions:

  1. Is there a way to control the order of the levels/groups of the variable used in the LSMEANS statement? I set the reference category in the CLASS statement, but that level/group is last in the plot. I'd like it to be first.
    Screenshot 2023-01-17 at 10.26.31 AM.png
  2. Do you know of a resource that I could use to better understand how to use the SAS documentation? That might sound silly, but I struggle to make sense of the documentation sometimes.
Rick_SAS
SAS Super FREQ

The LSMEANS statement uses the same order as the CLASS statement, so look at the documentation of the CLASS statement.

 

By default, the levels of a classification variable are listed alphabetically by their formatted value (order=formatted). The most useful way to change the order is to sort or otherwise arrange the input data set in the order you want, and then use the ORDER=DATA option on the CLASS statement to specify that the order of levels is found in the data set. For your example, sort by STRATUM and then by the YEAR variable. If that doesn't fix your problem, there are other tricks/techniques we can use.

 

Regarding how to use the doc, my approach to learning a new procedure is always to start with the "Overview" and "Getting Started" sections of the doc, then browse the syntax section. I leave the "Details" section for when I need to know the math/statistics behind the computation. 

 

Also, know that there is a section of the SAS/STAT doc devoted to "Shared Concepts." Most procedures link to that section if they support one of the shared statements (like LSMEANS).

_maldini_
Barite | Level 11

< If that doesn't fix your problem, there are other tricks/techniques we can use.>

It didn't, but thanks for the suggestion.

 

It looks like SAS just puts whatever value I put in the REF= option in the CLASS statement last, in both the LSMEANS table and the plot.

Screenshot 2023-01-17 at 8.56.21 PM.png

I can just change the reference level to the last value of year (i.e., REF= "2017-2018") instead of the first   (i.e., REF= "2009-2010"). That solves the problem. I don't think the reference level matters in this particular situation since I am really just looking for the adjusted means. 

CLASS year (ref="2017-2019") sex ;

Screenshot 2023-01-17 at 8.59.18 PM.png

 

PaigeMiller
Diamond | Level 26

@_maldini_ wrote:

 

Is there a way to control the order of the levels/groups of the variable used in the LSMEANS statement? I set the reference category in the CLASS statement, but that level/group is last in the plot. I'd like it to be first.

Screenshot 2023-01-17 at 10.26.31 AM.png.


Assuming that YEAR is a numeric variable with a format (is it?) then ORDER=INTERNAL in the PROC SURVEYREG statement produces the proper sorting.

--
Paige Miller
_maldini_
Barite | Level 11

Yes, it is, but that didn't change the output. 

 

VALUE 
yearfmt
				6 = "2009-2010"
				7 = "2011-2012"
				8 = "2013-2014"
				9 = "2015-2016"
				10 = "2017-2018";
proc surveyreg data=&dataset order=internal;
  	STRATUM sdmvstra;
 	CLUSTER sdmvpsu;
 	WEIGHT &weight;

/* HTN Prevalence by year */
/***********************************************************************************************************/                         
	class year (ref="2009-2010") sex ;
/* 	class sex; */

	model htn = year age_yrs sex  / solution clparm;
	lsmeans  year / pdiff=all adjust=tukey plots=(meanplot);

   	format 
	htn					htn_fmt.
	race_dum_white
	race_dum_hisp
	race_dum_other		dummy_fmt.
	
	reg_use				reg_usefmt.
	sex					sexfmt.	
	year 				yearfmt.
	;
run;

Screenshot 2023-01-17 at 8.46.23 PM.png

 

FreelanceReinh
Jade | Level 19

@_maldini_ wrote:
  1. Is there a way to control the order of the levels/groups of the variable used in the LSMEANS statement? I set the reference category in the CLASS statement, but that level/group is last in the plot. I'd like it to be first.

Hello @_maldini_,

 

Using the data from Example 122.8 Comparing Domain Statistics with a formatted numeric class variable added, the only simple way I found to change the tick mark position of the reference category was to use the ASCENDING or DESCENDING option of the MEANPLOT request. But then the order of tick marks depends on the LS mean values, which you don't want (e.g., "ASCENDING" would swap '2011-2012' and '2013-2014' in your example).

 

Two solutions did work for my test data, but were more complicated:

  1. Creating an output dataset containing the mean plot data
    ods output MeanPlot=mpdat;
    (this statement goes before or into the PROC SURVEYREG step) and then using PROC SGPLOT data=mpdat ... (with a SCATTER statement) to create the plot. PROC SGPLOT has an XAXIS statement where you can specify the order of tick marks with the VALUES= option. However, more options would need to be added in order to reproduce the original mean plot from PROC SURVEYREG.

  2. Modifying the ODS graphics template used by PROC SURVEYREG: I copied the PROC TEMPLATE code of Stat.Graphics.MeanPlot (path in the Templates window: Sashelp.Tmplstat\Stat\SurveyReg\Graphics\MeanPlot) from the template browser into the Enhanced Editor, inserted
    discreteopts=(tickvaluelist=(list of quoted tick values))
    into the XAXISOPTS= option of the LAYOUT OVERLAY statement, submitted this modified PROC TEMPLATE code and then the PROC SURVEYREG step.
    However, modifying ODS templates can be risky (see Re: PROC PSMATCH: Is there a way to change the x-axis label ... also for more details).

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1358 views
  • 6 likes
  • 4 in conversation