This was a paper that was supposed to be presented at PharmaSUG 2020 this year, but due to the cancellation of the conference I still wanted to create a page with the paper and macro for download to share with everyone. This is a macro that has been very helpful to myself and coworkers when doing univariate and multivariate modeling in my meta-analysis database work. The page currently just mirrors the paper but I would like this page to become a resource of examples on how to use the macro.
The research field of clinical oncology heavily relies on the methods of survival analysis and logistic regression. Analyses involve one or more variables within a model, and multiple models are often compared within subgroups. Results are prominently displayed within either a table or graphically with a forest plot. The MVMODELS macro performs every step for a univariate or multivariate analysis: running the analysis, organizing the results into datasets for printing or plotting, and creating the final output as a table or graph. MVMODELS is capable of running and extracting statistics from multiple models at once, performing subgroup analyses, outputting to most file formats, and contains a large variety of options to customize the final output. The macro MVMODELS is a powerful tool for analyzing and visualizing one or more statistical models.
Clinical oncology trial endpoints heavily rely on survival or logistic regression analyses to determine whether the trial is positive or negative. These can include endpoints such as overall survival, progression-free survival, and confirmed tumor response status. The analyses are potentially performed within multiple different populations such as protocol defined stratification or descriptive factors. Models are either univariate (consisting of one covariate) or adjusted for other relevant factors in a multivariate model. The macro MVMODELS is a tool designed to handle all of these situations and output the results into either a clean and easy to read table or forest plot. The macro performs the analysis, organizes and combines the results, and outputs the final product all from one macro call. The MVMODELS macro is a powerful tool for any programmer that analyzes clinical trial data.
The data set used in the examples within this paper is randomly generated from the following code:
data random;
call streaminit(123);
array u {50};
do study = 1 to 5;*Studies;
do i = 1 to 500+floor(rand("Uniform")*500);*Patients;
do j = 1 to dim(u);*Variables;
u(j)=rand("Uniform");
end;
arm=catx(' ','Arm',1+round(u1,1));
age=floor(18+62*u2);
gender=ifc(u3>=0.5,'Male','Female');
tstage=cats('T',1+floor(4*u4));
nstage=cats('N',0+floor(3*u5));
mstage=cats('M',0+floor(2*u6));
if arm='Arm 1' then response=ifc(u7>0.5,'Response','No Response');
else if arm='Arm 2' then response=ifc(u7>0.7,'Response','No Response');
**Follow up to 10 years;
os_time=ifn(arm='Arm 1',1+floor(120*u13),1+floor(120*u14));
os_stat=ifn(arm='Arm 1',
ifn((os_time <=60 and u8>0.35) or
(os_time > 60 and u8>0.65),1+floor(3*u9),0),
ifn((os_time <=60 and u10>0.7) or
(os_time > 60 and u11>0.2),1+floor(u12*3),0));
output;
end;
end;
drop u: i j;
label study='Study Number' arm='Treatment Arm' age='Age'
gender='Gender' tstage='T-Stage' nstage='N-Stage'
mstage='M-Stage' response='Response Status'
os_time='Overall Survival Time (months)'
os_stat='Overall Survival Status';
run;
The randomly generated data set is not realistic clinical trial data, but will serve the purpose for the examples in this paper. This data is meant to represent a pooled analysis of five trials that all have the same two treatments (Arm 1 vs Arm 2). Arm 1 represents a treatment that is very aggressive early but is harder for the patient to tolerate, and Arm 2 represents a treatment that is easier on the patient but has less overall efficacy. The data set contains demographic and disease characteristics for use in survival and logistic modeling. The status variable for overall survival has three different events that can happen for competing risks analysis.
The MVMODELS macro performs analysis and outputs the results into either a forest plot or a table. The macro parameters SHOW_TABLE and SHOW_PLOT determine which output is created. The following examples show the flexibility of both the types of analyses that can be performed and the ways the results can be displayed.
A multivariate model requires the flexibility to display different types of covariates in an easy to read format. The MVMODELS macro has multiple ways to display discrete or continuous covariates to best display the results in a meaningful format. The following code is an example of running a multivariate model:
%mvmodels(DATA=random, METHOD=survival, TIME=os_time, CENS=os_stat,
CEN_VL=0, COVARIATES=arm age gender, TYPE=2 1 2, CAT_DISPLAY=4,
CONT_STEP=10);
Figure 1. Displays the forest plot from a survival based multivariate model of overall survival based on treatment arm, age, and gender
Figure 1. CAT_DISPLAY=4 will display all levels of the categorical covariate including the reference value.
Table 1. Displays the table from a survival based multivariate model of overall survival based on treatment arm, age, and gender
Table 1. The table follows the same structure as the plot. Alternating row shading is the default. P-value footnotes are automatically created. CONT_STEP changes the units of the continuous variable. Type 1 variables are continuous and type 2 variables are categorical.
Different models are often compared within clinical trials to check the impact of adding one or more adjusting covariates. The MVMODELS macro can run more than one model at a time and can limit the display to one covariate of interest without showing the adjusting factors. This allows an easier comparison of the covariate of interest and a more compact plot or table. The number of models run is controlled by the NMODELS macro option, and different options can be specified for each model by using the pipe symbol as a delimiter (see MODEL_TITLE below). Options without the pipe delimiter are applied to all models (see TIME option below). The following code is an example of running multiple models:
%mvmodels(DATA=random, METHOD=survival, TIME=os_time, CENS=os_stat, CEN_VL=0,
NMODELS=3, SHOW_ADJCOVARIATES=0,
MODEL_TITLE=Treatment Arm|Treatment Arm*|Treatment Arm**,
FOOTNOTE=*Adjusted for age; **Adjusted for age and gender, HRDIGITS=3,
COVARIATES=arm |arm age |arm age gender, TYPE=2 1 2, CAT_DISPLAY=3,
PVAL_TYPE3=0, HEIGHT=4in);
Figure 2. Displays the results from three different models.
Figure 2. The first model is a univariate model of treatment. The second model adjusts for age, and the third model adjusts for age and gender. CAT_DISPLAY=3 will display the current covariate value without the reference group.
Table 2. Displays the results from three different models.
Table 2. Setting SHOW_ADJCOVARIATES=0 prevents adjusting factors from being displayed allowing the adjusted treatment arm covariate to be compared more easily.
Subgroup analyses are very common in oncology research and meta-analyses. A subgroup analysis involves running the same model within the levels of another variable. An example of this would be comparing treatment arms within different genders. The MVMODELS macro has several options to easily display subgroup analyses in different ways.
The BY parameter allows for one or more variables to be listed. The same model will be run within each level of each variable specified. The order of the BY variable values can be changed with BYORDER parameter. The following example shows the use of the BY variable:
%mvmodels(DATA=random, METHOD=survival, TIME=os_time, CENS=os_stat, CEN_VL=0,
COVARIATES=arm age gender, TYPE=2 1 2, CAT_DISPLAY=1, CONT_DISPLAY=2,
CONT_STEP=10, BOLD_COV_LABEL=0, BY=tstage, SHADING=2, SHOWWALLS=0);
Figure 3. The Cox model of overall survival and treatment arm, age, and gender is computed within each level of T-stage.
Figure 3. SHADING=2 alternates the shading between BY levels to make it easier to visually distinguish groups. CAT_DISPLAY=1 combines two level covariates into one row to save space. CONT_DISPLAY=2 hides the step size text of the continuous variable label. SHOW_WALLS=0 removes the lines bordering the plot area.
Table 3. The Cox model of overall survival and treatment arm, age, and gender is computed within each level of T-stage.
Table 3. The different model outputs can easily be compared between different levels of T-stage.
The ROWBY parameter allows for one variable to be listed. The same model will be run within each level of the variable specified. The order of the ROWBY variable values can be changed with ROWBYORDER parameter. ROWBY is different from BY in that it further separates the groups into distinct rows that can be separated with lines and adds vertical labels at the head of each row. The following example shows the use of the ROWBY variable:
%mvmodels(DATA=random, WHERE=study in(1 2 3), METHOD=survival, TIME=os_time,
CENS=os_stat, CEN_VL=0, COVARIATES=arm age gender, TYPE=2 1 2,
CAT_DISPLAY=2, CONT_DISPLAY=3,CONT_STEP=10, ROWBY=study, SHOWWALLS=0,
SHADING=0, REFLINE=1, PVAL_COVARIATES=0);
Figure 4. The Cox model of overall survival and treatment arm, age, and gender is computed within each level of study.
Figure 4. SHADING=0 removes the shading. CAT_DISPLAY=2 displays the reference group within the label. CONT_DISPLAY=3 moves the step size text to a new row. REFLINE adds a reference line to the graph to help visually compare estimates. PVAL_COVARIATES=0 disables the covariate level p-values.
Table 4. The Cox model of overall survival and treatment arm, age, and gender is computed within each level of study.
Table 4. ROWBY groups are easily distinguished with separating lines.
The COLBY parameter allows for one variable to be listed. The same model will be run within each level of the variable specified. The order of the COLBY variable values can be changed with COLBYORDER parameter. COLBY creates one column of summary statistics for each level of COLBY. The following example shows the use of the COLBY variable:
%mvmodels(DATA=random, METHOD=survival, TIME=os_time, CENS=os_stat, CEN_VL=0,
COVARIATES=arm age gender, TYPE=2 1 2, CAT_DISPLAY=4, CONT_DISPLAY=2,
CONT_STEP=10, COLBY=response, SHOWWALLS=0,
UNDERLINEHEADERS=1, REFLINE=1, MIN=0, MAX=2, INCREMENT=0.5,
PLOT_DISPLAY=subtitle ev_t hr_plot hr_est_range,
PLOT_COLUMNWEIGHTS=0.2 0.2 0.3 0.3);
Figure 5. The Cox model of overall survival and treatment arm, age, and gender is computed within each level of response.
Figure 5. Each level of COLBY gets a column header and the COLBY label is shown at the top of the graph. UNDERLINEHEADERS underlines the headers of each column label. A vertical line separates each level of COLBY. The axes are set with the MIN, MAX, and INCREMENT covariates. PLOT_DISPLAY determines which summary statistics are shown, and PLOT_COLUMNWEIGHTS manually sets the space of each column.
Table 5. The Cox model of overall survival and treatment arm, age, and gender is computed within each level of response.
Table 5. PLOT_DISPLAY only controls which summary stats are shown in the plot, while TABLE_DISPLAY determines which summary stats are shown in the table. A gap is added between each column.
The GROUPBY parameter allows for one variable to be listed. The same model will be run within each level of the variable specified. The order of the GROUPBY variable values can be changed with GROUPBYORDER parameter. GROUPBY is useful for creating a very compact graph to compare two or more subgroups side-by-side. This is useful for case-control comparison graphs. The following example shows the use of the GROUPBY variable:
%mvmodels(DATA=random, METHOD=survival, TIME=os_time, CENS=os_stat, CEN_VL=0,
BY=study, TIMELIST=36, GROUPBY=arm, MIN=0.70, MAX=1, INCREMENT=0.1);
Figure 6. Displays the 36 month Kaplan-Meier event-free rate for overall survival across each study grouped by arm.
Figure 6. TIMELIST specifies one or more event-free time-points. GROUPBY displays the estimates in the same row but offset and in different colors. A legend is added to identify the different levels of GROUPBY.
Table 6. Displays the 36 month Kaplan-Meier event-free rate for overall survival across each study, grouped by arm.
Table 6. GROUPBY adds a column to identify which level of GROUPBY the estimate belongs to. This option is more visually appealing in the graph, but is still available for the table.
The MVMODELS macro can display more than one graph within the same plot. The graphs must come from the same analysis. The following example shows having more than one graph within a plot:
%mvmodels(DATA=random, METHOD=survival, TIME=os_time, CENS=os_stat, CEN_VL=0,
COVARIATES=arm age gender, TYPE=2 1 2, CAT_DISPLAY=4, CONT_DISPLAY=2,
CONT_STEP=10, TIMELIST=36, SHOW_MODELSTATS=0, SHOWWALLS=0, MIN=0.8|0.75,
MAX=1.2|0.95, INCREMENT=0.1|0.05, REFLINE=1|, SUMSIZE=9pt,
PLOT_DISPLAY=subtitle ev_t hr_plot hr_est_range km_plot1 km_est_range1);
Figure 7. Graphs hazard ratios and 36 month overall survival event-free rates.
Figure 7. Separate options such as MIN and MAX can be set for each graph. Each graph will have the _PLOT suffix within the PLOT_DISPLAY parameter.
The MVMODELS macro is designed to output the plot and table to multiple destinations and have the same general appearance and style. The following is an example of outputting the table to multiple destinations at the same time:
ods pdf file='~/ibm/test.pdf' notoc bookmarkgen=no startpage=no;
ods excel file='~/ibm/test.xlsx' options (sheet_interval='none');
ods powerpoint file='~/ibm/test.pptx';
%mvmodels(DATA=random, METHOD=survival, TIME=os_time, CENS=os_stat, CEN_VL=0,
COVARIATES=arm age gender, TYPE=2 1 2, CAT_DISPLAY=4, CONT_STEP=10);
ods _all_ close;
Figure 8. Screen shot of the table from the PDF output.
Figure 8. The space between the superscripts and p-values is due to the PDF destination
Figure 9. Screen shot of the table from the EXCEL output.
Figure 9. The ability to output to EXCEL opens up the freedom of horizontal and vertical space for the table
ODS POWERPOINT Output
Figure 10. Screen shot of the table from the POWERPOINT output.
Figure 10. The ability to output to POWERPOINT makes it easier to create summary slides from an analysis.
Figure 11. Screen shot of the table from the LISTING output.
Figure 11. The ability to output to the OUTPUT window with ODS LISTING allows for the results to be saved to the .LST file and makes for quick and compact results.
The macro contains a large amount of parameters, so it is necessary to have error checking code throughout the macro to try to identify inappropriate macro parameter inputs before they cause errors in the SAS session. The error checking code makes sure variables exist, that required parameters are entered, and that proper values are entered. For instance, if a parameter has a designated list of values, the macro will check whether the user entered an appropriate value. If the user entered a value that does not match the list, then the macro stops, displays an error message, and provides the list of allowed values.
Creating a forest plot that compares multiple models requires a large amount of code replication. Nearly identical code is repeated to run each model, and nearly identical code is used to extract and combine the results from each model. Programmers not familiar with writing macros will need to spend a great amount of time writing out many of these near duplicate sections of code to create one forest plot. The chance of a programming error also increases when duplicating the same code, especially if the code needs to be modified. The MVMODELS macro removes the time investment and risk by fully automating two types of analyses: survival analysis and logistic regression. Included within survival analysis are Kaplan-Meier event-free rates, median time-to-event, Cox proportional hazards ratios, and concordance index. Included with logistic regression are odds ratio, binomial success rates, and concordance indexes. Each of these analyses is customizable with macro parameters, and the detail of each method is listed in section 4.
The macros generate a data set that is conducive to creating a forest plot or table.
The Graph Template Language (GTL) within the TEMPLATE procedure is used to set up the plot with a combination of the variables in the plot data set and macro variables derived from the plot data set. The actual image is then created using the SGRENDER procedure in combination with ODS Graphics option settings. The image can be a number of file types including PNG, EMF, PDF, JPEG, TIFF, and SVG, and can be embedded into RTF, HTML or PDF destinations.
The REPORT procedure is used to create the outputted table in combination with a large number of style modifications. The table is formatted to work well in RTF, HTML PDF, EXCEL, and POWERPOINT destinations.
The macro uses SAS procedures to perform the analyses. The following sections describe which SAS procedures are used to create each available statistic.
The MVMODELS macro can perform regular survival analysis as well as cumulative incidence analysis (SAS 9.4M3+). The methods to compute statistics differs slightly depending on which survival method is being used.
The MVMODELS macro has the potential to run a large number of statistical models in one macro call with the subgrouping BY, COLBY, ROWBY, and GROUPBY options, so the macro uses methods to streamline how many procedures need to be run to accommodate all of the models. The primary method used is to combine data set duplication with BY statements in the procedures. An example of this would be calculating the five year survival rates in the following example:
%mvmodels(DATA=random, METHOD=survival, TIME=os_time, CENS=os_stat, CEN_VL=0,
TIMELIST=60, BY=arm gender tstage nstage);
There are eleven five year survival rate estimates that would need to be calculated from this macro call: two for ARM, two for GENDER, four for TSTAGE, and three for NSTAGE. This would normally require four calls to the LIFETEST procedure due to patients existing within multiple BY variables, but the macro bypasses this by duplicating each patient once for every BY variable level that they are in. The data set is set up with the following structure:
Table 7. Shows the duplicated data set structure
Patient ID |
BY Variable |
By Variable Level |
Survival Time |
Survival Status |
1 |
ARM |
1 |
30 |
0 |
2 |
ARM |
2 |
60 |
1 |
1 |
GENDER |
Female |
30 |
0 |
2 |
GENDER |
Male |
60 |
1 |
Table 7. Each unique BY variable and level has its own subgroup within the transformed dataset. Patients exist in multiple subgroups.
Now the BY variable and BY variable level columns in table 7 can be used within the BY statement of the LIFETEST procedure creating all of the eleven estimates with only one LIFETEST procedure call instead of four. This method is also used when accommodating any COLBY, ROWBY, and GROUPBY variables as well as when calculating estimates for any categorical covariate levels. All models called by the macro are combined into a single dataset so that there is only one LIFETEST procedure called in the entire macro when calculating survival estimates, number of patients and events, and median time-to-event estimates.
This method is modified when running models with the PHREG or LOGISTIC procedures. The difficulty encountered is that multiple models can have varying numbers of covariates included and different stratification factors. The MVMODELS macro accounts for this in three steps:
A visual example of this method is shown in Table 8 below. Model one has Arm and Gender for covariates. Model two only has Arm for a covariate.
Table 8. Shows the data set structure for running the PHREG procedure
Model |
Patient ID |
Arm |
Gender |
Covariate 1 |
Covariate 2 |
1 |
1 |
1 |
Male |
1 |
1 |
1 |
2 |
2 |
Female |
0 |
0 |
2 |
1 |
1 |
Male |
1 |
0 |
2 |
2 |
2 |
Female |
0 |
0 |
Table 8. Model 1 uses Female and Arm 2 as reference groups, so the generic covariate variables are assigned values of 0. The other values are the numeric order of the remaining values. Model 2 does not use Gender as a covariate, so all values of the covariate 2 variable are 0.
All models are run with a single PHREG or LOGISTIC procedure call when using this methodology for the categorical covariates, continuous covariates, and strata variables along with the earlier method of duplicating the data set for each subgroup. One note is that all stratified and unstratified models are run separately in the LOGISTIC procedure as different results are created when using a stratification variable with all of the same levels versus not including the stratification variable completely.
After analyzing the requested models the MVMODELS macro creates a dataset that combines the results together that is ideal for graphing and printing. The macro then builds a graph template with the TEMPLATE procedure, and prints the table with the REPORT procedure.
The plot dataset is constructed using a combination of the SQL procedure and DATA steps. A DATA step is used to set up a structure based on input macro parameters for merging in the analysis results with the SQL procedure. The SQL procedure is used for merging because of the flexibility the procedure has with being able to use functions and other logic conditions in the merging process.
The variables that make up the plot data set are separated into different subtypes.
Each row of the forest plot has a rowheader, or subtitle, to show the variable label or value. The row header is contained within one variable:
There are numeric variables for the each calculated estimate, such as hazard ratio or odds ratio, and the upper and lower confidence limits. These variables are used for making the scatterplot and error bars in the plot.
Multiple estimates are available to be plotted and displayed depending on the method used, so the PREFIX_ term changes depending on which type of estimate is being captured. For example, the hazard ratios have the prefix HR_, odds ratios have the prefix OR_, and concordance indexes have the prefix C_. There are additional variables that are combinations of these variables. These are character variables with pre-specified rounding (set by macro parameters):
The section for number of patients and events has additional variables that can be used in the summary statistics panel with special formats to save space:
There are additional variables when utilizing Cox or logistic modeling that have the prefix REF_. These variables contain the number of patients and/or number of events of just the reference group.
There are two variables allocated for the p-values.
There are three different indicator variables in the plot data set to serve the following purposes:
These variables are automatically calculated by the macro.
There are four different grouping categories that the macro uses: BY variables, a COLBY (column by) variable, a ROWBY (row by) variable, and a GROUPBY (group by) variable. There can be multiple BY variables, but only one of each column by or row by variable. These are optional to increase the efficiency of doing subgroup analyses.
Variables for the BY groups include:
There are not any variables added for the COLBY variable. Instead all of the summary and plot columns are duplicated for each level of the COLBY variable and a suffix (example: TOTAL_1) is added where the number represents which level of the COLBY variable.
Variables for the ROWBY groups include:
ROWBY_LVL: designates which level of the current ROWBY variable. This is a numeric variable where the nth value corresponds to the nth level of the variable. For example, if gender has the levels of 1=Male and 2=Female and ROWBY_LVL=1 then would represent Male (the 1st level).
Variables for the GROUPBY groups include:
The GROUPBY_LVL variable is used for the discrete attribute map to color the scatter plot and highlow plot if GROUPBY is specified.
The macro uses the Graph Template Language within the TEMPLATE procedure in combination with the SGRENDER procedure to produce the final graph. The template splits the graph into sections or panels: the subtitle panel, the plot panels, and the statistical summary panels using the LATTICE layout within GTL. The lattice creates a plot space that has one column for each item being displayed by the DISPLAY parameter and one row for each value of the ROWBY variable. Each column is considered a section or panel.
Due to the evolution of the Graph Template Language from SAS 9.2 and 9.4+, producing a forest plot has become much easier with new functionality such as AXISTABLE and TEXTPLOT. However the MVMODELS does not use these new features in favor of primarily using annotation.
SAS 9.3 introduced annotation to the SG procedures and with it the DRAW functions within GTL. The draw functions, DRAWTEXT, DRAWLINE, DRAWARROW, and DRAWPOLYGON, allow the user to manually annotate a graph with text, lines, and other shapes. The creation of these annotations is generally more tedious and much less flexible compared to data driven graphing, but they have a distinct advantage within multi-panel graphs which is why the MVMODELS macro uses the annotation method instead of being entirely data driven.
One of the greatest challenges of designing a forest plot is allocating enough space to each section so that the graph is easy to read and none of the text involved gets cut-off. Annotation text does not get cut-off by the graph space and allows the text to flow across multiple panels. This comes in handy for having long subsection headers, model titles, or longer estimate values.
Figure 12. Displays an example of annotation crossing over panels versus the graph getting cut-off.
Figure 12. The red panels indicate where the layout panels end for each column. The model title is able to stretch across all seven panels in this example, where a normal data driven graph it would get cut off within the first panel instead.
Utilizing the macro facility removes the tediousness of writing out each annotation separately and fully automates the process. The MVMODELS macro pulls the values from the plot data set along with the coordinates to plot them and writes each value within the GTL environment without input from the user.
The DRAWTEXT function allows the use of Unicode characters, superscripts and subscripts in the text it creates. Normal data-driven graph elements such as labels and AXISTABLE values do not handle superscripts and subscripts.
The greatest limitation that annotation offers is that the graph does not allocate space automatically to fit annotation. This means that the programmer must use other means to define graph space in order to properly show the annotation. This can be as simple as printing a blank space of the same size and font of the DRAWTEXT function in the spot the user wants the text.
The subtitle panel contains the row headers for the other two panels and includes: the model titles, the covariate labels, and the levels of the covariate. The macro creates the forest plot in a row-by-row basis. Each row of the graph will have a row header (or subtitle), at least one possible scatterplot, and optional columns of summary statistics. Multiple models can be run and displayed in the plot, and model titles go before any covariates from a model are listed. Covariate labels are listed before the covariate levels. For example, for gender there would be a row with a label such as "Gender" followed by up to two rows (depending on display options) designating a "Male" row and a "Female" row.
The subtitles are drawn entirely with DRAWTEXT statements, which like ENTRY statements can be left aligned and can have spaces added in front for indentation. Unlike ENTRY statements however, the coordinates for the text can be specified with the DRAWTEXT statement. This allows the text to be aligned with the y-axis from the plot panel. Each subtitle has its own DRAWTEXT statement. DRAWTEXT statements are not bound by the walls of a layout or lattice cell, so longer model titles or variable labels can fit with less white space in the graph.
There are only three components to the plot: the scatterplot, the confidence bounds, and the reference line. The reference line is drawn with a REFERENCELINE statement. The scatterplots are drawn with the SCATTERPLOT statements, and the confidence bounds are drawn with the HIGHLOWPLOT statement. The HIGHLOWPLOT statement is used instead of the error bar options within the SCATTERPLOT statement for two reasons. The first is that when the line thickness is increased the endcaps on the error bars with SCATTERPLOT increase dramatically in vertical height which looks unprofessional. The error bar endcaps stay the same size or increase at a much slower rate when the line thickness is increased with the HIGHLOWPLOT statement. The second reason is that there is an option in the HIGHLOWPLOT statement to point to variables that determine if the error caps are drawn. The option to point to a variable that determines if an error bar endcap is drawn is helpful as the Data step can be used to determine if the bars exceed the MIN or MAX parameters and programmed accordingly with IF/ELSE logic. The macro does not draw the error bar endcaps if the error bars extend beyond the maximum or minimum values of the axis.
Each model created by the macro can be colored separately, sized differently, and have unique symbols. This is done by building a discrete attribute map and applying it to the model number variable.
The statistical summary panels are drawn similarly to the subtitle panel, but without the need for indenting or changing font weights. They are centered within their column of the LATTICE layout and drawn with the DRAWTEXT statement.
Reference guides are text that describes what an area of the graph means compared to a reference line (e.g. “Males perform better” or “High-grade perform worse”). These are generally paired up with an arrow to show the direction that the reference guides are referring to. The MVMODELS macro creates these with a combination of DRAWTEXT and DRAWARROW. Additional rows are added into the graph in order to allocate space. The text that is printed is directly input by the user with the REFGUIDELOWER and REFGUIDEUPPER parameters. The following is an example of adding reference guides:
%mvmodels(DATA=random, METHOD=survival, TIME=os_time, CENS=os_stat, CEN_VL=0,
COVARIATES=arm, by=gender, TYPE=2, CAT_DISPLAY=5, pval_type3=0,
HEIGHT=4in, REFLINE=1,
REFGUIDELOWER=Favors Arm 1,REFGUIDEUPPER=Favors Arm 2);
Figure 13. Example of adding reference guides to bottom of the plot space
Figure 13. The reference guides originate from the reference line and point towards the minimum and maximum values. The text is manually specified by the user.
The reference guides can also be printed at the top of the graph space, and line breaks are inserted with the ` symbol.
%mvmodels(DATA=random, METHOD=survival, TIME=os_time, CENS=os_stat, CEN_VL=0,
COVARIATES=arm, by=gender, TYPE=2, CAT_DISPLAY=5, pval_type3=0,
HEIGHT=4in, REFLINE=1,
REFGUIDELOWER=Favors Arm 1,REFGUIDEUPPER=Favors Arm 2,
REFGUIDEVALIGN=top,REFGUIDEHALIGN=in,REFGUIDELINEARROW=open,
REFGUIDELOWER=Favors`Arm 1,REFGUIDEUPPER=Favors`Arm 2);
Figure 14. Example of adding reference guides to top of the plot space
Figure 14. REFGUIDEVALIGN=top moves the reference guides to the top. REFGUIDEHALIGN=in aligns the reference guides against the reference line. The ` symbol creates line breaks.
The table is printed using the REPORT procedure due to the customizability the procedure offers. The capability to make compute variables on-the-fly, make style modifications on-the-fly, and create spanning headers easily makes the REPORT procedure the go-to procedure for creating high quality analysis output.
The MVMODELS macro is able to output to several ODS destinations including LISTING, RTF, PDF, EXCEL and POWERPOINT. Each of these destinations has its own programming quirks and challenges, and so the macro generalizes wherever possible but ultimately has separate code adjustments for each destination. The macro creates a style template using the TEMPLATE procedure to set up a majority of the output table, and then fine-tunes the final REPORT procedure depending on the destination. One REPORT procedure is ran for each destination currently open.
The REPORT procedure allows the creation of COMPUTE variables which are not actually in the input data set. These do not actually have to contain a value however, and so the macro uses COMPUTE variables as “dummy” variables. They are assigned a missing value, and given a certain width to create a space between different values of the COLBY variable.
The REPORT procedure allows one variable to be used multiple times and renamed. The macro uses a blank variable as an across variable multiple times in order to add a header across selected variables. These can be nested in order to create multiple spanning headers.
The MVMODELS macro creates an indicator variable for whether the value should be bold or regular font weight, and creates a variable for how many indents the text should have. A COMPUTE block within the REPORT procedure is used to add these style modifications to the SUBTITLE column.
The MVMODELS macro creates a variable indicator to whether a row should be shaded or not. A COMPUTE block within the REPORT procedure allows a style modification to be done to an entire row depending on this indicator variable.
The LISTING destination is a much tougher destination to make easy to read tables as length of variables is very important. Typically variables are made with lengths/formats that are longer than necessary to contain all of the information. Within the LISTING destination however this just adds more blank space to a column. The MVMODELS macro creates a separate dataset for printing to the LISTING destination where it finds the longest value in any character variable and sets the length of the variable to the length of this value.
In order to create lines that cross the whole page the repeat function is used along with the current LINESIZE option value. The REPEAT function allows for a character to be repeated n number of times, so by combining the hyphen with the LINESIZE value a dashed line will cross the entire page. A similar technique is used to create the underlines in each column header. A dashed line the length of each variable plus four (for space between columns) is drawn under each column label. Spanning headers are added to the COLUMNS statement with a line drawn underneath that is the length of the sum of the variables in the spanning header.
The MVMODELS macro is a powerful and flexible tool that was built to handle the large variety of modeling prevalent to clinical oncology trials. The macro outputs both a professional looking graph and summary table to a multitude of ODS destinations and has many options to fine-tune the appearance to match the user’s preference. The method used to create the forest plot is applicable to many other types of graphs. MVMODELS is an incredibly useful macro for any programmer performing survival or logistic regression analyses.
1 Therneau T (2014). _A Package for Survival Analysis in S_. R package version 2.37-7, <URL: http://CRAN.R-project.org/package=survival>.
2 cNeil BJ: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29-36, 1982.
Your comments and questions are valued and encouraged. Contact the author at:
Name: Jeffrey Meyers
Enterprise: Mayo Clinic
Is there a method for adding interaction terms in the logistic regression?
Hello @PharmlyDoc,
I uploaded my most recent version of the macro that include the ability to have an interaction p-value. It is the p-value between a BY variable and the first covariate (if it's categorical). The DISPLAY keyword is PVAL_INTER. Please see the documentation for other updates.
Thanks, though I'm not understanding how to insert an interaction term.
In proc logistic I'd specify an interaction term as sex*race or sex|race in the model statement, but doing that in the covariates statement does not work.
I like that method because it's simple and outputs estimates on the forest plot at different levels of categorical variables in the interaction term that are written in a detailed manner (e.g. race African American vs Caucasian at sex = female; race African American vs Caucasian at sex = male; sex male vs female at race=African American; sex male vs female at race=Caucasian).
I noticed the macro code has the ParameterEstimates output – so now I'm trying to figure out if I can run a normal proc logistic and output its ParameterEstimates into the macro.
Another reason I don't like using the by= option is because I added a third refline (called trefline) where I have refline=1, srefline=0.5, and trefline=1.5, and using the by= option sends my reference lines to the left side of the forest plot.
proc logistic data=heart plots(only)=(EFFECT ODDSRATIO(logbase=10 TYPE=HORIZONTALSTAT ));
class race(ref='white') sex(ref='female') diabetes(ref='no') hypertension(ref='no') /param=ref ;
model heartdisease(event='0')=age race|sex diabetes hypertension LDL CCI median_income / orpvalue clodds=wald lackfit;
oddsratio age / cl=wald;
oddsratio race / cl=wald;
oddsratio sex / cl=wald;
oddsratio diabetes/ cl=wald;
oddsratio hypertension / cl=wald;
oddsratio LDL / cl=wald;
oddsratio CCI / cl=wald;
oddsratio median_income / cl=wald;
ods output ParameterEstimates=dPara;
ods output OddsRatios=odssratio;
ods output ClassLevelInfo=ClassLevelInfo;
ods output OddsRatiosWald=dCIWald;
ods output Nobs=dNum;
ods output ResponseProfile=dEvent;
output out= _tempdsn ;
run;
%mvmodels (DATA=heart,METHOD=LOGISTIC, NMODELS=1,
EVENTCOV=heartdisease,EVENT=0,COVARIATES=age race sex diabetes hypertension LDL CCI median_income , TYPE=1 2 2 2 2 1 1 1 ,
CONT_DISPLAY=2, CAT_DISPLAY=4, cat_ref=white`female`no`no, XAXISTYPE=log, LOGBASE=10, XAXISLABEL=Odds Ratios (log scale),
PVAL_TYPE3=0,
DISPLAY=subtitle or_plot or_est_range , show_adjcovariates=1, refline=1, SREFLINE=0.5, SREFLINEPATTERN=SHORTDASH,
TREFLINE=1.5, TREFLINEPATTERN=SHORTDASH, LOGTICKSTYLE = LOGEXPAND, LOGMINOR=TRUE,
LINECOLOR=blue, REFGUIDELOWER=Favors Heart Disease, REFGUIDEUPPER=Favors No Disease);
Hello @PharmlyDoc. Unfortunately the macro does not display output from interaction models outside of the p-value of the interaction term (between the BY variable and first categorical covariate). This is at the guidance of my statisticians where I work as displaying output from interaction models appropriately can be difficult with a macro designed to be generic as this one is.
Hi @JeffMeyers . Is there an option to alternate shading by variable, instead of alternating by row?
Similar to the option you have for %TABLEN where "shading=2" ?
And can you add options to create a third refline and the ability to label the second and third reflines?
I investigated the OUT_PLOT dataset and noticed that the "boldind" variable could be a candidate for shading by variables.
After tweaking your code I was able to get it work. I just can't display 1 for the refline when I have 1.5 as a tickvalue.
%mvmodels (DATA=Neuralgia, METHOD=LOGISTIC, NMODELS=1,
EVENTCOV=Pain,EVENT=Yes,COVARIATES=Treatment Sex Age Duration, TYPE=2 2 1 1,
CONT_DISPLAY=2, CAT_DISPLAY=4, cat_param=ref,
PVAL_TYPE3=0,
DISPLAY=subtitle or_plot or_est_range pval , show_adjcovariates=1, pval_covariates=1,
XAXISTYPE=log, XAXISLABEL=Odds Ratios (log scale), refline=1, SREFLINE=0.5, SREFLINEPATTERN=SHORTDASH,
TREFLINE=1.5, TREFLINEPATTERN=SHORTDASH,
LOGMINOR=TRUE, tickvalues= 0.001 0.01 0.1 0.5 1.5 10,
LINECOLOR=blue, REFGUIDELOWER=Lower Odds of Pain, REFGUIDEUPPER=Higher Odds of Pain,
shading=2, OUT_PLOT=Plot);
With %TABLEN and shading=2 it shades on varlevel=. which produces the shading scheme I was looking for.
%tablen(
DATA = NEURALGIA,
VAR = SEX AGE DURATION PAIN,
TYPE = 2 1 1 2,
BY = TREATMENT, shading=2, debug=1);
Sorry for the constant questions.
Is there a way to enter cat_ref values that contain commas such as for income?
e.g. cat_ref=male`hispanic`$80,000, cat_param=ref,
PVAL_TYPE3=0, DISPLAY=subtitle or_plot or_est_range pval , etc...
Hello @PharmlyDoc,
That shading option is supposed to be there when SHADING=3, but looks like some other updates I made broke it. I just fixed the code, but before making another update wanted to confirm what you mean by label for the reference lines?
Also, to set a macro parameter with commas (or other punctuation) use the %str() function. For example: cat_ref=male`hispanic`%str($80,000),
@JeffMeyers Thanks for the help.
By "Label for the reference lines" I mean to put the value of the refline on the x-axis. If I have a second reference line at 0.5 then I want the number 0.5 on the x axis directly below second refline. But I guess I already achieved that desired effect on the forest plot above by stating:
tickvalues= 0.001 0.01 0.1 0.5 1.5 10,
Hello Jeff
Thank you so much for creating all this wonderful time saving code.
I am having troubles with getting the correct coding to change the reference category with the macro.
For example I have these formated refernce categories from my regular PHREG Class statement:
class agemo_dx3(ref='Age: 0 to < 36 months') mole_cat2 (ref='Group4') meta_cat3 (ref='M0') ext_res (ref='GTR/NTR') Conv_Chemo_Only_Upfront dx_to_rel_cat2 (ref='12 & longer months') pat_rel(ref='Local Only') rel_Combo_tx2(ref='CSI alone') gtr_surg (ref='GTR')/param=ref;
I have tried several iterations for the macro -
%mvmodels
(DATA=intentmol, METHOD=SURVIVAL, NMODELS=1, TIME=time_ev_rel_yr, CENS=event, CEN_VL=0,
COVARIATES=agemo_dx3 mole_cat2 meta_cat3 ext_res Conv_Chemo_Only_Upfront dx_to_rel_cat2 pat_rel rel_Combo_tx2 gtr_surg ,
TYPE= 2, CAT_REF = ' Age: 0 to < 36 months' Group4 ' M0 ' GTR/NTR ' Yes ' 12 & longer months ' Local Only 'CSI alone ' GTR , SHOW_ADJCOVARIATES=1);
Please advise on how to properly refer to the reference?
Also with your %newsurv MACRO I was wondering if there is a way to remove ' + Censor' from the text box as I do want to show the Log-rank p-value just not this label.
You also reference the 2020 newsurv. Where would it be located?
thank you,
Hello @kjmathes03, thank you for the nice comments. I believe the issues are possibly the following:
1) I can't tell if you're using a lowercase ~ (`) in the CAT_REF statement or quotation marks ('). The font makes them look like quotations, and if so you want to use the lowercase tilde unless there's a quote in the string.
2) Don't add spaces to the front and back of the reference group text. The macro assumes all spaces are part of the string. ( 'Age: 0 to < 36 months'Group4'M0 'GTR/NTR'Yes'12 & longer months'Local Only'CSI alone'GTR,).
Depending on the version of NEWSURV that you have there's an option in CENSORMARKERS=2 where it will keep the censormarkers but remove the legend for them. The macro is available to download on the communities page like this one, and there is a version attached with _2020 that is the version you're referencing. If you e-mail me at meyers.jeffrey@mayo.edu I can send you the newest version I have as well.
Hello Again
I was just wondering if the macro for NEWSURV was capable of incorporating competing risk outcomes when an ID is repeated?
For example the outcome is time to lead failure with each ID able to have more than one lead with a separate lead failure time. Death in this case is the competing risk for lead failure.
I know phreg uses Cov(aggregrate) and an id statement.
thank you,
Hello again @JeffMeyers ,
Is there a method for adding the "units statement" for continuous variables in the logistic regression?
I'm using the 02/17/2021 version.
Let's say I have a variable for count of admissions (CNT_ADMT) and another for count of medications (CNT_MED). And I want to see the odds ratios for every 2 unit increase in admissions and for every 3 unit increase in medications:
proc logistic data=PATIENTS PLOTS(ONLY MAXPOINTS=NONE)=(EFFECT ROC ODDSRATIO(logbase=10 TYPE=HORIZONTALSTAT)) namelen=20 ;
class SEX(ref='male') RACE(='non-Hispanic White') / param=ref;
model MI(event=1) = SEX RACE CNT_ADMT CNT_MED / CLODDS=wald orpvalue RSQ lackfit;
oddsratio sex/ cl=wald diff=ref;
oddsratio race/ cl=wald diff=ref;
oddsratio CNT_ADMT/ cl=wald diff=ref;
oddsratio CNT_MED/ cl=wald diff=ref;
UNITS CNT_ADMT = 2;
UNITS CNT_MED = 3;
run;
quit;
/*LOGISTIC Models*/
%if (&&_ncat&i>0 or &&_ncont&i>0) and %sysevalf(%qupcase(&method)=LOGISTIC,boolean) and &_interp=1 %then %do;
proc logistic data=_tempdsn&i.;
/**Splits analysis by current BY variable**/
by modelnum _rowby_lvl _colby_lvl _groupby_lvl _by_num;
%if &&nstrata&i >0 %then %do;
strata %do j=1 %to &&nstrata&i;
_strata_&j
%end; / missing;/**Applies Stratification**/
%end;
class _by_lvl
/**Creates class covariates with reference groups**/
%do j = 1 %to &&_ncat&i;
_cat_val_&j
%end; / param=&cat_param;
/**Runs model statement**/
model _event_val (event='1') /**Time and status variables**/=
%if %qscan(&&type&i,1,%str( ))=1 %then %do; _by_lvl _cont_1*_by_lvl %end;
%else %do; _by_lvl _cat_val_1*_by_lvl %end;
%do j = 1 %to &&_ncat&i;/**Character covariates**/
_cat_val_&j
%end;
%do j = 1 %to &&_ncont&i;/**Continuous covariates**/
_cont_&j
%end;;
/**Outputs temporary datasets**/
ods output modelanova=_interp (where=(find(effect,'*'))) /**Type 3 p-values**/;
run;
%end;
%if (&&_ncat&i>0 or &&_ncont&i>0) and %sysevalf(%qupcase(&method)=LOGISTIC,boolean) and
(&_hr=1 or &_cindex=1 or
(&_pval=1 and (&&pval_covariates&i=1 or &&pval_type3&i=1))) %then %do;
proc logistic data=_tempdsn&i.;
/**Splits analysis by current BY variable**/
by model;
%if &&nstrata&i >0 %then %do;
strata %do j=1 %to &&nstrata&i;
_strata_&j
%end;;/**Applies Stratification**/
%end;
class
/**Creates class covariates with reference groups**/
%do j = 1 %to &&_ncat&i;
_cat_val_&j (ref="0")
%end; / param=&cat_param;
/**Runs model statement**/
model _event_val (event='1') /**Event variables**/=
%do j = 1 %to &&_ncat&i;/**Character covariates**/
_cat_val_&j
%end;
%do j = 1 %to &&_ncont&i;/**Continuous covariates**/
_cont_&j
%end;
/ clodds=wald;
/**Calculate Odds Ratios**/
%do j = 1 %to &&_ncat&i;
oddsratio "_cat_val_&j" _cat_val_&j / diff=ref;
%end;
%do j = 1 %to &&_ncont&i;
oddsratio "_cont_&j" _cont_&j;
%end;
/**Outputs temporary datasets**/
ods output parameterestimates=_parm /**Parameter estimates and wald p-values**/
oddsratioswald=_odds /**Odds Ratios**/
%if &&_ncat&i>0 %then %do; modelanova=_t3 /**Type 3 p-values**/ %end;
globaltests=_gpval /**Global model p-values**/
fitstatistics=_fit /**Model fit statistics**/;
%if &_cindex=1 and &&nstrata&i=0 %then %do;
output out=_preds predicted=_pred_ xbeta=_xbeta_ /**Outputs the betas for C-index calculations**/;
%end;
run;
Hello @PharmlyDoc
Sorry I haven't been able to comment on here in some time. The option you are looking for is CONT_STEP. This allows you to change the step size for a continuous covariate. You can list multiple step sizes for different variables by using a space delimited list (e.g. CONT_STEP=2 3).
Hello @PharmlyDoc @JeffMeyers
I am trying to reproduce Figure 1, but with p-values from log-rank test.
I tried to add pval_by=1, but it did not work. Any suggestions? Thanks.
%mvmodels(DATA=random, METHOD=survival, TIME=os_time, CENS=os_stat,
CEN_VL=0, COVARIATES=arm age gender, TYPE=2 1 2, pval_by=1, CAT_DISPLAY=4,
CONT_STEP=10);
Hello @PKristanto ; The Log-rank test is only available with the BY level variables in this macro. E.g. if BY=ARM and you were getting KM values by arm you could then get a log-rank p-value comparing the arm strata.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.