BookmarkSubscribeRSS Feed

%MVMODELS: a Macro for Survival and Logistic Analysis

Started ‎03-27-2020 by
Modified ‎07-19-2021 by
Views 11,753

This was a paper that was supposed to be presented at PharmaSUG 2020 this year, but due to the cancellation of the conference I still wanted to create a page with the paper and macro for download to share with everyone. This is a macro that has been very helpful to myself and coworkers when doing univariate and multivariate modeling in my meta-analysis database work.  The page currently just mirrors the paper but I would like this page to become a resource of examples on how to use the macro.

 

Abstract

The research field of clinical oncology heavily relies on the methods of survival analysis and logistic regression.  Analyses involve one or more variables within a model, and multiple models are often compared within subgroups.  Results are prominently displayed within either a table or graphically with a forest plot.  The MVMODELS macro performs every step for a univariate or multivariate analysis: running the analysis, organizing the results into datasets for printing or plotting, and creating the final output as a table or graph.  MVMODELS is capable of running and extracting statistics from multiple models at once, performing subgroup analyses, outputting to most file formats, and contains a large variety of options to customize the final output.  The macro MVMODELS is a powerful tool for analyzing and visualizing one or more statistical models.

 

Introduction

Clinical oncology trial endpoints heavily rely on survival or logistic regression analyses to determine whether the trial is positive or negative.  These can include endpoints such as overall survival, progression-free survival, and confirmed tumor response status.  The analyses are potentially performed within multiple different populations such as protocol defined stratification or descriptive factors.  Models are either univariate (consisting of one covariate) or adjusted for other relevant factors in a multivariate model.  The macro MVMODELS is a tool designed to handle all of these situations and output the results into either a clean and easy to read table or forest plot.  The macro performs the analysis, organizes and combines the results, and outputs the final product all from one macro call.  The MVMODELS macro is a powerful tool for any programmer that analyzes clinical trial data.  

 

Sample Data Set for Examples

The data set used in the examples within this paper is randomly generated from the following code:

data random;
    call streaminit(123);
    array u {50};

    do study = 1 to 5;*Studies;
        do i = 1 to 500+floor(rand("Uniform")*500);*Patients;
            do j = 1 to dim(u);*Variables;
                u(j)=rand("Uniform");
                end;
            arm=catx(' ','Arm',1+round(u1,1));
            age=floor(18+62*u2);
            gender=ifc(u3>=0.5,'Male','Female');
            tstage=cats('T',1+floor(4*u4));
            nstage=cats('N',0+floor(3*u5));
            mstage=cats('M',0+floor(2*u6));
            if arm='Arm 1' then response=ifc(u7>0.5,'Response','No Response');
            else if arm='Arm 2' then  response=ifc(u7>0.7,'Response','No Response');
            **Follow up to 10 years;
            os_time=ifn(arm='Arm 1',1+floor(120*u13),1+floor(120*u14));
            os_stat=ifn(arm='Arm 1',
                ifn((os_time <=60 and u8>0.35) or
                    (os_time > 60 and u8>0.65),1+floor(3*u9),0),
                ifn((os_time <=60 and u10>0.7) or
                    (os_time > 60 and u11>0.2),1+floor(u12*3),0));
            output;
            end;
        end;
    drop u: i j;
    label study='Study Number' arm='Treatment Arm' age='Age'
          gender='Gender' tstage='T-Stage' nstage='N-Stage'
        mstage='M-Stage' response='Response Status'
        os_time='Overall Survival Time (months)'
        os_stat='Overall Survival Status';
run;

 The randomly generated data set is not realistic clinical trial data, but will serve the purpose for the examples in this paper.  This data is meant to represent a pooled analysis of five trials that all have the same two treatments (Arm 1 vs Arm 2).  Arm 1 represents a treatment that is very aggressive early but is harder for the patient to tolerate, and Arm 2 represents a treatment that is easier on the patient but has less overall efficacy.  The data set contains demographic and disease characteristics for use in survival and logistic modeling.  The status variable for overall survival has three different events that can happen for competing risks analysis.

 

Example Output

The MVMODELS macro performs analysis and outputs the results into either a forest plot or a table.  The macro parameters SHOW_TABLE and SHOW_PLOT determine which output is created.  The following examples show the flexibility of both the types of analyses that can be performed and the ways the results can be displayed. 

 

Displaying Results from a Multivariate Model

A multivariate model requires the flexibility to display different types of covariates in an easy to read format.  The MVMODELS macro has multiple ways to display discrete or continuous covariates to best display the results in a meaningful format.  The following code is an example of running a multivariate model:

%mvmodels(DATA=random, METHOD=survival, TIME=os_time, CENS=os_stat,
  CEN_VL=0, COVARIATES=arm age gender, TYPE=2 1 2, CAT_DISPLAY=4,
  CONT_STEP=10);

 Figure 1. Displays the forest plot from a survival based multivariate model of overall survival based on treatment arm, age, and gender

mvmodels_web_figure1.JPG

 Figure 1. CAT_DISPLAY=4 will display all levels of the categorical covariate including the reference value. 

 

Table 1. Displays the table from a survival based multivariate model of overall survival based on treatment arm, age, and gender mvmodels_web_table1.JPG

Table 1. The table follows the same structure as the plot.  Alternating row shading is the default.  P-value footnotes are automatically created. CONT_STEP changes the units of the continuous variable.  Type 1 variables are continuous and type 2 variables are categorical.  

 

Comparing Results from Multiple Models

Different models are often compared within clinical trials to check the impact of adding one or more adjusting covariates.  The MVMODELS macro can run more than one model at a time and can limit the display to one covariate of interest without showing the adjusting factors.  This allows an easier comparison of the covariate of interest and a more compact plot or table.  The number of models run is controlled by the NMODELS macro option, and different options can be specified for each model by using the pipe symbol as a delimiter (see MODEL_TITLE below).  Options without the pipe delimiter are applied to all models (see TIME option below).  The following code is an example of running multiple models:

%mvmodels(DATA=random, METHOD=survival, TIME=os_time, CENS=os_stat, CEN_VL=0,
   NMODELS=3, SHOW_ADJCOVARIATES=0,
   MODEL_TITLE=Treatment Arm|Treatment Arm*|Treatment Arm**,
   FOOTNOTE=*Adjusted for age; **Adjusted for age and gender, HRDIGITS=3,
   COVARIATES=arm |arm age |arm age gender, TYPE=2 1 2, CAT_DISPLAY=3,
   PVAL_TYPE3=0, HEIGHT=4in);

Figure 2. Displays the results from three different models.

 mvmodels_web_figure2.JPG

Figure 2. The first model is a univariate model of treatment.  The second model adjusts for age, and the third model adjusts for age and gender.  CAT_DISPLAY=3 will display the current covariate value without the reference group.

 

Table 2. Displays the results from three different models.

mvmodels_web_table2.JPGTable 2. Setting SHOW_ADJCOVARIATES=0 prevents adjusting factors from being displayed allowing the adjusted treatment arm covariate to be compared more easily.

 

Quickly Performing Subgroup Analyses

Subgroup analyses are very common in oncology research and meta-analyses.  A subgroup analysis involves running the same model within the levels of another variable.  An example of this would be comparing treatment arms within different genders.  The MVMODELS macro has several options to easily display subgroup analyses in different ways.

Within the Same Cell (BY Parameter)

The BY parameter allows for one or more variables to be listed.  The same model will be run within each level of each variable specified.  The order of the BY variable values can be changed with BYORDER parameter.  The following example shows the use of the BY variable:

%mvmodels(DATA=random, METHOD=survival, TIME=os_time, CENS=os_stat, CEN_VL=0,
  COVARIATES=arm age gender, TYPE=2 1 2, CAT_DISPLAY=1, CONT_DISPLAY=2,
  CONT_STEP=10, BOLD_COV_LABEL=0, BY=tstage, SHADING=2, SHOWWALLS=0);

Figure 3. The Cox model of overall survival and treatment arm, age, and gender is computed within each level of T-stage.

 mvmodels_web_figure3.JPG

Figure 3. SHADING=2 alternates the shading between BY levels to make it easier to visually distinguish groups.  CAT_DISPLAY=1 combines two level covariates into one row to save space.  CONT_DISPLAY=2 hides the step size text of the continuous variable label. SHOW_WALLS=0 removes the lines bordering the plot area.

 

Table 3. The Cox model of overall survival and treatment arm, age, and gender is computed within each level of T-stage.

mvmodels_web_table3.JPG

 Table 3. The different model outputs can easily be compared between different levels of T-stage.

 

By Rows (ROWBY)

The ROWBY parameter allows for one variable to be listed.  The same model will be run within each level of the variable specified.  The order of the ROWBY variable values can be changed with ROWBYORDER parameter.  ROWBY is different from BY in that it further separates the groups into distinct rows that can be separated with lines and adds vertical labels at the head of each row.  The following example shows the use of the ROWBY variable:

%mvmodels(DATA=random, WHERE=study in(1 2 3), METHOD=survival, TIME=os_time,
  CENS=os_stat, CEN_VL=0, COVARIATES=arm age gender, TYPE=2 1 2,
  CAT_DISPLAY=2, CONT_DISPLAY=3,CONT_STEP=10, ROWBY=study, SHOWWALLS=0,
  SHADING=0, REFLINE=1, PVAL_COVARIATES=0);

Figure 4. The Cox model of overall survival and treatment arm, age, and gender is computed within each level of study.

 mvmodels_web_figure4.JPG

 Figure 4. SHADING=0 removes the shading.  CAT_DISPLAY=2 displays the reference group within the label.  CONT_DISPLAY=3 moves the step size text to a new row.  REFLINE adds a reference line to the graph to help visually compare estimates.  PVAL_COVARIATES=0 disables the covariate level p-values.

 

Table 4. The Cox model of overall survival and treatment arm, age, and gender is computed within each level of study.

mvmodels_web_table4.JPG

 Table 4. ROWBY groups are easily distinguished with separating lines. 

 

By Columns (COLBY)

The COLBY parameter allows for one variable to be listed.  The same model will be run within each level of the variable specified.  The order of the COLBY variable values can be changed with COLBYORDER parameter.  COLBY creates one column of summary statistics for each level of COLBY.  The following example shows the use of the COLBY variable:

%mvmodels(DATA=random, METHOD=survival, TIME=os_time, CENS=os_stat, CEN_VL=0,
  COVARIATES=arm age gender, TYPE=2 1 2, CAT_DISPLAY=4, CONT_DISPLAY=2,
  CONT_STEP=10, COLBY=response, SHOWWALLS=0,
  UNDERLINEHEADERS=1, REFLINE=1, MIN=0, MAX=2, INCREMENT=0.5,
  PLOT_DISPLAY=subtitle ev_t hr_plot hr_est_range, 
  PLOT_COLUMNWEIGHTS=0.2 0.2 0.3 0.3);

Figure 5. The Cox model of overall survival and treatment arm, age, and gender is computed within each level of response.

mvmodels_web_figure5.JPGFigure 5. Each level of COLBY gets a column header and the COLBY label is shown at the top of the graph.  UNDERLINEHEADERS underlines the headers of each column label.  A vertical line separates each level of COLBY.  The axes are set with the MIN, MAX, and INCREMENT covariates.  PLOT_DISPLAY determines which summary statistics are shown, and PLOT_COLUMNWEIGHTS manually sets the space of each column.

 

Table 5. The Cox model of overall survival and treatment arm, age, and gender is computed within each level of response.

mvmodels_web_table5.JPG

Table 5. PLOT_DISPLAY only controls which summary stats are shown in the plot, while TABLE_DISPLAY determines which summary stats are shown in the table.  A gap is added between each column.

 

Multiple results in same row (GROUPBY)

The GROUPBY parameter allows for one variable to be listed.  The same model will be run within each level of the variable specified.  The order of the GROUPBY variable values can be changed with GROUPBYORDER parameter.  GROUPBY is useful for creating a very compact graph to compare two or more subgroups side-by-side.  This is useful for case-control comparison graphs.  The following example shows the use of the GROUPBY variable:

 

%mvmodels(DATA=random, METHOD=survival, TIME=os_time, CENS=os_stat, CEN_VL=0,
  BY=study, TIMELIST=36, GROUPBY=arm, MIN=0.70, MAX=1, INCREMENT=0.1);
Figure 6. Displays the 36 month Kaplan-Meier event-free rate for overall survival across each study grouped by arm.

 

mvmodels_web_figure6.JPG

Figure 6. TIMELIST specifies one or more event-free time-points.  GROUPBY displays the estimates in the same row but offset and in different colors.  A legend is added to identify the different levels of GROUPBY.

 

Table 6. Displays the 36 month Kaplan-Meier event-free rate for overall survival across each study, grouped by arm.

mvmodels_web_table6.JPG

 Table 6. GROUPBY adds a column to identify which level of GROUPBY the estimate belongs to.  This option is more visually appealing in the graph, but is still available for the table.

 

Display Multiple Graphs

The MVMODELS macro can display more than one graph within the same plot.  The graphs must come from the same analysis.  The following example shows having more than one graph within a plot:

 

%mvmodels(DATA=random, METHOD=survival, TIME=os_time, CENS=os_stat, CEN_VL=0,
  COVARIATES=arm age gender, TYPE=2 1 2, CAT_DISPLAY=4, CONT_DISPLAY=2,
  CONT_STEP=10, TIMELIST=36, SHOW_MODELSTATS=0, SHOWWALLS=0, MIN=0.8|0.75,
  MAX=1.2|0.95, INCREMENT=0.1|0.05, REFLINE=1|, SUMSIZE=9pt,
  PLOT_DISPLAY=subtitle ev_t hr_plot hr_est_range km_plot1 km_est_range1);
Figure 7. Graphs hazard ratios and 36 month overall survival event-free rates.

 

mvmodels_web_figure7.JPG

Figure 7. Separate options such as MIN and MAX can be set for each graph.  Each graph will have the _PLOT suffix within the PLOT_DISPLAY parameter.

 

Output to Multiple Destinations

The MVMODELS macro is designed to output the plot and table to multiple destinations and have the same general appearance and style.  The following is an example of outputting the table to multiple destinations at the same time:

ods pdf file='~/ibm/test.pdf' notoc bookmarkgen=no startpage=no;
ods excel file='~/ibm/test.xlsx' options (sheet_interval='none');
ods powerpoint file='~/ibm/test.pptx';

%mvmodels(DATA=random, METHOD=survival, TIME=os_time, CENS=os_stat, CEN_VL=0,
    COVARIATES=arm age gender, TYPE=2 1 2, CAT_DISPLAY=4, CONT_STEP=10);

ods _all_ close;
 
ODS PDF Output

Figure 8. Screen shot of the table from the PDF output.

mvmodels_web_figure8.JPG

Figure 8. The space between the superscripts and p-values is due to the PDF destination

 

ODS EXCEL Ooutput

Figure 9. Screen shot of the table from the EXCEL output.

mvmodels_web_figure9.JPG

Figure 9. The ability to output to EXCEL opens up the freedom of horizontal and vertical space for the table

 

 

ODS POWERPOINT Output

Figure 10. Screen shot of the table from the POWERPOINT output.

mvmodels_web_figure10.JPG

Figure 10. The ability to output to POWERPOINT makes it easier to create summary slides from an analysis.

 

ODS LISTING Output

Figure 11. Screen shot of the table from the LISTING output.

mvmodels_web_figure11.JPG

Figure 11. The ability to output to the OUTPUT window with ODS LISTING allows for the results to be saved to the .LST file and makes for quick and compact results.

 

Macro Process Overview

Step 1: Error Checking

The macro contains a large amount of parameters, so it is necessary to have error checking code throughout the macro to try to identify inappropriate macro parameter inputs before they cause errors in the SAS session.  The error checking code makes sure variables exist, that required parameters are entered, and that proper values are entered.  For instance, if a parameter has a designated list of values, the macro will check whether the user entered an appropriate value.  If the user entered a value that does not match the list, then the macro stops, displays an error message, and provides the list of allowed values.

 

Step 2: Automated Analysis

Creating a forest plot that compares multiple models requires a large amount of code replication.   Nearly identical code is repeated to run each model, and nearly identical code is used to extract and combine the results from each model.  Programmers not familiar with writing macros will need to spend a great amount of time writing out many of these near duplicate sections of code to create one forest plot.  The chance of a programming error also increases when duplicating the same code, especially if the code needs to be modified.  The MVMODELS macro removes the time investment and risk by fully automating two types of analyses: survival analysis and logistic regression.  Included within survival analysis are Kaplan-Meier event-free rates, median time-to-event, Cox proportional hazards ratios, and concordance index.  Included with logistic regression are odds ratio, binomial success rates, and concordance indexes. Each of these analyses is customizable with macro parameters, and the detail of each method is listed in section 4. 

 

Step 3: Plot Data Set Construction

The macros generate a data set that is conducive to creating a forest plot or table. 

 

Step 4: Generate the Plot

The Graph Template Language (GTL) within the TEMPLATE procedure is used to set up the plot with a combination of the variables in the plot data set and macro variables derived from the plot data set.  The actual image is then created using the SGRENDER procedure in combination with ODS Graphics option settings. The image can be a number of file types including PNG, EMF, PDF, JPEG, TIFF, and SVG, and can be embedded into RTF, HTML or PDF destinations.

 

Step 4: Generate the Table

The REPORT procedure is used to create the outputted table in combination with a large number of style modifications.  The table is formatted to work well in RTF, HTML PDF, EXCEL, and POWERPOINT destinations.

 

Analysis Methods

The macro uses SAS procedures to perform the analyses.  The following sections describe which SAS procedures are used to create each available statistic.

 

Survival Methods

The MVMODELS macro can perform regular survival analysis as well as cumulative incidence analysis (SAS 9.4M3+).  The methods to compute statistics differs slightly depending on which survival method is being used.

 

Survival and 1-Survival

  • Number of patients and events: ODS OUTPUT statement within the LIFETEST procedure specifying the CENSOREDSUMMARY data set.
  • Kaplan-Meier event-free rates: The TIMELIST option within the LIFETEST procedure is used to specify the time-points, and the OUTSURV option with the REDUCEOUT option is used to output the rates to a data set.
  • Median time-to-event: ODS OUTPUT statement within the LIFETEST procedure specifying the QUARTILES data set.  This data set is then subset down to where PERCENT=50.
  • P-values to compare survival curves (no stratification): ODS OUTPUT specifying the HOMTESTS dataset.  The BY Variable is listed in the STRATA statement and LOGRANK and WILCOXON are specified as options.
  • P-values to compare survival curves (with stratification): ODS OUTPUT specifying the HOMTESTS dataset.  Stratification factors are listed in the STRATA statement and LOGRANK, WILCOXON, and GROUP=BY Variable are specified as options.
  • Hazard ratios (stratified or not stratified): ODS OUTPUT within the PHREG procedure specifying the PARAMETERESTIMATES data set.  Continuous covariates are only included within the MODEL statement, and discrete covariates are included in both the MODEL and CLASS statements.  The reference group for the categorical covariates is determined by the CAT_REF parameter and specified with the REF= option of the CLASS statement.  The step size for continuous covariates is determined by the CONT_STEP parameter and is applied as a transformation to the hazard ratio after outputting using the following formula:
    • EXP(CONT_STEP*LOG(HAZARDRATIO))
  • Cox model type-III tests (stratified or not stratified): ODS OUTPUT within the PHREG procedure specifying the MODELANOVA data set.  The TYPE3(ALL) option is specified within the MODEL statement in order to produce the score, Wald, and likelihood-ratio tests.
  • Cox model p-values comparing individual covariate levels (stratified or not stratified): ODS OUTPUT within the PHREG procedure specifying the PARAMETERESTIMATES data set.
  • Concordance Indexes (stratified or not stratified): The method for calculating concordance indexes described in the survConcordance1 package from R developed by Therry Therneau is a widely recognized method, and thus was chosen as the method within this macro.  The method uses a binary tree approach to calculating the weights, sum of squares, and eventual standard error.  The model predicted values used in the binary tree method are taken from the OUTPUT statement within the PHREG procedure defining the XBETA variable.

 

Cumulative Incidence (CIF)

  • Number of patients and events: ODS OUTPUT statement within the LIFETEST procedure specifying the FAILURESUMMARY data set.  The event of interest is specified with the EVENTCODE option of the TIME statement.
  • Kaplan-Meier event-free rates: The TIMELIST option within the LIFETEST procedure is used to specify the time-points, and the OUTCIF option is used to output the rates to a data set.
  • Median time-to-event: The OUTCIF option within the LIFETEST procedure is used to output the cumulative incidence rates across time.  A SQL procedure query is used pull the median times and confidence intervals from these rates.  For example, the median time is the first time the CIF rate is greater than or equal to 50%.
  • P-values to compare survival curves (no stratification): ODS OUTPUT specifying the GRAYTEST dataset.  The BY Variable is listed in the STRATA statement.
  • P-values to compare survival curves (with stratification): ODS OUTPUT specifying the GRAYTEST dataset.  Stratification factors are listed in the STRATA statement and GROUP=BY Variable are specified as options.
  • Hazard ratios (stratified or not stratified): ODS OUTPUT within the PHREG procedure specifying the PARAMETERESTIMATES data set.  The event of interest is specified with the EVENTCODE option in the MODEL statement.  Continuous covariates are only included within the MODEL statement, and discrete covariates are included in both the MODEL and CLASS statements.  The reference group for the categorical covariates is determined by the CAT_REF parameter and specified with the REF= option of the CLASS statement.  The step size for continuous covariates is determined by the CONT_STEP parameter and is applied as a transformation to the hazard ratio after outputting using the following formula:
    • EXP(CONT_STEP*LOG(HAZARDRATIO))
  • Cox model type-III tests (stratified or not stratified): ODS OUTPUT within the PHREG procedure specifying the MODELANOVA data set.  The TYPE3(ALL) option is specified within the MODEL statement in order to produce the score, Wald, and likelihood-ratio tests.
  • Cox model p-values comparing individual covariate levels (stratified or not stratified): ODS OUTPUT within the PHREG procedure specifying the PARAMETERESTIMATES data set.
  • Concordance Indexes (stratified or not stratified): Not available with CIF method.

 

Logistic Regression Methods

  • Number of patients and events / binomial success rates: The TABLES statement is used with the BIN option to generate the binomial success rates along with number of patients and events.  A success is considered to be the event of interest.  A data step is utilized beforehand to create a variable that contains the counts for each level of the binomial variable.  This variable is then used in a FREQ statement with the ZEROS option (which forces PROC FREQ to include counts of zero) to avoid errors that can arise with zero percent success rates and 100 percent success rates. The estimates are then output using an OUTPUT statement with the BIN option. The number of patients and events are output by using the OUT option within the TABLES statement.  The counts and binomial success rates are taken within each level of a BY variable and within all levels of a categorical covariate.
  • P-values to compare BY Variable groups: OUTPUT statement within the FREQ procedure with the CHISQ and/or FISHER options specified in the TABLES and OUTPUT statements as options.
  • Odds ratios (stratified and unstratified): ODS OUTPUT within the LOGISTIC procedure specifying the ODDSRATIOSWALD data set.  One ODDSRATIO statement is created for each covariate.  The event is specified by the EVENT parameter.  Continuous covariates are only included within the MODEL statement, and discrete covariates are included in both the MODEL and CLASS statements.  The reference group for the categorical covariates is determined by the CAT_REF parameter and specified with the REF= option of the CLASS statement.  The step size for continuous covariates is determined by the CONT_STEP parameter and is applied as a transformation to the odds ratio after outputting using the following formula:
    • EXP(CONT_STEP*LOG(ODDSRATIO))
  • Logistic model type-III tests (stratified or not stratified): ODS OUTPUT within the PHREG procedure specifying the MODELANOVA data set.  The Wald test is created with this method.
  • Logistic model p-values comparing individual covariate levels (stratified or not stratified): ODS OUTPUT within the PHREG procedure specifying the PARAMETERESTIMATES data set.
  • Concordance Indexes (not stratified): The method for calculating concordance indexes follows the methods described in a paper by JA Hanley and BJ McNeil2. While the concordance index for logistic regression can be automatically output from the LOGISTIC procedure, the standard error is not. Without a standard error the confidence bounds for the concordance index cannot be calculated.  The paper by Hanley and McNeil provide a method for calculating the standard error that has been commonly used within the Biomedical Statistics and Informatics division at Mayo Clinic.  The model predicted values used in this method are taken from the OUTPUT statement defining the XBETA variable.

 

Streamlining the Analyses

The MVMODELS macro has the potential to run a large number of statistical models in one macro call with the subgrouping BY, COLBY, ROWBY, and GROUPBY options, so the macro uses methods to streamline how many procedures need to be run to accommodate all of the models.  The primary method used is to combine data set duplication with BY statements in the procedures.  An example of this would be calculating the five year survival rates in the following example:

%mvmodels(DATA=random, METHOD=survival, TIME=os_time, CENS=os_stat, CEN_VL=0,
    TIMELIST=60, BY=arm gender tstage nstage);
There are eleven five year survival rate estimates that would need to be calculated from this macro call: two for ARM, two for GENDER, four for TSTAGE, and three for NSTAGE.  This would normally require four calls to the LIFETEST procedure due to patients existing within multiple BY variables, but the macro bypasses this by duplicating each patient once for every BY variable level that they are in.  The data set is set up with the following structure:

 

Table 7. Shows the duplicated data set structure

Patient ID

BY Variable

By Variable Level

Survival Time

Survival Status

1

ARM

1

30

0

2

ARM

2

60

1

1

GENDER

Female

30

0

2

GENDER

Male

60

1

 

Table 7. Each unique BY variable and level has its own subgroup within the transformed dataset.  Patients exist in multiple subgroups.

 

Now the BY variable and BY variable level columns in table 7 can be used within the BY statement of the LIFETEST procedure creating all of the eleven estimates with only one LIFETEST procedure call instead of four.  This method is also used when accommodating any COLBY, ROWBY, and GROUPBY variables as well as when calculating estimates for any categorical covariate levels.  All models called by the macro are combined into a single dataset so that there is only one LIFETEST procedure called in the entire macro when calculating survival estimates, number of patients and events, and median time-to-event estimates.

This method is modified when running models with the PHREG or LOGISTIC procedures.  The difficulty encountered is that multiple models can have varying numbers of covariates included and different stratification factors.  The MVMODELS macro accounts for this in three steps:

  1. Make a generic numeric variable for each categorical covariate where the numeric value is the order.  For example gender would be converted from Male/Female to 1/2
  2. Make the reference value of each categorical covariate equal to 0.  This is so “0” can be specified as the reference group within the CLASS statement.  For the gender example if males are the reference group then all of the male gender codes would be converted to zeros.
  3. If one model has less covariates than another (e.g. model 1 uses arm and gender, model 2 uses arm, gender, and T-stage), then a variable is still created for the covariates that don’t exist in the current model, but all values are set to be 0.  In the PHREG procedure a covariate that has all of the same values is thrown out of the model automatically.

A visual example of this method is shown in Table 8 below.  Model one has Arm and Gender for covariates.   Model two only has Arm for a covariate.

Table 8. Shows the data set structure for running the PHREG procedure

Model

Patient ID

Arm

Gender

Covariate 1

Covariate 2

1

1

1

Male

1

1

1

2

2

Female

0

0

2

1

1

Male

1

0

2

2

2

Female

0

0

Table 8. Model 1 uses Female and Arm 2 as reference groups, so the generic covariate variables are assigned values of 0.  The other values are the numeric order of the remaining values.  Model 2 does not use Gender as a covariate, so all values of the covariate 2 variable are 0.

 

All models are run with a single PHREG or LOGISTIC procedure call when using this methodology for the categorical covariates, continuous covariates, and strata variables along with the earlier method of duplicating the data set for each subgroup.  One note is that all stratified and unstratified models are run separately in the LOGISTIC procedure as different results are created when using a stratification variable with all of the same levels versus not including the stratification variable completely.

 

Output Methods

After analyzing the requested models the MVMODELS macro creates a dataset that combines the results together that is ideal for graphing and printing.  The macro then builds a graph template with the TEMPLATE procedure, and prints the table with the REPORT procedure.

 

How the Dataset is Constructed

The plot dataset is constructed using a combination of the SQL procedure and DATA steps.  A DATA step is used to set up a structure based on input macro parameters for merging in the analysis results with the SQL procedure.  The SQL procedure is used for merging because of the flexibility the procedure has with being able to use functions and other logic conditions in the merging process.

 

Variable Components

The variables that make up the plot data set are separated into different subtypes.

 

Row Headers

Each row of the forest plot has a rowheader, or subtitle, to show the variable label or value.  The row header is contained within one variable:

  • SUBTITLE: Contains the row header such as variable label, variable level, group label, group level, or model title.

 

Estimates and Confidence Limits

There are numeric variables for the each calculated estimate, such as hazard ratio or odds ratio, and the upper and lower confidence limits.  These variables are used for making the scatterplot and error bars in the plot.

  • PREFIX_EST: Calculated ratio/estimate/rate
  • PREFIX_LCL: Calculated lower 95% confidence limit
  • PREFIX_UCL: Calculated upper 95% confidence limit

Multiple estimates are available to be plotted and displayed depending on the method used, so the PREFIX_ term changes depending on which type of estimate is being captured.  For example, the hazard ratios have the prefix HR_, odds ratios have the prefix OR_, and concordance indexes have the prefix C_.  There are additional variables that are combinations of these variables.  These are character variables with pre-specified rounding (set by macro parameters):

  • PREFIX_ESTIMATE: ESTIMATE.  Example: 1.34
  • PREFIX_RANGE: LCL - UCL.  Example: 0.25-4.20
  • PREFIX_EST_RANGE: ESTIMATE (LCL - UCL).  Example: 1.34 (0.25-4.20)

 

Number of Patients and Events

The section for number of patients and events has additional variables that can be used in the summary statistics panel with special formats to save space:

  • TOTAL: Total number of patients within model/group
  • EVENTS: Number of events or successes (for binomial) within model/group
  • EV_T: Events/Total.  Example: 245/300
  • PCT: Percentage %.  Example: 60%
  • EV_T_PCT: Events/Total (Percentage %).  Example: 245/300 (81.6%)

There are additional variables when utilizing Cox or logistic modeling that have the prefix REF_.  These variables contain the number of patients and/or number of events of just the reference group.

 

P-Values

There are two variables allocated for the p-values. 

  • PVAL: character variable (applies PVALUE6.4 format to numeric value) that contains the p-value
  • PFOOT_INDEX: contains the index of the p-value for the footnote.  This is a numeric variable that will apply the value as a superscript if AUTOPFOOT=1.

 

Plot/Table Indicators

There are three different indicator variables in the plot data set to serve the following purposes:

  • SUBIND: determines the number of indentations of the subtitle
  • BOLDIND: determines if the subtitle is bold or normal weight
  • SHADEIND: determines if the row in the plot has a shade background when SHADING=1 or 2

These variables are automatically calculated by the macro.

 

Grouping Variables

There are four different grouping categories that the macro uses: BY variables, a COLBY (column by) variable, a ROWBY (row by) variable, and a GROUPBY (group by) variable.  There can be multiple BY variables, but only one of each column by or row by variable.  These are optional to increase the efficiency of doing subgroup analyses.

 

Variables for the BY groups include:

  • BY_NUM: designates which BY variable the current data row represents.  This is a numeric variable where the nth value represents the nth variable listed in the BY parameter.
  • BY_LVL: designates the level of the current BY variable.  This is a numeric variable where the nth value corresponds to the nth level of the variable.  For example, if gender has the levels of 1=Male and 2=Female and BY_LVL=1 then would represent Male (the 1st level).  A value of BY_LVL=0 indicates that the current row has the label for the BY variable in the SUBTITLE column.

There are not any variables added for the COLBY variable.  Instead all of the summary and plot columns are duplicated for each level of the COLBY variable and a suffix (example: TOTAL_1) is added where the number represents which level of the COLBY variable.

Variables for the ROWBY groups include:

ROWBY_LVL: designates which level of the current ROWBY variable.  This is a numeric variable where the nth value corresponds to the nth level of the variable.  For example, if gender has the levels of 1=Male and 2=Female and ROWBY_LVL=1 then would represent Male (the 1st level). 

Variables for the GROUPBY groups include:

  • GROUPBY_LVL: designates which level of the current GROUPBY variable.  This is a numeric variable where the nth value corresponds to the nth level of the variable.  For example, if gender has the levels of 1=Male and 2=Female and GROUPBY_LVL=1 then would represent Male (the 1st level). 

The GROUPBY_LVL variable is used for the discrete attribute map to color the scatter plot and highlow plot if GROUPBY is specified.

 

Creating the FOREST PLOT

The macro uses the Graph Template Language within the TEMPLATE procedure in combination with the SGRENDER procedure to produce the final graph.  The template splits the graph into sections or panels: the subtitle panel, the plot panels, and the statistical summary panels using the LATTICE layout within GTL.  The lattice creates a plot space that has one column for each item being displayed by the DISPLAY parameter and one row for each value of the ROWBY variable.  Each column is considered a section or panel.

Due to the evolution of the Graph Template Language from SAS 9.2 and 9.4+, producing a forest plot has become much easier with new functionality such as AXISTABLE and TEXTPLOT.  However the MVMODELS does not use these new features in favor of primarily using annotation.

 

Annotation Versus Data Driven Plotting

SAS 9.3 introduced annotation to the SG procedures and with it the DRAW functions within GTL.  The draw functions, DRAWTEXT, DRAWLINE, DRAWARROW, and DRAWPOLYGON, allow the user to manually annotate a graph with text, lines, and other shapes.  The creation of these annotations is generally more tedious and much less flexible compared to data driven graphing, but they have a distinct advantage within multi-panel graphs which is why the MVMODELS macro uses the annotation method instead of being entirely data driven. 

 

 

Annotation Can Cross Over Panels

One of the greatest challenges of designing a forest plot is allocating enough space to each section so that the graph is easy to read and none of the text involved gets cut-off.  Annotation text does not get cut-off by the graph space and allows the text to flow across multiple panels.  This comes in handy for having long subsection headers, model titles, or longer estimate values.

 

Figure 12. Displays an example of annotation crossing over panels versus the graph getting cut-off.

mvmodels_web_figure12.JPG

Figure 12. The red panels indicate where the layout panels end for each column.  The model title is able to stretch across all seven panels in this example, where a normal data driven graph it would get cut off within the first panel instead.

 

 

The Macro Facility Takes Away the Tediousness of Annotation

Utilizing the macro facility removes the tediousness of writing out each annotation separately and fully automates the process.  The MVMODELS macro pulls the values from the plot data set along with the coordinates to plot them and writes each value within the GTL environment without input from the user.

 

 

Annotation Allows the use of Unicode, Superscript and Subscript

The DRAWTEXT function allows the use of Unicode characters, superscripts and subscripts in the text it creates.  Normal data-driven graph elements such as labels and AXISTABLE values do not handle superscripts and subscripts.

 

 

Limitations of Annotation

The greatest limitation that annotation offers is that the graph does not allocate space automatically to fit annotation.  This means that the programmer must use other means to define graph space in order to properly show the annotation.  This can be as simple as printing a blank space of the same size and font of the DRAWTEXT function in the spot the user wants the text.

 

Graph Section/Panel Descriptions

 

Subtitle Panel

The subtitle panel contains the row headers for the other two panels and includes: the model titles, the covariate labels, and the levels of the covariate.  The macro creates the forest plot in a row-by-row basis.  Each row of the graph will have a row header (or subtitle), at least one possible scatterplot, and optional columns of summary statistics.  Multiple models can be run and displayed in the plot, and model titles go before any covariates from a model are listed.  Covariate labels are listed before the covariate levels.  For example, for gender there would be a row with a label such as "Gender" followed by up to two rows (depending on display options) designating a "Male" row and a "Female" row. 

The subtitles are drawn entirely with DRAWTEXT statements, which like ENTRY statements can be left aligned and can have spaces added in front for indentation.  Unlike ENTRY statements however, the coordinates for the text can be specified with the DRAWTEXT statement.  This allows the text to be aligned with the y-axis from the plot panel.  Each subtitle has its own DRAWTEXT statement.  DRAWTEXT statements are not bound by the walls of a layout or lattice cell, so longer model titles or variable labels can fit with less white space in the graph.

 

 

Plot Panels

There are only three components to the plot: the scatterplot, the confidence bounds, and the reference line.  The reference line is drawn with a REFERENCELINE statement.  The scatterplots are drawn with the SCATTERPLOT statements, and the confidence bounds are drawn with the HIGHLOWPLOT statement.  The HIGHLOWPLOT statement is used instead of the error bar options within the SCATTERPLOT statement for two reasons.  The first is that when the line thickness is increased the endcaps on the error bars with SCATTERPLOT increase dramatically in vertical height which looks unprofessional.  The error bar endcaps stay the same size or increase at a much slower rate when the line thickness is increased with the HIGHLOWPLOT statement.  The second reason is that there is an option in the HIGHLOWPLOT statement to point to variables that determine if the error caps are drawn.  The option to point to a variable that determines if an error bar endcap is drawn is helpful as the Data step can be used to determine if the bars exceed the MIN or MAX parameters and programmed accordingly with IF/ELSE logic.  The macro does not draw the error bar endcaps if the error bars extend beyond the maximum or minimum values of the axis.

Each model created by the macro can be colored separately, sized differently, and have unique symbols.  This is done by building a discrete attribute map and applying it to the model number variable.

 

 

Statistical Summary Panel

The statistical summary panels are drawn similarly to the subtitle panel, but without the need for indenting or changing font weights.  They are centered within their column of the LATTICE layout and drawn with the DRAWTEXT statement. 

 

 

Reference Guides

Reference guides are text that describes what an area of the graph means compared to a reference line (e.g. “Males perform better” or “High-grade perform worse”). These are generally paired up with an arrow to show the direction that the reference guides are referring to. The MVMODELS macro creates these with a combination of DRAWTEXT and DRAWARROW.  Additional rows are added into the graph in order to allocate space.  The text that is printed is directly input by the user with the REFGUIDELOWER and REFGUIDEUPPER parameters.  The following is an example of adding reference guides:

%mvmodels(DATA=random, METHOD=survival, TIME=os_time, CENS=os_stat, CEN_VL=0,
    COVARIATES=arm, by=gender, TYPE=2, CAT_DISPLAY=5, pval_type3=0,
    HEIGHT=4in, REFLINE=1,
    REFGUIDELOWER=Favors Arm 1,REFGUIDEUPPER=Favors Arm 2);

 

Figure 13. Example of adding reference guides to bottom of the plot space

mvmodels_web_figure13.JPG

Figure 13. The reference guides originate from the reference line and point towards the minimum and maximum values.  The text is manually specified by the user.

The reference guides can also be printed at the top of the graph space, and line breaks are inserted with the ` symbol.

 

%mvmodels(DATA=random, METHOD=survival, TIME=os_time, CENS=os_stat, CEN_VL=0,
    COVARIATES=arm, by=gender, TYPE=2, CAT_DISPLAY=5, pval_type3=0,
    HEIGHT=4in, REFLINE=1,
    REFGUIDELOWER=Favors Arm 1,REFGUIDEUPPER=Favors Arm 2,
REFGUIDEVALIGN=top,REFGUIDEHALIGN=in,REFGUIDELINEARROW=open, REFGUIDELOWER=Favors`Arm 1,REFGUIDEUPPER=Favors`Arm 2);
Figure 14. Example of adding reference guides to top of the plot space

 

mvmodels_web_figure14.JPG

Figure 14. REFGUIDEVALIGN=top moves the reference guides to the top.  REFGUIDEHALIGN=in aligns the reference guides against the reference line.  The ` symbol creates line breaks.

 

Creating the Table

The table is printed using the REPORT procedure due to the customizability the procedure offers.  The capability to make compute variables on-the-fly, make style modifications on-the-fly, and create spanning headers easily makes the REPORT procedure the go-to procedure for creating high quality analysis output.

 

Designing a Style Template

The MVMODELS macro is able to output to several ODS destinations including LISTING, RTF, PDF, EXCEL and POWERPOINT.  Each of these destinations has its own programming quirks and challenges, and so the macro generalizes wherever possible but ultimately has separate code adjustments for each destination.  The macro creates a style template using the TEMPLATE procedure to set up a majority of the output table, and then fine-tunes the final REPORT procedure depending on the destination.  One REPORT procedure is ran for each destination currently open.

 

Adding Spaces Between Columns

The REPORT procedure allows the creation of COMPUTE variables which are not actually in the input data set.  These do not actually have to contain a value however, and so the macro uses COMPUTE variables as “dummy” variables.  They are assigned a missing value, and given a certain width to create a space between different values of the COLBY variable.

 

Creating Spanning Headers

The REPORT procedure allows one variable to be used multiple times and renamed.  The macro uses a blank variable as an across variable multiple times in order to add a header across selected variables.  These can be nested in order to create multiple spanning headers.

 

Styling the SUBTITLE Column

The MVMODELS macro creates an indicator variable for whether the value should be bold or regular font weight, and creates a variable for how many indents the text should have.  A COMPUTE block within the REPORT procedure is used to add these style modifications to the SUBTITLE column.

 

Adding Shading to Rows

The MVMODELS macro creates a variable indicator to whether a row should be shaded or not.  A COMPUTE block within the REPORT procedure allows a style modification to be done to an entire row depending on this indicator variable.

 

Creating the Table in the LISTING Destination

The LISTING destination is a much tougher destination to make easy to read tables as length of variables is very important.  Typically variables are made with lengths/formats that are longer than necessary to contain all of the information.  Within the LISTING destination however this just adds more blank space to a column.  The MVMODELS macro creates a separate dataset for printing to the LISTING destination where it finds the longest value in any character variable and sets the length of the variable to the length of this value. 

In order to create lines that cross the whole page the repeat function is used along with the current LINESIZE option value.  The REPEAT function allows for a character to be repeated n number of times, so by combining the hyphen with the LINESIZE value a dashed line will cross the entire page.  A similar technique is used to create the underlines in each column header.  A dashed line the length of each variable plus four (for space between columns) is drawn under each column label.  Spanning headers are added to the COLUMNS statement with a line drawn underneath that is the length of the sum of the variables in the spanning header.

 

Conclusion

The MVMODELS macro is a powerful and flexible tool that was built to handle the large variety of modeling prevalent to clinical oncology trials.  The macro outputs both a professional looking graph and summary table to a multitude of ODS destinations and has many options to fine-tune the appearance to match the user’s preference.  The method used to create the forest plot is applicable to many other types of graphs.  MVMODELS is an incredibly useful macro for any programmer performing survival or logistic regression analyses.

 

References

1 Therneau T (2014). _A Package for Survival Analysis in S_. R package version 2.37-7, <URL: http://CRAN.R-project.org/package=survival>.

2 cNeil BJ: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29-36, 1982.

 

Contact Information

Your comments and questions are valued and encouraged. Contact the author at:

Name: Jeffrey Meyers

Enterprise: Mayo Clinic

E-mail: Meyers.jeffrey@mayo.edu / jpmeyers.spa@gmail.com

Comments

Is there a method for adding interaction terms in the logistic regression? 

 

Hello @PharmlyDoc,

   I uploaded my most recent version of the macro that include the ability to have an interaction p-value.  It is the p-value between a BY variable and the first covariate (if it's categorical).  The DISPLAY keyword is PVAL_INTER.  Please see the documentation for other updates.

Thanks, though I'm not understanding how to insert an interaction term. 

In proc logistic I'd specify an interaction term as sex*race or sex|race in the model statement, but doing that in the covariates statement does not work. 

I like that method because it's simple and outputs estimates on the forest plot at different levels of categorical variables in the interaction term that are written in a detailed manner (e.g. race African American vs Caucasian at sex = female; race African American vs Caucasian at sex = male; sex male vs female at race=African American;  sex male vs female at race=Caucasian).

I noticed the macro code has the ParameterEstimates output  – so now I'm trying to figure out if I can run a normal proc logistic and output its  ParameterEstimates into the macro. 

 

Another reason I don't like using the by= option is because I added a third refline (called trefline) where I have refline=1, srefline=0.5, and trefline=1.5, and using the by= option sends my reference lines to the left side of the forest plot. 

 

 

proc logistic data=heart  plots(only)=(EFFECT ODDSRATIO(logbase=10   TYPE=HORIZONTALSTAT ));
class race(ref='white') sex(ref='female') diabetes(ref='no') hypertension(ref='no')  /param=ref ;
model heartdisease(event='0')=age race|sex diabetes hypertension LDL CCI median_income / orpvalue clodds=wald  lackfit;
oddsratio age /  cl=wald;
oddsratio race / cl=wald;
oddsratio sex / cl=wald;
oddsratio  diabetes/ cl=wald;
oddsratio  hypertension / cl=wald;
oddsratio  LDL / cl=wald;
oddsratio CCI  / cl=wald;
oddsratio median_income  / cl=wald;
ods output ParameterEstimates=dPara;
ods output OddsRatios=odssratio;
ods output ClassLevelInfo=ClassLevelInfo;
ods output OddsRatiosWald=dCIWald;
ods output Nobs=dNum;
ods output ResponseProfile=dEvent;
output out= _tempdsn ;
run;

%mvmodels (DATA=heart,METHOD=LOGISTIC, NMODELS=1,
    EVENTCOV=heartdisease,EVENT=0,COVARIATES=age race sex diabetes hypertension LDL CCI median_income  , TYPE=1 2 2 2 2 1 1 1 , 
    CONT_DISPLAY=2, CAT_DISPLAY=4, cat_ref=white`female`no`no, XAXISTYPE=log, LOGBASE=10, XAXISLABEL=Odds Ratios (log scale),
    PVAL_TYPE3=0, 
DISPLAY=subtitle or_plot or_est_range , show_adjcovariates=1, refline=1,  SREFLINE=0.5, SREFLINEPATTERN=SHORTDASH, 
 TREFLINE=1.5, TREFLINEPATTERN=SHORTDASH, LOGTICKSTYLE = LOGEXPAND, LOGMINOR=TRUE, 
 LINECOLOR=blue, REFGUIDELOWER=Favors Heart Disease, REFGUIDEUPPER=Favors No Disease);




 

 

 

 

Hello @PharmlyDoc.  Unfortunately the macro does not display output from interaction models outside of the p-value of the interaction term (between the BY variable and first categorical covariate).  This is at the guidance of my statisticians where I work as displaying output from interaction models appropriately can be difficult with a macro designed to be generic as this one is.

Hi @JeffMeyers . Is there an option to alternate shading by variable, instead of alternating by row?  

Similar to the option you have for %TABLEN where "shading=2" ?

 

And can you add options to create a third refline and the ability to label the second and third reflines? 

 

@JeffMeyers 

 

I investigated the OUT_PLOT dataset and noticed that the "boldind" variable could be a candidate for shading by variables.

After tweaking your code I was able to get it work. I just can't  display 1 for the refline when I have 1.5 as a tickvalue.

 

%else %if &shading=2 %then %do;
if boldind=1 then _shade+1;
else if first.by_num and by_lvl=0 then _shade=0;
else if first.by_num and by_lvl>0 then _shade=1;
else if first.by_lvl then _shade+1;
if varnum in(0 0.25) then shadeind=0;
else shadeind=mod(_shade,2);
drop _shade;
retain _shade;
%end;

 

%mvmodels (DATA=Neuralgia, METHOD=LOGISTIC, NMODELS=1,
 EVENTCOV=Pain,EVENT=Yes,COVARIATES=Treatment Sex Age Duration, TYPE=2 2 1 1,
    CONT_DISPLAY=2, CAT_DISPLAY=4, cat_param=ref,  
    PVAL_TYPE3=0, 
 DISPLAY=subtitle or_plot or_est_range pval , show_adjcovariates=1, pval_covariates=1, 
 XAXISTYPE=log,  XAXISLABEL=Odds Ratios (log scale), refline=1,  SREFLINE=0.5, SREFLINEPATTERN=SHORTDASH,
 TREFLINE=1.5, TREFLINEPATTERN=SHORTDASH,
   LOGMINOR=TRUE, tickvalues= 0.001 0.01 0.1 0.5 1.5 10,
 LINECOLOR=blue, REFGUIDELOWER=Lower Odds of Pain, REFGUIDEUPPER=Higher Odds of Pain,
 shading=2, OUT_PLOT=Plot);

 

Screen Shot 2021-08-05 at 8.33.37 PM.png

 

 

 

 

With %TABLEN and shading=2 it shades on varlevel=. which produces the shading scheme I was looking for. 

%tablen(
    DATA = NEURALGIA,
    VAR = SEX AGE DURATION PAIN,
    TYPE = 2 1 1 2,
    BY = TREATMENT, shading=2,  debug=1);

  

@JeffMeyers  

Sorry for the constant questions. 

Is there a way to enter cat_ref values that contain commas such as for income?

 

e.g. cat_ref=male`hispanic`$80,000, cat_param=ref,
PVAL_TYPE3=0, DISPLAY=subtitle or_plot or_est_range pval , etc...

Hello @PharmlyDoc,

    That shading option is supposed to be there when SHADING=3, but looks like some other updates I made broke it.  I just fixed the code, but before making another update wanted to confirm what you mean by label for the reference lines?

 

Also, to set a macro parameter with commas (or other punctuation) use the %str() function.  For example: cat_ref=male`hispanic`%str($80,000),

@JeffMeyers Thanks for the help.

 

By "Label for the reference lines"  I mean to put the value of the refline on the x-axis. If I have a second reference line at 0.5 then I want the number 0.5 on the x axis directly below second refline. But I guess I already achieved that desired effect on the forest plot above by stating:

tickvalues= 0.001 0.01 0.1 0.5 1.5 10,

 

Hello Jeff

 

Thank you so much for creating all this wonderful time saving code. 

I am having troubles with getting the correct coding to change the reference category with the macro.

For example I have these formated refernce categories  from my regular PHREG Class statement:

class agemo_dx3(ref='Age: 0 to < 36 months') mole_cat2 (ref='Group4') meta_cat3 (ref='M0') ext_res (ref='GTR/NTR') Conv_Chemo_Only_Upfront dx_to_rel_cat2 (ref='12 & longer months') pat_rel(ref='Local Only') rel_Combo_tx2(ref='CSI alone') gtr_surg (ref='GTR')/param=ref;

 

I have tried several iterations for the macro - 

%mvmodels

(DATA=intentmol, METHOD=SURVIVAL, NMODELS=1, TIME=time_ev_rel_yr, CENS=event, CEN_VL=0,
COVARIATES=agemo_dx3 mole_cat2 meta_cat3 ext_res Conv_Chemo_Only_Upfront dx_to_rel_cat2 pat_rel rel_Combo_tx2 gtr_surg ,
TYPE= 2, CAT_REF = ' Age: 0 to < 36 months' Group4 ' M0 ' GTR/NTR ' Yes ' 12 & longer months ' Local Only 'CSI alone ' GTR , SHOW_ADJCOVARIATES=1);

 

Please advise on how to properly refer to the reference?

 

Also with your %newsurv MACRO I was wondering if there is a way to remove ' + Censor' from the text box as I do want to show the Log-rank p-value just not this label.

You also reference the 2020 newsurv. Where would it be located? 

thank you,

Hello @kjmathes03, thank you for the nice comments.  I believe the issues are possibly the following:

1) I can't tell if you're using a lowercase ~ (`) in the CAT_REF statement or quotation marks (').  The font makes them look like quotations, and if so you want to use the lowercase tilde unless there's a quote in the string.

2) Don't add spaces to the front and back of the reference group text.  The macro assumes all spaces are part of the string. ( 'Age: 0 to < 36 months'Group4'M0 'GTR/NTR'Yes'12 & longer months'Local Only'CSI alone'GTR,).

 

Depending on the version of NEWSURV that you have there's an option in CENSORMARKERS=2 where it will keep the censormarkers but remove the legend for them.  The macro is available to download on the communities page like this one, and there is a version attached with _2020 that is the version you're referencing.  If you e-mail me at meyers.jeffrey@mayo.edu I can send you the newest version I have as well.

Hello Again

I was just wondering if the macro for NEWSURV was capable of incorporating competing risk outcomes when an ID is repeated?

For example the outcome is time to lead failure with each ID able to have more than one lead with a separate lead failure time. Death in this case is the competing risk for lead failure.

I know phreg uses Cov(aggregrate) and an id statement.

 

thank you,

Hello again @JeffMeyers ,

Is there a method for adding the "units statement" for continuous variables in the logistic regression?

I'm using the 02/17/2021 version.

 

Let's say I have a variable for count of admissions (CNT_ADMT) and another for count of medications (CNT_MED). And I want to see the odds ratios for every 2 unit increase in admissions and for every 3 unit increase in medications:

 

proc logistic data=PATIENTS   PLOTS(ONLY MAXPOINTS=NONE)=(EFFECT ROC ODDSRATIO(logbase=10 TYPE=HORIZONTALSTAT)) namelen=20 ;

class SEX(ref='male') RACE(='non-Hispanic White') / param=ref;

model MI(event=1) = SEX RACE CNT_ADMT CNT_MED / CLODDS=wald orpvalue RSQ lackfit;

oddsratio sex/ cl=wald diff=ref;

oddsratio race/ cl=wald diff=ref;

oddsratio CNT_ADMT/ cl=wald diff=ref;

oddsratio CNT_MED/ cl=wald diff=ref;

UNITS CNT_ADMT = 2;

UNITS  CNT_MED = 3;

run;

quit;

 

 /*LOGISTIC Models*/
        %if (&&_ncat&i>0 or &&_ncont&i>0) and %sysevalf(%qupcase(&method)=LOGISTIC,boolean) and &_interp=1 %then %do;
            proc logistic data=_tempdsn&i.;
                /**Splits analysis by current BY variable**/
                by modelnum _rowby_lvl _colby_lvl _groupby_lvl _by_num;
                %if &&nstrata&i >0 %then %do;                                   
                    strata %do j=1 %to &&nstrata&i; 
                        _strata_&j
                    %end; / missing;/**Applies Stratification**/
                %end;
                class _by_lvl               
                    /**Creates class covariates with reference groups**/
                    %do j = 1 %to &&_ncat&i;
                        _cat_val_&j
                    %end; / param=&cat_param;
                /**Runs model statement**/
                model _event_val (event='1') /**Time and status variables**/=
                    %if %qscan(&&type&i,1,%str( ))=1 %then %do; _by_lvl _cont_1*_by_lvl %end;
                    %else %do; _by_lvl _cat_val_1*_by_lvl %end;
                    %do j = 1 %to &&_ncat&i;/**Character covariates**/
                        _cat_val_&j
                    %end;
                    %do j = 1 %to &&_ncont&i;/**Continuous covariates**/
                        _cont_&j
                    %end;; 
                /**Outputs temporary datasets**/
                ods output modelanova=_interp (where=(find(effect,'*'))) /**Type 3 p-values**/;
            run;
        %end;
        %if (&&_ncat&i>0 or &&_ncont&i>0) and %sysevalf(%qupcase(&method)=LOGISTIC,boolean) and 
            (&_hr=1 or &_cindex=1 or
            (&_pval=1 and (&&pval_covariates&i=1 or &&pval_type3&i=1))) %then %do;
            proc logistic data=_tempdsn&i.;
                /**Splits analysis by current BY variable**/
                by model;
                %if &&nstrata&i >0 %then %do;                                   
                    strata %do j=1 %to &&nstrata&i; 
                        _strata_&j
                    %end;;/**Applies Stratification**/
                %end;
                class               
                    /**Creates class covariates with reference groups**/
                    %do j = 1 %to &&_ncat&i;
                        _cat_val_&j (ref="0")
                    %end;  / param=&cat_param;
                /**Runs model statement**/
                model _event_val (event='1') /**Event variables**/=
                    %do j = 1 %to &&_ncat&i;/**Character covariates**/
                        _cat_val_&j
                    %end;
                    %do j = 1 %to &&_ncont&i;/**Continuous covariates**/
                        _cont_&j
                    %end;
                    / clodds=wald;
                /**Calculate Odds Ratios**/
                %do j = 1 %to &&_ncat&i;
                    oddsratio "_cat_val_&j" _cat_val_&j / diff=ref;
                %end;
                %do j = 1 %to &&_ncont&i;
                    oddsratio "_cont_&j" _cont_&j;
                %end;   
                /**Outputs temporary datasets**/
                ods output parameterestimates=_parm /**Parameter estimates and wald p-values**/
                       oddsratioswald=_odds /**Odds Ratios**/
                       %if &&_ncat&i>0 %then %do; modelanova=_t3 /**Type 3 p-values**/ %end;
                       globaltests=_gpval /**Global model p-values**/
                       fitstatistics=_fit /**Model fit statistics**/;
                %if &_cindex=1 and &&nstrata&i=0 %then %do;
                    output out=_preds predicted=_pred_ xbeta=_xbeta_ /**Outputs the betas for C-index calculations**/;
                %end;
            run;

 

Hello @PharmlyDoc

   Sorry I haven't been able to comment on here in some time.  The option you are looking for is CONT_STEP.  This allows you to change the step size for a continuous covariate.  You can list multiple step sizes for different variables by using a space delimited list (e.g. CONT_STEP=2 3).

Hello @PharmlyDoc @JeffMeyers 

I am trying to reproduce Figure 1, but with p-values from log-rank test.

I tried to add pval_by=1, but it did not work. Any suggestions? Thanks.

%mvmodels(DATA=random, METHOD=survival, TIME=os_time, CENS=os_stat,
  CEN_VL=0, COVARIATES=arm age gender, TYPE=2 1 2, pval_by=1, CAT_DISPLAY=4,
  CONT_STEP=10);

 

Hello @PKristanto ; The Log-rank test is only available with the BY level variables in this macro.  E.g. if BY=ARM and you were getting KM values by arm you could then get a log-rank p-value comparing the arm strata.

Version history
Last update:
‎07-19-2021 12:35 AM
Updated by:
Contributors

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags