This was a paper that was supposed to be presented at PharmaSUG 2020 this year, but due to the cancellation of the conference I still wanted to create a page with the paper and macro for download to share with everyone. This is a macro that my coworkers and I use commonly every day to assess the demographic distributions in our oncology trials.
Abstract
Clinical trial publications frequently allocate at least one of the allotted tables to summarize the demographics, stratification factors, and other variables of interest of the patients involved with the study. These tables generally include basic distribution information such as frequencies, means, medians, and ranges while also comparing these distributions across a key factor, such as treatment arm, to determine whether there was any imbalance in patient populations when doing analysis. These distributions are not difficult to compute and combine into a table, but as treatments become more specific to patient characteristics such as genetic biomarkers and tumor stages there is a need to be able to display the distributions in subgroups. The macro %TABLEN is a tool developed to compute distribution statistics for continuous variables, date variables, discrete variables, univariate survival time-to-event variables, and univariate logistic regression variables and combine them into a table for publication. %TABLEN has multiple features for comparisons, subgrouping, and outputting to multiple destinations such as RTF and PDF. The macro %TABLEN is a valuable tool for any programmer summarizing patient level data.
Example of demographic summary table
Clinical manuscripts often lead with a summary table of the demographics of patients included in the study to demonstrate a balance between arms and other subgroups. The format is rather standard, and the TABLEN macro focuses on creating tables in this format:
%tablen(data=example, by=arm,
var=age date_on sex race1 smoke_st num_met,
type=1 3 2, outdoc=~/ibm/example1.rtf);
These tables include demographic background variables along with key study factors and show the distributions in order to assess whether the variables are balanced across study arms.
What makes this macro special?
Given how often these tables are used in clinical manuscripts it is safe to say that multiple data centers already have in house programs (Mayo Clinic sure did) to make these tables, so what differentiates %TABLEN from what might already exist?
Five different variable types
The TABLEN macro has the capability to summarize five different types of variables: continuous, discrete, date, survival and logistic. Each variable type has individual options to customize the output. These are specified with the VAR and TYPE macro options. The VAR option holds the list of variables and the TYPE option specifies which type they are with the following options:
1 = Continuous
2 = Discrete
3 = Date
4 = Survival
5 = Logistic
Numeric and Date type variables
These can include variables such as age, laboratory values, and date of randomization. Both numeric and date type variables are included here because they are summarized the same way, but different formats are applied to the results. The statistics that are shown are controlled by either the CONTDISPLAY or DATEDISPLAY macro options. Any number of the following keywords can be listed in either option, and are displayed in the order they are listed in the output:
N: Number of non-missing records
NMISS: Number of missing records
N_NMISS: Combines N and NMISS as N (NMISS)
MEAN: Mean of the distribution
SD: Standard Deviation of the distribution
MEAN_SD: Combines Mean and SD as MEAN (SD)
MEDIAN: Median of the distribution
RANGE: Range of the distribution as MIN, MAX
IQR: Interquartile range of the distribution as Q1, Q3
MEDIAN_RANGE: Combines Median and Range as MEDIAN (MIN, MAX)
MEDIAN_IQR: Combines Median and IQR as MEDIAN (Q1, Q3)
The macro option DATEFMT controls the format that is applied for date variables, and there are options to control how many significant digits are shown for each variable individually.
%TABLEN(data=example, by=arm,
var=age date_on, type=1 3, pvals=1 5,
contdisplay=n_nmiss mean_sd median_range,
datedisplay=n median range)
P-values are available for either numeric or date type variables by listing one of the following numbers in the PVALS macro option:
If BY variable is present
0 = No p-value
1 = Kruskal-Wallis
2 = Exact Kruskal-Wallis (long calculation times)
3 = Wilcoxon rank sum
4 = Exact Wilcoxon rank sum (long calculation times)
5 = ANOVA F-test
6 = Equal variance two sample t-test
7 = Unequal variance two sample t-test
If BY variable is not present
0 = No p-value
1 = Student T-Test
2 = Sign Rank
Discrete type variables
Examples of these variables include gender, tumor staging, and race. These are variables that have a limited range of values. The TABLEN macro has several key options for controlling the output for discrete variables:
DIS_DISPLAY: Determines if counts (N), percentages (PCT) or both (N_PCT) are shown
DIS_ORDER: Allows the user to change the default order of the variable values. The default order is the sorted unformatted values. The user can specify a numeric list of the ordered values to change them. For example the variable GENDER might have the values 'Male' and 'Female' which would have the default order of Female and then Male. The user can specify DIS_ORDER=2 1 in order to change the order to Male then Female. Multiple variables can be distinguished with the pipe (|) symbol: DIS_ORDER=2 1|3 1 2|2 1 3. This option can be used to limit how many values are shown. For example with gender if DIS_ORDER=1 then only Females will be shown. This can be useful to save vertical space in the table.
DIS_INCMISS: Determines whether missing values are included as a valid value for percentages
DIS_PRINTMISS: Determines if number of missing values are printed
%TABLEN(data=example, by=arm,
var=smoke_st sex dz_g, type=2,
pvals=1 3 2, dis_order=3 1 2|1|)
The following P-values are available with the following codes in PVALS:
0 = No p-value
1 = Chi-square
2 = Fisher's exact
3 = Cochran-Armitage trend test (Either BY or VAR must have 2 levels only)
Survival type variables
The TABLEN macro performs univariate survival analyses where the variable in the VAR list represents the time variable. The following parameters allow the user to determine the rest of the survival model:
SURV_STAT: Matches a survival event variable to the time variable listed in the VAR list
CEN_VL: Determines the value representing the non-event. This must be numeric.
TIMELIST: Specifies time-points for Kaplan-Meier event-free rates. Unique time-points can be specified for each survival type variable. Other parameters allow the time variable to be transformed (TDIVISOR), a unit label to be applied (TIME_UNITS), and the display format (TIMELISTFMT: percentage vs. proportion).
The SURVDISPLAY option controls which statistics are shown and the order with the following keywords:
N: Number of patients
EVENTS: Number of events
EVENTS_N: Combines N and Events as EVENTS/N
MEDIAN: Kaplan-Meier survival median and 95% CI
HR: Cox model hazard ratio and 95% CI
COXPVAL: Cox model Wald p-value comparing parameters
TIMELIST: Kaplan-Meier time-point event-free rates and 95% CI
%TABLEN(data=example, by=arm,
var=fu_time pg_time, type=4, pvals=1 3,
surv_stat=fu_stat pg_stat, cen_vl=1, tdivisor=30.44 30.44,
timelist=6 12|3 6, time_units=months,
survdisplay=events_n median hr timelist)
Different options can be specified for each survival variable with the pipe delimiter as shown in the TIMELIST option in the above example. The following p-values are available with the following values in the PVALS option:
0 = No p-value
1 = Logrank
2 = Wilcoxon
3 = Cox model type-3 score
4 = Cox model type-3 likelihood-ratio
5 = Cox model type-3 Wald
Logistic type variables
The TABLEN macro performs univariate logistic regression analyses when a BY variable is input where the variable in the VAR list represents the event variable. If a BY variable is not input then the binomial success rate is calculated. The following parameters allow the user to determine the rest of the logistic model:
LOG_EVENT: Determines the event value for the binomial variable in the VAR list.
LOG_BINFMT: Determines if the binomial success rate is calculated as a proportion or percentage.
The LOG_DISPLAY option controls which statistics are shown and the order with the following keywords:
N: Sample size
EVENTS: Number of events or successes
ODDSRATIO: Logistic regression odds ratio and 95% CI
WALDPVAL: Logistic regression Wald p-value comparing parameters
BINRATE: Binomial success rate and 95% CI
%TABLEN(data=example, by=arm,
var=gd3, type=5, pvals=1,
log_event=1,
log_display=events_n binrate oddsratio)
The following p-values are available with the specified values in the PVALS option:
When a BY variable is present
0 = No p-value
1 = logistic regression type-3 Wald p-value
When no BY variable is present
0 = No p-value
1 = Chi-square p-value
2 = Fisher's exact p-value
Options for comparison and subgrouping
There are three different options for comparison or subgrouping: BY, COLBY (column BY), and ROWBY (row BY). These options can be used together or separately to account for different situations.
BY variable options
The BY variable creates one set of statistics for each level of the BY variable and allows the comparison of these measures with metrics such as p-values and hazard ratios. The following are key parameters associated with using a BY variable:
BYORDER: Allows the user to change the order of the BY variable levels without modifying the original data. The order by default is the unformatted values.
BY_INCMISS: Determines if missing values are included as a valid value for comparisons
PVALS: Determines which type of p-value is used for comparison. Different p-values are available for each variable type and are set for each variable in the VAR list individually.
REFERENCE: Specifies the reference level when computing hazard ratios or odds ratios
Column BY (COLBY) options
The COLBY option allows the tables to be nested into subgroups in multiple columns. The variable list, BY groupings, and p-values are summarized within each level of the COLBY variable. There is no limit to how many levels the COLBY variable can have, and this feature can work very nicely when combined with the ODS EXCEL destination.
ROW BY (ROWBY)
The ROWBY option allows the tables to be nested into subgroups in multiple rows. The variable list, BY groupings, and p-values are summarized within each level of the ROWBY variable. There is no limit to how many levels the ROWBY variable can have, and this feature is more useful when horizontal space is limited.
Quality of life features
The TABLEN macro is packed with features to improve quality of life of the programmer using the macro. These features include:
Full error checking for every parameter to stop the macro from crashing and ruining a SAS session. The macro cleans up all temporary data sets that it creates and restores any option values that it changed.
Options such as TYPE carry their last value forward so that values don't have to be repeated. For example with the TYPE variable if there are 5 variables that have the first as continuous and the rest discrete the following two uses of the TYPE option are the same: 1 2 2 2 2 and 1 2. In the second list the 2 will be carried forward for the remaining 3 variables if no further values are specified.
Each variable type's options will only be applied to variables of those types. For example if the first survival variable is the third variable in the VAR list, the first option listed in SURV_STAT will still apply to the first listed survival variable. There is no need to account for the other variable types.
The LABELS option allows the user to specify new labels for variables without changing their data set. The macro takes the current variable label by default.
The WHERE option allows the user to subset their data set without making a new one to run with the macro. This will not affect their original data set.
The macro can output to RTF, PDF, EXCEL, HTML, and POWERPOINT destinations while keeping the same look. The WORD destination is also being reviewed currently. The table is also looks the same in SAS Studio. The following image shows the RTF, PDF, EXCEL, SAS Studio, and POWERPOINT destinations in order:
The TABLEN macro also makes a table for the LISTING environment that works well with .LST files:
P-value footnotes are automatically created for the table along with the super scripts. These can also be turned off.
Other style options
The TABLEN macro also has several options for changing the shading, spacing and borders of the final table output.
Table shading
Table shading can either be disabled, shade alternating rows, or shade alternating variables. This is controlled by the SHADING parameter in the TABLEN macro call. The following are examples of disabling the shading and shading by row. The default of SHADING=1 to have shading by alternating row has been displayed in the prior table examples.
Table spacing
Within the table there are three options for the space between variables: adding a blank row, adding a line, and having no space or line between variables. This option is controlled by the SPLIT parameter in the TABLEN macro call. The default of SPLIT=SPACE has been displayed in the prior table examples and will add a space between each variable.
Table borders
The amount of cell borders is controlled by the BORDERDISPLAY parameter in the TABLEN macro call and has four different options. The default of BORDERDISPLAY=1 only shows border lines at the top and bottom of the table and has been used in the prior table examples. The following examples show the other options: BORDERDISPLAY=2 turns on all cell borders, BORDERDISPLAY=3 turns on all cell borders except for title and footnote, and BORDERDISPLAY=4 turns on borders for columns.
Conclusion
The %TABLEN macro is a flexible and powerful tool for creating demographic summary tables, and has the features required to meet the additional challenges put forth by more complicated clinical trials. The ability to handle multiple variable types, output to several ODS destinations, create nested tables, and perform on the fly data set modifications is useful to any programmer in the healthcare industry.
The macro is attached to this article, and the paper and presentation will be added as they are finalized. Any questions or macro related issues can be sent to:
Name: Jeffrey Meyers
Enterprise: Mayo Clinic
E-mail: Meyers.jeffrey@mayo.edu / jpmeyers.spa@gmail.com
... View more