Demographic Table and Subgroup Summary Macro %TABLEN

15 Likes

This was a paper that was supposed to be presented at PharmaSUG 2020 this year, but due to the cancellation of the conference I still wanted to create a page with the paper and macro for download to share with everyone. This is a macro that my coworkers and I use commonly every day to assess the demographic distributions in our oncology trials.

Abstract

Clinical trial publications frequently allocate at least one of the allotted tables to summarize the demographics, stratification factors, and other variables of interest of the patients involved with the study. These tables generally include basic distribution information such as frequencies, means, medians, and ranges while also comparing these distributions across a key factor, such as treatment arm, to determine whether there was any imbalance in patient populations when doing analysis. These distributions are not difficult to compute and combine into a table, but as treatments become more specific to patient characteristics such as genetic biomarkers and tumor stages there is a need to be able to display the distributions in subgroups. The macro %TABLEN is a tool developed to compute distribution statistics for continuous variables, date variables, discrete variables, univariate survival time-to-event variables, and univariate logistic regression variables and combine them into a table for publication. %TABLEN has multiple features for comparisons, subgrouping, and outputting to multiple destinations such as RTF and PDF. The macro %TABLEN is a valuable tool for any programmer summarizing patient level data.

Example of demographic summary table

Clinical manuscripts often lead with a summary table of the demographics of patients included in the study to demonstrate a balance between arms and other subgroups. The format is rather standard, and the TABLEN macro focuses on creating tables in this format:

%tablen(data=example, by=arm, 
     var=age date_on sex race1 smoke_st num_met,
     type=1 3 2, outdoc=~/ibm/example1.rtf);

These tables include demographic background variables along with key study factors and show the distributions in order to assess whether the variables are balanced across study arms.

What makes this macro special?

Given how often these tables are used in clinical manuscripts it is safe to say that multiple data centers already have in house programs (Mayo Clinic sure did) to make these tables, so what differentiates %TABLEN from what might already exist?

Five different variable types

The TABLEN macro has the capability to summarize five different types of variables: continuous, discrete, date, survival and logistic. Each variable type has individual options to customize the output. These are specified with the VAR and TYPE macro options. The VAR option holds the list of variables and the TYPE option specifies which type they are with the following options:

1 = Continuous
2 = Discrete
3 = Date
4 = Survival
5 = Logistic

Numeric and Date type variables

These can include variables such as age, laboratory values, and date of randomization. Both numeric and date type variables are included here because they are summarized the same way, but different formats are applied to the results. The statistics that are shown are controlled by either the CONTDISPLAY or DATEDISPLAY macro options. Any number of the following keywords can be listed in either option, and are displayed in the order they are listed in the output:

N: Number of non-missing records
NMISS: Number of missing records
N_NMISS: Combines N and NMISS as N (NMISS)
MEAN: Mean of the distribution
SD: Standard Deviation of the distribution
MEAN_SD: Combines Mean and SD as MEAN (SD)
MEDIAN: Median of the distribution
RANGE: Range of the distribution as MIN, MAX
IQR: Interquartile range of the distribution as Q1, Q3
MEDIAN_RANGE: Combines Median and Range as MEDIAN (MIN, MAX)
MEDIAN_IQR: Combines Median and IQR as MEDIAN (Q1, Q3)

The macro option DATEFMT controls the format that is applied for date variables, and there are options to control how many significant digits are shown for each variable individually.

%TABLEN(data=example, by=arm, 
    var=age date_on, type=1 3, pvals=1 5,
    contdisplay=n_nmiss mean_sd median_range,
    datedisplay=n median range)

P-values are available for either numeric or date type variables by listing one of the following numbers in the PVALS macro option:

If BY variable is present
- 0 = No p-value
- 1 = Kruskal-Wallis
- 2 = Exact Kruskal-Wallis (long calculation times)
- 3 = Wilcoxon rank sum
- 4 = Exact Wilcoxon rank sum (long calculation times)
- 5 = ANOVA F-test
- 6 = Equal variance two sample t-test
- 7 = Unequal variance two sample t-test
If BY variable is not present
- 0 = No p-value
- 1 = Student T-Test
- 2 = Sign Rank

Discrete type variables

Examples of these variables include gender, tumor staging, and race. These are variables that have a limited range of values. The TABLEN macro has several key options for controlling the output for discrete variables:

DIS_DISPLAY: Determines if counts (N), percentages (PCT) or both (N_PCT) are shown
DIS_ORDER: Allows the user to change the default order of the variable values. The default order is the sorted unformatted values. The user can specify a numeric list of the ordered values to change them. For example the variable GENDER might have the values 'Male' and 'Female' which would have the default order of Female and then Male. The user can specify DIS_ORDER=2 1 in order to change the order to Male then Female. Multiple variables can be distinguished with the pipe (|) symbol: DIS_ORDER=2 1|3 1 2|2 1 3. This option can be used to limit how many values are shown. For example with gender if DIS_ORDER=1 then only Females will be shown. This can be useful to save vertical space in the table.
DIS_INCMISS: Determines whether missing values are included as a valid value for percentages
DIS_PRINTMISS: Determines if number of missing values are printed

%TABLEN(data=example, by=arm, 
    var=smoke_st sex dz_g, type=2,  
    pvals=1 3 2, dis_order=3 1 2|1|)

The following P-values are available with the following codes in PVALS:

0 = No p-value
1 = Chi-square
2 = Fisher's exact
3 = Cochran-Armitage trend test (Either BY or VAR must have 2 levels only)

Survival type variables

The TABLEN macro performs univariate survival analyses where the variable in the VAR list represents the time variable. The following parameters allow the user to determine the rest of the survival model:

SURV_STAT: Matches a survival event variable to the time variable listed in the VAR list
CEN_VL: Determines the value representing the non-event. This must be numeric.
TIMELIST: Specifies time-points for Kaplan-Meier event-free rates. Unique time-points can be specified for each survival type variable. Other parameters allow the time variable to be transformed (TDIVISOR), a unit label to be applied (TIME_UNITS), and the display format (TIMELISTFMT: percentage vs. proportion).

The SURVDISPLAY option controls which statistics are shown and the order with the following keywords:

N: Number of patients
EVENTS: Number of events
EVENTS_N: Combines N and Events as EVENTS/N
MEDIAN: Kaplan-Meier survival median and 95% CI
HR: Cox model hazard ratio and 95% CI
COXPVAL: Cox model Wald p-value comparing parameters
TIMELIST: Kaplan-Meier time-point event-free rates and 95% CI

%TABLEN(data=example, by=arm, 
    var=fu_time pg_time, type=4,  pvals=1 3,
    surv_stat=fu_stat pg_stat, cen_vl=1, tdivisor=30.44 30.44, 
    timelist=6 12|3 6, time_units=months,
    survdisplay=events_n median hr timelist)

Different options can be specified for each survival variable with the pipe delimiter as shown in the TIMELIST option in the above example. The following p-values are available with the following values in the PVALS option:

0 = No p-value
1 = Logrank
2 = Wilcoxon
3 = Cox model type-3 score
4 = Cox model type-3 likelihood-ratio
5 = Cox model type-3 Wald

Logistic type variables

The TABLEN macro performs univariate logistic regression analyses when a BY variable is input where the variable in the VAR list represents the event variable. If a BY variable is not input then the binomial success rate is calculated. The following parameters allow the user to determine the rest of the logistic model:

LOG_EVENT: Determines the event value for the binomial variable in the VAR list.
LOG_BINFMT: Determines if the binomial success rate is calculated as a proportion or percentage.

The LOG_DISPLAY option controls which statistics are shown and the order with the following keywords:

N: Sample size
EVENTS: Number of events or successes
ODDSRATIO: Logistic regression odds ratio and 95% CI
WALDPVAL: Logistic regression Wald p-value comparing parameters
BINRATE: Binomial success rate and 95% CI

%TABLEN(data=example, by=arm, 
    var=gd3, type=5,  pvals=1,
    log_event=1,
    log_display=events_n binrate oddsratio)

The following p-values are available with the specified values in the PVALS option:

When a BY variable is present
- 0 = No p-value
- 1 = logistic regression type-3 Wald p-value
When no BY variable is present
- 0 = No p-value
- 1 = Chi-square p-value
- 2 = Fisher's exact p-value

Options for comparison and subgrouping

There are three different options for comparison or subgrouping: BY, COLBY (column BY), and ROWBY (row BY). These options can be used together or separately to account for different situations.

BY variable options

The BY variable creates one set of statistics for each level of the BY variable and allows the comparison of these measures with metrics such as p-values and hazard ratios. The following are key parameters associated with using a BY variable:

BYORDER: Allows the user to change the order of the BY variable levels without modifying the original data. The order by default is the unformatted values.
BY_INCMISS: Determines if missing values are included as a valid value for comparisons
PVALS: Determines which type of p-value is used for comparison. Different p-values are available for each variable type and are set for each variable in the VAR list individually.
REFERENCE: Specifies the reference level when computing hazard ratios or odds ratios

Column BY (COLBY) options

The COLBY option allows the tables to be nested into subgroups in multiple columns. The variable list, BY groupings, and p-values are summarized within each level of the COLBY variable. There is no limit to how many levels the COLBY variable can have, and this feature can work very nicely when combined with the ODS EXCEL destination.

ROW BY (ROWBY)

The ROWBY option allows the tables to be nested into subgroups in multiple rows. The variable list, BY groupings, and p-values are summarized within each level of the ROWBY variable. There is no limit to how many levels the ROWBY variable can have, and this feature is more useful when horizontal space is limited.

Quality of life features

The TABLEN macro is packed with features to improve quality of life of the programmer using the macro. These features include:

Full error checking for every parameter to stop the macro from crashing and ruining a SAS session. The macro cleans up all temporary data sets that it creates and restores any option values that it changed.
Options such as TYPE carry their last value forward so that values don't have to be repeated. For example with the TYPE variable if there are 5 variables that have the first as continuous and the rest discrete the following two uses of the TYPE option are the same: 1 2 2 2 2 and 1 2. In the second list the 2 will be carried forward for the remaining 3 variables if no further values are specified.
Each variable type's options will only be applied to variables of those types. For example if the first survival variable is the third variable in the VAR list, the first option listed in SURV_STAT will still apply to the first listed survival variable. There is no need to account for the other variable types.
The LABELS option allows the user to specify new labels for variables without changing their data set. The macro takes the current variable label by default.
The WHERE option allows the user to subset their data set without making a new one to run with the macro. This will not affect their original data set.
The macro can output to RTF, PDF, EXCEL, HTML, and POWERPOINT destinations while keeping the same look. The WORD destination is also being reviewed currently. The table is also looks the same in SAS Studio. The following image shows the RTF, PDF, EXCEL, SAS Studio, and POWERPOINT destinations in order:

The TABLEN macro also makes a table for the LISTING environment that works well with .LST files:

P-value footnotes are automatically created for the table along with the super scripts. These can also be turned off.

Other style options

The TABLEN macro also has several options for changing the shading, spacing and borders of the final table output.

Table shading

Table shading can either be disabled, shade alternating rows, or shade alternating variables. This is controlled by the SHADING parameter in the TABLEN macro call. The following are examples of disabling the shading and shading by row. The default of SHADING=1 to have shading by alternating row has been displayed in the prior table examples.

Table spacing

Within the table there are three options for the space between variables: adding a blank row, adding a line, and having no space or line between variables. This option is controlled by the SPLIT parameter in the TABLEN macro call. The default of SPLIT=SPACE has been displayed in the prior table examples and will add a space between each variable.

Table borders

The amount of cell borders is controlled by the BORDERDISPLAY parameter in the TABLEN macro call and has four different options. The default of BORDERDISPLAY=1 only shows border lines at the top and bottom of the table and has been used in the prior table examples. The following examples show the other options: BORDERDISPLAY=2 turns on all cell borders, BORDERDISPLAY=3 turns on all cell borders except for title and footnote, and BORDERDISPLAY=4 turns on borders for columns.

Conclusion

The %TABLEN macro is a flexible and powerful tool for creating demographic summary tables, and has the features required to meet the additional challenges put forth by more complicated clinical trials. The ability to handle multiple variable types, output to several ODS destinations, create nested tables, and perform on the fly data set modifications is useful to any programmer in the healthcare industry.

The macro is attached to this article, and the paper and presentation will be added as they are finalized. Any questions or macro related issues can be sent to:

Name: Jeffrey Meyers

Enterprise: Mayo Clinic

E-mail: Meyers.jeffrey@mayo.edu / jpmeyers.spa@gmail.com

beloko · ‎11-29-2022

Hello @JeffMeyers and the tablen users community,

I'm getting something off with the "DIS_ORDER" parameter.

In the example below: for the sex variable, "(Male)" appears if I order it before a numeric variable,

but if ordered after a numeric variable, it does not.

(When including more variables before and after the numeric variable, the same happens.)

%tablen(
  DATA = sample_data, 
  VAR = sex index_age sex, 
  TYPE = 2 1 2,
  DIS_ORDER= 1 | 1, 
  SPLIT=NONE);

Any insight would be greatly appreciated.

I opened up the macro to see if I could figure it out, but dissecting the genius within will take eons 🙂.

beloko · ‎12-01-2022

Hello @JeffMeyers and tablen users.

I think hiccup is highlighted when the _tester_ dataset is created (line 2658).

Between lines 2712 and 2718, there is logic preventing the display the variable in "factor".

Possibly related to the value of the "_ndisplay_&i._(1)" variable.

The pink text is what ~should be happening, but a condition is not quite being met when the type 2 variable is ordered after the type 1 variable:

                    %else %do;
                        factor="%superq(label&i)"; 
                        if %scan(&type,&i,%str( ))^=2 and dim(_ndisplay_&i._)=1 then factor=strip(factor)||', '||strip(_ndisplay_&i._(1));
                        else if %scan(&type,&i,%str( ))=2 and dim(_ndisplay_&i._)=2 and 
                            %sysfunc(countw(%scan(%superq(dis_order),&i,|,m),%str( )))=1 and %sysfunc(find(%scan(%superq(dis_order),&i,|,m),freq,i))=0 then 
                                factor=strip(factor)|| ' ('||strip(_ndisplay_&i._(1))||')';
                    %end;

Tracing further back, this is probably connected to the creation of "_ndisplay_&i" variable first seen in line 1710 for type 2 variables.

Any insight or assistance would be greatly appreciated.

Thanks!

JeffMeyers · ‎12-01-2022

Hello everyone. I wanted to apologize for not being to reply to anyone as of late. My new role no longer uses some of the macros I've created so I don't have as much opportunity or need to make updates, and my current life situation with young children has kept me from making replies/updates in my spare time. I am planning to try to catch up if I can this weekend. I am uploading my most recent version of the macro to this article which I will use to test some of these issues.

JeffMeyers · ‎12-01-2022

Hello @beloko ,

I tested your scenario with the current version of the macro and I don't see the issue happening there. When the article updates with the new version attached (tablen_05102022.sas) I would try with that version:

%tablen(data=sashelp.cars,var=type horsepower type,type=2 1 2,by=origin,dis_order=1|1);

JeffMeyers · ‎12-01-2022

Hello @sskt . I have an update to the article up for review with a new version of the macro attached. When the review is finished it should be available for download.

beloko · ‎12-01-2022

@JeffMeyers Many many thanks!

User04 · ‎03-01-2023

Hello @JeffMeyers. Are there any news about the update yet? When is the new version of the macro available? It would be really nice if you could make the macro available soon. Many many thanks!

JeffMeyers · ‎03-07-2023

Hello @User04 . I submitted an update to this page with the new version some time ago (and for another macro of mine) but it is still waiting for review. I think I need to contact support for this.

@ChrisHemedinger Do you know why my updates submitted a few months ago are still marked as waiting for review? I haven't been able to post the updated version to a couple macros.

ChrisHemedinger · ‎03-07-2023

@JeffMeyers Just published both of these. I thought you could Publish the items yourself (no need to submit for review first, as you're the author), but let me know if that's not the case.

JeffMeyers · ‎03-07-2023

Thank you @ChrisHemedinger . I went back in to see options. I have a button for Save and a button for "Submit for Review." I tried clicking the Submit for Review (sorry it made another waiting for review version) to see if there is a pop up to skip to publish, but it sent it out for review again.

User04 · ‎03-09-2023

Thank you for your answer @JeffMeyers. But is it possible that you can update the post again in the menu and then just click save? Thanks again for your fantastic macro!

user021 · ‎05-13-2023

@JeffMeyers

Can you add option to calculate 95% CI of the mean for continuous variable, and 95% CI of the proportion for categorical variable?

JeffMeyers · ‎05-24-2023

Hello @user021 . Unfortunately I haven't had time to update this macro lately. The company I work for doesn't currently use this macro and I've been busy with my family and haven't had as much time to work on my personal macros. I can try to take a look at adding the 95% CI of the mean (I think that's just +/- 1.96 STD correct?). I think adding 95% CI for a proportion would be quite a bit more work given how the macro is constructed. I think that would require getting the binomial CI for each level individually unless there's an option in PROC FREQ that I'm missing.

Smokeman75 · ‎07-28-2023

Hi @JeffMeyers ,

I would like to use the macro, and I wonder I a documentation about the macro validation is available somewhere,

or any information that shows functionnalities have been tested ?

Thank you for your reply and best regards,

Damien

beloko · ‎07-31-2023

Hello Damien (@Smokeman75),

Welcome to the tablen users community!

As a tablen user, I would recommend comparing the raw output that you get with your current coding methods with the tablen output. Then you will have the certainty of your own tested methods combined with the elegancy of the tablen output.

If you'd like to add some automation to that process, there is an output dataset from tablen called work._out which is output when you set the parameter debug=1. You can write some code to do the comparing between output from your methods and output from tablen.

Until SAS adopts this macro as its own and provides its stamp of "tested and approved", we are in the wild west of open source software. I will say, though, that if you take a look under the hood of this macro, you will find it to be quite elegant, transparent, methodical, and well documented, which are some of the hallmarks of a good open source software. It also provides intermediate datasets for checking/debugging/understanding. Also, @JeffMeyers is quite responsive to the questions in this forum (while quite busy with life outside this forum), which is another sign of a good open source software.

Please do test tablen against your current methods and let us know what you find. If it is anything like my experience, it will be a worthwhile exercise for you and a good service to the tablen users community.

Thanks and good luck!

jrb182 · ‎12-03-2023

Hi @beloko

I saw an earlier comment that you were working on adding standardized differences for comparing to groups as an option. Did you have any success or was there an update to the macro?

This would be very helpful for something I am working on

Best!

Sara614 · ‎04-15-2024

I just wanted to say thank you so much for making this macro!!! I was wondering is there a way to suppress cell counts of less than 11 within the macro statement?

rmorgenstern · ‎04-16-2024

Your macro, %TABLEN, and it is an absolute game changer for me. I wanted to check if you know how I can display the row percent rather than column percent for discrete variables?

Thank you in advance for your time!

JeffMeyers

Hello @rmorgenstern, This option can be set with the PCTDISPLAY parameter. Set this to either COL or ROW to get the percentages you want.

JeffMeyers

Hello @Smokeman75, This macro is considered open source and is not validated. It has been used extensively during my time at Mayo Clinic where results have been double checked for manuscripts and analyses, but there's no official validation documentation.

JeffMeyers

Hello @Sara614, This macro does not have a functionality to show the top X% of results unfortunately.