This was a paper that was supposed to be presented at PharmaSUG 2020 this year, but due to the cancellation of the conference I still wanted to create a page with the paper and macro for download to share with everyone. This is a macro that my coworkers and I use commonly every day to assess the demographic distributions in our oncology trials.
Clinical trial publications frequently allocate at least one of the allotted tables to summarize the demographics, stratification factors, and other variables of interest of the patients involved with the study. These tables generally include basic distribution information such as frequencies, means, medians, and ranges while also comparing these distributions across a key factor, such as treatment arm, to determine whether there was any imbalance in patient populations when doing analysis. These distributions are not difficult to compute and combine into a table, but as treatments become more specific to patient characteristics such as genetic biomarkers and tumor stages there is a need to be able to display the distributions in subgroups. The macro %TABLEN is a tool developed to compute distribution statistics for continuous variables, date variables, discrete variables, univariate survival time-to-event variables, and univariate logistic regression variables and combine them into a table for publication. %TABLEN has multiple features for comparisons, subgrouping, and outputting to multiple destinations such as RTF and PDF. The macro %TABLEN is a valuable tool for any programmer summarizing patient level data.
Clinical manuscripts often lead with a summary table of the demographics of patients included in the study to demonstrate a balance between arms and other subgroups. The format is rather standard, and the TABLEN macro focuses on creating tables in this format:
%tablen(data=example, by=arm,
var=age date_on sex race1 smoke_st num_met,
type=1 3 2, outdoc=~/ibm/example1.rtf);
These tables include demographic background variables along with key study factors and show the distributions in order to assess whether the variables are balanced across study arms.
Given how often these tables are used in clinical manuscripts it is safe to say that multiple data centers already have in house programs (Mayo Clinic sure did) to make these tables, so what differentiates %TABLEN from what might already exist?
The TABLEN macro has the capability to summarize five different types of variables: continuous, discrete, date, survival and logistic. Each variable type has individual options to customize the output. These are specified with the VAR and TYPE macro options. The VAR option holds the list of variables and the TYPE option specifies which type they are with the following options:
These can include variables such as age, laboratory values, and date of randomization. Both numeric and date type variables are included here because they are summarized the same way, but different formats are applied to the results. The statistics that are shown are controlled by either the CONTDISPLAY or DATEDISPLAY macro options. Any number of the following keywords can be listed in either option, and are displayed in the order they are listed in the output:
The macro option DATEFMT controls the format that is applied for date variables, and there are options to control how many significant digits are shown for each variable individually.
%TABLEN(data=example, by=arm,
var=age date_on, type=1 3, pvals=1 5,
contdisplay=n_nmiss mean_sd median_range,
datedisplay=n median range)
P-values are available for either numeric or date type variables by listing one of the following numbers in the PVALS macro option:
Examples of these variables include gender, tumor staging, and race. These are variables that have a limited range of values. The TABLEN macro has several key options for controlling the output for discrete variables:
%TABLEN(data=example, by=arm,
var=smoke_st sex dz_g, type=2,
pvals=1 3 2, dis_order=3 1 2|1|)
The following P-values are available with the following codes in PVALS:
The TABLEN macro performs univariate survival analyses where the variable in the VAR list represents the time variable. The following parameters allow the user to determine the rest of the survival model:
The SURVDISPLAY option controls which statistics are shown and the order with the following keywords:
%TABLEN(data=example, by=arm,
var=fu_time pg_time, type=4, pvals=1 3,
surv_stat=fu_stat pg_stat, cen_vl=1, tdivisor=30.44 30.44,
timelist=6 12|3 6, time_units=months,
survdisplay=events_n median hr timelist)
Different options can be specified for each survival variable with the pipe delimiter as shown in the TIMELIST option in the above example. The following p-values are available with the following values in the PVALS option:
The TABLEN macro performs univariate logistic regression analyses when a BY variable is input where the variable in the VAR list represents the event variable. If a BY variable is not input then the binomial success rate is calculated. The following parameters allow the user to determine the rest of the logistic model:
The LOG_DISPLAY option controls which statistics are shown and the order with the following keywords:
%TABLEN(data=example, by=arm,
var=gd3, type=5, pvals=1,
log_event=1,
log_display=events_n binrate oddsratio)
The following p-values are available with the specified values in the PVALS option:
There are three different options for comparison or subgrouping: BY, COLBY (column BY), and ROWBY (row BY). These options can be used together or separately to account for different situations.
The BY variable creates one set of statistics for each level of the BY variable and allows the comparison of these measures with metrics such as p-values and hazard ratios. The following are key parameters associated with using a BY variable:
The COLBY option allows the tables to be nested into subgroups in multiple columns. The variable list, BY groupings, and p-values are summarized within each level of the COLBY variable. There is no limit to how many levels the COLBY variable can have, and this feature can work very nicely when combined with the ODS EXCEL destination.
The ROWBY option allows the tables to be nested into subgroups in multiple rows. The variable list, BY groupings, and p-values are summarized within each level of the ROWBY variable. There is no limit to how many levels the ROWBY variable can have, and this feature is more useful when horizontal space is limited.
The TABLEN macro is packed with features to improve quality of life of the programmer using the macro. These features include:
The TABLEN macro also has several options for changing the shading, spacing and borders of the final table output.
Table shading can either be disabled, shade alternating rows, or shade alternating variables. This is controlled by the SHADING parameter in the TABLEN macro call. The following are examples of disabling the shading and shading by row. The default of SHADING=1 to have shading by alternating row has been displayed in the prior table examples.
Within the table there are three options for the space between variables: adding a blank row, adding a line, and having no space or line between variables. This option is controlled by the SPLIT parameter in the TABLEN macro call. The default of SPLIT=SPACE has been displayed in the prior table examples and will add a space between each variable.
The amount of cell borders is controlled by the BORDERDISPLAY parameter in the TABLEN macro call and has four different options. The default of BORDERDISPLAY=1 only shows border lines at the top and bottom of the table and has been used in the prior table examples. The following examples show the other options: BORDERDISPLAY=2 turns on all cell borders, BORDERDISPLAY=3 turns on all cell borders except for title and footnote, and BORDERDISPLAY=4 turns on borders for columns.
The %TABLEN macro is a flexible and powerful tool for creating demographic summary tables, and has the features required to meet the additional challenges put forth by more complicated clinical trials. The ability to handle multiple variable types, output to several ODS destinations, create nested tables, and perform on the fly data set modifications is useful to any programmer in the healthcare industry.
The macro is attached to this article, and the paper and presentation will be added as they are finalized. Any questions or macro related issues can be sent to:
Name: Jeffrey Meyers
Enterprise: Mayo Clinic
E-mail: Meyers.jeffrey@mayo.edu / jpmeyers.spa@gmail.com
Hello @JeffMeyers and the tablen users community,
I'm getting something off with the "DIS_ORDER" parameter.
In the example below: for the sex variable, "(Male)" appears if I order it before a numeric variable,
but if ordered after a numeric variable, it does not.
(When including more variables before and after the numeric variable, the same happens.)
%tablen(
DATA = sample_data,
VAR = sex index_age sex,
TYPE = 2 1 2,
DIS_ORDER= 1 | 1,
SPLIT=NONE); Any insight would be greatly appreciated.
I opened up the macro to see if I could figure it out, but dissecting the genius within will take eons 🙂.
Hello @JeffMeyers and tablen users.
I think hiccup is highlighted when the _tester_ dataset is created (line 2658).
Between lines 2712 and 2718, there is logic preventing the display the variable in "factor".
Possibly related to the value of the "_ndisplay_&i._(1)" variable.
The pink text is what ~should be happening, but a condition is not quite being met when the type 2 variable is ordered after the type 1 variable:
%else %do;
factor="%superq(label&i)";
if %scan(&type,&i,%str( ))^=2 and dim(_ndisplay_&i._)=1 then factor=strip(factor)||', '||strip(_ndisplay_&i._(1));
else if %scan(&type,&i,%str( ))=2 and dim(_ndisplay_&i._)=2 and
%sysfunc(countw(%scan(%superq(dis_order),&i,|,m),%str( )))=1 and %sysfunc(find(%scan(%superq(dis_order),&i,|,m),freq,i))=0 then
factor=strip(factor)|| ' ('||strip(_ndisplay_&i._(1))||')';
%end;Tracing further back, this is probably connected to the creation of "_ndisplay_&i" variable first seen in line 1710 for type 2 variables.
Any insight or assistance would be greatly appreciated.
Thanks!
Hello everyone. I wanted to apologize for not being to reply to anyone as of late. My new role no longer uses some of the macros I've created so I don't have as much opportunity or need to make updates, and my current life situation with young children has kept me from making replies/updates in my spare time. I am planning to try to catch up if I can this weekend. I am uploading my most recent version of the macro to this article which I will use to test some of these issues.
Hello @beloko ,
I tested your scenario with the current version of the macro and I don't see the issue happening there. When the article updates with the new version attached (tablen_05102022.sas) I would try with that version:
%tablen(data=sashelp.cars,var=type horsepower type,type=2 1 2,by=origin,dis_order=1|1);
Hello @sskt . I have an update to the article up for review with a new version of the macro attached. When the review is finished it should be available for download.
@JeffMeyers Many many thanks!
Hello @JeffMeyers. Are there any news about the update yet? When is the new version of the macro available? It would be really nice if you could make the macro available soon. Many many thanks!
Hello @User04 . I submitted an update to this page with the new version some time ago (and for another macro of mine) but it is still waiting for review. I think I need to contact support for this.
@ChrisHemedinger Do you know why my updates submitted a few months ago are still marked as waiting for review? I haven't been able to post the updated version to a couple macros.
@JeffMeyers Just published both of these. I thought you could Publish the items yourself (no need to submit for review first, as you're the author), but let me know if that's not the case.
Thank you @ChrisHemedinger . I went back in to see options. I have a button for Save and a button for "Submit for Review." I tried clicking the Submit for Review (sorry it made another waiting for review version) to see if there is a pop up to skip to publish, but it sent it out for review again.
Thank you for your answer @JeffMeyers. But is it possible that you can update the post again in the menu and then just click save? Thanks again for your fantastic macro!
Can you add option to calculate 95% CI of the mean for continuous variable, and 95% CI of the proportion for categorical variable?
Hello @user021 . Unfortunately I haven't had time to update this macro lately. The company I work for doesn't currently use this macro and I've been busy with my family and haven't had as much time to work on my personal macros. I can try to take a look at adding the 95% CI of the mean (I think that's just +/- 1.96 STD correct?). I think adding 95% CI for a proportion would be quite a bit more work given how the macro is constructed. I think that would require getting the binomial CI for each level individually unless there's an option in PROC FREQ that I'm missing.
Hi @JeffMeyers ,
I would like to use the macro, and I wonder I a documentation about the macro validation is available somewhere,
or any information that shows functionnalities have been tested ?
Thank you for your reply and best regards,
Damien
Hello Damien (@Smokeman75),
Welcome to the tablen users community!
As a tablen user, I would recommend comparing the raw output that you get with your current coding methods with the tablen output. Then you will have the certainty of your own tested methods combined with the elegancy of the tablen output.
If you'd like to add some automation to that process, there is an output dataset from tablen called work._out which is output when you set the parameter debug=1. You can write some code to do the comparing between output from your methods and output from tablen.
Until SAS adopts this macro as its own and provides its stamp of "tested and approved", we are in the wild west of open source software. I will say, though, that if you take a look under the hood of this macro, you will find it to be quite elegant, transparent, methodical, and well documented, which are some of the hallmarks of a good open source software. It also provides intermediate datasets for checking/debugging/understanding. Also, @JeffMeyers is quite responsive to the questions in this forum (while quite busy with life outside this forum), which is another sign of a good open source software.
Please do test tablen against your current methods and let us know what you find. If it is anything like my experience, it will be a worthwhile exercise for you and a good service to the tablen users community.
Thanks and good luck!
Hi @beloko
I saw an earlier comment that you were working on adding standardized differences for comparing to groups as an option. Did you have any success or was there an update to the macro?
This would be very helpful for something I am working on
Best!
I just wanted to say thank you so much for making this macro!!! I was wondering is there a way to suppress cell counts of less than 11 within the macro statement?
Your macro, %TABLEN, and it is an absolute game changer for me. I wanted to check if you know how I can display the row percent rather than column percent for discrete variables?
Thank you in advance for your time!
Hello @rmorgenstern, This option can be set with the PCTDISPLAY parameter. Set this to either COL or ROW to get the percentages you want.
Hello @Smokeman75, This macro is considered open source and is not validated. It has been used extensively during my time at Mayo Clinic where results have been double checked for manuscripts and analyses, but there's no official validation documentation.
Hello @Sara614, This macro does not have a functionality to show the top X% of results unfortunately.
Thank you very much @JeffMeyers this is extremely useful
Hello, I face an Issue regarding TABLEN macro.
I am reporting some data regarding hematology tests, for which internationnal units have to be displayed due to submission to health authorities.
One kind of unit used for as internationnal unit use is 10^9/L, with this syntax (for Leukocyte count for example).
TABLEN macro displays this char as "10/L", as ODS escape char is set to "^" in the code of TABLEN macro.
Is there a way to display this character in the reports (either titles header or body of the report), in order to avoid to use equivalent unit that are not the official unit (such as /mm3...) ?
Thanks in advance,
Damien
Hello @JeffMeyers and the %TABLEN users community,
It would be very helpful to know which version(s) of SAS the %TABLEN macro was developed and tested on (M6 or M7?) in order to assess its compatibility with recent (M8) or future (M9) versions.
Does anyone have an answer or information on this? Has anyone had any issues with the M8 version?
Thanks in advance for your help!
Hello @Sylvain2 .
I developed this macro originally in SAS 9.3 I believe, and used it at my prior job through SAS 9.4m5. To my knowledge nothing significant changed in ods or output tables from 9.4m5 to 9.4m7 so it should work the same. My new job since then uses a different table generator for TFLs so I have not used or updated this macro in some years now. Breaking the macro would generally require variables changing names in output or ODS tables, so that is that I would watch out for.
Hi @JeffMeyers
I've been using this recently and it's working very well. I love this.
It's so useful that I'll forget how to use the original Freq and Means procedure syntax.
I have the same question as @Vladislav27 (05-03-2021 04:40 PM).
Could you tell me if this is possibility?
1) Can I set multiple variables?
2) Can I set a column that is all 0?
1) 2 variables example : "factor" and "ff" doesn't work;
data x;
do arm= 1 to 2 ;
ff='ff' ; nx =1 ;
factor='Test 1'; n=5; output;
factor='Test 2'; n=4; output;
factor='Test 3'; n=0; output;
end ;
run;
%tablen(data=x,by = arm , var=factor ff ,type=2,freq=n|nx , showpval=0 );
2) column : arm=2 is all 0 : doesn't work;
data x ; set x ;
if arm=2 and ff ^='ff' then n=0 ;
run ;
%tablen(data=x,by = arm , var=factor ff ,type=2,freq=n|nx , showpval=0 );
Hi @Smokeman75 (10-23-2024 08:10 AM)
How about this.
data aaa ;
do i = 1 to 20 ;
wbc=ranuni(0) ; output ;
end ;
label WBC="WBC【10^{unicode '00B3'x}/μL】" ;
run ;
%tablen ( data=aaa , var=wbc, type=1) ;
Error in the selection of the correct exact p-values
Hello everyone,
I have just discovered that when calculating the exact Wilcoxon rank test (P_VALS=4 for continuous variables), unfortunately the exact p-value is not displayed. The asymptotic two-tailed p-value P2_WIL is selected instead of the exact p-value, which is called XP2_WIL in the generated data set of the npar1way procedure. The same applies when calculating the exact Kruskal-Wallis test. The asymptotic p-value P_KW is also selected, rather than the exact p-value XP_KW. It would probably be necessary to check whether the other exact p-values have been selected correctly.
Unfortunately, I can't find all the places in MACRO where something needs to be changed so that the entire MACRO runs.
It would be nice, if the MACRO could be corrected or if anybody can say where to correct the MACRO.
Thanks!
Hello @User04. I saw this message and apologize for the inconvenience. I work somewhere now that doesn't use this macro anymore so I haven't been making updates to it unfortunately. Looking through the code the only place I see that would need to be modified would be here:
This looks to be at line 2851. The macro checks if the p-value codes for exact Wilcoxon or Kruskal Wallis test are being used and outputs to a separate dataset if so (referenced by p1e in the SQL query). This is the spot where variables from that specific dataset are being used. I don't know if at one time the variable names were the same and an update happened in a new SAS version and it changed, or if it's an oversite.
Hello @JeffMeyers .Thank you very much. I had already found that section.
However, when I change the values there, the MACRO no longer runs and I get the follwoing error messages:
In the SAS dataset _num_p1e the variables exists. I would be grateful if you could provide further suggestions as to what the errors might be.
Hello @User04 . Run the macro with DEBUG=1 to keep the temporary datasets and look at the variables in the dataset to make sure it's outputting as you expect. It looks like the variable names you're looking for are not in the output dataset. Here is the code run earlier in the macro to create the dataset:
proc npar1way data=_numeric noprint;
by _colbylvl_ _rowbylvl_ _numvar_;
class _bylvl_;
var _numval_;
exact wilcoxon kw;
output out=_num_p1e wilcoxon anova;
where exact=1;
freq _freq_;
run;
Thank you. I have looked in the tempory dataset num_p1e. Both variables exist in the temporary data.
Therefore, I am surprised by the error.
@User04 In your screenshot with the error you have XP_KW and not XP2_KW in your dataset screenshot.
Hi @JeffMeyers ,
Thanks so much for sharing this! It has made my life so much easier! Wondering - is there a way to apply suppression criteria to the macro? E.g. leave blank any values less than 11 in the output? Thank you!
@caitlinmb - as you probably know, this is a hilariously tricky problem to automate, mainly because it's so context-dependent. I have developed various versions of a table 1 macro over the years, and I've frequently toyed with the idea of adding this kind of thing (in particular, to comply with Medicare rules re: suppression). The conclusion I've come to is that, because of the risk involved in accidentally missing a count between 1 and 10, the only thing that really makes sense is to just add a simple feature that flags any rows with these counts and leave it to the user to manually suppress as needed. For anyone wondering, here's an illustration of the issue the frequently comes up:
The problem of course comes from the total row and total column. In order to suppress the 5, the best approach in this case is probably to (also) suppress the 44 above it and the 22 next to it (and the 46 at the intersection of the 44 and 22). But that might not always be the case depending on the categories - for instance, if one of the categories reflects unknown / missing data, you might preferentially suppress that. In order to automate / generalize this, you'd need some pretty complex parameterization of the macro and/or some complicated and possibly fragile logic. Just my $0.02.
Yes you are right. In my case the exact p-value of the Kruskal-Wallis test was not in the output, because my group variable had only two categories. Then only the assymptotic p-value P_KW is calculated. Only if I the variable has more than two categories the exact Kruskal-Wallis p-value XP_KW is calculated and added to the temporary dataset _num_p1e.
If the group variable has only two categories, the p-value XP2_WIL of the exact Wilcoxon Test is added to the temporary dataset _num_p1e. The assymptotic Wilcoxon p-value is P2_WIL.
I think the solution is to add a condition when you want the exact Kruskal-Wallis test and use the correct variable names for the exact Wilcoxon and Kruskal-Wallis test p-values:
%if &_nexact=1 %then %do;
%if &nby > 2 %then %do;
when 2 then p1e.xp_kw
%end;
%if &nby=2 and &_nbymiss=0 %then %do;
when 4 then p1e.xp2_wil
%end;
%end;
@quickbluefish Thank you! That makes total sense- I was playing around with it last night and realized that adding suppression criteria would be a pretty huge undertaking (at least, beyond my programming capabilities). Would you be open to sharing how you added the flag?
@burbanca - for fun, I tried to implement this, and it seems to work, but I haven't even remotely stress-tested it 🙂 It's just showing the stripped down logic using an example dataset - this is not incorporated into my table1.sas macro (in the same Github folder). https://github.com/Jeremy-Smith5/CEP-public/blob/main/SAS/suppression_logic.sas
Not sure that I will actually add this to the macro - it's sort of a fun problem, but there are just too many potential pitfalls with applying this to a real table with sensitive data. Would love to see a solution by someone who actually understands optimization problems (which is not me...).
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.