Solved: Using Proc Report and Proc Freq/chi-sq to create "Table 1" for researc...

Wolverine · Posted 05-09-2024 09:57 AM

I've been looking for a good, efficient method to create "Table 1", which has a more or less standard format in research journals:

Variable Name	Category 1	Category 2	P-value
Variable 1
1	N(%)	N(%)	Chi-sq or Fisher's
2	N(%)	N(%)
3	N(%)	N(%)
Variable 2	Mean (SD)	Mean (SD)	t-test
	Median (IQR)	Median (IQR)

I found this PDF paper. It seems to work pretty well, but I'm having 2 issues. The first is that the chi-square results are saved into a file and then merged into the main file. But the value of the "variable" variable is the var label in 1 file and the var name in the other, so they don't merge.

The second issue is that Proc Report does not produce a table and instead gives me this: NOTE: Groups are not created because the usage of levels is DISPLAY. To avoid this note, change all GROUP variables to ORDER variables.

WARNING: A GROUP, ORDER, or ACROSS variable is missing on every observation.

data have;
  infile datalines dsd dlm=',' truncover;
  input DEM_AGE DEM_SEX cohort_flag TM_group;
datalines;
3,1,1,0
2,1,1,1
3,2,1,1
3,2,1,1
3,2,1,0
2,2,1,1
2,1,1,1
3,1,1,1
2,1,1,1
3,2,1,0
2,1,1,0
2,2,1,1
3,2,1,0
2,2,0,
3,2,1,1
3,2,1,1
3,1,1,0
3,2,1,0
2,1,1,0
3,1,1,1
3,2,1,1
3,2,1,0
3,2,1,1
3,2,1,1
3,2,1,1
; RUN;
/*Load formats from existing file in temp folder*/
options fmtsearch=(temp.formats);
/*DEM_SEX	sex	1 Male 2 Female*/
/*DEM_AGE	AGE2GRP 1:Age Group <65 2:Age Group [65,75) 3:Age Group >=75*/
/*TM_group	yesno	1 Yes 2 No*/

/*Generate descriptive statistics*/
proc means data = temp.have noprint n sum mean;
	class DEM_AGE DEM_SEX;
	var TM_group /*MA_group*/;
	ways 1;
	output out = temp.expl_PreTable n =
	sum =
	mean = / autoname;
	WHERE cohort_flag = 1; 
run;

/*Format descriptive stats*/
data temp.expl_Table (keep = variable levels TM_group_N TM_group_sum
		TM_group_mean pct ExpPct indexvar);	set temp.expl_PreTable;

	length variable $ 20; 	/* These four variables */;
	length levels $ 20; 	/* will describe the first */;
	length pct $ 8; 	/* four columns of the table */;
	length ExpPct $ 15;

	if DEM_AGE ne . then do; /*Building "variable" and "Levels" columns for "DEM_AGE"*/;

	variable = 'Age category';
	levels = put(DEM_AGE, age2grp.);
	IndexVar = 1; /*This index is included just in case the order of data presentation needs to be changed*/;
end;

	if DEM_SEX ne . then do; /*Building "variable" and "Levels" columns for "DEM_SEX"*/;
	variable = 'Sex';
	levels = put(DEM_SEX,sex.);
	IndexVar = 2;
end;

	pct = put(TM_group_mean*100,4.1); /*Calculate % exposed */;
	ExpPct = compress(put(TM_group_sum,comma4.),' ')
		||' '||'('||compress(pct,' ')||')'; /*creating data in the form of "count (%)" */;
run;

/*Run chi-square significance tests and use ODS to create a dataset of these results*/
ods trace on;
ods output chisq = temp.expl_ChiData;

proc freq data = temp.have;
	table TM_group*DEM_AGE / chisq;
	table TM_group*DEM_SEX / chisq;
	WHERE cohort_flag = 1; 
run;
ods trace off;

/*Rearrange chi-square dataset so it can be merged with descriptive stats table*/
data temp.expl_ChiData2 (keep = variable prob);
	set temp.expl_ChiData (where = (statistic = 'Chi-Square'));
	length variable $ 20;
	variable = scan(table,-1,' '); /* Returns the last word in a character value from the "table" variable*/
run;

/*Sort both tables so they will merge*/
PROC SORT data=temp.expl_Table OUT=temp.expl_Table_sort; BY variable; RUN;
PROC SORT data=temp.expl_ChiData2 OUT=temp.expl_ChiData2_sort; BY variable; RUN;

/**************MERGE DOES NOT WORK BECAUSE The value of the "variable" variable is the var label in 1 file
and the var name in the other. DUE TO THE 'IF a' STATEMENT, NOTHING FROM THE CHIDATA2 FILE IS MERGED IN****/


/*Merge descriptive stats table with chi-square table*/
DATA temp.expl_TableData;
	MERGE temp.expl_Table_sort (in = a) temp.expl_ChiData2_sort (in = b);
	BY variable;
		IF a;
RUN;

/*Use PROC REPORT to create final output table*/
proc report data = temp.expl_TableData nowd;
	column variable levels TM_group_N ExpPct prob;
	define variable / "Variable" group format = $variable.;
	define levels / " " ;
	define TM_group_N / "TM" /*format = comma5.*/;
	define ExpPct / "TM Group/n (%)";
	define prob / "p-value" group format = pvalue6.4;
	
	Title "Table 1. Descriptive characteristics of individuals in the sample";
RUN;

Reeza · Posted 05-09-2024 02:58 PM

Have you looked into the TableN macro? The author does post on here.

https://communities.sas.com/t5/SAS-Communities-Library/Demographic-Table-and-Subgroup-Summary-Macro-...

View solution in original post

ballardw · Posted 05-09-2024 10:16 AM

We can't test your code because you use formats that you have not provided definitions for: SEX and Age2Grp.

Part of your 'problem' with Variable is you assign values like 'Sex' and 'Age Category' explicitly in the data step creating your Expl_table. So if you want to change that to allow merging on this variable reassign the values you expect to match in the merge.

OR provide a different variable to hold that text to appear in the "report" and another to hold the value to merge.

Note that you can have a custom format for Variable that will display the value of DEM_AGE as "Age Group" and DEM_SEX as "Sex" so the values do not have to contain the actual "label" text.

Wolverine · Posted 05-09-2024 11:03 AM

proc format library=temp;


value age2grp
1='1:Age Group <65'
2='2:Age Group [65,75)'
3='3:Age Group >=75'
.='Inapplicable/Missing'

value sex
.='Inapplicable/Missing'
1='1:Male'
2='2:Female'
value yesfmt
1='1:Yes'
2='2:No'
.='Inapplicable/Missing'
;
RUN;

Here is the code for the formats.

I tried rewriting the data step for Expl_table, but now it puts the numbers 1, 2, and 3 (corresponding to the levels of the variables) in the 'variable' column instead. So it still won't merge.

	if DEM_AGE ne . then do; /*Building "variable" and "Levels" columns for "DEM_AGE"*/;

	variable = DEM_AGE;
	levels = put(DEM_AGE, age2grp.);
	IndexVar = 1; /*This index is included just in case the order of data presentation needs to be changed*/;
end;

	if DEM_SEX ne . then do; /*Building "variable" and "Levels" columns for "DEM_SEX"*/;
	variable = DEM_SEX;
	levels = put(DEM_SEX,sex.);
	IndexVar = 2;
end;

ballardw · Posted 05-09-2024 11:20 AM

@Wolverine wrote:

I tried rewriting the data step for Expl_table, but now it puts the numbers 1, 2, and 3 (corresponding to the levels of the variables) in the 'variable' column instead. So it still won't merge.
	if DEM_AGE ne . then do; /*Building "variable" and "Levels" columns for "DEM_AGE"*/;

	variable = DEM_AGE;
	levels = put(DEM_AGE, age2grp.);
	IndexVar = 1; /*This index is included just in case the order of data presentation needs to be changed*/;
end;

	if DEM_SEX ne . then do; /*Building "variable" and "Levels" columns for "DEM_SEX"*/;
	variable = DEM_SEX;
	levels = put(DEM_SEX,sex.);
	IndexVar = 2;
end;

You replaced an assignment of literal text:

variable ='Sex';

with that of the value of a variable.

to place the TEXT "DEM_SEX" into variable for matching

variable='DEM_SEX';

Wolverine · Posted 05-09-2024 11:55 AM

@ballardw wrote:
variable ='Sex';
with that of the value of a variable.

to place the TEXT "DEM_SEX" into variable for matching
variable='DEM_SEX';

Thanks! I made that correction and the table is now being produced.

However, there are two remaining issues. In the expl_pretable file, the TM_group_N and TM_group_Sum variables have the same value for each level. So, for example, the percentage of males in the TM group is 100%, and the percentage of females in the TM group is also 100%. It seems like the TM_group_Sum variable should have the total number of cases in the TM group.

The second issue is that in the final table, the TM Group n (%) column is using scientific notation rather than displaying the total n. In the actual data, there are n=14,248 (45.76%) males in the TM group, but that column displays "14E3 (100)". I'm sure the (100) part will be corrected when issue 1 is addressed, but I still need to fix the 14E3 part.

ballardw · Posted 05-09-2024 01:17 PM

Note your proc format code is missing some semicolons.

I'm not sure what you are expecting to use for a denominator in your percentages.

The TM_GROUP_MEAN value is the percentage of 1's for TM_group in the expl_PreTable by gender or age group(your example data doesn't have enough ages for all groups to be represented) but is in decimal form: i.e. .555 = 55.5 %. If you are getting 100 percent for each of your gender values you aren't using the correct observation from the output of proc means. When I run your code with your example data I do not get the same anything for the genders. I get percentages of 55.6 and 66.7 in the TM Group N percentages in the report table after adjusting the values of Variable to "DEM_SEX" and "DEM_AGE" as discussed.

Can't observe the behavior of the TM_group_N . Your description makes me think that somewhere you may have attached a BEST4. or F4. or 4. format to create that text value. Perhaps something inherited from code with fewer observations. Note that this may restrict what happens if the sum is larger than 999 as 1,000 takes 5 characters to display/

compress(put(TM_group_sum,comma4.)

Example:

data junk;
   TM_group_sum= 12345;
   ExpPct = compress(put(TM_group_sum,comma4.),' ');
run;

which shows the result as 12E3. So if you see this in your log it is a big flag that you are attempting to display a value too large for the designated format. If you expect values in the 12,000 range the format would have to be Comma6.

NOTE: At least one W.D format was too small for the number to be printed. The decimal may be
      shifted by the "BEST" format.

Note: Dataset options can be applied to the target data sets in ODS OUTPUT statement if desired.

You may want to learn about the CAT, CATT, CATS and CATX functions for combing text values.

Wolverine · Posted 05-09-2024 02:05 PM

@ballardw wrote:

I'm not sure what you are expecting to use for a denominator in your percentages.

The TM_GROUP_MEAN value is the percentage of 1's for TM_group in the expl_PreTable by gender or age group(your example data doesn't have enough ages for all groups to be represented) but is in decimal form: i.e. .555 = 55.5 %. If you are getting 100 percent for each of your gender values you aren't using the correct observation from the output of proc means. When I run your code with your example data I do not get the same anything for the genders. I get percentages of 55.6 and 66.7 in the TM Group N percentages in the report table after adjusting the values of Variable to "DEM_SEX" and "DEM_AGE" as discussed.

Can't observe the behavior of the TM_group_N . Your description makes me think that somewhere you may have attached a BEST4. or F4. or 4. format to create that text value. Perhaps something inherited from code with fewer observations. Note that this may restrict what happens if the sum is larger than 999 as 1,000 takes 5 characters to display/
compress(put(TM_group_sum,comma4.)

The denominator should be the total number of observations in TM_group. For example, there are n=14,248 males and n=31,134 total cases in TM_group. 14,248 / 31,134 = 45.76%, so the final output should look like this: 14,248 (45.76%).

I removed the noprint option on Proc Means and looked at the table. For some reason, the Mean is 1.0000 for all entries. The Sum is N * Mean, or effectively N * 1.0000. And the percent is calculated as N / Sum, so it's always 100. I tried looking for an option in Proc Means that would produce the total N in TM_group, but I couldn't find anything.

On a side note, the cohort is restricted to ages 65+. That's why DEM_AGE = 1 never shows up in the tables.

ballardw · Posted 05-09-2024 02:20 PM

The proc means output for the example data looks like

DEM_AGE	DEM_SEX	_TYPE_	_FREQ_	TM_group_N	TM_group_Sum	TM_group_Mean
.	1	1	9	9	5	0.55556
.	2	1	15	15	10	0.66667
2	.	2	7	7	5	0.71429
3	.	2	17	17	10	0.58824

Which does not show anything related to Tm_group N, Sum or Mean the same. So your real data must have some problem you are not sharing. The only way to get a TM_GROUP_MEAN = 1 with a 1/0 coded variable is to have all values = 1. So I strongly suggest checking your TM_GROUP variable before the proc means step.

Since your proc means is using Where cohort_flag=1 I suggest running something like this as a check: (using your data of course).

Proc freq data=have;
   tables cohort_flag * dem_sex*tm_group / list missing;
run;

The list option makes the output more compact and easier to see which levels of Cohort or sex may be having an issue with tm_group values.

Wolverine · Posted 05-09-2024 02:27 PM

Sorry, I think I copied my code, then realized I needed to correct my where statements, and then pasted without re-copying the code. The where statements should be Where cohort_flag = 1 and TM_group = 1;

That explains why the mean is always 1.0000 - it's the mean of TM_group, not the mean of the DEM_SEX or DEM_AGE variables.

ballardw · Posted 05-09-2024 03:09 PM

@Wolverine wrote:

Sorry, I think I copied my code, then realized I needed to correct my where statements, and then pasted without re-copying the code. The where statements should be Where cohort_flag = 1 and TM_group = 1;

That explains why the mean is always 1.0000 - it's the mean of TM_group, not the mean of the DEM_SEX or DEM_AGE variables.

Perhaps you don't want Proc Means with that Where statement at all. Maybe Proc Freq makes more sense if you want the percentage of Male/Female where that condition is true.

You already have Proc Freq code. Add Tables Dem_sex Dem_age; and ask for the ODS output to include Onewayfreqs.

Reeza · Posted 05-09-2024 02:58 PM

Have you looked into the TableN macro? The author does post on here.

https://communities.sas.com/t5/SAS-Communities-Library/Demographic-Table-and-Subgroup-Summary-Macro-...

Wolverine · Posted 05-10-2024 09:59 AM

@Reeza wrote:
Have you looked into the TableN macro? The author does post on here.

https://communities.sas.com/t5/SAS-Communities-Library/Demographic-Table-and-Subgroup-Summary-Macro-...

That looks like the answer to my prayers! Although, I'm still having some trouble with it. I'll create a new topic...

Ksharp · Posted 05-09-2024 10:41 PM

data have;
 set sashelp.heart(obs=200);
run;



/*
dsn=     analysis dataset name
treat=   the variable name of treat (column variable)
var=     the category variable name for analysis 
chiq_fisher=  1 means using chisq test,0 means using fisher test.

NOTE:This macro is for category variable.
*/
%macro var_category(dsn=,treat=,var=,chiq_fisher=);
proc freq data=&dsn. noprint;
table &var.*&treat./out=_count  %if &chiq_fisher.=1 %then %do;chisq%end;%else %do;fisher%end; ;
output out=_test %if &chiq_fisher.=1 %then %do;chisq%end;%else %do;fisher%end; ;
run;

proc sql;
create table _report as
select distinct 1 as id, " &var" as row length=80,' '||&treat. as col length=80,'09'x as value length=80 
 from _count
  where &treat. is not missing
union 
select 2,'09'x||&var. ,' '||&treat. ,cats(COUNT,'(',put(PERCENT,8.2),')%') 
 from _count
  where &var. is not missing and &treat. is not missing
union
select 2,'09'x||&var.,'P-value',
%if &chiq_fisher.=1 %then %do;
(select put(P_PCHI,pvalue8.2) from _test)
%end;
%else %do;
(select put(XP2_FISH,pvalue8.2) from _test)
%end; 
 from _count(where=( &var. is not missing and &treat. is not missing ) obs=1)
;
quit;
proc append base=want data=_report force;run;
%mend;


/*
dsn=     analysis dataset name
treat=   the variable name of treat (column variable)
var=     the continuous variable name for analysis 

NOTE:This macro is for continuous variable.
*/
%macro var_continuous(dsn=,treat=,var=);
ods select none;
ods output Equality=Equality  TTests=ttests;
proc ttest data=&dsn.;
class &treat.;
var &var.;
run;
proc univariate data=&dsn. outtable=_summary;
class &treat.;
var &var.;
run;
ods select all;

proc sql;
create table _report as
select distinct 4 as id, " &var" as row length=80,' '||&treat. as col length=80,
 cats(put(_MEAN_,8.2),'(',put(_STD_,8.3),')') as value length=80 
 from _summary
  where &treat. is not missing
union 
select distinct 4, " &var",'P-value',
case when (select ProbF<0.05 from Equality) then (select put(Probt,pvalue8.2) from ttests(firstobs=2 obs=2))
 else (select put(Probt,pvalue8.2) from ttests(obs=1))
end 
from _summary(obs=1)
union 
select distinct 5, "09"x ,' '||&treat. ,
 cats(put(_MEDIAN_,8.2),'(',put(_QRANGE_,8.3),')') 
 from _summary
  where &treat. is not missing
;
quit;
proc append base=want data=_report force;run;
%mend;







/*Finally, put them all together.*/
options nodsnferr;
proc delete data=want;run;
%var_category(dsn=have,treat=status,var=bp_status,chiq_fisher=1)
%var_continuous(dsn=have,treat=status,var=weight)

ods rtf file='c:\temp\temp.rtf' style=journal ;
proc report data=want nowd style={outputwidth=100%};
columns id row value,col;
define id/group noprint;
define row/group 'Variable Name';
define col/across '';
define value/group '';
run;
ods rtf close;

Using Proc Report and Proc Freq/chi-sq to create "Table 1" for research journal article

Re: Using Proc Report and Proc Freq/chi-sq to create "Table 1" for research journal articl

Re: Using Proc Report and Proc Freq/chi-sq to create "Table 1" for research journal articl

Re: Using Proc Report and Proc Freq/chi-sq to create "Table 1" for research journal articl

Re: Using Proc Report and Proc Freq/chi-sq to create "Table 1" for research journal articl

Re: Using Proc Report and Proc Freq/chi-sq to create "Table 1" for research journal articl

Re: Using Proc Report and Proc Freq/chi-sq to create "Table 1" for research journal articl

Re: Using Proc Report and Proc Freq/chi-sq to create "Table 1" for research journal articl

Re: Using Proc Report and Proc Freq/chi-sq to create "Table 1" for research journal articl

Re: Using Proc Report and Proc Freq/chi-sq to create "Table 1" for research journal articl

Re: Using Proc Report and Proc Freq/chi-sq to create "Table 1" for research journal articl

Re: Using Proc Report and Proc Freq/chi-sq to create "Table 1" for research journal articl

Re: Using Proc Report and Proc Freq/chi-sq to create "Table 1" for research journal articl

Re: Using Proc Report and Proc Freq/chi-sq to create "Table 1" for research journal articl

SAS Innovate 2025: Call for Content

Classroom Training Available!