I set out to create a summary demographics table for a dataset with 6 variables: groups, sex, age, race, ethnicity, and status as below.
|
|
Total |
Group A |
Group B |
Group C |
Group D |
Group E |
||||||
|
|
N |
% |
N |
% |
N |
% |
N |
% |
N |
% |
N |
% |
Sex |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Female |
|
|
|
|
|
|
|
|
|
|
|
|
|
Male |
|
|
|
|
|
|
|
|
|
|
|
|
Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
20-30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
≥30 |
|
|
|
|
|
|
|
|
|
|
|
|
Etc... |
|
|
|
|
|
|
|
|
|
|
|
|
|
Total |
|
|
|
|
|
|
|
|
|
|
|
|
Using the proc tabulate code below I am pretty close:
proc tabulate data=c;
class groups sex age race ethnicity status;
classlev sex age race ethnicity status /s=[JUST=R];
var newcount;
table sex="Sex" age="Age" race="Race" ethnicity="Ethnicity" status="Status" ALL="TOTAL", ALL="TOTAL"*(newcount="N"*f=8.0 colpctn="%"*f=8.1) groups*(newcount="N"*f=8,0 colpctn="%"*f=8.1);
run;
But, I am having a few issues:
1) There is a "Sum" cell below my "N". How can I remove?
N | % | N | % | N | % | N | % | N | % | N | % |
---|---|---|---|---|---|---|---|---|---|---|---|
Sum | Sum | Sum | Sum | Sum | Sum |
2) Though my numbers are right, my column percents are not accurate. I am not sure where they are coming from, but I imagine it has to do with my proc summary or another step below.
As a test, I ran a proc freq of sex*group with weight by newcount, which gets me what should be the correct column percentages (although my frequencies are no longer the table N). I tried weighting my proc tabulate by newcount but that only significantly increases the N without changing the %.
To retrace my steps:
1) I ran a proc summary for the variables of interest as below:
proc summary data=a;
class groups sex age race ethnicity status;
var count;
output out=b n= /autoname;
run;
2) I ran a data step to get the monthly average since the original data is a year long and I want to present the average monthly demographics.
data c;
set b;
newcount=count_N/12;
format newcount 8.0 ;
drop count_n _freq_ ;
run;
3) I ran the proc tabulate as above.
I know this probably has a simple solution, but I've been struggling with it for several hours and just can't see what I'm missing. Thanks in advance!
Since your analysis variable 's implied statistic is SUM ,not N , therefore you should use COLPCTSUM (not COLPCTN) , and N should be replaced with SUM .
libname x v9 "C:\Users\xiakeshan\Downloads\"; options nofmterr; proc tabulate data=x.c; class groups sex age race ethnicity status; classlev sex age race ethnicity status /s=[JUST=R]; var newcount; table sex="Sex" age="Age" race="Race" ethnicity="Ethnicity" status="Status" ALL="TOTAL",
ALL="TOTAL"*(newcount=''*(sum="N"*f=8.0 colpctsum="%"*f=8.1))
groups*(newcount=''*(sum="N"*f=8.0 colpctsum="%"*f=8.1)); run;
You need to post some data to present your problem.
You need KEYLABEL statement to get rid of 'SUM' label.
Here is an example:
proc tabulate data=sashelp.heart; class sex status bp_status; var ageatstart; table sex="Sex" status="Status" ALL="TOTAL", ALL="TOTAL"*(ageatstart="N"*f=8.0 colpctn="%"*f=8.1) bp_status=''*(ageatstart="N"*f=8.0 colpctn="%"*f=8.1); keylabel sum=' '; run;
Sorry for the slow response. The keylabel statement did the trick for the "sum" label. Thank you!
I've attached a sample randomized dataset I created off of plausible values from the dataset I am using. It is formatted the same way, and produces the same problem (column % does not equal the true percentage%).
This is part of the table produced from the below code, and percentages appear to be off in all columns.
I repaste the proc tabulate code I am using here because I noticed an errant comma in my first post.
proc tabulate data=c;
class groups sex age race ethnicity status;
classlev sex age race ethnicity status /s=[JUST=R];
var newcount;
table sex="Sex" age="Age" race="Race" ethnicity="Ethnicity" status="Status" ALL="TOTAL", ALL="TOTAL"*(newcount="N"*f=8.0 colpctn="%"*f=8.1) groups*(newcount="N"*f=8.0 colpctn="%"*f=8.1);
run;
Can you identify what is happening with the proc tabulate statement to calculate the wrong %? Thank you!
Since your analysis variable 's implied statistic is SUM ,not N , therefore you should use COLPCTSUM (not COLPCTN) , and N should be replaced with SUM .
libname x v9 "C:\Users\xiakeshan\Downloads\"; options nofmterr; proc tabulate data=x.c; class groups sex age race ethnicity status; classlev sex age race ethnicity status /s=[JUST=R]; var newcount; table sex="Sex" age="Age" race="Race" ethnicity="Ethnicity" status="Status" ALL="TOTAL",
ALL="TOTAL"*(newcount=''*(sum="N"*f=8.0 colpctsum="%"*f=8.1))
groups*(newcount=''*(sum="N"*f=8.0 colpctsum="%"*f=8.1)); run;
Thank you! The colpctn was the problem and your changes made it work.
I really appreciate your help!
You might want to try the %TABLEN macro for this type of table.
Thanks @PaigeMiller ! I had tried looking into that macro previous to this post and face similar but different issues in applying it to this data set due to how my data is summarized. It appears the macro runs off of "normally" organized long data.
The reason I used proc summary on my data was due to:
1) a count variable in the dataset which basically allowed a single observation to represent multiple people with the same characteristics (i.e., where count was 3 that meant 3 individuals had that set of characteristics)
2) to present the average monthly sample (from the fluctuating year of data I had). So, once summarized by count I divided the count by 12 to get a monthly average.
I'm having trouble thinking of how I could get this result in a long data set, but if you have any ideas then that macro would probably work perfectly.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.