Hi I come across a task that requires the final output of the data to be one big summary table (all the rows are categorical variables, want to see the distribution break down; and the columns reflect different population group, I want to see the count), it looks like:
If there is a missing variable, report missing as well.
Patients who got vaccine | Patients who didn't get vaccine | |||
Number | % | Number | % | |
Gender | ||||
Male | ||||
Female | ||||
Race | ||||
American Indian | ||||
Asian | ||||
Black | ||||
White | ||||
Multi-Race | ||||
NA | ||||
Ethinicity | ||||
Hispanic | ||||
Non-Hispanic | ||||
Age | ||||
18-39 | ||||
40-64 | ||||
65+ |
The input data looks like:
patient_ID | Race | Gender | Ethnicity | Age Group | Followup_Flag | Vaccine_Flag |
1 | American Indian | F | Hispanic | 18-39 | 1 | 1 |
2 | Asian | M | Non-Hispanic | 18-39 | 0 | 0 |
3 | Black | F | NA | 40-64 | 0 | 1 |
4 | White | Hispanic | 65+ | 1 | 1 | |
5 | Multi-Race | F | Non-Hispanic | 18-39 | 0 | 1 |
6 | NA | M | Hispanic | 18-39 | 1 | 1 |
7 | American Indian | F | Non-Hispanic | 40-65 | 0 | 0 |
8 | Asian | F | 65+ | 1 | 0 | |
9 | Black | M | Non-Hispanic | 18-39 | 0 | 0 |
10 | American Indian | F | NA | 18-39 | 0 | 1 |
11 | Asian | M | Hispanic | 40-66 | 1 | 1 |
12 | Black | F | Non-Hispanic | 65+ | 0 | 1 |
The methods I know, proc report, proc mean, etc, only give me summary statistics for one category at a time, and I am not interested in how different categories intersects with each other, don't need something like Race*Age. The real data I work with have way more different population categories like the two big columns, but the rows that need to be reported are the same across population groups.
I am wondering if there is any way to systematically 'stack' summary statistics of each row category, so that I won't need to stack them by hand for my real data (10 row categories, 20 column population categories).
Any suggestion is greatly appreciated.
Thank you!
Try using PROC TABULATE.
If you have difficulties with it - please:
1) post your data in a input datalines format
2) post the code you tried (and log if needed) to point where is your issue.
PROC REPORT will do this.
Better yet is to do the calculations in PROC SUMMARY and then have PROC REPORT make the final report in the format you want.
I have a macro that does some of this. You can find it here. You'll likely need to customize your output further to get exactly what you need.
https://gist.github.com/statgeek/b308ac2cfc9b4db0ee3d793567627af0
Please describe your problem in terms of your variables. Your output has two column headings but it is not clear what role the variable followup_flag may play, or which "%" you want: row, column or table.
If you want the two levels of Vaccine_flag to generate the output, with 1=got vaccine and 2= didn't then something like this may be a start:
proc format library=work; format vacc 1="Patients who got vaccine" 0="Patients who didn't get vaccine" ; run; proc tabulate data=have; class gender race ethnicity age; class vaccine_flag /descending; format vaccine_flag vacc.; table gender race ethnicity age, vaccine_flag=''*(n='Number' RowPctN='%') /misstext=' '; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.