Hi everyone!
I'm currently working on a medical study and have large amounts of data which I want to tabulate efficiently. My data set has approx 120'000 observations.
PROC TABULATE offers a fantastic tabulating function with great flexibility but leaves me with the following problem: when I tabulate several categorical variables (with varying missing for different observations) i loose observations in my table. This only happens with categorical variables.
To clearify this:
* If I tabulate continuous variables, there are no problems; SAS calculates means and std dev for the variables without deleting any observations (despite varying missing values for different variables).
* If I tabulate ONE categorical variable there are no problems; SAS calculates numbers, percentages etc for that variable. I can choose to present missing values or not in the table.
* If I try to tabulate several categorical variables with varying missing values for different variables, those observations are deleted from the entire table. When trying to tabulate all my 35 variables (continuous and categorical) at the same time i end up with 600 patients (starting with 120'000).
Now there are a number of suggested solutions (se Art Carpenters SUGI article "Proc tabulate - beyond bascis").
* I can add the MISSING statement in the proc (e.g proc tabulate data=DATASET missing;) However this includes the missing values in the calculated percentages which makes no sense.
* It is suggested that one can add the "preloadfmt exclusive" (after defining a format without the missing value; se Art Carpenters article) but this does not work.
* etc
I have been trying to fix this for three days without progress.
My first script, which doesnt work, looks something like this:
PROC TABULATE data=DATASETNAME;
CLASS sex ethnicity previous_heart_failure heart_failure_betablocker previous_coronary_heart coronary_heart_betablocker; /*Betablocker is a drug*/
VAR age bloodpressure glucose_level;
TABLES
all='All' *(all sex=' '*(pctn<sex>='%'))
age *(mean*f=3.1 std*f=3.1)
bloodpressure *(mean*f=3.1 std*f=3.1)
glucose_level *sex=' '*(mean*f=4.1 STD*f=4.1)
previous_heart_failure *(n pctn<previous_heart_failure>='%')
heart_failure_betablocker *(n pctn<heart_failure_betablocker>='%')
previous_coronary_heart *(n pctn<previous_coronary_heart>='%')
coronary_heart_betablocker *(n pctn<coronary_heart_betablocker>='%')
,
all='All'
ethnicity='Ethnicity/race'
;
FORMAT sex sex_. ETCETERA...;
RUN;
Anyone got a solution?
Thanks in advance!
Adam
I doubt you will find a simple solution. What you are describing is the way PROC TABULATE is designed to work. Any observation with a missing value for a CLASS variable is removed from all tables, whether that particular CLASS variable is used in that particular table or not. The reasoning for this is that it guarantees that all your tables are built from the same set of observations.
To circumvent this feature, I see two choices. The easiest one is to run separate PROC TABULATEs, each with a different CLASS statement. If that takes too long to run, a more difficult alternative would be to run a PROC SUMMARY first, and save the summary data set. Then feed the summary data set into as many PROC TABULATEs as you would like ... they will all run quickly. That approach would require changing some of the syntax in PROC TABULATE. Instead of getting N and PCTN statistics, you would likely be asking for SUM and PCTSUM statistics.
Good luck.
but if he formats the class variable and uses preloadftm+completetypes those obs still wont be included in the report?
Maybe some of the offending class variables could be made analysis variables. previous_heart_failure looks like it might be a yes/no variable that if coded 0/1 you could summarize with mean to get percent, proportion actually.
Thanks for Your time Astounding, data _null and Tal, however the problem was not solved.
It would be great if this problem could be solved. I'm sure there are many SAS users out there dealing with this problem.
My solution right now is executing one proc tabulate per categorical variable. However this is very time consuming since tables must be unioned later in a text editor.
Perhaps there isn't a fix for this, other than going around an output via proc freq/summary.
:smileyplain:
It can be "fixed". Post some example data and expected result.
I have this issue as well. The way I get around it is I make the dataset vertical, with "category" and "response" variables (or whatever makes sense), only outputting rows where the category variable has a nonmissing value. Then use CATEGORY and RESPONSE as the class, or CATEGORY could be a BY variable if that makes more sense (with the PCTNs).
IE
data for_tab;
set sashelp.class;
array analysis_vars age height weight;
array char_Vars sex; *more of course if you have more;
do _t = 1 to dim(analysis_vars);
category=vname(analysis_vars[_t]);
response=analysis_vars[_t];
output;
end;
*do same thing for char_Vars;
run;
proc tabulate data=for_tab;
class category response;
tables category, response*colpctn; *or whatever you want stat-wise;
run;
If you have statistics you want sometimes but not other times, just ask for them and then drop them when you transpose back to horizontal post-analysis.
values separated by space.
period (.) indicates missing value.
first row = variable names.
observation sex glucose_level kidney_disease physical_activit coronary_disease heart_failure treated_for_coronary treated_for_heart_failure
1 1 . . . 0 0 . .
2 1 100 0 3 0 0 . .
3 1 38 0 2 1 1 . .
4 1 50 . 1 0 0 . .
5 1 48 0 4 0 0 . .
6 2 50 0 5 0 0 . .
7 2 54 0 . 1 0 1 .
8 2 48 0 3 1 0 0 .
9 2 51 0 5 0 0 . .
10 2 . 1 . 0 0 . .
11 1 41 0 3 0 0 . .
12 1 43 . 4 1 0 1 .
13 1 40 0 3 0 0 . .
14 2 59 . 5 0 0 . .
15 1 54 0 4 0 0 . .
16 1 48 0 . 1 1 1 1
17 1 47 0 5 0 0 . .
18 1 44 0 2 0 0 . .
19 1 51 0 2 0 0 . .
20 1 54 0 5 1 0 1 .
21 2 51 0 5 0 0 . .
22 2 51 0 4 0 0 . .
23 2 51 1 1 1 0 1 .
24 1 . 0 2 0 0 . .
25 2 48 0 2 0 0 . .
26 1 45 0 5 0 0 . .
27 1 . . 1 1 0 0 .
28 1 40 0 . 0 0 . .
29 2 44 0 5 0 0 . .
30 2 . . . 1 0 1 .
31 2 47 0 . 0 0 . .
32 1 52 0 3 0 0 . .
33 1 . . 4 1 1 1 1
34 2 97 0 5 0 0 . .
35 2 43 . . 1 1 1 1
36 1 50 0 3 0 0 . .
37 2 99 0 3 0 0 . .
38 2 35 0 5 0 0 . .
39 1 69 1 3 0 0 . .
40 2 47 1 . 0 0 . .
41 1 53 . . 0 0 . .
42 1 49 0 4 0 0 . .
43 1 53 0 2 0 0 . .
44 2 41 . 5 0 0 . .
45 1 42 0 1 1 0 0 .
46 1 78 0 5 0 0 . .
47 1 33 1 4 0 0 . .
48 1 64 0 5 0 0 . .
49 2 47 0 5 0 0 . .
50 1 38 0 5 0 0 . .
Hi again
Still havent solved the problem despite Your instructions. Any other suggestions? :smileyplain:
Thanks in advance
You replied to yourself so to whom is your comment addressed. I suggest you post a complete working example with data, tabulate code, and expected output.
Example data:
Variables: (underscore = continous variable, otherwise categorical).
Observation_number Ethnicity Sex Glucose Kidney_disease Exercise Heart_disease Heart_failure Heart_disease_treated Heart_Failure_treated
1 | Sweden | 1 | . | . | . | 0 | 0 | . | . |
---|---|---|---|---|---|---|---|---|---|
2 | Sweden | 1 | 100 | 0 | 3 | 0 | 0 | . | . |
3 | Sweden | 1 | 38 | 0 | 2 | 1 | 1 | . | . |
4 | Sweden | 1 | 50 | . | 1 | 0 | 0 | . | . |
5 | Nordic countries | 1 | 48 | 0 | 4 | 0 | 0 | . | . |
6 | Sweden | 2 | 50 | 0 | 5 | 0 | 0 | . | . |
7 | Sweden | 2 | 54 | 0 | . | 1 | 0 | 1 | . |
8 | Sweden | 2 | 48 | 0 | 3 | 0 | 0 | . | . |
9 | Sweden | 2 | 51 | 0 | 5 | 0 | 0 | . | . |
10 | Sweden | 2 | . | 1 | . | 0 | 0 | . | . |
11 | Sweden | 1 | 41 | 0 | 3 | 0 | 0 | . | . |
12 | Sweden | 1 | 43 | . | 4 | 1 | 0 | 1 | . |
13 | Sweden | 1 | 40 | 0 | 3 | 0 | 0 | . | . |
14 | Sweden | 2 | 59 | . | 5 | 0 | 0 | . | . |
15 | Sweden | 1 | 54 | 0 | 4 | 0 | 0 | . | . |
16 | Sweden | 1 | 48 | 0 | . | 1 | 1 | 1 | 1 |
17 | Sweden | 1 | 47 | 0 | 5 | 0 | 0 | . | . |
18 | Sweden | 1 | 44 | 0 | 2 | 0 | 0 | . | . |
19 | Sweden | 1 | 51 | 0 | 2 | 0 | 0 | . | . |
20 | Sweden | 1 | 54 | 0 | 5 | 1 | 0 | 1 | . |
21 | Latin America & Caribbean | 2 | 51 | 0 | 5 | 0 | 0 | . | . |
22 | Sweden | 2 | 51 | 0 | 4 | 0 | 0 | . | . |
23 | Sweden | 2 | 51 | 1 | 1 | 1 | 0 | 1 | . |
24 | Nordic countries | 1 | . | 0 | 2 | 0 | 0 | . | . |
25 | Sweden | 2 | 48 | 0 | 2 | 0 | 0 | . | . |
26 | Sweden | 1 | 45 | 0 | 5 | 0 | 0 | . | . |
27 | Sweden | 1 | . | . | 1 | 1 | 0 | 0 | . |
28 | Europe (Low income), Russia and Centr | 1 | 40 | 0 | . | 0 | 0 | . | . |
29 | Sweden | 2 | 44 | 0 | 5 | 0 | 0 | . | . |
30 | Sweden | 2 | . | . | . | 1 | 0 | 1 | . |
31 | Sweden | 2 | 47 | 0 | . | 0 | 0 | . | . |
32 | Sweden | 1 | 52 | 0 | 3 | 0 | 0 | . | . |
33 | Sweden | 1 | . | . | 4 | 1 | 1 | 1 | 1 |
34 | Sweden | 2 | 97 | 0 | 5 | 0 | 0 | . | . |
35 | Europe (Low income), Russia and Centr | 2 | 43 | . | . | 1 | 1 | 1 | 1 |
36 | Sweden | 1 | 50 | 0 | 3 | 0 | 0 | . | . |
37 | Sweden | 2 | 99 | 0 | 3 | 0 | 0 | . | . |
38 | Sweden | 2 | 35 | 0 | 5 | 0 | 0 | . | . |
39 | Sweden | 1 | 69 | 1 | 3 | 0 | 0 | . | . |
40 | Sweden | 2 | 47 | 1 | . | 0 | 0 | . | . |
41 | Sweden | 1 | 53 | . | . | 0 | 0 | . | . |
42 | Sweden | 1 | 49 | 0 | 4 | 0 | 0 | . | . |
43 | Sub-Saharan Africa | 1 | 53 | 0 | 2 | 0 | 0 | . | . |
44 | Sweden | 2 | 41 | . | 5 | 0 | 0 | . | . |
45 | Sweden | 1 | 42 | 0 | 1 | 0 | 0 | . | . |
46 | Sweden | 1 | 78 | 0 | 5 | 0 | 0 | . | . |
47 | Sweden | 1 | 33 | 1 | 4 | 0 | 0 | . | . |
48 | Sweden | 1 | 64 | 0 | 5 | 0 | 0 | . | . |
49 | Sweden | 2 | 47 | 0 | 5 | 0 | 0 | . | . |
50 | Europe (Low income), Russia and Centr | 1 | 38 | 0 | 5 | 0 | 0 | . | . |
Expected output:
| Ethnicity | |||||
| sex | Europe (Low income), Russia and Centr | Latin America & Caribbean | Nordic countries | Sub-Saharan Africa | Sweden |
N | 1 | 2 | 0 | 2 | 1 | 25 |
| 2 | 1 | 1 | 0 | 0 | 18 |
Glucose (mean) | 40,333333333 | 51 | 48 | 53 | 53 | |
Kidney disease |
|
|
|
|
| |
0 | 2 | 1 | 2 | 1 | 29 | |
1 | 0 | 0 | 0 | 0 | 5 | |
Exercise |
|
|
|
|
| |
1 | 0 | 0 | 0 | 0 | 4 | |
2 | 0 | 0 | 1 | 1 | 4 | |
3 | 0 | 0 | 0 | 0 | 8 | |
4 | 0 | 0 | 1 | 0 | 6 | |
5 | 1 | 1 | 0 | 0 | 13 | |
Heart_disease |
|
|
|
|
| |
0 | 2 | 1 | 2 | 1 | 34 | |
1 | 1 | 0 | 0 | 0 | 9 | |
Heart_failure |
|
|
|
|
| |
0 | 2 | 1 | 2 | 1 | 40 | |
1 | 1 | 0 | 0 | 0 | 3 | |
Heart_disease_treated |
|
|
|
|
| |
0 | 0 | 0 | 0 | 0 | 1 | |
1 | 1 | 0 | 0 | 0 | 7 | |
Heart_failure_treated |
|
|
|
|
| |
1 | 1 | 0 | 0 | 0 | 2 |
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.