BookmarkSubscribeRSS Feed
Adam1
Calcite | Level 5

Hi everyone!

I'm currently working on a medical study and have large amounts of data which I want to tabulate efficiently. My data set has approx 120'000 observations.

PROC TABULATE offers a fantastic tabulating function with great flexibility but leaves me with the following problem: when I tabulate several categorical variables (with varying missing for different observations) i loose observations in my table. This only happens with categorical variables.

To clearify this:

* If I tabulate continuous variables, there are no problems; SAS calculates means and std dev for the variables without deleting any observations (despite varying missing values for different variables).

* If I tabulate ONE categorical variable there are no problems; SAS calculates numbers, percentages etc for that variable. I can choose to present missing values or not in the table.

* If I try to tabulate several categorical variables with varying missing values for different variables, those observations are deleted from the entire table. When trying to tabulate all my 35 variables (continuous and categorical) at the same time i end up with 600 patients (starting with 120'000).

Now there are a number of suggested solutions (se Art Carpenters SUGI article "Proc tabulate - beyond bascis").

* I can add the MISSING statement in the proc (e.g proc tabulate data=DATASET missing;) However this includes the missing values in the calculated percentages which makes no sense.

* It is suggested that one can add the "preloadfmt exclusive" (after defining a format without the missing value; se Art Carpenters article) but this does not work.

* etc

I have been trying to fix this for three days without progress.

My first script, which doesnt work, looks something like this:

PROC TABULATE data=DATASETNAME;

CLASS sex ethnicity previous_heart_failure heart_failure_betablocker previous_coronary_heart coronary_heart_betablocker; /*Betablocker is a drug*/

VAR age bloodpressure glucose_level;

TABLES

all='All'                                     *(all sex=' '*(pctn<sex>='%'))

age                                           *(mean*f=3.1 std*f=3.1)

bloodpressure                          *(mean*f=3.1 std*f=3.1)

glucose_level                           *sex=' '*(mean*f=4.1 STD*f=4.1)

previous_heart_failure             *(n pctn<previous_heart_failure>='%') 

heart_failure_betablocker        *(n pctn<heart_failure_betablocker>='%')

previous_coronary_heart          *(n pctn<previous_coronary_heart>='%')

coronary_heart_betablocker     *(n pctn<coronary_heart_betablocker>='%')

,

all='All'

ethnicity='Ethnicity/race'

;

FORMAT sex sex_. ETCETERA...;

RUN;

Anyone got a solution?

Thanks in advance!

Adam

10 REPLIES 10
Astounding
PROC Star

I doubt you will find a simple solution.  What you are describing is the way PROC TABULATE is designed to work.  Any observation with a missing value for a CLASS variable is removed from all tables, whether that particular CLASS variable is used in that particular table or not.  The reasoning for this is that it guarantees that all your tables are built from the same set of observations.

To circumvent this feature, I see two choices.  The easiest one is to run separate PROC TABULATEs, each with a different CLASS statement.  If that takes too long to run, a more difficult alternative would be to run a PROC SUMMARY first, and save the summary data set.  Then feed the summary data set into as many PROC TABULATEs as you would like ... they will all run quickly.  That approach would require changing some of the syntax in PROC TABULATE.  Instead of getting N and PCTN statistics, you would likely be asking for SUM and PCTSUM statistics.

Good luck.

Tal
Pyrite | Level 9 Tal
Pyrite | Level 9

but if he formats the class variable and uses preloadftm+completetypes those obs still wont be included in the report?

data_null__
Jade | Level 19

Maybe some of the offending class variables could be made analysis variables.  previous_heart_failure looks like it might be a yes/no variable that if coded 0/1 you could summarize with mean to get percent,  proportion actually.

Adam1
Calcite | Level 5

Thanks for Your time Astounding, data _null and Tal, however the problem was not solved.

It would be great if this problem could be solved. I'm sure there are many SAS users out there dealing with this problem.

My solution right now is executing one proc tabulate per categorical variable. However this is very time consuming since tables must be unioned later in a text editor.

Perhaps there isn't a fix for this, other than going around an output via proc freq/summary.

:smileyplain:

data_null__
Jade | Level 19

It can be "fixed".  Post some example data and expected result.

snoopy369
Barite | Level 11

I have this issue as well.  The way I get around it is I make the dataset vertical, with "category" and "response" variables (or whatever makes sense), only outputting rows where the category variable has a nonmissing value.  Then use CATEGORY and RESPONSE as the class, or CATEGORY could be a BY variable if that makes more sense (with the PCTNs).

IE

data for_tab;

set sashelp.class;

array analysis_vars age height weight;

array char_Vars sex; *more of course if you have more;

do _t = 1 to dim(analysis_vars);

category=vname(analysis_vars[_t]);

response=analysis_vars[_t];

output;

end;

*do same thing for char_Vars;

run;

proc tabulate data=for_tab;

class category response;

tables category, response*colpctn; *or whatever you want stat-wise;

run;

If you have statistics you want sometimes but not other times, just ask for them and then drop them when you transpose back to horizontal post-analysis.

Adam1
Calcite | Level 5

values separated by space.

period (.) indicates missing value.

first row = variable names.

observation sex glucose_level kidney_disease physical_activit coronary_disease heart_failure treated_for_coronary treated_for_heart_failure  

1 1 . . . 0 0 . .  

2 1 100 0 3 0 0 . .  

3 1 38 0 2 1 1 . .  

4 1 50 . 1 0 0 . .  

5 1 48 0 4 0 0 . .  

6 2 50 0 5 0 0 . .  

7 2 54 0 . 1 0 1 .  

8 2 48 0 3 1 0 0 .  

9 2 51 0 5 0 0 . .  

10 2 . 1 . 0 0 . .  

11 1 41 0 3 0 0 . .  

12 1 43 . 4 1 0 1 .  

13 1 40 0 3 0 0 . .  

14 2 59 . 5 0 0 . .  

15 1 54 0 4 0 0 . .  

16 1 48 0 . 1 1 1 1  

17 1 47 0 5 0 0 . .  

18 1 44 0 2 0 0 . .  

19 1 51 0 2 0 0 . .  

20 1 54 0 5 1 0 1 .  

21 2 51 0 5 0 0 . .  

22 2 51 0 4 0 0 . .  

23 2 51 1 1 1 0 1 .  

24 1 . 0 2 0 0 . .  

25 2 48 0 2 0 0 . .  

26 1 45 0 5 0 0 . .  

27 1 . . 1 1 0 0 .  

28 1 40 0 . 0 0 . .  

29 2 44 0 5 0 0 . .  

30 2 . . . 1 0 1 .  

31 2 47 0 . 0 0 . .  

32 1 52 0 3 0 0 . .  

33 1 . . 4 1 1 1 1  

34 2 97 0 5 0 0 . .  

35 2 43 . . 1 1 1 1  

36 1 50 0 3 0 0 . .  

37 2 99 0 3 0 0 . .  

38 2 35 0 5 0 0 . .  

39 1 69 1 3 0 0 . .  

40 2 47 1 . 0 0 . .  

41 1 53 . . 0 0 . .  

42 1 49 0 4 0 0 . .  

43 1 53 0 2 0 0 . .  

44 2 41 . 5 0 0 . .  

45 1 42 0 1 1 0 0 .  

46 1 78 0 5 0 0 . .  

47 1 33 1 4 0 0 . .  

48 1 64 0 5 0 0 . .  

49 2 47 0 5 0 0 . .  

50 1 38 0 5 0 0 . . 

Adam1
Calcite | Level 5

Hi again

Still havent solved the problem despite Your instructions. Any other suggestions? :smileyplain:

Thanks in advance

data_null__
Jade | Level 19

You replied to yourself so to whom is your comment addressed.  I suggest you post a complete working example with data, tabulate code, and expected output.

Adam1
Calcite | Level 5

Example data:

Variables: (underscore = continous variable, otherwise categorical).

Observation_number Ethnicity Sex Glucose Kidney_disease Exercise Heart_disease Heart_failure Heart_disease_treated Heart_Failure_treated

            
1Sweden 1...00..
2Sweden 11000300..
3Sweden 1380211..
4Sweden 150.100..
5Nordic countries 1480400..
6Sweden 2500500..
7Sweden 2540.101.
8Sweden 2480300..
9Sweden 2510500..
10Sweden 2.1.00..
11Sweden 1410300..
12Sweden 143.4101.
13Sweden 1400300..
14Sweden 259.500..
15Sweden 1540400..
16Sweden 1480.1111
17Sweden 1470500..
18Sweden 1440200..
19Sweden 1510200..
20Sweden 15405101.
21Latin America & Caribbean 2510500..
22Sweden 2510400..
23Sweden 25111101.
24Nordic countries 1.0200..
25Sweden 2480200..
26Sweden 1450500..
27Sweden 1..1100.
28Europe (Low income), Russia and Centr1400.00..
29Sweden 2440500..
30Sweden 2...101.
31Sweden 2470.00..
32Sweden 1520300..
33Sweden 1..41111
34Sweden 2970500..
35Europe (Low income), Russia and Centr243..1111
36Sweden 1500300..
37Sweden 2990300..
38Sweden 2350500..
39Sweden 1691300..
40Sweden 2471.00..
41Sweden 153..00..
42Sweden 1490400..
43Sub-Saharan Africa 1530200..
44Sweden 241.500..
45Sweden 1420100..
46Sweden 1780500..
47Sweden 1331400..
48Sweden 1640500..
49Sweden 2470500..
50Europe (Low income), Russia and Centr1380500..

Expected output:

Ethnicity

sex

Europe (Low income), Russia and Centr

Latin America & Caribbean

Nordic countries

Sub-Saharan Africa

Sweden

N

1

2

0

2

1

25

2

1

1

0

0

18

Glucose (mean)

40,333333333

51

48

53

53

Kidney disease

0

2

1

2

1

29

1

0

0

0

0

5

Exercise

1

0

0

0

0

4

2

0

0

1

1

4

3

0

0

0

0

8

4

0

0

1

0

6

5

1

1

0

0

13

Heart_disease

0

2

1

2

1

34

1

1

0

0

0

9

Heart_failure

0

2

1

2

1

40

1

1

0

0

0

3

Heart_disease_treated

0

0

0

0

0

1

1

1

0

0

0

7

Heart_failure_treated

1

1

0

0

0

2

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1193 views
  • 8 likes
  • 5 in conversation