Help using Base SAS procedures

TABULATE NOT COUNTING CORRECTLY

Reply
N/A
Posts: 0

TABULATE NOT COUNTING CORRECTLY

Hi

I have a table with 376k records. I am using the following PROC Tabulate to count the number of records per a particular variable. The following is the results of the log;

2682 PROC TABULATE DATA=PICreditHire missing;
2683 CLASS FIRST_PI_RESDATE;
2684 VAR CreditHireTotal;
2685 TABLE FIRST_PI_RESDATE,CreditHireTotal*(N='NbrOfClaims');
2686 Title;
2687 RUN;
NOTE: There were 376842 observations read from the data set

However when I sum up the 'NbrOfClaims' field in the resulting output I only get 347k claims, despire the fact that the Tabulate has (correctly) read 376k claims from the source dataset. I've messed round with the MISSING options and eyeballed the source dataset and can't understand why Tabulate is missing some of the records. I've also tried using the PROC FREQ (below) and that has given me the correct results (376k records) as has taking the dataset into MS Access and running a count from there.

PROC FREQ DATA-PICreditHire;
TABLES FIRST_PI_RESDATE;
RUN;

Any ideas why PROC TABULATE isn't picking up all the records in the count? I can't use PROC FREQ because I need to have some other analysis in the report (i.e. SUM) I've just reduced it to a Count on this post to make it easier to read. The problem appears to be that the TABULATE is not including some of the values of 'CreditHireTotal' (Number, Length 8), which is why the PROC freq works as that is just a count against the FIRST_PI_RESDATE. From eyeballing I can't find any values that look strange and if I use CreditHireTotal as the Class variable and count against that I get a full count of the records (376k).

Message was edited by: RobH
SAS Super FREQ
Posts: 8,862

Re: TABULATE NOT COUNTING CORRECTLY

Posted in reply to deleted_user
Hi:
By default, when you cross an analysis variable with a statistic, missing values are ignored in the calculation of statistics. I'm guessing that you have several observations with missing values for the "CreditHireTotal" variable.

If you run the code below, you will see that a copy of SASHELP.CLASS has been altered so that the value the SEX variable for 2 people has been changed and 2 heights have been turned to missing. The MISSING option reveals the 2 missing obs for the SEX variable, but has no effect on the count when it's based on (crossed with) height. The TABULATE program shows the difference between asking for the simple N and asking for the N crossed with the analysis variable.

cynthia
[pre]

data class;
set sashelp.class;
if name = 'Alfred' or name = 'Alice'
then sex = ' ';
else if _n_ gt 17 then height=.;
run;

proc print data=class;
title 'note missing values';
run;

proc freq data=class;
tables sex height / nocum nopercent;
run;

PROC TABULATE DATA=class missing;
CLASS sex;
VAR height;
TABLE sex='Gender' all,
height*(N='NbrOfStudents') / box='Ex 1';

table sex='Gender' all,
n='NbrOfStudents' / box='Ex 2';

table sex='Gender' all,
n='Total count' height*N='non missing heights'
/ box='Ex 3';
Title;
RUN;
[/pre]
Ask a Question
Discussion stats
  • 1 reply
  • 104 views
  • 0 likes
  • 2 in conversation