BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hi

I have a table with 376k records. I am using the following PROC Tabulate to count the number of records per a particular variable. The following is the results of the log;

2682 PROC TABULATE DATA=PICreditHire missing;
2683 CLASS FIRST_PI_RESDATE;
2684 VAR CreditHireTotal;
2685 TABLE FIRST_PI_RESDATE,CreditHireTotal*(N='NbrOfClaims');
2686 Title;
2687 RUN;
NOTE: There were 376842 observations read from the data set

However when I sum up the 'NbrOfClaims' field in the resulting output I only get 347k claims, despire the fact that the Tabulate has (correctly) read 376k claims from the source dataset. I've messed round with the MISSING options and eyeballed the source dataset and can't understand why Tabulate is missing some of the records. I've also tried using the PROC FREQ (below) and that has given me the correct results (376k records) as has taking the dataset into MS Access and running a count from there.

PROC FREQ DATA-PICreditHire;
TABLES FIRST_PI_RESDATE;
RUN;

Any ideas why PROC TABULATE isn't picking up all the records in the count? I can't use PROC FREQ because I need to have some other analysis in the report (i.e. SUM) I've just reduced it to a Count on this post to make it easier to read. The problem appears to be that the TABULATE is not including some of the values of 'CreditHireTotal' (Number, Length 8), which is why the PROC freq works as that is just a count against the FIRST_PI_RESDATE. From eyeballing I can't find any values that look strange and if I use CreditHireTotal as the Class variable and count against that I get a full count of the records (376k).

Message was edited by: RobH
1 REPLY 1
Cynthia_sas
SAS Super FREQ
Hi:
By default, when you cross an analysis variable with a statistic, missing values are ignored in the calculation of statistics. I'm guessing that you have several observations with missing values for the "CreditHireTotal" variable.

If you run the code below, you will see that a copy of SASHELP.CLASS has been altered so that the value the SEX variable for 2 people has been changed and 2 heights have been turned to missing. The MISSING option reveals the 2 missing obs for the SEX variable, but has no effect on the count when it's based on (crossed with) height. The TABULATE program shows the difference between asking for the simple N and asking for the N crossed with the analysis variable.

cynthia
[pre]

data class;
set sashelp.class;
if name = 'Alfred' or name = 'Alice'
then sex = ' ';
else if _n_ gt 17 then height=.;
run;

proc print data=class;
title 'note missing values';
run;

proc freq data=class;
tables sex height / nocum nopercent;
run;

PROC TABULATE DATA=class missing;
CLASS sex;
VAR height;
TABLE sex='Gender' all,
height*(N='NbrOfStudents') / box='Ex 1';

table sex='Gender' all,
n='NbrOfStudents' / box='Ex 2';

table sex='Gender' all,
n='Total count' height*N='non missing heights'
/ box='Ex 3';
Title;
RUN;
[/pre]

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 720 views
  • 0 likes
  • 2 in conversation