Help using Base SAS procedures

Proc Tabulate reporting incorrect data

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

Proc Tabulate reporting incorrect data

Hello,

 

I have an issue using proc tabulate. My dataset has three tiers with sample sizes of 12742, 12680, and 8227, respectively. When I run a proc means, it reports the entire sample and the correct means for variables within each of the three tiers. But when I run a proc tabulate, most of the sample is not reported. Also, while the variable "x" is missing for all respondents and the proc means clearly shows this (I use this variable to put a line in my tabulated reports), the proc tabulate is reporting a mean of 16.55 for x. I would really appreciate help getting to the bottom of this one.

 

Code is below and outputs are attached. I'm on SAS Enterprise Guide 6.1 (64 bit) and have SAS 9.4

 

Thanks!

 

 

CODE:


proc tabulate data = dsn3;
class tier ;
var &profvar;
tables &profvar, tier * (n mean);
run;

 

proc means data=dsn3 n mean; var &profvar; class tier; run;

 

 

 

means.PNGtab.PNG

Accepted Solutions
Solution
‎01-23-2017 12:34 PM
New Contributor
Posts: 4

Re: Proc Tabulate reporting incorrect data

Posted in reply to drumphil27

Thank you everyone for your assistance in helping me to resolve this issue.

 

I was able to do a bit of research over the weekend, and I discovered the cause of the issue I was having. The dataset I am working with is very large, and the number of variables was causing the proc tabulate to break. I added a keep statement to reduce the dataset to just those variables I need, and the proc tabulate is now working correctly!

 

Thanks again!

View solution in original post


All Replies
Super Contributor
Posts: 252

Re: Proc Tabulate reporting incorrect data

Posted in reply to drumphil27

Without having access to your data, it's not simply a case of using:

proc tabulate data=dsn3 missing;

is it?

Trusted Advisor
Posts: 1,018

Re: Proc Tabulate reporting incorrect data

Posted in reply to drumphil27

I don't see any syntax that would suggest this is an issue of proc means vs proc tabulate.  Is EG really giving you the same dataset for both procs?    How about showing us the log?

SAS Super FREQ
Posts: 8,862

Re: Proc Tabulate reporting incorrect data

Posted in reply to drumphil27
I don't understand the need to use X to put a line in the report. There are many other ways to highlight a Total line or to get either REPORT or TABULATE to create a TOTAL for you.
cynthia
Super User
Posts: 19,770

Re: Proc Tabulate reporting incorrect data

Posted in reply to drumphil27

Loook at the difference in Ns across the reports. 

Note that proc tabulate deletes records listwise if any variable being used is missing.

 

I agree with @LaurieF that this is highly likely an issue of missing values.  

New Contributor
Posts: 4

Re: Proc Tabulate reporting incorrect data

[ Edited ]

Thanks everyone for your responses.

 

I re-ran the program adding "missing" to the proc tabulate statement as LaurieF suggested, but got the same exact table with the issues.

 

mkeintz, the log is attached - there is only one dataset "dsn3" so I don't know how EG could be pointing to different datasets, and it finds all the variables and everything, but just shows values that cannot be accurate.

 

Reeza, it's interesting that for d2: age, the sample sizes are correct, though, right? FYI, I have run this program hundreds of times on dozens of datasets and never had this issue, and it has always included that variable "x" that is always missing for all observations.

 

FYI Cynthia_sas, I create the variable "x" and label it "<---->". I do this to create a blank line, because it ends up being a really long report and it helps me break out each variable into sections.

 

Thanks again, all!

Super Contributor
Posts: 252

Re: Proc Tabulate reporting incorrect data

Posted in reply to drumphil27

For a start, I suggest:

option nofmterr;

That way we don't get 2,000 pages of format errors, when it could have been shown in one page!

 

Can you give us access to the dataset? It's hard to work on it if we can only guess.

SAS Super FREQ
Posts: 8,862

Re: Proc Tabulate reporting incorrect data

Posted in reply to drumphil27
Hi, there are a lot of different ways to break up TABULATE reports without putting your own divider line in. BY group processing or PAGE processing is one way. Using ALL is another way. Or, you could move to PROC REPORT which allows you to use a LINE statement to insert a blank line between logical groups.

cynthia
Trusted Advisor
Posts: 1,018

Re: Proc Tabulate reporting incorrect data

I had also at first thought this might be a listwise deletion issue upon encountering a missing value.  But then, If listwise deletion is operating, why is D2 (presumably listed in macrovar PROFVAR) not effected in the tabulate?  It shows within-tier N's exactly equal in the proc tabulate vs proc means.

 

And PROC MEANS output shows a constant N for all vars within each of the TIER classes.  And those constant N's alway equal the "NOBS" reported by proc means for the given tier. Doesn't this suggest that the observations are all complete (i.e. no missing vars)?

 

@drumphil27: But just to be sure: How many total records do you have (I can't read your log)?  Please run a proc freq;  tables tier;run;

Trusted Advisor
Posts: 1,018

Re: Proc Tabulate reporting incorrect data

@drumphil27: I've now succeeded in opening your log file, and I see that DSN3 has 33,649 obs, which is exactly the total obs for the tiers as reported by the proc means.  And since the proc means also reports the same number of obs in each tier for all the variables  (except X) in &PROFVAR, it looks like you only have complete records - no missing values in the TIER variable or the &profvar variables, suggesting the problem is NOT a missing values issue.

 

But wait!  The log you provided does not show the same proc tabulate code as your initial topic note:  Instead of the expected

 

proc tabulate data = dsn3 missing;
    class tier ;
    var &profvar;
    tables &profvar, tier * (n mean);
run;            

 

 It has

 

192        proc tabulate data = dsn3 missing;
193          class everyone tier ;
194          var &profvar;
195         tables &profvar, everyone * (n mean) tier * (n mean);
196        run;            

 

Now I do see that you have an EVERYONE=1 statement in a data step that appears to be applied to every record, so it shouldn't reduce the N of observations reproted by the tabulate procedure, but I have these questions:

  1. Why is the EVERYONE class not shown in the proc tabulate output you provided?
  2. Is EVERYONE actually=1 in every record?  Run a proc freq to confirm.
  3. You apparently want to use everyone to get a total over all tiers.  If so, in the future don't bother creating this new variable.  Instead use the keyword ALL in proc tabulate, as in
        tables &profvar, (ALL='All tiers')*(n mean)   tier*(n mean);

To repeat: just to eliminate it as the culprit, I would take a look at the class variable EVERYONE (which appears only in the log of the proc tabulate) to see whether it can explain the problem.

 

BTW, your program includes a proc transpose with a

   PROC TRANSPOSE DATA=DD OUT=DD;

statement.  I would strongly suggest you NOT use the same name for a transposed dataset as the untransposed version.

Solution
‎01-23-2017 12:34 PM
New Contributor
Posts: 4

Re: Proc Tabulate reporting incorrect data

Posted in reply to drumphil27

Thank you everyone for your assistance in helping me to resolve this issue.

 

I was able to do a bit of research over the weekend, and I discovered the cause of the issue I was having. The dataset I am working with is very large, and the number of variables was causing the proc tabulate to break. I added a keep statement to reduce the dataset to just those variables I need, and the proc tabulate is now working correctly!

 

Thanks again!

Trusted Advisor
Posts: 1,018

Re: Proc Tabulate reporting incorrect data

Posted in reply to drumphil27

Are you saying you added a keep statement, but the CLASS and VARS statement in proc tabulate were unchanged?  And that made the problem go away?

 

I'm glad your problem disappered, but the memory consumption of tabulate should be much more sensitive to the number of analysis vars, and the number of categories of the class vars than to the number of vars in the source data set.

 

Could you test this?  Add an OPTIONS FULLSTIMER; statement to your program, and run the proc tabulate twice, once with the keep and once without.  The log should show the amount of memory consumed.  I'd be interested in the comparison.

 

regards,

Mark

New Contributor
Posts: 4

Re: Proc Tabulate reporting incorrect data

Mark,

 

Yes, I did not change the class or vars statements, only added a keep statement (and added sgio=yes), and then the proc tabulate N's all matched the correct sample sizes shown in the proc means output I had shared.

 

Unfortunately, I spent far too many hours trying to resolve that issue and I have a deadline to meet on this project, so perhaps at a later time I can come back and try to run your test.

 

Thanks again for your help!

Super User
Posts: 19,770

Re: Proc Tabulate reporting incorrect data

Posted in reply to drumphil27

If this occurred as stated, without an error message in the log, I would HIGHLY suggest you report the issue to SAS SUPPORT (support.sas.com) so they can verify this issue and issue a hotfix if required.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 13 replies
  • 405 views
  • 1 like
  • 5 in conversation