BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
drumphil27
Fluorite | Level 6

Hello,

 

I have an issue using proc tabulate. My dataset has three tiers with sample sizes of 12742, 12680, and 8227, respectively. When I run a proc means, it reports the entire sample and the correct means for variables within each of the three tiers. But when I run a proc tabulate, most of the sample is not reported. Also, while the variable "x" is missing for all respondents and the proc means clearly shows this (I use this variable to put a line in my tabulated reports), the proc tabulate is reporting a mean of 16.55 for x. I would really appreciate help getting to the bottom of this one.

 

Code is below and outputs are attached. I'm on SAS Enterprise Guide 6.1 (64 bit) and have SAS 9.4

 

Thanks!

 

 

CODE:


proc tabulate data = dsn3;
class tier ;
var &profvar;
tables &profvar, tier * (n mean);
run;

 

proc means data=dsn3 n mean; var &profvar; class tier; run;

 

 

 

means.PNGtab.PNG
1 ACCEPTED SOLUTION

Accepted Solutions
drumphil27
Fluorite | Level 6

Thank you everyone for your assistance in helping me to resolve this issue.

 

I was able to do a bit of research over the weekend, and I discovered the cause of the issue I was having. The dataset I am working with is very large, and the number of variables was causing the proc tabulate to break. I added a keep statement to reduce the dataset to just those variables I need, and the proc tabulate is now working correctly!

 

Thanks again!

View solution in original post

13 REPLIES 13
LaurieF
Barite | Level 11

Without having access to your data, it's not simply a case of using:

proc tabulate data=dsn3 missing;

is it?

mkeintz
PROC Star

I don't see any syntax that would suggest this is an issue of proc means vs proc tabulate.  Is EG really giving you the same dataset for both procs?    How about showing us the log?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Cynthia_sas
SAS Super FREQ
I don't understand the need to use X to put a line in the report. There are many other ways to highlight a Total line or to get either REPORT or TABULATE to create a TOTAL for you.
cynthia
Reeza
Super User

Loook at the difference in Ns across the reports. 

Note that proc tabulate deletes records listwise if any variable being used is missing.

 

I agree with @LaurieF that this is highly likely an issue of missing values.  

drumphil27
Fluorite | Level 6

Thanks everyone for your responses.

 

I re-ran the program adding "missing" to the proc tabulate statement as LaurieF suggested, but got the same exact table with the issues.

 

mkeintz, the log is attached - there is only one dataset "dsn3" so I don't know how EG could be pointing to different datasets, and it finds all the variables and everything, but just shows values that cannot be accurate.

 

Reeza, it's interesting that for d2: age, the sample sizes are correct, though, right? FYI, I have run this program hundreds of times on dozens of datasets and never had this issue, and it has always included that variable "x" that is always missing for all observations.

 

FYI Cynthia_sas, I create the variable "x" and label it "<---->". I do this to create a blank line, because it ends up being a really long report and it helps me break out each variable into sections.

 

Thanks again, all!

LaurieF
Barite | Level 11

For a start, I suggest:

option nofmterr;

That way we don't get 2,000 pages of format errors, when it could have been shown in one page!

 

Can you give us access to the dataset? It's hard to work on it if we can only guess.

Cynthia_sas
SAS Super FREQ
Hi, there are a lot of different ways to break up TABULATE reports without putting your own divider line in. BY group processing or PAGE processing is one way. Using ALL is another way. Or, you could move to PROC REPORT which allows you to use a LINE statement to insert a blank line between logical groups.

cynthia
mkeintz
PROC Star

I had also at first thought this might be a listwise deletion issue upon encountering a missing value.  But then, If listwise deletion is operating, why is D2 (presumably listed in macrovar PROFVAR) not effected in the tabulate?  It shows within-tier N's exactly equal in the proc tabulate vs proc means.

 

And PROC MEANS output shows a constant N for all vars within each of the TIER classes.  And those constant N's alway equal the "NOBS" reported by proc means for the given tier. Doesn't this suggest that the observations are all complete (i.e. no missing vars)?

 

@drumphil27: But just to be sure: How many total records do you have (I can't read your log)?  Please run a proc freq;  tables tier;run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

@drumphil27: I've now succeeded in opening your log file, and I see that DSN3 has 33,649 obs, which is exactly the total obs for the tiers as reported by the proc means.  And since the proc means also reports the same number of obs in each tier for all the variables  (except X) in &PROFVAR, it looks like you only have complete records - no missing values in the TIER variable or the &profvar variables, suggesting the problem is NOT a missing values issue.

 

But wait!  The log you provided does not show the same proc tabulate code as your initial topic note:  Instead of the expected

 

proc tabulate data = dsn3 missing;
    class tier ;
    var &profvar;
    tables &profvar, tier * (n mean);
run;            

 

 It has

 

192        proc tabulate data = dsn3 missing;
193          class everyone tier ;
194          var &profvar;
195         tables &profvar, everyone * (n mean) tier * (n mean);
196        run;            

 

Now I do see that you have an EVERYONE=1 statement in a data step that appears to be applied to every record, so it shouldn't reduce the N of observations reproted by the tabulate procedure, but I have these questions:

  1. Why is the EVERYONE class not shown in the proc tabulate output you provided?
  2. Is EVERYONE actually=1 in every record?  Run a proc freq to confirm.
  3. You apparently want to use everyone to get a total over all tiers.  If so, in the future don't bother creating this new variable.  Instead use the keyword ALL in proc tabulate, as in
        tables &profvar, (ALL='All tiers')*(n mean)   tier*(n mean);

To repeat: just to eliminate it as the culprit, I would take a look at the class variable EVERYONE (which appears only in the log of the proc tabulate) to see whether it can explain the problem.

 

BTW, your program includes a proc transpose with a

   PROC TRANSPOSE DATA=DD OUT=DD;

statement.  I would strongly suggest you NOT use the same name for a transposed dataset as the untransposed version.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
drumphil27
Fluorite | Level 6

Thank you everyone for your assistance in helping me to resolve this issue.

 

I was able to do a bit of research over the weekend, and I discovered the cause of the issue I was having. The dataset I am working with is very large, and the number of variables was causing the proc tabulate to break. I added a keep statement to reduce the dataset to just those variables I need, and the proc tabulate is now working correctly!

 

Thanks again!

mkeintz
PROC Star

Are you saying you added a keep statement, but the CLASS and VARS statement in proc tabulate were unchanged?  And that made the problem go away?

 

I'm glad your problem disappered, but the memory consumption of tabulate should be much more sensitive to the number of analysis vars, and the number of categories of the class vars than to the number of vars in the source data set.

 

Could you test this?  Add an OPTIONS FULLSTIMER; statement to your program, and run the proc tabulate twice, once with the keep and once without.  The log should show the amount of memory consumed.  I'd be interested in the comparison.

 

regards,

Mark

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
drumphil27
Fluorite | Level 6

Mark,

 

Yes, I did not change the class or vars statements, only added a keep statement (and added sgio=yes), and then the proc tabulate N's all matched the correct sample sizes shown in the proc means output I had shared.

 

Unfortunately, I spent far too many hours trying to resolve that issue and I have a deadline to meet on this project, so perhaps at a later time I can come back and try to run your test.

 

Thanks again for your help!

Reeza
Super User

If this occurred as stated, without an error message in the log, I would HIGHLY suggest you report the issue to SAS SUPPORT (support.sas.com) so they can verify this issue and issue a hotfix if required.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 3451 views
  • 1 like
  • 5 in conversation