The LST tabulate output shows the n and pctn variable as "COUNT" and "COUNT_PCT", respectively, as expected.
However, the output dataset (tabulateout) shows the corresponding variable names as just "N" and "PctN_00000" instead of "COUNT" and "COUNT_PCT".
Why this discrepancy? Is there any way to force PROC TABULTE to name the output dataset variables as specified in the table statement? The problem with the PctN_00000 variable name is that it is not static. The "00000" actually corresponds to the number of class variables. So if I added var6 and var7 to the class statement, the variable name would change to PctN_0000000. This is very annoying and inconvenient. Please advise.
What you have in the TABLE statement only names the table columns in the "printed" output file or ODS output file -- it has no effect on what goes in the OUTPUT dataset (as a result of OUTPUT=). The usage of an '=' after a variable name in a TABLE statement only serves the purpose of providing a text string to act as a label for the column header in the "printed" or displayed output table. Normally, the string after the equal sign (=) is quoted, which serves to make the label function of the = more clear.
However, you certainly can use the RENAME= dataset option to rename your output variables as shown in the program below.
Thanks for your response. I am already using the RENAME= option in the PROC TABULATE statement. The problem is that the default column names in the output datset for certain statistic variables are NOT static/fixed. PCTN is a good example of this. SAS utilizes the following nomenclature for the PCTN column name in the output dataset:
where # represents the number of zeroes that correspond to the number of class variables. Here are examples:
3 class variables -> PctN_000
5 class variables -> PctN_00000
10 class variables -> PctN_0000000000
15 class variables -> PctN_000000000000000
N class variables -> PctN_[N zeroes]
So every time I add or subtract a variable to the class statement, I also have to update the number of zeroes in PctN_# in the RENAME= statement. This is very annoying/inconvenient, especially since the "N" statistic is NOT affected by this (the default column name is always "N" which does not require updating the RENAME= option every time).
Is this expected naming behavior for the PCTN column in the output dataset? What is the reason for adding zeroes at the end instead of simply using "PCTN" as the column name in the output dataset?
The number of observations in the output data set depends on the number of categories of data that are used in the tables and the number of subtables that are generated.
And then this note http://support.sas.com/kb/14/884.html
In some cases TABULATE needs to generate multiple output data set
variables for percentage statistics. That is because there can be
multiple denominator crossings, just like there can be multiple
numerator (table) crossings.
This means that merely using the statistic name or analysis variable
plus the statistic name will not guarantee a unique data set variable
I believe the documentation explains why the statistic names for the output dataset cannot be static/fixed with PROC TABULATE.
However, since you are only getting simple percents, without any complicated crossings, you might like the way that PROC FREQ creates the simple N and PCTN statistics in the output dataset. N is labelled FREQUENCY and PCTN is labelled PERCENT.
"If you use multiple table requests in a TABLES statement, the contents of the OUTPUT data set correspond to the last table request. ."
I was not aware that the ODS OUTPUT statement could bypass this limitation and would be able to capture the entire FREQ output regardless of the number of tables requests. It's something new I have just learned today and I intend to explore the ODS features in more detail. Thank you again!