I apologize ahead of time but what I have provided is ALL that I can provide given the confidential nature of my data.
Alrighty, What I am trying to do is produce a report BY year, counties/district, demographics, so on and so forth. I have explored a few different avenues between Proc Freq, Proc Report and Proc Tabulate. I have come very close using REPORT and TABULATE but both leave me a step of two shy of my ultimate goal for different reasons.
TABULATE:
This would be my first choice provided that the syntax in the table statement allows me to state multiple variables in one statement which helps with automation. My trouble comes in that I am trying to Calculate the PERCENTAGE of a BYGROUP. The groups are appropriately defined by "DESCRIPTION" and is calculated using PCTN. My data is weighted and I need the percentage to be calculated based on "OVERALL" not N. Speaking of "OVERALL" I can not get a format I created to work on the variable that takes any number under 5 and reports it as "<5". BONUS points if you can suppress the output of the Description column while keeping its value for grouping purposes intact.
REPORT:
The very same format that will not work in TABULATE for some reason works here and I have the flexibility to tag the "DESCRIPTION" column with a NOPRINT option in the DEFINE statement which makes the report look cleaner. However my troubles begin with that I can not figure out how to group the Percentages together using the "DESCRIPTION" group, nor can I get the Percentage to be applied to the OVERALL variable which has been weighted. The PCTN is applied the the unweighted variable N which is the total of the entire report. This report is not my first choice because the syntax will make it much harder to automate when adding additional variable on the fly.
I have attached OUTPUT for both reports, a small example of the data that is used as input for the procedures as well as the exact code I am running.
Help me Obi Wan, you are my only hope.
See if this comes closer. I am not sure I exactly understand what your denominator and numerator are for the percent but I think this may be it. Part of the issue is that PCTN doesn't really do what you want with summing a weighted variable.
You may not like the data step that I think is necessary because of the way tabulate will not do sums of class variables but should not be much more difficult.
data work.wttransposed; set tmp1.transposed_2; wtoverall=mi_weight*overall; dummy=1; run; proc tabulate data=work.wttransposed ; class location s race Descprition continuum11 / order=unformatted; var overall mi_weight wtoverall dummy; table location*(s race)*Descprition=' '*continuum11, wtoverall='Overall'*sum=''*f=censor. dummy=''*sum='n'*f=best5. wtoverall='Weighted percentage'*pctsum<continuum11>=''; /* weight mi_weight;*/ /* format continuum11 cascade. s sex. Descprition Desc. overall censor. ;*/ run;
I might be a bit surer if this solution "works" if an example of the expected output for that data example were provided.
Note I did not have your libraries so used mine.
I understand the nature of confidential data. However it really helps to have some data to test code against to see what is actually needed.
It need not be actual data just variables with values that will replicate the similar behavior. And demonstrate the expected behavior for that example data. OR use a SAS supplied data set and modify code to mimic your example. I am not going to retype data from a picture.
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.
Since it appears that you "sensitive" data is related to FIPs I think you might be able to anonymize it by 1) changing the variable name (though kind of late) , 2) not use a format related to displaying your values and 3) replacing the values with different value
I don't believe either of the outputs shown were directly created by Proc tabulate with the code attached. The row variable labels appear in unexpected places.
Generally a variable label such as your "Description" can be suppressed using something like Description=' ' or override the default label for one part of the code with Variable="Some label text". In complex tables you would usually also want the table option after the statistics and such / row=float to make the table appear cleaner because of the blank cells that the suppressed label is likely to leave.
You do not show where you attempted to use your custom format. I am going to guess from the way you show "< 5" that you want to apply it to a statistic such as N. To apply a format to a statistic you use : n*F=formatname. : if you are specifying a label with the statistic that would look like : n='Count'*F=formatname. It may also be a good idea to actually provide the format definition.
And sometimes you have to manually summarize data with other procedures like Proc Summary and a data step especially with weighted data if you want to mix weighted and non-weighted values in a calculation.
Ok.. A version of the same type of dataset is attached for your pleasure. Changing "PRIMARY_COUNTY_FIPS" to "LOCATION" should do. I can assure you that the pictures of the output I provided came directly from the code that is in the original post. If you have not see that type of output from Proc Tabulate before I would suggest running the code on this dataset.
The format is a used on the sum of the groups. As I said before it works with Proc Report but I am curious why it wont with Tabulate. I have included that code for you too.
Proc Format;
value censor
. = ' "
0-4.9 = "<5"
other = [8.];
run;
Also, I think you may misunderstand what I am doing with the DESCRIPTION variable with the REPORT NOPRINT option. Report can suppress a column completely, Tabulate, as far as I know, can not. I was attempting to see if someone knew how to do the same Proc Report option in Proc Tabulate.
@rcleven2 wrote:
Ok.. A version of the same type of dataset is attached for your pleasure. Changing "PRIMARY_COUNTY_FIPS" to "LOCATION" should do. I can assure you that the pictures of the output I provided came directly from the code that is in the original post. If you have not see that type of output from Proc Tabulate before I would suggest running the code on this dataset.
The format is a used on the sum of the groups. As I said before it works with Proc Report but I am curious why it wont with Tabulate. I have included that code for you too.
Proc Format; value censor . = ' " 0-4.9 = "<5" other = [8.]; run;
The format as shown has syntax error for the missing value, mismatched quotes, which is going to mess with a lot of other code.
-_- I appreciate the help with the grammar, but in the grand scheme of things that is an easy fix for both of us. What I need help with are the problems I pointed earlier, twice now. If you don't have anything to offer on the real problems then please "help" someone else.
See if this comes closer. I am not sure I exactly understand what your denominator and numerator are for the percent but I think this may be it. Part of the issue is that PCTN doesn't really do what you want with summing a weighted variable.
You may not like the data step that I think is necessary because of the way tabulate will not do sums of class variables but should not be much more difficult.
data work.wttransposed; set tmp1.transposed_2; wtoverall=mi_weight*overall; dummy=1; run; proc tabulate data=work.wttransposed ; class location s race Descprition continuum11 / order=unformatted; var overall mi_weight wtoverall dummy; table location*(s race)*Descprition=' '*continuum11, wtoverall='Overall'*sum=''*f=censor. dummy=''*sum='n'*f=best5. wtoverall='Weighted percentage'*pctsum<continuum11>=''; /* weight mi_weight;*/ /* format continuum11 cascade. s sex. Descprition Desc. overall censor. ;*/ run;
I might be a bit surer if this solution "works" if an example of the expected output for that data example were provided.
Note I did not have your libraries so used mine.
I am not familiar with Pro Tabulate at all but seeing something work and then tweaking it here and there and seeing the changes helps. Going back to the data step you used before, is there a way to add in a BLANK variable that can be used to give blank lines to the report between the DESCRIPTION Classes?
Hi:
You can add blank lines between rows or between groups using PROC REPORT. It is not really possible to customize the break lines (such as adding an "extra" blank line) using PROC TABULATE.
Adding a blank row to the data in PROC TABULATE won't achieve what you want. I may have an example of inserting blank lines. It's not clear to me whether you mean a blank column though or a blank row between Description classes.
However, here's an example of adding a line of text in PROC REPORT between each country. The COMPUTE block is how you add extra lines between grouping or ordering variables. In this case, I used COUNTRY.
But, could have written just a blank line as shown in the annotation on the screen shot. You can have a COMPUTE block like this for any of the ORDER or GROUP usage variables.
Here's a paper that discusses COMPUTE blocks and break processing in more detail: https://support.sas.com/resources/papers/proceedings17/SAS0431-2017.pdf
Cynthia
Proc Tabulate does things that I can not get Proc Report to do. Those problems have been addressed above and was my main issue, figuring out which set of problems were easier to overcome. I have seen examples of blank lines with the use of dummy variables before in Tabulate, I just need to fit it to the code provided to me above.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.