I have a bit of an odd problem. I just started with my company and am currently working update some old code. It uses a lot of Proc Freqs to help us check out the validity of our data. We have a LOT of variables, but most of them share between three or four valid value ranges. It was asked if it would be possible to condense the proc freqs into a table with a list of variables with the same valid value range comprising the rows, and the values that they could take across the top (and any invalid values that may be in there). All we want is a display of the counts of each different response for each variable.
I have been working on this by myself for a while now and haven't been able to get very far. I have had a few ideas on how to attempt this, but so far none of them have worked. I've been focusing mainly on proc report as we use it in other sections of the code, but just recently began to look at proc tabulate as well.
I apologize if this is unclear, if more information is requested I would be more than happy to provide it. Thanks for any help!
I'm not sure that will solve my problem. With it I can change each variable into an observation, but then I have 100k new variables from all of the observations. I would need some way of consolidating all of those variables into a smaller range of variables, one variable containing a count for each specific value or value range that appeared, correct? And even if I do that, how do I account for the invalid values? In general, I can't create a generic variable that contains invalid counts, as there are times when it is important to know what the invalid values actually are.
I guess my original question may have been this. Is it possible to use proc report or proc tabulate to create a table where each row is a different variable and the columns are predefined ranges and any invalid values in the variables (the predefined ranges I believe would come from a proc format)? Each cell on the table would then contain a count of how many observations contained that value or range for that variable. The output from running proc freq on 100-200 similar variables (3-4 times) is lengthy and cumbersome to analyze. The hope was that we could just use another procedure to consolidate the information, and to save paper.
I apologize, I am fairly new to SAS. I had great success teaching myself macros and a lot of other things, but this seems like it should be easy and is really giving me a difficult time. Thanks again for any help!
It would help to illustrate in a post-reply exactly the INPUT side and desired OUTPUT side which you would like to see happen in your SAS application/program.
Consider that you also have full-control with DATA step programming to generate your output statistics/summary information.
I understood that you had many observations, for example, with a variable ID, and you would like to generate a "collapsed" summary report showing counts of ID, represented as columns (and possibly to consider other key-variables with separate rows/observations). With this objective, I encourage you to explore PROC TRANSPOSE to transform your vertical data to horizontal, then summarize, if needed, and then you can use either PROC PRINT or if desired PROC REPORT to externalize your data in a report.
If I am understanding your problem I believe that proc tabulate may have a good shot at what you need.
A skeleton program would look like:
Proc tabulate data=;
/*the missing is important as otherwise the default behavior for tabulate
will only count the records that all have nonmissing values for the class
/*one table per variable*/
table n, ;
table n, ;
/* or one row with all of the variables across the page */
table n, ;
Depending on the number of categories for each variable I would look at how many columns you want across a page and select the number of variables per row. It may be that you want to use tabulate with a noprint option to create an output dataset and then proc transpose to arrange data for proc print.
I apologize for disappearing for a while. We had a lot of work all of a sudden.
To answer the earlier questions:
Desired output would be 1 row per variable.
Also, all of the variables that I would put in one table would have similar values. I would put all the numeric continuous in one report, all of the discrete in another, and all of the character variables with the same valid values in another.
I will once again look into proc tabulate (and the data step/proc transpose ideas) as I did not have much time to look at it before my attention was needed elsewhere. Thanks for all the help!