10-04-2013 08:52 AM
I have a request to create a report that is basically a statistical report. The report is to show counts of 'how many' observations there are for particular columns based on the value of the column. For example: data column 'TType', this column can have up to 5 different distinct values, values (1,2,3,4, &5). I need to get a count for how many observations there are in the table for each value. I have done this by using a proc freq. I also have to do counts on 20 other columns based on some criteria, for example, count the number of observations where a particular column is not null. I am not sure the direction to go with for this. I thought there would be a better way to code this than having 20 separate 'select count' statments. I also need to all the counts to a data set. This report has requests for the averages on some columns as well as counts for some columns. Would appreciate any suggestions!
10-04-2013 09:14 AM
For basic statistics and grouping, you should look into proc tabulate and/or proc report. Proc report syntax is slightly more complicated but it allows you to do some additionnal computations as opposed to proc tabulate where you are really stuck with the statistics provided by the procedure.
There are many other ways depending on how much you wish to manipulate the data (calculate the statistics) yourself and simply use a proc print with some ODS STYLE-ing.
Without a far more detailed data structure and desired output layout it is difficult to tell you which way is the best. With that said, the above 2 are designed to create tables with grouping (or class) variables like TTYPE. Tabulate does provide all the basic statistics that proc freq provides (default options). The condition on what is considered a missing value for the count is what might make it not workout.
10-04-2013 12:07 PM
Here are some examples of using PROC TABULATE to get both counts and statistics. Each TABLE statement produces 1 table. The 4th table uses a numeric variable in order to request MEAN statistic for MPG_CITY and MPG_HIGHWAY for each MAKE and TYPE, just to show a cross-tabulate. SASHELP.CARS has a LOT of car brands in the file, so I limited the MAKE variable to only 5 values to loosely correspond to your description.
ods listing close;
ods html file='c:\temp\car_info.html';
proc tabulate data=sashelp.cars;
where make in ('Acura', 'Audi', 'Toyota', 'Volkswagen', 'Saturn');
class make type ;
var mpg_city mpg_highway msrp;
table make all,
n / box='1 Make in Rows';
table make='2 Make as Columns' all ;
table type all,
make*n all*n/ box='3 Cross-Tab';
table type all,
make*(n='Count' mpg_city*mean mpg_highway*mean) / box='4 Average MPG';
ods html close;