turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- ODS and Base Reporting
- /
- Calculate mean of class variable using Tabulate

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-05-2013 10:15 AM

I am just learning to use Tabulate and I have a table of college student enrollments and have created a table that calculates the number of enrollments in each curriculum for each of the past 5 academic years using code similar to this:

Proc Tabulate data=test;

class curr acadyear;

tables curr all, acadyear*(N all);

run;

and it gives me a table like this:

Curr20 | 200708 | 200809 | 200910 | 201011 | 201112 |
---|---|---|---|---|---|

101 | |||||

201 | |||||

301 | |||||

What I want to add is another column that calculates the mean for each curriculum for the 5 years. I've tried several things (most of which gave me the error message associated with not having an analysis variable). Is there a way to trick Tabulate to give me the mean of a class variable?

Thanks,

Brian

Accepted Solutions

Solution

02-06-2013
04:44 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-06-2013 04:44 PM

Brian,

Your current data set will not support the table you want. But you're not that far from it. Try this preparatory step:

proc freq data=test;

tables curr * acadyear / noprint out=summarized (keep=curr acadyear count);

run;

Then you should be able to use the data set SUMMARIZED as the input to PROC TABULATE. Include a VAR COUNT; statement, and ask for the MEAN statistic instead of the N statistic:

proc tabulate data=summarized;

class curr acadyear;

var count;

tables curr all, (acadyear all) * count * mean;

run;

That's untested code, but it should be pretty close to what you need.

Good luck.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-05-2013 11:01 AM

You don't have a var statement specifying what variable to summarize. And when you say 'MEAN' do you mean the average number of enrollments over the 5 years? Or something else?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-05-2013 02:04 PM

Reeza,

Thanks for the response!

Yes, I want the average enrollments over the 5 year period. This seems like it should be very simple to do. I'm not sure there is a variable I can specify in a Va statement. The number of enrollments is based on the class variable acadyear. Can I somehow create a new variable that I could calculate the 5-year average for each curriculum?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-05-2013 10:24 PM

Hi:

I guess I don't understand what you mean, when you say that the "number of enrollments is based on the class variable ACADYEAR" -- don't you have a separate variable for enrollments? What is the structure of your input data?

I don't understand how that TABLE statement is giving you that output? Where is the BOX area that TABULATE puts at the intersection of the row headers and the column headers? Where is your extra column underneath every ACADYEAR value for the ALL?

So, here's some code to try. Look at the structure of SASHELP.SHOES. Where I have REGION, you have CURR. Where I have PRODUCT, you have ACADYEAR. Report #1 shows what you should be getting based on the code you posted. Report #2 is what I think you want IF you have a separate ENROLLMENT variable that you can put in a VAR statement (such as where I have SALES).

cynthia

**ods listing close;**

**ods html file='c:\temp\show_mean.html';**

**Proc Tabulate data=sashelp.shoes;**

** where region in ('Asia', 'Canada', 'Pacific');**

** title '1) You should be getting this';**

** class region product;**

** tables region all,**

** product*(N all);**

**run;**

**Proc Tabulate data=sashelp.shoes;**

** where region in ('Asia', 'Canada', 'Pacific');**

** title '2) What I think you want';**

** class region product;**

** var sales;**

** tables region=' ' all,**

** product*(N=' ') all='Tot Cnt'*n=' ' sales='Avg Sales'*mean=' ' / box=region;**

**run;**

**ods html close;**

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-06-2013 03:07 PM

Cynthia,

Thanks very much for responding!

Unfortunately I am very dense when it comes to Proc Tabulate. Here's the exact code I'm using, which works great to give me a table with curriculum and acadplan going down the first two columns and then the next 5 columns contain enrollments for the academic years from 2007 to 2012. What I want is a last column that calculates the 5-year average enrollment. But my table doesn't really have an analysis variable to put into a Var statement. It seems I need some type of derived variable but I'm having trouble figuring out what.

**proc** **tabulate** data = enr.AnnFiveYearCombined;

class CURR ay acadplan;

table CURR=''*(acadplan='' all='Subtotal') all='Total', ay=' '*N='';

title 'Enrollments for Programs 2007-08 through 2011-12';

**run**;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-06-2013 04:05 PM

Hi:

No worries, it's just that this recently posted TABULATE code is significantly different from what you originally posted and with TABULATE, that makes a difference to understanding what the results will look like.

So, AY cannot be a YEAR value and an ENROLLMENT value. Unless you mean that the TOTAL COUNT of all the AY values divided by 5 is what you want to see in your final column. This is where seeing some of your data would be helpful.

Here's what AY*N does inside TABULATE. Every unique value of AY will get a column header. Then the number in the cell will be the count of observations for the intersection of the CURR*ACADPLAN with that value of AY. The N in the cell is the statistic that SAS calculates based on the number of obs that TABULATE would place in that cell. Once TABULATE plops a number in a cell coming from the N statistic, it doesn't "remember" what that number was when it moves onto the next column or cell. When you use ALL, TABULATE "recounts" all the obs that fall into that category.

To really help you, the structure of your data is critical. What does** enr.AnnFiveYearCombined** look like??? I know that you might not be able to post the EXACT data, but can you make some data with FAKE values for CURR and ACADPLAN so that it is easier to make a recommendation on how to get what you want. Some data manipulation may be required. Hard to predict without seeing the structure of what you are starting with.

cynthia

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-06-2013 04:42 PM

Cynthia,

Thanks again for the reply!

Each record in the data table contains student information. The class variables Curr, AcadPlan, and ay are all non-numeric character data. There are a lot of other variables, most of which are non-numeric character data. The data table is like this:

Curr AcadPlan AY StudentID

101 11 200708 1111111

101 21 200708 2222222

201 11 200708 3333333

Hope this better explains what I'm working with.

Brian

Solution

02-06-2013
04:44 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-06-2013 04:44 PM

Brian,

Your current data set will not support the table you want. But you're not that far from it. Try this preparatory step:

proc freq data=test;

tables curr * acadyear / noprint out=summarized (keep=curr acadyear count);

run;

Then you should be able to use the data set SUMMARIZED as the input to PROC TABULATE. Include a VAR COUNT; statement, and ask for the MEAN statistic instead of the N statistic:

proc tabulate data=summarized;

class curr acadyear;

var count;

tables curr all, (acadyear all) * count * mean;

run;

That's untested code, but it should be pretty close to what you need.

Good luck.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-06-2013 07:58 PM

Astounding,

Thanks for the reply. I'll try that out in the morning!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-07-2013 01:13 PM

Astounding,

That's the ticket! Thanks,

Brian Adams