turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- BI
- /
- Enterprise Guide
- /
- Weighted Frequency Table in EG producing slightly ...

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-04-2015 11:16 AM

I am a Base SAS user that has been learning Enterprise Guide (6.1) over the last couple months. As I've been converting some of my Base SAS programs over to EG as a learning exercise, I've come across a problem - I get slightly different results with a weighted frequency depending on which program I use. My data is a survey sample of 3,508 observations that includes a weight variable to extrapolate the information to approximately 535,000 cases.

In Base SAS, I'm running a simple Proc Freq table on a single variable and weighting it with the "WEIGHT" statement. I get a total of 316,666 observations for the conditions set.

When I run a Summary Tables task on the SAME dataset, and assign the weight variable to the "Frequency Count" task role, I get a total of 315,461 observations for the same conditions.

Am I missing something in how to weight data properly in Enterprise Guide?

Accepted Solutions

Solution

02-04-2015
11:44 AM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to brittneykp

02-04-2015 11:44 AM

Summary Tables is using PROC TABULATE behind the scenes. You might see if there are nuances in how TABULATE and FREQ use WEIGHT, or treat MISSING differently. For a better comparison, you might try the Table Analysis task instead -- it uses PROC FREQ for n-way frequency analysis.

Chris

All Replies

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to brittneykp

02-04-2015 11:41 AM

Frequency count is likely assuming that the values are that, counts. Which would be integers. So likely the fractional part of your weights is getting discounted. As and exercise, examine the integer and fractional parts of your weight variable and sum them separately. See if you get values close to 315,461 and 1205.

Solution

02-04-2015
11:44 AM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to brittneykp

02-04-2015 11:44 AM

Summary Tables is using PROC TABULATE behind the scenes. You might see if there are nuances in how TABULATE and FREQ use WEIGHT, or treat MISSING differently. For a better comparison, you might try the Table Analysis task instead -- it uses PROC FREQ for n-way frequency analysis.

Chris

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to brittneykp

02-04-2015 12:19 PM

I've hit this before. I can't remember the exact details, but in one of the SAS procedures using a particular option truncates the weight to an integer, while there is a different option to use the full value of the weight. Some experimentation should clear it up for you.

Tom

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to brittneykp

02-04-2015 12:36 PM

Thanks for the help! Looks like PROC TABULATE's "FREQ" statement does indeed truncate the weights; when I choose the One-Way Frequency task instead of Summary Tables task, I get the correct values. I guess I'll just have to be very careful which task I use when creating frequencies for my sample data.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to brittneykp

02-04-2015 03:17 PM

You can use Summary Tables, just make sure you use the Relative Weight instead of the Frequency Count. There are a few things that Summary Tables does that can't be done easily any other way.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to TomKari

02-04-2015 03:58 PM

I didn't find that assigning the weight to Relative Weight worked in Summary Tables - my output still only included the actual number of observations when I did that and not the weighted values. My understanding is that Relative Weight only works when computing statistics on other variables, not on frequency counts.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to brittneykp

02-04-2015 06:12 PM

Sorry, I was only looking at this thread with half an eye, and didn't notice that the discussion was about frequency counts. To get a weighted frequency count out of Summary Tables, use the Relative Weight, and the statistic that you want is Sum of Weights instead of N. I just tested it, and it works fine. Let me know if you have any problems.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to TomKari

02-04-2015 06:19 PM

I tried to do this myself and it won't let me use the SumWgt statistic - I get a crossed out symbol when I try to drop it on the table. The other stats work fine, and I have my weight variable set to the Relative Weight role. What am I missing?

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to brittneykp

02-04-2015 07:11 PM

Although you're conceptually only doing a frequency count, I think that EG wants you to apply the Sum of Weights on an analysis variable. Try putting any numeric variable into your table request (make sure there are no missing values), and try applying the sum of weights statistic on that.

One thing to be aware of with PROC FREQ is it will fail most quickly because of a large result set...PROC TABULATE is better, and PROC MEANS is basically impossible to break.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to TomKari

02-05-2015 10:59 AM

Well, I won't be able to use Summary Tables then, because most of the variables I am evaluating are character variables. Thanks for the help!

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to brittneykp

02-05-2015 01:31 PM

Just add a numeric variable named "DummyVar" with a value of 1 on every record to your table, and you're good to go!

Tom