Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Data Mining
- /
- Hypothesis testing with ASE?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-24-2016 10:01 AM

Hello,

I analyse texts with Text Mining in SAS EM and already did a decision tree plus a logistic regression.

As a result I am getting e.g. an average square error (ASE). Can this indicator sufficently accept a hypothesis?

Or are there better indicators?

Thanks, in advance and hope you have an idea.

Kind regards,

Benjamin

Accepted Solutions

Solution

01-25-2016
12:35 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-25-2016 10:44 AM

H Benjamin,

ASE is an estimate of model's mean squared error and it is not directly used for hypothesis testing. For example, in predictive modeling, you can use your model’s ASE on training and validation data to get an indication of overfitting. If your goal is overall hypothesis testing (between the target and all of the input variables), then you need to look at the overall p-value in the ANOVA table (available as on output of the Regression node). Total variance explained by the model and the model’s ASE (also degrees of freedoms of the model) are used together to calculate this p-value.

Hope this helps!

Funda

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-25-2016 06:25 AM

Hi Benjamin,

More context please.

What does your Text Mining flow look like? And what are you trying to predict?

A general walk-through of your data and your goal would be nice too!

Thanks,

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-25-2016 06:55 AM

Hi Migual,

The Text Mining flow contains nothing special. In detail it contains Text Parsing-, Text Filter-, Text Topic-,Text Cluster-, Data Partition-, Decision Tree- and Regression-Node. I try to determine, if the text topics (independent var.) can explain a metrical targer variable (e.g. company size). A null hypothesis could be: There is no correlation between the text topics and the company size.

Hope this make things clearer.

Solution

01-25-2016
12:35 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-25-2016 10:44 AM

H Benjamin,

ASE is an estimate of model's mean squared error and it is not directly used for hypothesis testing. For example, in predictive modeling, you can use your model’s ASE on training and validation data to get an indication of overfitting. If your goal is overall hypothesis testing (between the target and all of the input variables), then you need to look at the overall p-value in the ANOVA table (available as on output of the Regression node). Total variance explained by the model and the model’s ASE (also degrees of freedoms of the model) are used together to calculate this p-value.

Hope this helps!

Funda

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-25-2016 12:35 PM

Hi Funda,

This sounds very good to me, the p-value is what I have been seeking for. I will try this.

Thank you.

Kind regards,

Benjamin

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-26-2016 12:57 PM

Hi,

I tried this at SAS EM, but unfortunately I couldn't find the mentioned ANOVA table (with the p-value) of the Regresssion node.

Where can I find this table?

Best regards,

Benjamin

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-26-2016 01:59 PM

ANOVA (Analysis of Variance) table is avalibale as an output of the regression node. P-value is shown by the pink arrow

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-26-2016 02:10 PM

OK, I was in the wrong node (decision tree) Now I've found it.

Thank's again.