turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Data Mining
- /
- Correlations between Input and Targetvariables

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-03-2010 04:39 AM

Hi there,

first of all: Hello to the SAS Community, I'm new in the Data Mining Business and working with the SAS Enterprise Miner 6.1.

I want to find out what correlation there is between the monthly temperatures (degree Celsius) in Germany and the monthly sales output (capacity in tons) from a specific food-product.

I know that there are many parameters more, which influence the sales output, but for the beginning I only want to find out how powerful the parameter temperature is.

The result that i want to have is like: If the monthly temperature rises 1 degree Celsius, the sales output will rise between 1,3 and 1,9 tons.

My intention is to have an "temperature-adjustment" for the forecasting of the monthly sales output.

My course of action right now is to take sales output data and temperature data from the past (in an excel file) and import this data with the File Import Node to the Enterprise Miner. Then I correct some "critical" sales output data (I use the replacement and impute node) and run the regression node. But the result isn't very satisfying.

So do you have any tips and tricks for me to get a better result?

PS: I know that there is a correlation between the temperature and the sales output, but I don't know how powerful it is.

Thanks

BlackCan

first of all: Hello to the SAS Community, I'm new in the Data Mining Business and working with the SAS Enterprise Miner 6.1.

I want to find out what correlation there is between the monthly temperatures (degree Celsius) in Germany and the monthly sales output (capacity in tons) from a specific food-product.

I know that there are many parameters more, which influence the sales output, but for the beginning I only want to find out how powerful the parameter temperature is.

The result that i want to have is like: If the monthly temperature rises 1 degree Celsius, the sales output will rise between 1,3 and 1,9 tons.

My intention is to have an "temperature-adjustment" for the forecasting of the monthly sales output.

My course of action right now is to take sales output data and temperature data from the past (in an excel file) and import this data with the File Import Node to the Enterprise Miner. Then I correct some "critical" sales output data (I use the replacement and impute node) and run the regression node. But the result isn't very satisfying.

So do you have any tips and tricks for me to get a better result?

PS: I know that there is a correlation between the temperature and the sales output, but I don't know how powerful it is.

Thanks

BlackCan

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to BlackCan

11-03-2010 09:27 AM

Hi there. Can you tell us more about what you mean by result was not satisfactory - poor model fit, unacceptable interpretation, bad prediction to 1 degree change, what ?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to David_Duling

11-03-2010 10:36 AM

The result is not satisfactory, because my Average Squared Error is 10996,45 for train data, and 12450,12 for validation data.

My properties for the regression node are:

Regression Type: Logistic Regression

Link Function: Logic

...

Selection Model: Stepwise

Selection Criterion: Validation Error

By the way: Do I have to partition my Data? At the moment I split it into 50% train and 50% validation data. The more I think of it, I believe I don't have to do it...

PS: I miss info like Odds Ratio Estimates in the regression output. How do I let the miner create those info? Are those info missing because I import the data with the File Import node and not with a data source?

I'm sorry for asking so many questions :\

My properties for the regression node are:

Regression Type: Logistic Regression

Link Function: Logic

...

Selection Model: Stepwise

Selection Criterion: Validation Error

By the way: Do I have to partition my Data? At the moment I split it into 50% train and 50% validation data. The more I think of it, I believe I don't have to do it...

PS: I miss info like Odds Ratio Estimates in the regression output. How do I let the miner create those info? Are those info missing because I import the data with the File Import node and not with a data source?

I'm sorry for asking so many questions :\

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to BlackCan

11-08-2010 10:52 AM

Regarding the data and the odds ratio - the answer is no. Enterprise Miner uses proc dmreg, rather than proc logistic or proc reg, which produces a somewhat different set of output statistics based on data mining needs including scalability.

Regarding the model fit, it will of course depend on your data and your function; I suggest you call tech support and ask for assistance, they would probably enjoy that interaction.

Regarding the model fit, it will of course depend on your data and your function; I suggest you call tech support and ask for assistance, they would probably enjoy that interaction.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to BlackCan

11-19-2010 09:37 AM

Hello BlackCan,

You can use proc Npar1Way to test correlation between your target variable and predictor variables and find the best predictor before moving to EM.

Sincerely,

SPR

You can use proc Npar1Way to test correlation between your target variable and predictor variables and find the best predictor before moving to EM.

Sincerely,

SPR

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-19-2010 10:14 AM

the stat explore node will compute corrleations between interval inputs and interval targets, and the varclus node will compute variable-variable correlations. stat explore node will also choose predictors.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to David_Duling

11-19-2010 03:01 PM

Hello David Duling,

Proc NPAR1WAY measures differences in empirical distributions of predictors for Event and Not Event classes of a binary target and provides several statistics to estimate this differences. This approach can be also applied to continuous targets by using a suitable format to split a target into several intervals. I found this approach very convenient for preliminary analysis of predictors.

Sincerely,

SPR

Proc NPAR1WAY measures differences in empirical distributions of predictors for Event and Not Event classes of a binary target and provides several statistics to estimate this differences. This approach can be also applied to continuous targets by using a suitable format to split a target into several intervals. I found this approach very convenient for preliminary analysis of predictors.

Sincerely,

SPR

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-22-2010 08:37 AM

SPR - it sounds like you should write an extension tool that we all can use.

http://support.sas.com/documentation/onlinedoc/miner/em61/ext_nodes.pdf

Cheers.

http://support.sas.com/documentation/onlinedoc/miner/em61/ext_nodes.pdf

Cheers.