Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Correlations between Input and Targetvariables

Reply
New Contributor
Posts: 2

Correlations between Input and Targetvariables

Hi there,

first of all: Hello to the SAS Community, I'm new in the Data Mining Business and working with the SAS Enterprise Miner 6.1.

I want to find out what correlation there is between the monthly temperatures (degree Celsius) in Germany and the monthly sales output (capacity in tons) from a specific food-product.
I know that there are many parameters more, which influence the sales output, but for the beginning I only want to find out how powerful the parameter temperature is.

The result that i want to have is like: If the monthly temperature rises 1 degree Celsius, the sales output will rise between 1,3 and 1,9 tons.

My intention is to have an "temperature-adjustment" for the forecasting of the monthly sales output.


My course of action right now is to take sales output data and temperature data from the past (in an excel file) and import this data with the File Import Node to the Enterprise Miner. Then I correct some "critical" sales output data (I use the replacement and impute node) and run the regression node. But the result isn't very satisfying.

So do you have any tips and tricks for me to get a better result?

PS: I know that there is a correlation between the temperature and the sales output, but I don't know how powerful it is.


Thanks
BlackCan
SAS Employee
Posts: 35

Re: Correlations between Input and Targetvariables

Hi there. Can you tell us more about what you mean by result was not satisfactory - poor model fit, unacceptable interpretation, bad prediction to 1 degree change, what ?
New Contributor
Posts: 2

Re: Correlations between Input and Targetvariables

Posted in reply to David_Duling
The result is not satisfactory, because my Average Squared Error is 10996,45 for train data, and 12450,12 for validation data.

My properties for the regression node are:
Regression Type: Logistic Regression
Link Function: Logic
...
Selection Model: Stepwise
Selection Criterion: Validation Error

By the way: Do I have to partition my Data? At the moment I split it into 50% train and 50% validation data. The more I think of it, I believe I don't have to do it...

PS: I miss info like Odds Ratio Estimates in the regression output. How do I let the miner create those info? Are those info missing because I import the data with the File Import node and not with a data source?


I'm sorry for asking so many questions :\
SAS Employee
Posts: 35

Re: Correlations between Input and Targetvariables

Regarding the data and the odds ratio - the answer is no. Enterprise Miner uses proc dmreg, rather than proc logistic or proc reg, which produces a somewhat different set of output statistics based on data mining needs including scalability.

Regarding the model fit, it will of course depend on your data and your function; I suggest you call tech support and ask for assistance, they would probably enjoy that interaction.
Super Contributor
Super Contributor
Posts: 365

Re: Correlations between Input and Targetvariables

Hello BlackCan,

You can use proc Npar1Way to test correlation between your target variable and predictor variables and find the best predictor before moving to EM.

Sincerely,
SPR
SAS Employee
Posts: 35

Re: Correlations between Input and Targetvariables

the stat explore node will compute corrleations between interval inputs and interval targets, and the varclus node will compute variable-variable correlations. stat explore node will also choose predictors.
Super Contributor
Super Contributor
Posts: 365

Re: Correlations between Input and Targetvariables

Posted in reply to David_Duling
Hello David Duling,

Proc NPAR1WAY measures differences in empirical distributions of predictors for Event and Not Event classes of a binary target and provides several statistics to estimate this differences. This approach can be also applied to continuous targets by using a suitable format to split a target into several intervals. I found this approach very convenient for preliminary analysis of predictors.

Sincerely,
SPR
SAS Employee
Posts: 35

Re: Correlations between Input and Targetvariables

SPR - it sounds like you should write an extension tool that we all can use.

http://support.sas.com/documentation/onlinedoc/miner/em61/ext_nodes.pdf

Cheers.
Ask a Question
Discussion stats
  • 7 replies
  • 816 views
  • 0 likes
  • 3 in conversation