BookmarkSubscribeRSS Feed
BlackCan
Calcite | Level 5
Hi there,

first of all: Hello to the SAS Community, I'm new in the Data Mining Business and working with the SAS Enterprise Miner 6.1.

I want to find out what correlation there is between the monthly temperatures (degree Celsius) in Germany and the monthly sales output (capacity in tons) from a specific food-product.
I know that there are many parameters more, which influence the sales output, but for the beginning I only want to find out how powerful the parameter temperature is.

The result that i want to have is like: If the monthly temperature rises 1 degree Celsius, the sales output will rise between 1,3 and 1,9 tons.

My intention is to have an "temperature-adjustment" for the forecasting of the monthly sales output.


My course of action right now is to take sales output data and temperature data from the past (in an excel file) and import this data with the File Import Node to the Enterprise Miner. Then I correct some "critical" sales output data (I use the replacement and impute node) and run the regression node. But the result isn't very satisfying.

So do you have any tips and tricks for me to get a better result?

PS: I know that there is a correlation between the temperature and the sales output, but I don't know how powerful it is.


Thanks
BlackCan
7 REPLIES 7
David_Duling
SAS Employee
Hi there. Can you tell us more about what you mean by result was not satisfactory - poor model fit, unacceptable interpretation, bad prediction to 1 degree change, what ?
BlackCan
Calcite | Level 5
The result is not satisfactory, because my Average Squared Error is 10996,45 for train data, and 12450,12 for validation data.

My properties for the regression node are:
Regression Type: Logistic Regression
Link Function: Logic
...
Selection Model: Stepwise
Selection Criterion: Validation Error

By the way: Do I have to partition my Data? At the moment I split it into 50% train and 50% validation data. The more I think of it, I believe I don't have to do it...

PS: I miss info like Odds Ratio Estimates in the regression output. How do I let the miner create those info? Are those info missing because I import the data with the File Import node and not with a data source?


I'm sorry for asking so many questions 😕
David_Duling
SAS Employee
Regarding the data and the odds ratio - the answer is no. Enterprise Miner uses proc dmreg, rather than proc logistic or proc reg, which produces a somewhat different set of output statistics based on data mining needs including scalability.

Regarding the model fit, it will of course depend on your data and your function; I suggest you call tech support and ask for assistance, they would probably enjoy that interaction.
SPR
Quartz | Level 8 SPR
Quartz | Level 8
Hello BlackCan,

You can use proc Npar1Way to test correlation between your target variable and predictor variables and find the best predictor before moving to EM.

Sincerely,
SPR
David_Duling
SAS Employee
the stat explore node will compute corrleations between interval inputs and interval targets, and the varclus node will compute variable-variable correlations. stat explore node will also choose predictors.
SPR
Quartz | Level 8 SPR
Quartz | Level 8
Hello David Duling,

Proc NPAR1WAY measures differences in empirical distributions of predictors for Event and Not Event classes of a binary target and provides several statistics to estimate this differences. This approach can be also applied to continuous targets by using a suitable format to split a target into several intervals. I found this approach very convenient for preliminary analysis of predictors.

Sincerely,
SPR
David_Duling
SAS Employee
SPR - it sounds like you should write an extension tool that we all can use.

http://support.sas.com/documentation/onlinedoc/miner/em61/ext_nodes.pdf

Cheers.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 3216 views
  • 0 likes
  • 3 in conversation