first of all: Hello to the SAS Community, I'm new in the Data Mining Business and working with the SAS Enterprise Miner 6.1.
I want to find out what correlation there is between the monthly temperatures (degree Celsius) in Germany and the monthly sales output (capacity in tons) from a specific food-product.
I know that there are many parameters more, which influence the sales output, but for the beginning I only want to find out how powerful the parameter temperature is.
The result that i want to have is like: If the monthly temperature rises 1 degree Celsius, the sales output will rise between 1,3 and 1,9 tons.
My intention is to have an "temperature-adjustment" for the forecasting of the monthly sales output.
My course of action right now is to take sales output data and temperature data from the past (in an excel file) and import this data with the File Import Node to the Enterprise Miner. Then I correct some "critical" sales output data (I use the replacement and impute node) and run the regression node. But the result isn't very satisfying.
So do you have any tips and tricks for me to get a better result?
PS: I know that there is a correlation between the temperature and the sales output, but I don't know how powerful it is.
The result is not satisfactory, because my Average Squared Error is 10996,45 for train data, and 12450,12 for validation data.
My properties for the regression node are:
Regression Type: Logistic Regression
Link Function: Logic
Selection Model: Stepwise
Selection Criterion: Validation Error
By the way: Do I have to partition my Data? At the moment I split it into 50% train and 50% validation data. The more I think of it, I believe I don't have to do it...
PS: I miss info like Odds Ratio Estimates in the regression output. How do I let the miner create those info? Are those info missing because I import the data with the File Import node and not with a data source?
Regarding the data and the odds ratio - the answer is no. Enterprise Miner uses proc dmreg, rather than proc logistic or proc reg, which produces a somewhat different set of output statistics based on data mining needs including scalability.
Regarding the model fit, it will of course depend on your data and your function; I suggest you call tech support and ask for assistance, they would probably enjoy that interaction.
the stat explore node will compute corrleations between interval inputs and interval targets, and the varclus node will compute variable-variable correlations. stat explore node will also choose predictors.
Proc NPAR1WAY measures differences in empirical distributions of predictors for Event and Not Event classes of a binary target and provides several statistics to estimate this differences. This approach can be also applied to continuous targets by using a suitable format to split a target into several intervals. I found this approach very convenient for preliminary analysis of predictors.