BookmarkSubscribeRSS Feed
Sam_zirak
Fluorite | Level 6

Hi,
I am trying to predict an interval target (customer spend in $) using GLM in SAS EM 13.2. I have 80K observations and 300 variables.
The problem that I have with the final model is that the minimum prediction for the target is around $50 whereas 11K of customers have spent less than $50 in the training dataset and 1% of customers have spent less than $1.

Any thoughts on why is this happening or on how to fix it?
The probability distribution that I have used in the GLM model is Gamma with a Log link function. I have also tried other prob. distributions as well as link functions like Tweedie or invert gaussian but Gamma with a log link function produced the smallest ASE. The distribution of Target variable is highly skewed on the right as there are plenty of customers spending lower amounts and only a few spending more than $1000.

Any thoughts are highly appreciated,
Thanks,
Sam

2 REPLIES 2
JBerry
Quartz | Level 8
Sometimes you are dealing with two different sets of underlying drivers, so something that might work is to see if you can identify those who spend less than $50 using a binary regression model first. If you can predict those (meaning you are getting a strong model), simply run the binary model first, and then run a separate GLM for each. I'll bet you can see which variables are different by comparing the GLM results.
Sam_zirak
Fluorite | Level 6

Thank you JBerry,

 

I really liked your idea but I am still confused that why GLM is unable to predict those small spends even without segmenting the whole population.

I have already segmented the population based on the credit limit of customers: CL le $500 and CL g $500.

 

Thanks,

Sam

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1851 views
  • 1 like
  • 2 in conversation