Programming the statistical procedures from SAS

Proc Logistic: Rebuild models when errors in build data are discovered?

Reply
Frequent Contributor
Posts: 98

Proc Logistic: Rebuild models when errors in build data are discovered?

 

Hi all,

I’m using SAS Stat and Proc Logistic to build some basic product (retail) propensity models.  These questions have more to do with an issue that has developed with the data I’m using for some of these  “on the shelf” logistic regression models. I thought this was a good place to get some initial advice on how to handle this issue.

 

In a nutshell, the customer IDs I use to base the model build samples on models are not an accurate representation of customers. There is a bunch of customers which were assigned more than one customer ID (more than one email, more than one address…issues like that). So, two customer IDs  could actually be one “customer”.

 

Since I built my propensity models using Customer ID, that means I only modeled on portions of customer behavior, and duplicated customers as well.

 

Here are my questions:

  • Should I rebuild the models, once the data is corrected?
  • Should I do some validation work on the existing models now, by combining all associated cust Ids together, creating a new identifier, and creating gains charts?
  • If the data issue cannot be corrected is there a way to take these duplicate model IDs into account when I rebuild or create new models?

Any feedback will be greatly appreciated! Thanks !  

Grand Advisor
Posts: 16,926

Re: Proc Logistic: Rebuild models when errors in build data are discovered?

An assumption for regression is independence between observations. 

The ID issues violates this assumption, so yes, you should fix it, if possible.

 

 

Frequent Contributor
Posts: 98

Re: Proc Logistic: Rebuild models when errors in build data are discovered?

Thanks! If the issues can not be corrected. Do you have any suggestions on how to take this issue into account when buliding new models based on this data? 

Respected Advisor
Posts: 2,655

Re: Proc Logistic: Rebuild models when errors in build data are discovered?

Please don't think this is a flippant answer, but if you cannot set up the ID's as independent, then I would strongly suggest that you not build new models from the data, but rather spend your available time and money on collecting usable data.

 

However, if that really can't be done, then some sort of hierarchical modeling might be attempted, regarding the multiple IDs per unique customer.  If the unique identifier can be found, then you might consider the multiple measures as a repeated measure on the individual.  From there, it gets considerably murkier, as model selection procedures in the mixed model realm are not easily defined.  You will have to depend on subject knowledge more than you may want.

 

Steve Denham

Ask a Question
Discussion stats
  • 3 replies
  • 246 views
  • 1 like
  • 3 in conversation