Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

how to define positive and negative samples well?

Reply
Contributor
Posts: 36

how to define positive and negative samples well?

I am building a model to predict a list who's total account is less than 30 thousand dollar will mostly buying stock in seven days.to build the train/validation set,I am wondering serval different ways:

(1)

pos samples are selected from all buying records without distinct customer(from 2016-09 till now,about 110 thousand records);neg samples are customer that not in the pos samples(same amount);

(2)
pos samples are selected from all buying records from past six month;neg samples are that not in the pos samples(neg samples may contain ones who bought fund stock before)

(3)
pos samples are selected from all buying records from past year by each month;and neg samples are that not in the pos samples each month (so someone may be pos last month,but neg this year)

when we define neg and pos samples,what is the key or principle or tricks in it?anyone have time come up with suggestion are appreciated.(I am not good enough at English,if there is anything unclear I feel sorrt about and please ask)

SAS Employee
Posts: 45

Re: how to define positive and negative samples well?

Hi, Geo-,

generalities are risky because there are usually data that will prove a principle wrong. That said, people's buying patterns change with time. Therefore, the last six months of data would be more representative of buying patterns in the near future than the last year's data. So, if there are enough data in the last six months, use those. Perhaps you can explore how long buying patterns look stable.

 

With regard to (3) in which someone may be positive last month but negative this year, the negative examples should be every customer who is not positive. (I assume it is the customer who is either positive or negative, not an individual transaction.) Retraining the model every month or so is prudent models because models get stale (because people change their buying patterns). When retraining, a customer who was positive when training the earlier model could be negative when training the new model.

 

I might not understand your question correctly, and I certainly do not know the data or concerns as well as you do, so I might have missed the mark here.

 

Good luck,
Padraic

Ask a Question
Discussion stats
  • 1 reply
  • 198 views
  • 0 likes
  • 2 in conversation