BookmarkSubscribeRSS Feed
Geo-
Quartz | Level 8

I am building a model to predict a list who's total account is less than 30 thousand dollar will mostly buying stock in seven days.to build the train/validation set,I am wondering serval different ways:

(1)

pos samples are selected from all buying records without distinct customer(from 2016-09 till now,about 110 thousand records);neg samples are customer that not in the pos samples(same amount);

(2)
pos samples are selected from all buying records from past six month;neg samples are that not in the pos samples(neg samples may contain ones who bought fund stock before)

(3)
pos samples are selected from all buying records from past year by each month;and neg samples are that not in the pos samples each month (so someone may be pos last month,but neg this year)

when we define neg and pos samples,what is the key or principle or tricks in it?anyone have time come up with suggestion are appreciated.(I am not good enough at English,if there is anything unclear I feel sorrt about and please ask)

1 REPLY 1
PadraicGNeville
SAS Employee

Hi, Geo-,

generalities are risky because there are usually data that will prove a principle wrong. That said, people's buying patterns change with time. Therefore, the last six months of data would be more representative of buying patterns in the near future than the last year's data. So, if there are enough data in the last six months, use those. Perhaps you can explore how long buying patterns look stable.

 

With regard to (3) in which someone may be positive last month but negative this year, the negative examples should be every customer who is not positive. (I assume it is the customer who is either positive or negative, not an individual transaction.) Retraining the model every month or so is prudent models because models get stale (because people change their buying patterns). When retraining, a customer who was positive when training the earlier model could be negative when training the new model.

 

I might not understand your question correctly, and I certainly do not know the data or concerns as well as you do, so I might have missed the mark here.

 

Good luck,
Padraic

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 2612 views
  • 0 likes
  • 2 in conversation