BookmarkSubscribeRSS Feed
Buaskes
Calcite | Level 5

Hi there.

I am doing am online fraud classification case where i have 364 variables and 50.000 observations. 300 of these variables are binary product variables thus indicating if the purchase made was of a specific product or not. I am thinking that there must be some information hidden in these variables but i can figure out a good way of dealing with them. Does anyone have an idea?

Thanks

4 REPLIES 4
Reeza
Super User

MBA - market basket analysis - which products are likely to be batched together?

If only one of the 300 is filled for every observation then change the data structure to have a product variable instead?

Ksharp
Super User

As Reeza pointed out , Maybe You could encode that category variable into a numeric variable by proc glmselect , then fit it in model.

M_Maldonado
Barite | Level 11

There is a data mining approach for rare events, often used to flag fraud. Give it a try not transforming or reject variables just yet. Try clustering your data and if you have a few flagged or confirmed fraud cases, you can train a predictive model for each cluster. You are hoping that your fraudsters have different patterns than the rest of your customers, and you would have a higher concentration of fraudsters in certain clusters.


Make sure your cluster makes sense and decide whether you need to standardize or tweak your clustering. For your 300 binary variables you do not need to standardize but do standardize if you have other inputs in really different scales.

presented this approach in SAS Global Forum 2015. Take a look at his paper

     SAS® Does Data Science: How to Succeed in a Data Science Competition

     http://support.sas.com/resources/papers/proceedings15/SAS2520-2015.pdf

Compare this approach to Reeza's and Xia's suggestions.

Good luck,

Miguel

Buaskes
Calcite | Level 5

Thanks a lot for the replies guys. I have enough to get me forward now Smiley Happy

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1085 views
  • 0 likes
  • 4 in conversation