Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

How can i treat 300 binary product variables in classification case?

Reply
New Contributor
Posts: 2

How can i treat 300 binary product variables in classification case?

Hi there.

I am doing am online fraud classification case where i have 364 variables and 50.000 observations. 300 of these variables are binary product variables thus indicating if the purchase made was of a specific product or not. I am thinking that there must be some information hidden in these variables but i can figure out a good way of dealing with them. Does anyone have an idea?

Thanks

Super User
Posts: 17,785

Re: How can i treat 300 binary product variables in classification case?

MBA - market basket analysis - which products are likely to be batched together?

If only one of the 300 is filled for every observation then change the data structure to have a product variable instead?

Super User
Posts: 9,676

Re: How can i treat 300 binary product variables in classification case?

As Reeza pointed out , Maybe You could encode that category variable into a numeric variable by proc glmselect , then fit it in model.

Super Contributor
Posts: 336

Re: How can i treat 300 binary product variables in classification case?

There is a data mining approach for rare events, often used to flag fraud. Give it a try not transforming or reject variables just yet. Try clustering your data and if you have a few flagged or confirmed fraud cases, you can train a predictive model for each cluster. You are hoping that your fraudsters have different patterns than the rest of your customers, and you would have a higher concentration of fraudsters in certain clusters.


Make sure your cluster makes sense and decide whether you need to standardize or tweak your clustering. For your 300 binary variables you do not need to standardize but do standardize if you have other inputs in really different scales.

presented this approach in SAS Global Forum 2015. Take a look at his paper

     SAS® Does Data Science: How to Succeed in a Data Science Competition

     http://support.sas.com/resources/papers/proceedings15/SAS2520-2015.pdf

Compare this approach to Reeza's and Xia's suggestions.

Good luck,

Miguel

New Contributor
Posts: 2

Re: How can i treat 300 binary product variables in classification case?

Thanks a lot for the replies guys. I have enough to get me forward now Smiley Happy

Ask a Question
Discussion stats
  • 4 replies
  • 447 views
  • 0 likes
  • 4 in conversation