Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

How to find characteristics of Target and non-target customer?

Reply
Contributor
Posts: 20

How to find characteristics of Target and non-target customer?

Dear all experts,

Hi I am a rookie of data mining filed. I was assigned in a project. The objectives of my project is to find out the characteristics of good customer and the bad one.

Therefore I designed my experiment by following these steps:

1) Manage missing value by using replacement node (Some factors are shown as missing value because they are not found in database such as bankrupted customer, if one used to be in bankrupted record before, this field will be shown as "Yes", if not, it will be shown as "." therefore, this kind of missing value should be replaced with "No")

2) Drop missing value by using "Impute" Node

3) Over sampling the data (Good and bad should have the same proportion 50:50)

4) I reduce multicolinearity and find out the potential factors which will be used in clustering later by using "Regression" Node

5) Clustering the data

Do you think is there any problems about my experimental design? Please suggest me if there are any steps i should change. Besides, I still have  the problem with step 5: clustering data, I know that target variable is unable to use in clustering technique. Therefore, After step 4, should i separate the data into 2 groups: Good and Bad , and apply cluster technique in each group? I am not sure about what i design is correct or not. Is there any examples or any literature reviews?



Thanks for you all help in advance and look forward hearing from you all soon


Best regards,

Ros

Super User
Posts: 10,041

Re: How to find characteristics of Target and non-target customer?

Posted in reply to cmajorros

I suggest  proc logistic .

Contributor
Posts: 20

Re: How to find characteristics of Target and non-target customer?

I have never tried this way before, Is there any demonstrations show me how to use it and  how does it work.

Trusted Advisor
Posts: 1,228

Re: How to find characteristics of Target and non-target customer?

Posted in reply to cmajorros

Hi,

There are few things that might be helpful in your design.

1.     If your variables have lot of missing values let us 50% or more then it's better to drop those variable for further analysis. We can't generalize all the time that missing will

        always be 'No'.

2.     Not sure what do you mean by drop missing values using impute node.?

3.     After oversampling you will have a clustered data based on your target variable. You can perform cluster analysis for two clusters solution based on independent variables

        and correlate target and non-target customers within each cluster. This will give you an idea how significant are the independent variables  in clustering target and non-

        target customers.

Contributor
Posts: 20

Re: How to find characteristics of Target and non-target customer?

Dear stat@sas

Thanks for your reply. I first used replacement for manage with missing value for some factors which the value should be 0 not missing this causes of when i map the data and they were not found the record. For other kinds of factors , if the missing value exceed 50 percent it will be elimimated by Impute node

I have one more question. I have more than 100 factors in my experiment. I think I need to eliminate factors which are not related to being good/ bad customer by using regression node . Is that ok, If i do regression before separating the data into to 2 groups (Good and Bad) ? .

Trusted Advisor
Posts: 1,228

Re: How to find characteristics of Target and non-target customer?

Posted in reply to cmajorros

Yes, this is step 4 in your design.

N/A
Posts: 1

Re: How to find characteristics of Target and non-target customer?

Posted in reply to cmajorros

use the variable selection node to eliminate variables that have missing values and low correlation to the target.

you can also enable to AOV16 to transform interval variables in categoric ones.

Contributor
Posts: 20

Re: How to find characteristics of Target and non-target customer?

Thanks, but what the different between
Variable Selection and Regression? I think both method are able to eliminate correlation. I have never used Variable Selection before. How does it work?

Ask a Question
Discussion stats
  • 7 replies
  • 534 views
  • 6 likes
  • 4 in conversation