About cmajorros

cmajorros · ‎10-31-2016

Dear All I am currently running Systerm stability report for a credit scoring model. I have found. During modelling process, I use data from Jan2009 to Jul 2015 in building model. After checking windows performance the duration for performing bad behavior is 18 months. So I used data from Jan09 to Feb 14 in building the model. After I finish building the model, I have test the charater of portfolio by selecting the data from Jan09 to Jul15 test the stability of characters. All characters have proper index value (Less that 0.1). After that a year (July 16), I use the model after that 3 months I run system stabilty report and character report by using data from Jul16 to Sep 16 as Actual data compare with Expected data (Jun09 - Feb14). I have found that the characters of customer has been shift. From this step, I think I have got a problem about selecting data in stability reports. I need you guy help by answering these questions: 1) Expected Data is included only data which was used in the model (Jun09 - Feb14) ? 2) After building the model, data I used in testing character should be from (Mar14 to July15)? or should i include data in modelling (Jan 09 - Jul 15) in testing? 3) After applying new the model, which is the range of actual data of system statbility? A. Jan09 - Sep 16 B.Mar 14 - Sep 16 C. After launching new model (Jul16 - Sep 16) Someone said it should be C. I am quite sure that this C will result in shifting of model and causes of very high index value (> 0.1). Personally, I think the data in testing should be included data which was used in modelling. but I am not really sure about what i thought. I am a newbie modelling and really need your suggestion . Thanks for your suggestion in advance.

cmajorros · ‎06-08-2016

Dear All I am building the Behavioral Scoring Model for a financial product. Basically, data is seperated into 2 parts (observation period and outcome period). For this project, I want to predict customer who have overdue amount equal to 1 installment will be come worse (move to overdue amount 2 installments) within the next 3 months. Are there any theories or practices to support my these questions ? How many months or years of data should be used in modelling? (12 months or 5 years or 7 years) Is it related to my product life cycle? If customers mostly have the contract 72 months (6 years) or 20 years like mortgage loan, the collected data in model should be 6 years or 20 years ? if yes, it seems like data for mortgage loan take so long time for collecting. It may result in changing of characteristics of customer. Or is collecting period should be 7 years which is the duration of business life cycle? Or only 12 months are enough? Is there any practices or theories to support the decision? Another question is how long should the outcome period be ? Is there any calculation for supporting? Regards, Ros

cmajorros · ‎12-08-2015

Dear All, I have an issue about selecting factors for credit scorecard. During approval process, I think some conditions such as Downpayment, Financing Amount and etc. might be changed, For example, Customer A intended to pay the downpayment 15% , but after that he/she might be requested from bank to pay 20% . In implementing the model, we need historical data in the analysis . Which amount should be used the first downpayment (15%) or last downpayment (20%)? It sounds like first downpayment was what customer intend to pay. While last downpayment was the actual payment. So, my question is which one is better? Look forward to hearing from you soon Ros

cmajorros · ‎08-20-2015

Thank you so much for your answer. it is very helpful.

cmajorros · ‎08-14-2015

Hi All Is it possible to find the position of a character in a text like these examples Text Need_Position_of_4 Result 123412344444 3 9 1112222223400120044444 2 18 For the first example I need 3th 4 and the second one I need the second 4.

cmajorros · ‎06-12-2015

Dear All I have got a problem with thai language when I export file from E-guide in excel format, and link to Oracle DBMS to import this export file to the table automatically. The problem is when I export file It is some time show Thai properly, but it sometimes show unreadable language. When it links to Oracle DBMS ,the import process wont be the problem if the original file is exported and readable in Thai. if not, the error will be shown and the import process will be failed. Is it possible to export CSV. file and make it readable in Thai permanently without configure in RDBMS everytime. Do I have to change any setting in EG?

cmajorros · ‎12-01-2014

Dear Statisticians I m building a predictive model by using SAS E-Miner credit Scoring . I used the data from 2007 to 2012. We have a factor call "Terms" which current situation (2013-2014) of the data is significantly change.The average of term for the model is around 30 terms. Two year later term are getting longer (Average 48 terms) due to changing of some practices . If I countinue using "Term" in my model. I think it s no longer useful and I have to re-build the model so soon. For example, =< 12 terms = 10 points 13-24 = 5 points 25- 36 = 0 points >=36 = -5 point From the score, all new customers will receive -5 points. Because the nobody has no term Less than 36. The proportion of other groups will be 0. So I decided to standardize my factor by 2 methods. 1) In finding Z we need SD, No of observation, and Average. I fixed the AVG of all records by using Average of data in model. This mean all records of data will have the same N, and Average 2) N , Average and SD are not fixed, They are vary by their date. Below are examples of the two groups 1) Contract Term Date AVG SD Z 1 12 01-Jan-07 23.25 12.65 (0.89) 2 18 02-Jan-07 23.25 12.65 (0.42) 3 20 03-Jan-07 23.25 12.65 (0.26) 4 24 04-Jan-07 23.25 12.65 0.06 5 12 05-Jan-07 23.25 12.65 (0.89) 6 16 06-Jan-07 23.25 12.65 (0.57) 7 48 07-Jan-07 23.25 12.65 1.96 8 36 08-Jan-07 23.25 12.65 1.01 2) Contract Term Date AVG SD Z 1 12 01-Jan-07 12.00 - - 2 18 02-Jan-07 15.00 4.24 0.71 3 20 03-Jan-07 16.67 4.16 0.80 4 24 04-Jan-07 18.50 5.00 1.10 5 12 05-Jan-07 17.20 5.22 (1.00) 6 16 06-Jan-07 17.00 4.69 (0.21) 7 48 07-Jan-07 21.43 12.47 2.13 8 36 08-Jan-07 23.25 12.65 1.01 From the above example, Value of Z of the two group are changed and can effect to the range of score. I need u all suggestion about the method of calculating the Z value which one is better. My background is not statistician or mathematician, but really interested in Data Modelling and really need support from you all guys. Thanks in advance. Ros

cmajorros · ‎11-24-2014

Dear all I really need you all guys help. I have data set like below; Contract Contract_Date Contract_Amount 1 01-Jan-13 10000 2 02-Jan-13 20000 3 01-Feb-14 30000 4 02-Feb-14 25000 5 10-Jun-14 22000 6 11-Jun-14 22000 and I want to create a new column name "AVG_Contract_AMT". I want to add the average of Contract_Amount which Contract_Date <= each row Contract_date like this Contract Contract_Date Contract_Amount AVG_Contract_AMT 1 01-Jan-13 10000 10000 2 02-Jan-13 20000 15000 3 01-Feb-14 30000 20000 4 02-Feb-14 25000 21250 5 10-Jun-14 22000 21400 6 11-Jun-14 22000 21500 I have never do any coding in EG like this before. Is there any examples? And In case I want to find Z Value ((X-AVG/SD)) for each contract,but AVG of data and SD must have contract_Date <= each row contract date like below Contract Contract_Date Contract_Amount AVG_Contract_AMT SD Z 1 01-Jan-13 10000 10000 100 0 2 02-Jan-13 20000 15000 100 50 3 01-Feb-14 30000 20000 100 100 4 02-Feb-14 25000 21250 79.05694 47.43 5 10-Jun-14 22000 21400 66.3325 9.045 6 11-Jun-14 22000 21500 60.55301 8.257 Please suggest the solution for me. Thank you in advance. Best Regards, Ros

cmajorros · ‎09-12-2014

Thanks, but what the different between Variable Selection and Regression? I think both method are able to eliminate correlation. I have never used Variable Selection before. How does it work?

cmajorros · ‎09-10-2014

I have never tried this way before, Is there any demonstrations show me how to use it and how does it work.

cmajorros · ‎09-10-2014

Dear stat@sas Thanks for your reply. I first used replacement for manage with missing value for some factors which the value should be 0 not missing this causes of when i map the data and they were not found the record. For other kinds of factors , if the missing value exceed 50 percent it will be elimimated by Impute node I have one more question. I have more than 100 factors in my experiment. I think I need to eliminate factors which are not related to being good/ bad customer by using regression node . Is that ok, If i do regression before separating the data into to 2 groups (Good and Bad) ? .

cmajorros · ‎09-10-2014

Dear all experts, Hi I am a rookie of data mining filed. I was assigned in a project. The objectives of my project is to find out the characteristics of good customer and the bad one. Therefore I designed my experiment by following these steps: 1) Manage missing value by using replacement node (Some factors are shown as missing value because they are not found in database such as bankrupted customer, if one used to be in bankrupted record before, this field will be shown as "Yes", if not, it will be shown as "." therefore, this kind of missing value should be replaced with "No") 2) Drop missing value by using "Impute" Node 3) Over sampling the data (Good and bad should have the same proportion 50:50) 4) I reduce multicolinearity and find out the potential factors which will be used in clustering later by using "Regression" Node 5) Clustering the data Do you think is there any problems about my experimental design? Please suggest me if there are any steps i should change. Besides, I still have the problem with step 5: clustering data, I know that target variable is unable to use in clustering technique. Therefore, After step 4, should i separate the data into 2 groups: Good and Bad , and apply cluster technique in each group? I am not sure about what i design is correct or not. Is there any examples or any literature reviews? Thanks for you all help in advance and look forward hearing from you all soon Best regards, Ros

cmajorros · ‎09-06-2014

Thanks so much. I am so new for this field. You answer is so helpful.

cmajorros · ‎09-05-2014

How to manage the factor which have high Information Value (<0.1), but the graph of WOE is shown like this. In practically, WOE graph should have linier slope like this In my case, if I re-grouped the attribute in “Interactive- Grouping” make the WOE graph have liner line like above it will cause of low Information Value Besides, If continue use this result , in scorecard node, it will affect my score result. For example, Percentage of Down payment <15% = 10 15=<DP<30 = 20 30=<DP<50 = 15 >=50 = -5 From the result, I don’t think we cant use this factor. The customers who have high down payment, should have the high score. I have 9 potential factors which were have IV exceed 0.1 ,but almost all of them have the result as I stated. Do you have any ideas to tackle with this problem? Thanks for you all help in advance. Best Regards, Ros

cmajorros · ‎08-29-2014

Thank you so much Windy. i am appreciated for you all kindness. You all did very good job. Thanks again

Online Status	Offline
Date Last Visited	‎10-31-2016 05:41 AM

system stability report (what range of data used for Actual data)

How long of the data for Behavioral Modelling

Factors for credit scoring

Re: How can I find the position of a character in a text.

How can I find the position of a character in a text.

Change language when export to CSV

Standardize Factor

How to find Average if by using EG

Re: How to find characteristics of Target and non-target customer?

Re: How to find characteristics of Target and non-target customer?

Re: How to manage factors which have IV <0.1,but the score are fluctua...

Re: How to find characteristics of Target and non-target customer?

system stability report (what range of data used for Actual data)

How long of the data for Behavioral Modelling

Factors for credit scoring

Re: How can I find the position of a character in a text.

How can I find the position of a character in a text.

Change language when export to CSV

Standardize Factor

How to find Average if by using EG

Re: How to find characteristics of Target and non-target customer?

Re: How to find characteristics of Target and non-target customer?

Re: How to find characteristics of Target and non-target customer?

How to find characteristics of Target and non-target customer?

Re: How to manage factors which have IV <0.1,but the score are fluctua...

How to manage factors which have IV <0.1,but the score are fluctuate

Re: How to set minimum credit scoring point not less than 0