What type of analysis can be done with the following variables? E.g chi square test, hypothesis testing.
Name, address, city, state,zip, balance,product,account id,open date
Basically I need to give insights to bank with their given data.
I tried chi square test between balance and product. But I'm not successful.
I request someone to give a skeleton for any analysis you propose.
Customer profitability requires account transaction data such as fees and interest as well as the banks cost of lending and is very complex to work out . If you are only starting out I would start with simpler questions.
To measure the risk of going into default and not repaying their loans you need to track customer account behaviour over years of data looking at repay history and loan balances. If you don't have historical data going over several years then you cannot do this type of analysis. You then need to identify the accounts that weren't paid back and look at the account behaviour prior to this. You are talking many weeks of work to come up with any meaningful results. If you have never done this type of work before then again starting with simpler questions might be better.
Have you started off with the basics - summaries?
Average account balance
Distribution of account balance
# of accounts by state/zip/city
# of unique individuals
# of accounts/individual
% of accounts/individuals vs population data for state/city/zip - could be used for deciding where business can expand.
% of account balance by state/city/zip
Age of customers by state/city/zip
For many of the # tables above you could probably run a chi-square test
If you want to run something like chi-square for balances you could create ranges of values that mean something to the bank such as 0, 1 to 10000, 10000 to 25000, more than 25000 or similar. The boundaries for the groups ideally would mean the bank treats the customer differently in some manner such as rate changes for loans/deposits, additional offers, reminders, tax affect or similar.
One way to do that would be to create custom format(s) and apply those where you are doing the chi-square. The formatted values will be used to create groups based on the range.
It may be that each product could use different ranges. Also I would expect different behaviors between deposit products, loans and brokerage. So a simple is there a difference in the distribution of values between products that a chi-square provides to be not greatly informative.
Total of balance across accounts for individuals might indicate "valued customer" or similar status.
if the data were at different times and not a single snapshot then changes over time would likely be informative.
What is the focus of your analysis? Is it customer behaviour or understanding customers? Or is it something else? Without giving us more guidance it is difficult to point you in the right direction. Reeza's list is a good starting point.
I actually work in a bank and there hundreds if not thousands of way you can look at bank data. Usually there is some end game in mind such as marketing, like upselling customers, or it could be financial - finding out who your profitable customers are, or it could be risk-related - who are most likely to not repay their loans.
I need to understand customer behavior as you said. I wish to find out 'profitable customers are, or it could be risk-related - who are most likely to not repay their loans'.
Chi square analysis is just an example which I know.I open to do any analysis on your proposal which can be done via EG.
Then you should take a look at
Logistic Regression - proc logistic
Possion Regression - proc genmod ( can apply to multi-dimension contingency table ,which chi-square is usually to two dimension )
Then you should take a look at
Logistic Regression - proc logistic
Possion Regression - proc genmod ( can apply to multi-dimension contingency table ,which chi-square is usually to two dimension )
Any idea of providing me the dependent and independent variables for these regression analysis? Any additional statements\options are most welcome.
It is long story to talk. Actually I am not expert about it ,although my mayor is Economic and Financial .
Make bivariable variable :
0 - fraud
1- not frand
use proc logistic to find which variable is the most important to influence fraud. Of course ,there is a Forecast score probability.
And Rate Possion Regression can make a score for each type custom of fraud . Here is a paper .
24188 - Modeling rates and estimating rates and rate ratios (with confidence intervals)
Xia Keshan
Based on your description of your data you don't seem to have variables that would reflect profitable customers or risk related customers.
May I request you to tell me the other analysis that can be performed on my data apart from chi-square test?
Any analysis pertinent to the question. You're going at it backwards - come up with questions and then figure out how to analyze the data.
Customer profitability requires account transaction data such as fees and interest as well as the banks cost of lending and is very complex to work out . If you are only starting out I would start with simpler questions.
To measure the risk of going into default and not repaying their loans you need to track customer account behaviour over years of data looking at repay history and loan balances. If you don't have historical data going over several years then you cannot do this type of analysis. You then need to identify the accounts that weren't paid back and look at the account behaviour prior to this. You are talking many weeks of work to come up with any meaningful results. If you have never done this type of work before then again starting with simpler questions might be better.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.