## Statistical Analysis on Banking data

Solved
Super Contributor
Posts: 499

# Statistical Analysis on Banking data

What type of analysis can be done with the following variables? E.g chi square test, hypothesis testing.

Name, address, city, state,zip, balance,product,account id,open date

Basically I need to give insights to bank with their given data.

I tried chi square test between balance and product. But I'm not successful.

I request someone to give a skeleton for any analysis you propose.

Accepted Solutions
Solution
2 weeks ago
Super User
Posts: 3,381

## Re: Statistical Analysis on Banking data

Customer profitability requires account transaction data such as fees and interest as well as the banks cost of lending and is very complex to work out . If you are only starting out I would start with simpler questions.

To measure the risk of going into default and not repaying their loans you need to track customer account behaviour over years of data looking at repay history and loan balances. If you don't have historical data going over several years then you cannot do this type of analysis. You then need to identify the accounts that weren't paid back and look at the account behaviour prior to this. You are talking many weeks of work to come up with any meaningful results. If you have never done this type of work before then again starting with simpler questions might be better.

All Replies
Super User
Posts: 20,731

## Re: Statistical Analysis on Banking data

Have you started off with the basics - summaries?

Average account balance

Distribution of account balance

# of accounts by state/zip/city

# of unique individuals

# of accounts/individual

% of accounts/individuals vs population data for state/city/zip - could be used for deciding where business can expand.

% of account balance by state/city/zip

Age of customers by state/city/zip

For many of the # tables above you could probably run a chi-square test

Super User
Posts: 11,810

## Re: Statistical Analysis on Banking data

If you want to run something like chi-square for balances you could create ranges of values that mean something to the bank such as 0, 1 to 10000, 10000 to 25000, more than 25000 or similar. The boundaries for the groups ideally would mean the bank treats the customer differently in some manner such as rate changes for loans/deposits, additional offers, reminders, tax affect or similar.

One way to do that would be to create custom format(s) and apply those where you are doing the chi-square. The formatted values will be used to create groups based on the range.

It may be that each product could use different ranges. Also I would expect different behaviors between deposit products, loans and brokerage. So a simple is there a difference in the distribution of values between products that a chi-square provides to be not greatly informative.

Total of balance across accounts for individuals might indicate "valued customer" or similar status.

if the data were at different times and not a single snapshot then changes over time would likely be informative.

Super User
Posts: 3,381

## Re: Statistical Analysis on Banking data

What is the focus of your analysis? Is it customer behaviour or understanding customers? Or is it something else? Without giving us more guidance it is difficult to point you in the right direction. Reeza's list is a good starting point.

I actually work in a bank and there hundreds if not thousands of way you can look at bank data. Usually there is some end game in mind such as marketing, like upselling customers, or it could be financial - finding out who your profitable customers are, or it could be risk-related - who are most likely to not repay their loans.

Super Contributor
Posts: 499

## Re: Statistical Analysis on Banking data

I need to understand customer behavior as you said. I wish to find out 'profitable customers are, or it could be risk-related - who are most likely to not repay their loans'.

Chi square analysis is just an example which I know.I open to do any analysis on your proposal which can be done via EG.

Super User
Posts: 10,210

## Re: Statistical Analysis on Banking data

Then you should take a look at

Logistic Regression  - proc logistic

Possion Regression - proc genmod  ( can apply to multi-dimension contingency table ,which chi-square is usually to two dimension )

Super User
Posts: 10,210

## Re: Statistical Analysis on Banking data

Then you should take a look at

Logistic Regression  - proc logistic

Possion Regression - proc genmod  ( can apply to multi-dimension contingency table ,which chi-square is usually to two dimension )

Super Contributor
Posts: 499

## Re: Statistical Analysis on Banking data

Any idea of providing me the dependent and independent variables for these regression analysis? Any additional statements\options are most welcome.

Super User
Posts: 10,210

## Re: Statistical Analysis on Banking data

It is long story to talk. Actually I am not expert about it ,although my mayor is Economic and Financial .

Make bivariable variable :

0 - fraud

1- not frand

use proc logistic to find which variable is the most important to influence fraud. Of course ,there is a Forecast score probability.

And Rate Possion Regression can make a score for each type custom of fraud . Here is a paper .

24188 - Modeling rates and estimating rates and rate ratios (with confidence intervals)

Xia Keshan

Super User
Posts: 20,731

## Re: Statistical Analysis on Banking data

Based on your description of your data you don't seem to have variables that would reflect profitable customers or risk related customers.

Super Contributor
Posts: 499

## Re: Statistical Analysis on Banking data

May I request you to tell me the other analysis that can be performed on my data apart from chi-square test?

Super User
Posts: 20,731

## Re: Statistical Analysis on Banking data

Any analysis pertinent to the question. You're going at it backwards - come up with questions and then figure out how to analyze the data.

Solution
2 weeks ago
Super User
Posts: 3,381

## Re: Statistical Analysis on Banking data

Customer profitability requires account transaction data such as fees and interest as well as the banks cost of lending and is very complex to work out . If you are only starting out I would start with simpler questions.

To measure the risk of going into default and not repaying their loans you need to track customer account behaviour over years of data looking at repay history and loan balances. If you don't have historical data going over several years then you cannot do this type of analysis. You then need to identify the accounts that weren't paid back and look at the account behaviour prior to this. You are talking many weeks of work to come up with any meaningful results. If you have never done this type of work before then again starting with simpler questions might be better.

☑ This topic is solved.