BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Babloo
Rhodochrosite | Level 12

What type of analysis can be done with the following variables? E.g chi square test, hypothesis testing.

Name, address, city, state,zip, balance,product,account id,open date

Basically I need to give insights to bank with their given data.

I tried chi square test between balance and product. But I'm not successful.

I request someone to give a skeleton for any analysis you propose.

1 ACCEPTED SOLUTION

Accepted Solutions
SASKiwi
PROC Star

Customer profitability requires account transaction data such as fees and interest as well as the banks cost of lending and is very complex to work out . If you are only starting out I would start with simpler questions.

To measure the risk of going into default and not repaying their loans you need to track customer account behaviour over years of data looking at repay history and loan balances. If you don't have historical data going over several years then you cannot do this type of analysis. You then need to identify the accounts that weren't paid back and look at the account behaviour prior to this. You are talking many weeks of work to come up with any meaningful results. If you have never done this type of work before then again starting with simpler questions might be better.

View solution in original post

12 REPLIES 12
Reeza
Super User

Have you started off with the basics - summaries?

Average account balance

Distribution of account balance

# of accounts by state/zip/city

# of unique individuals

# of accounts/individual

% of accounts/individuals vs population data for state/city/zip - could be used for deciding where business can expand.

% of account balance by state/city/zip

Age of customers by state/city/zip

For many of the # tables above you could probably run a chi-square test

ballardw
Super User

If you want to run something like chi-square for balances you could create ranges of values that mean something to the bank such as 0, 1 to 10000, 10000 to 25000, more than 25000 or similar. The boundaries for the groups ideally would mean the bank treats the customer differently in some manner such as rate changes for loans/deposits, additional offers, reminders, tax affect or similar.

One way to do that would be to create custom format(s) and apply those where you are doing the chi-square. The formatted values will be used to create groups based on the range.

It may be that each product could use different ranges. Also I would expect different behaviors between deposit products, loans and brokerage. So a simple is there a difference in the distribution of values between products that a chi-square provides to be not greatly informative.

Total of balance across accounts for individuals might indicate "valued customer" or similar status.

if the data were at different times and not a single snapshot then changes over time would likely be informative.

SASKiwi
PROC Star

What is the focus of your analysis? Is it customer behaviour or understanding customers? Or is it something else? Without giving us more guidance it is difficult to point you in the right direction. Reeza's list is a good starting point.

I actually work in a bank and there hundreds if not thousands of way you can look at bank data. Usually there is some end game in mind such as marketing, like upselling customers, or it could be financial - finding out who your profitable customers are, or it could be risk-related - who are most likely to not repay their loans.

Babloo
Rhodochrosite | Level 12

I need to understand customer behavior as you said. I wish to find out 'profitable customers are, or it could be risk-related - who are most likely to not repay their loans'.

Chi square analysis is just an example which I know.I open to do any analysis on your proposal which can be done via EG.

Ksharp
Super User

Then you should take a look at

Logistic Regression  - proc logistic

Possion Regression - proc genmod  ( can apply to multi-dimension contingency table ,which chi-square is usually to two dimension )

Ksharp
Super User

Then you should take a look at

Logistic Regression  - proc logistic

Possion Regression - proc genmod  ( can apply to multi-dimension contingency table ,which chi-square is usually to two dimension )

Babloo
Rhodochrosite | Level 12

Any idea of providing me the dependent and independent variables for these regression analysis? Any additional statements\options are most welcome.

Ksharp
Super User

It is long story to talk. Actually I am not expert about it ,although my mayor is Economic and Financial .

Make bivariable variable :

0 - fraud

1- not frand

use proc logistic to find which variable is the most important to influence fraud. Of course ,there is a Forecast score probability.

And Rate Possion Regression can make a score for each type custom of fraud . Here is a paper .

24188 - Modeling rates and estimating rates and rate ratios (with confidence intervals)

Xia Keshan

Reeza
Super User

Based on your description of your data you don't seem to have variables that would reflect profitable customers or risk related customers.

Babloo
Rhodochrosite | Level 12

May I request you to tell me the other analysis that can be performed on my data apart from chi-square test?

Reeza
Super User

Any analysis pertinent to the question. You're going at it backwards - come up with questions and then figure out how to analyze the data.

SASKiwi
PROC Star

Customer profitability requires account transaction data such as fees and interest as well as the banks cost of lending and is very complex to work out . If you are only starting out I would start with simpler questions.

To measure the risk of going into default and not repaying their loans you need to track customer account behaviour over years of data looking at repay history and loan balances. If you don't have historical data going over several years then you cannot do this type of analysis. You then need to identify the accounts that weren't paid back and look at the account behaviour prior to this. You are talking many weeks of work to come up with any meaningful results. If you have never done this type of work before then again starting with simpler questions might be better.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 5690 views
  • 7 likes
  • 5 in conversation