## Need help on correlation and regression

Regular Contributor
Posts: 168

# Need help on correlation and regression

My data is as below.

cust     account         balance

Smith  checking        \$1,000.00
Smith  Savings        \$4,000.00
Smith  mortgage      \$150,000.00
Smith  credit_card    \$500.00
Jones  checking       \$973.78
Jones  savings         \$2,613
Jones  Mortgage      .
Jones  credit_card   \$140.48

I need to convert as below via proc transpose.

cust   checking   saving         mortgage       credit_card

smith \$1,000.00  \$4,000.00  \$150,000.00  \$500.00
Jones \$973.78   \$2,613         .                   \$140.48

Then I need to check the correlation between accounts. It means how 'checking' was sold when compared to the other accounts like saving, mortgage and credit_card. basically I interested whether 'checking' account customers  interested with 'savings' account or 'saving' account customers  interested with 'credit card' account so on.

Here I need suggestion to find out correlation (whether it is positive or negative) between the accounts and then I've been instructed to use regression technique to predict details of the person.

Since I'm new to analytics I need some inputs on correlation and regression to proceed further on this assignment. Let me know if you any other details.

Thanks.

Super User
Posts: 10,766

## Re: Need help on correlation and regression

Plot an correlation matrix graph  or get these correlation coefficient   via  proc corr .

Super User
Posts: 23,668

## Re: Need help on correlation and regression

You may want to also consider Market Basket Analysis, i.e. if a person purchases one product are they likely to purchase another.

On the other hand, if you have only 4 types you may want to change the balances to 1/0 to indicate the presence of an account type and then run some categorical analysis.

I'm not sure how to automatically account for the absence of an account when calculating correlation.

Regular Contributor
Posts: 168

## Re: Need help on correlation and regression

Thanks Reeza. I want to work on this assignment as well ( if a person purchases one product are they likely to purchase another.)  eg. If a person buys auto loan he may likely buy consumer loan. In this case also my data is as posted in my intial post.

May I ask you to brief with example to accomplish this task or just provide me some rough outline to proceed further.

Thanks.

Regular Contributor
Posts: 168

## Re: Need help on correlation and regression

I request someone to point me in right direction on my assignment.

thanks.

Super User
Posts: 23,668

## Re: Need help on correlation and regression

1. proc transpose

2. proc corr

3. mba via mba macro SUGI 28: A SAS(r) Market Basket Analysis Macro: The Poor Man's ...

proc sort data=have; by cust account; run;

proc transpose data=have out=want;

by cust;

id account;

var balance;

run;

proc corr data=want;

run;

Regular Contributor
Posts: 168

## Re: Need help on correlation and regression

Thanks Reeza. After reading the document I've added the code to include macro variables, but couldn't seem to be any difference in my output. It seem we're are creating macro variables and we not explictly applied any functions/proc's to predict the customer.

data cust;
infile cards expandtabs;
input cust \$ account :\$upcase16.   balance :comma12.;
format balance dollar12.2;
cards;
Smith  checking        \$1,000.00
Smith  Savings        \$4,000.00
Smith  mortgage      \$150,000.00
Smith  credit_card    \$500.00
Jones  checking       \$973.78
Jones  savings         \$2,613
Jones  Mortgage      .
Jones  credit_card   \$140.48
;;;;
run;
proc print;
run;
proc transpose out=wide;
by cust notsorted;
id account;
var balance;
run;
proc print;
run;

proc corr data=wide;
run;

%let lib=work;
%let set=wide;
%let analysis_unit= checking;

Please suggest on this subject with some code snippet.

Thanks again.

Regular Contributor
Posts: 168

## Re: Need help on correlation and regression

Just I'm wondering for the response.

Thanks.

Super User
Posts: 10,766

## Re: Need help on correlation and regression

Need some more data . the size of sample is too sample to do a proc corr.

Super User
Posts: 23,668

## Re: Need help on correlation and regression

You never called the macro. Re-read the paper.

There should be a line something alone gthe lines of

%marketanalysis(paramenter1, parameter2, etc...);

Regular Contributor
Posts: 168

## Re: Need help on correlation and regression

Do we've any document to implement market basket analysis with Base SAS?

Super User
Posts: 10,766

## Re: Need help on correlation and regression

Your sample size is too small to do proc corr and basket analysis .  Post some more data ?

Regular Contributor
Posts: 168

## Re: Need help on correlation and regression

Hi Ksharp,

Please see my sample data below.

John     Funded      Mortgage        12,willington road      612001

James   In-Progess  Consumer      2a,swann house          652345

Gordon   Cancelled  Home            172,haymart road        623459

John     Funded      Consumer        12,willington road      612001

Gordon   Funded   Auto            172,haymart road        623459

James    Funded     Mortgage        2a,swann house          652345

James    Funded     Home            2a,swann house          652345

With these data, I need to filter only 'funded' customers and need to find out the loan products which he/she likely to purchase. eg. Person who brought consumer loan may intersted to buy purchase auto loan or any other loan what we sell.

On the whole, we need to sell the loan products based on the current status i.e. funded. Please be informed that we sell only four loan products (auto, consumer,mortgage and home) and we need to cross sell only for funded people.

Regular Contributor
Posts: 168

Super User
Posts: 10,766

## Re: Need help on correlation and regression

Still not enough for data size.

```data cust;
infile cards expandtabs;
input cust \$ account :\$upcase16.   balance :comma12.;
format balance dollar12.2;
cards;
Smith  Auto        \$1,000.00
Smith  Consumer        \$4,000.00
J  mortgage      \$150,000.00
Smith  Mortgage    \$500.00
Jones  Auto       \$973.78
J  Auto         \$2,613
Jones  Mortgage      .
Jones  Consumer   \$140.48
John          Mortgage          612001
James     Consumer              652345
Gordon     Mortgage                   623459
John           Consumer           612001
Gordon      Auto                  623459
James         Mortgage               652345
Smi  mortgage      \$50,000.00
James  Auto    \$5020.00
Jon  Auto         \$12,613
Jon  Mortgage      23456
Jon  Consumer   \$1140.48
;;;;
run;

proc sort data=cust ;by cust;run;
proc transpose out=wide;
by cust ;
id account;
var balance;
run;
/* Check whose two variable have the highest correlated coefficient*/
proc corr data=wide;
var _numeric_;
run;

/*The following is Basket Analysis. Select the highest frequency */
data temp;
set wide;
array x{*} _numeric_;
do i=1 to dim(x)-1;
do j=i+1 to dim(x);
if not missing(x{i}) and not missing(x{j}) then do;
tag=catx(' ',vname(x{i}),vname(x{j}));output;
end;
end;
end;
keep tag;
run;
proc freq data=temp order=freq;
table tag/out=want nopercent nocum;
run;
proc print noobs;run;
```

Xia Keshan

Discussion stats
• 25 replies
• 872 views
• 0 likes
• 3 in conversation