Solved
Contributor
Posts: 49

# How to segment variable based on correlation matrix?

Hi All,

I need to do some segmentation on variables based on correlation matrix. The  correlation matrix is shown as below. If correlation coefficients of some variables in correlation matrix are larger than 0.75 or smaller than -0.75, they will be classified into one subgroup and be stored in one dataset, e.g. dataset_1. Then the variables in dataset_1 will be removed from correlation matrix. Then, if correlation coefficients of some variables in remaining correlation matrix are larger than 0.75 or smaller than -0.75,  they will be classified into another subgroup and be stored in another dataset, e.g. dataset_2. Then the variables in dataset_2 will be removed from correlation matrix....................And repeat until no variables have correlation coefficients of above 0.75 or below -0.75. The remaining variables will be stored in one datsets, dataset_n.

 _NAME_ V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V1 1 0.795283 0.648228 0.702434 0.814356 0.898034 0.72141 0.545573 0.410562 0.67573 0.79505 V2 0.795282885 1 0.785185 0.852621 0.830391 0.75202 0.817556 0.682524 0.517509 0.8671 0.995331 V3 0.648227521 0.785185 1 0.711466 0.690316 0.60089 0.716786 0.552231 0.50897 0.695645 0.824341 V4 0.702433756 0.852621 0.711466 1 0.846026 0.662565 0.748602 0.701647 0.457522 0.757802 0.853448 V5 0.81435639 0.830391 0.690316 0.846026 1 0.817881 0.772293 0.523626 0.398964 0.778101 0.829205 V6 0.898034025 0.75202 0.60089 0.662565 0.817881 1 0.707077 0.457641 0.329565 0.679927 0.750266 V7 0.721409554 0.817556 0.716786 0.748602 0.772293 0.707077 1 0.604067 0.517373 0.788003 0.819051 V8 0.545573147 0.682524 0.552231 0.701647 0.523626 0.457641 0.604067 1 0.585093 0.584004 0.681971 V9 0.410562159 0.517509 0.50897 0.457522 0.398964 0.329565 0.517373 0.585093 1 0.485006 0.522883 V10 0.675730408 0.8671 0.695645 0.757802 0.778101 0.679927 0.788003 0.584004 0.485006 1 0.860323 V11 0.795050139 0.995331 0.824341 0.853448 0.829205 0.750266 0.819051 0.681971 0.522883 0.860323 1

Is there one solution to do this?

MT.

Accepted Solutions
Solution
‎08-07-2012 08:21 AM
Super User
Posts: 10,761

## Re: How to segment variable based on correlation matrix?

Are you doing some analysis Like decision tree ?

I pick up 0.2 as correlation coefficient benchmark .

data x;

array x{*} a1-a20;

do j=1 to 100;

do i=1 to dim(x);

x{i}=ranuni(1234);

end;

output;

end;

drop i j;

run;

%let corr=0.2;

proc sql noprint;

select name into : list separated by ' '

from dictionary.columns

where libname='WORK' and memname='X';

quit;

%macro decision;

%let i=1;

%do %while(1);

proc corr data=x outp=person(where=(_TYPE_='CORR')) noprint;

var &list ;

run;

data temp;

set person;

array x{*} a: ;

length var \$ 40;

do i=1 to dim(x);

if x{i} eq 1 then leave;

else if abs(x{i}) ge &corr then do;

corr=x{i};

var=_name_; output;

var=vname(x{i}); output;

end;

end;

keep corr var;

run;

proc sql ;

create table data_&i as select distinct var from temp;

%if &sqlobs = 0 %then %do;

create table data_&i as select _name_ as var from person; quit;

%return;

%end;

select name into : list separated by ' '

from dictionary.columns

where libname='WORK' and memname='PERSON' and name not in (select distinct var from temp) and  name not in ('_NAME_' '_TYPE_');

quit;

%let i=%eval(&i+1) ;

%end;

%mend decision;

%decision

Ksharp

All Replies
Solution
‎08-07-2012 08:21 AM
Super User
Posts: 10,761

## Re: How to segment variable based on correlation matrix?

Are you doing some analysis Like decision tree ?

I pick up 0.2 as correlation coefficient benchmark .

data x;

array x{*} a1-a20;

do j=1 to 100;

do i=1 to dim(x);

x{i}=ranuni(1234);

end;

output;

end;

drop i j;

run;

%let corr=0.2;

proc sql noprint;

select name into : list separated by ' '

from dictionary.columns

where libname='WORK' and memname='X';

quit;

%macro decision;

%let i=1;

%do %while(1);

proc corr data=x outp=person(where=(_TYPE_='CORR')) noprint;

var &list ;

run;

data temp;

set person;

array x{*} a: ;

length var \$ 40;

do i=1 to dim(x);

if x{i} eq 1 then leave;

else if abs(x{i}) ge &corr then do;

corr=x{i};

var=_name_; output;

var=vname(x{i}); output;

end;

end;

keep corr var;

run;

proc sql ;

create table data_&i as select distinct var from temp;

%if &sqlobs = 0 %then %do;

create table data_&i as select _name_ as var from person; quit;

%return;

%end;

select name into : list separated by ' '

from dictionary.columns

where libname='WORK' and memname='PERSON' and name not in (select distinct var from temp) and  name not in ('_NAME_' '_TYPE_');

quit;

%let i=%eval(&i+1) ;

%end;

%mend decision;

%decision

Ksharp

🔒 This topic is solved and locked.