BookmarkSubscribeRSS Feed
Ghabek
Calcite | Level 5

Hello

 

I am still searching for a good dataset showcasing variable clustering in VDMML....maybe showing three or four clusters of variables to reduce multicollinearity on numeric variables.

 

any assistance would be appreciated!

 

 

6 REPLIES 6
chmedi
SAS Employee

Hi Ghabek.  You can get three clusters with the data set pva_raw_data.  This data is used in the EM Applied Analytics course.  I had to play with the Variable Clustering node properties to get the three clusters.  My changes from default:  Use default maximum number of variables per cluster:  De-select checkbox,  Number of variables per cluster upper threshold = 8,  Clustering Rho value = 0.7.

Ghabek
Calcite | Level 5
Great
Can you send me the dataset pls?
Ghabek
Calcite | Level 5

Hello

 

I ran it with your settings and still get one cluster for the variables....Im using Average Gift as the interval target to run the pipeline with the variable clustering node and your settings.

 

I get one cluster

 

Ghabek_0-1591660147136.png

 

 

Ghabek_1-1591660197701.png

 

Ghabek_2-1591660242634.png

I got the datatset from the web of 7600 records.......

 

any ideas why I cant get three clusters?

thanks!!

chmedi
SAS Employee

Yours is a different data set than that provided in the SAS course.  If you could, please contact SAS to request the data.

Ghabek
Calcite | Level 5
im a sas partner

how do that

who do i talk to

i used to work in education but left sas
chmedi
SAS Employee
Since you're a SAS Partner, why don't you work through your Partner channels? In the meantime, I have some simulated data that produces three clusters.

Log-into SAS Studio on the same system as Model Studio, and submit the following code to generate the data in Public:

cas;
caslib _ALL_ assign;

%macro makeRegressorData(nBy=1,nByFixedSize=1,nObs=100,nCont=4,
nClass=3,nLev1=3,nLev2=5,nLev3=7);
data testdata;
drop i j;
%if &nCont>0 %then %do; array x{&nCont} x1-x&nCont; %end;
%if &nClass>0 %then %do; array c{&nClass} c1-c&nClass;%end;

do by=1 to &nBy;
if by > &nByFixedSize then
nObsInBy = floor(2*ranuni(1111155)*&nObs);
else nObsInBy = &nObs;
if nObsInBy < 10 then nObsInBy = 10;

do i = 1 to nObsInBy;
%if &nCont>0 %then %do;
do j= 1 to &nCont;
x{j} = ranuni(1);
end;
%end;

%if &nClass > 0 %then %do;
do j=1 to &nClass;
if mod(j,3) = 0 then
c{j} = ranbin(1,&nLev3,.6);
else if mod(j,3) = 1 then
c{j} = ranbin(1,&nLev1,.5);
else if mod(j,3) = 2 then
c{j} = ranbin(1,&nLev2,.4);
end;
%end;

weight = 1 + ranuni(1);
freq = 1 + mod(i,3);

*** if ( i = 11 ) then x{2} = .;
*** if ( i = 12 ) then c{1} = .;

output;
end;
end;
run;
%mend;

%macro AddDepVar(modelRHS =,errorStd = 1);
data testdata;
set testdata;
y = &modelRHS + &errorStd * rannor(1);
run;
%mend;

%makeRegressorData(nBy=1,nByFixedSize=1,nObs=435,nCont=14,
nClass=11,nLev1=1,nLev2=3,nLev3=3)
/* %AddDepVar(modelRHS=2*c1 - 0.5*x1,errorStd = 1); */

data testdata2(drop=x105 x106);
set testdata;
x105 = x5;
x106 = x6;
x109=x9;
x5 = x2 + 0.49147127158235 * x3 -0.5868629022532 * x4 - 1.7447414951792 * x105;
x6 = x1 - 0.13846905761895 * x3 + 0.6800302223946 * x4 - 0.45238025995657 * x105;
x4 = x106 + 0.00146547818375 * x1 + 0.03017986135363 * x2 - 0.00820230703132 * x3 +
0.03457381414136 * x4 + 0.03738296997610 * x105;

x9 = x7 + 0.7138912 * x8 - 0.45238025995657 * x109;
x10 = x8 / 30 + 0.931261126 * x7+ 0.03738296997610 * x109;

run;

data public.test009 (promote=yes keep=x1-x10 c1);
set testdata2;
run;


Then, switch to Model Studio (Build Models), create a project with test009 (you will need to refresh the available table list). Target=c1. For the Variable Clustering node, the one
change to make to the properties is to set Maximum number of iterations (under Advanced Options) to 30.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 993 views
  • 0 likes
  • 2 in conversation