Create normalize distances from euclidean distances

Accepted Solution Solved
Reply
Contributor
Posts: 62
Accepted Solution

Create normalize distances from euclidean distances

Hello,

I've created euclidean distances with proc distance:

proc distance data=have out=want method=euclid;

var interval (w_Smiley Happy; /*  weights start with w_ */

id bank_id;

by_rate_ year;

run;

I want to transform euclidean distances into normalized ones (i.e. that vary between 0 and 1).

I've read on the website that I can add directly the norm option with the proc distance to normalize. How does it work? Because my data are already weights ready to ''enter'' in the proc distance. I do not see how I can include a norm option in the proc distance. I must certainly ''normalize'' my data in previous tables. My previous table has mainly 11 columns. I have a column for an identification number and 10 columns (for 10 different industries) which are amounts. I sum each amount by industry, by identification number and by year. Also, I've created a total from which I create ten weights thereafter.

Alternatively, I've tried to create plots of ''normal'' variables, to normalize with a proc standard, ... I do not have succeeded in this way.

How can I incorporate the norm option for normalizing my distances? It seems that this is the simplest way to procede.

Thank you in advance.


Accepted Solutions
Solution
‎08-04-2014 01:16 PM
Trusted Advisor
Posts: 1,228

Re: Create normalize distances from euclidean distances

I would say just exclude those variables from the analysis that have mean and standard deviation equal to 0

View solution in original post


All Replies
Trusted Advisor
Posts: 1,228

Re: Create normalize distances from euclidean distances

Hi,

Just add /std=std in var statement.

proc distance data=have out=want method=euclid;

var interval (w_:/std=std); /*  weights start with w_ */

id bank_id;

by_rate_ year;

run;

Contributor
Posts: 62

Re: Create normalize distances from euclidean distances

Hi,

Thanks for your answer.

I obtain this warning in the log:

WARNING: The scale estimator for variable W_PUBLIC is less than or equal to 0.  W_PUBLIC will not be standardized.

I compare the two outputs (the first with my initial code and the second with your suggestion) and I obtain exactly the same distances.

Trusted Advisor
Posts: 1,228

Re: Create normalize distances from euclidean distances

Not sure, what is contained in W_Public variable? Is that interval variable?

Contributor
Posts: 62

Re: Create normalize distances from euclidean distances

I will show you what I've previously done with my data. I think it will be clearer.

1) I've a table that looks like that:

_rate_year LenderIDIndustrie_AGIndustrie_MINIndustrie_CONIndustrie_TRANIndustrie_WHOLE Industrie_REIndustrie_FINIndustrie_SERIndustrie_PUBLICIndustrie_Other
0,11995235000000000000000
0,119954507868969000000000
0,1199530000760000000000
0,11995500000890655550000000
0,11995677000590000000000000

So, it's a table that contains loans (Only one lender ''lend'' to a borrower. It contains bilateral conventional loans) . I have ten industries, eighteen years and eight sample rates (it's for a subsequent sampling). The variable LenderID identifies each lender.

2) I've applied this code to sum

proc means data=pf15 noprint;

var Industrie_AG Industrie_MIN Industrie_CON Industrie_TRAN Industrie_WHOLE Industrie_RE Industrie_FIN Industrie_SER Industrie_PUBLIC Industrie_Other;

outputout=pf15_1(drop=_type_ _freq_)

sum(Industrie_AG Industrie_MIN Industrie_CON Industrie_TRAN Industrie_WHOLE Industrie_RE Industrie_FIN Industrie_SER Industrie_PUBLIC Industrie_Other)=sum_AG sum_MIN sum_CON sum_TRAN sum_WHOLE sum_RE sum_FIN sum_SER sum_PUBLIC sum_Other;

by _rate_ year LenderID;

run;

So, I've added each amount by _rate_, year and LenderID. So, I have aggregate amounts by industry.

3) I've created a total (first code) and weights (second code).

proc sql;

create table pf15_2 as

select *, sum(sum_AG,sum_MIN,sum_CON,sum_TRAN, sum_WHOLE, sum_RE, sum_FIN, sum_SER, sum_PUBLIC, sum_other) as sum_tot

from pf15_1;

  quit;

proc sql;

create table pf15_3 as

select distinct _rate_, year, LenderID, (sum_AG/sum_tot)as W_AG, (sum_MIN/sum_tot) as W_MIN, (sum_CON/sum_tot) as W_CON,  (sum_TRAN/sum_tot) as W_TRAN,

(sum_WHOLE/sum_tot) as W_WHOLE, (sum_RE/sum_tot) as W_RE, (sum_FIN/sum_tot) as W_FIN, (sum_SER/sum_tot) as W_SER,(sum_PUBLIC/sum_tot) as W_PUBLIC,  

sum_Other/sum_tot) as W_Other

from pf15_2;

quit;

4) We arrive to the code I initially present. I create a character id with the variable LenderID (first code) and I use the proc distance (second code).

/*step1 create a character id*/

data pf15_5;

set pf15_4;

bank_id="_"||put(LenderID, best8.);

run;

proc distance data=pf15_5out=pf15_6 method=euclid;

var interval; /* use w_: to represent the group of all variables starting with w_*/

id bank_id;

by _rate_ year;

run;

I've transformed my data to use the proc distance in this way. I must certainly normalize in a previous table, I guess.

Contributor
Posts: 62

Re: Create normalize distances from euclidean distances

A little error...

data pf15_4;

set pf15_3;

bank_id="_"||put(LenderID, best8.);

run;

Trusted Advisor
Posts: 1,228

Re: Create normalize distances from euclidean distances

Thanks for presenting the problem in detail. Let us go back to main question to feed standardize variables to proc distance. If you could provide info on W_PUBLIC variable (causing warning message) would help to understand why you are getting warning message as a result of proc distance. Proc stdize will generate the same warning message if W_PUBLIC is not eligible to standardize. I would suggest use proc means for W_PUBLIC to get the following

proc means data=pf15_3 n nmiss mean std;

var W_PUBLIC;

run;

Contributor
Posts: 62

Re: Create normalize distances from euclidean distances

This is precisely the warnings I obtain from the log. See summary above.

proc distance data=pf15_5 out=pf15_6 method=euclid;

var interval (w_:/std=std); /* use w_: to represent the group of all variables starting with w_*/

id bank_id;

by _rate_year;

run;

WARNING: The scale estimator for variable W_PUBLIC is less than or equal to 0.  W_PUBLIC will not be standardized.

WARNING: Because some variables can not be standardized, PROC DISTANCE will not compute the distance matrix. Choose other standardization methods for those variables.

NOTE: The above message was for the following BY group: _RATE_=0.1 Year=2005

WARNING: The scale estimator for variable W_PUBLIC is less than or equal to 0.  W_PUBLIC will not be standardized.

WARNING: Because some variables can not be standardized, PROC DISTANCE will not compute the distance matrix. Choose other standardization methods for those variables.

NOTE: The above message was for the following BY group: _RATE_=0.1 Year=2006

WARNING: The scale estimator for variable W_PUBLIC is less than or equal to 0.  W_PUBLIC will not be standardized.

WARNING: Because some variables can not be standardized, PROC DISTANCE will not compute the distance matrix. Choose other standardization methods for those variables.

NOTE: The above message was for the following BY group: _RATE_=0.1 Year=2009

WARNING: The scale estimator for variable W_PUBLIC is less than or equal to 0.  W_PUBLIC will not be standardized.

WARNING: Because some variables can not be standardized, PROC DISTANCE will not compute the distance matrix. Choose other standardization methods for those variables.

NOTE: The above message was for the following BY group: _RATE_=0.1 Year=2012

WARNING: The scale estimator for variable W_PUBLIC is less than or equal to 0.  W_PUBLIC will not be standardized.

WARNING: Because some variables can not be standardized, PROC DISTANCE will not compute the distance matrix. Choose other standardization methods for those variables.

NOTE: The above message was for the following BY group:_RATE_=0.25 Year=2012

WARNING: The scale estimator for variable W_PUBLIC is less than or equal to 0.  W_PUBLIC will not be standardized.

WARNING: Because some variables can not be standardized, PROC DISTANCE will not compute the distance matrix. Choose other standardization methods for those variables.

NOTE: Th above message was for the following BY group: _RATE_=0.33333 Year=2005

WARNING: The scale estimator for variable W_PUBLIC is less than or equal to 0.  W_PUBLIC will not be standardized.

WARNING: Because some variables can not be standardized, PROC DISTANCE will not compute the distance matrix. Choose other standardization methods for those variables.

NOTE: The above message was for the following BY group: _RATE_=0.33333 Year=2012

WARNING: The scale estimator for variable W_PUBLIC is less than or equal to 0.  W_PUBLIC will not be standardized.

WARNING: Because some variables can not be standardized, PROC DISTANCE will not compute the distance matrix. Choose other standardization methods for those variables.

NOTE: The above message was for the following BY group: _RATE_=0.5 Year=2005

WARNING: The scale estimator for variable W_PUBLIC is less than or equal to 0.  W_PUBLIC will not be standardized.

WARNING: Because some variables can not be standardized, PROC DISTANCE will not compute the distance matrix. Choose other standardization methods for those variables.

NOTE: The above message was for the following BY group: _RATE_=0.5 Year=2006

WARNING: The scale estimator for variable W_PUBLIC is less than or equal to 0.  W_PUBLIC will not be standardized.

WARNING: Because some variables can not be standardized, PROC DISTANCE will not compute the distance matrix. Choose other standardization methods for those variables.

NOTE: The above message was for the following BY group: _RATE_=0.5 Year=2012

WARNING: The scale estimator for variable W_PUBLIC is less than or equal to 0.  W_PUBLIC will not be standardized.

WARNING: Because some variables can not be standardized, PROC DISTANCE will not compute the distance matrix. Choose other standardization methods for those variables.

NOTE: The above message was for the following BY group: _RATE_=0.66667 Year=2012

 

WARNING: The scale estimator for variable W_PUBLIC is less than or equal to 0.  W_PUBLIC will not be standardized.

WARNING: Because some variables can not be standardized, PROC DISTANCE will not compute the distance matrix. Choose other standardization methods for those variables.

NOTE: The above message was for the following BY group: _RATE_=0.75 Year=2012

WARNING: The scale estimator for variable W_PUBLIC is less than or equal to 0.  W_PUBLIC will not be standardized.

WARNING: Because some variables can not be standardized, PROC DISTANCE will not compute the distance matrix. Choose other standardization methods for those variables.

NOTE: The above message was for the following BY group: _RATE_=0.9 Year=2012

WARNING: The scale estimator for variable W_PUBLIC is less than or equal to 0.  W_PUBLIC will not be standardized.

WARNING: Because some variables can not be standardized, PROC DISTANCE will not compute the distance matrix. Choose other standardization methods for those variables.

NOTE: The above message was for the following BY group:_RATE_=1 Year=2012

NOTE:
OUT= data set is not created.

NOTE: PROCEDURE DISTANCE used (Total process time):

      real time  0.69 seconds

      cpu time  0.39 seconds

I have _rate_ which takes the values 0.1, 0.25, 0.3333, 0.50, 0.66667, 0.75, 0.9 and 1. It came from samplings (_rate_ is the sampling rate).

So, problems arise from W_PUBLIC (_rate_ 0.1, 0.25,0.3333, 0.50, 0.66667, 0.75, 0.9 and 1) from different years:

_rate_=0.1 => 2005, 2006, 2009 and 2012

_rate_=0.25 => 2012

_rate_=0.33333 => 2005 and 2012

_rate_=0.50 => 2005, 2006 and 2012

_rate_=0.66667 => 2012

_rate_=0.75=> 2012

_rate_=0.90=> 2012

_rate_=1  => 2012

Because W_PUBLIC gives warnings for different sampling rates (_rate_) and not always the same years, I execute your code by _rate_  year:

procmeans data=pf15_3n nmiss meanstd;

var W_PUBLIC;

by _rate_ year;

run;

 

The output is huge. See summary above.

The SAS
  System
The
  MEANS Procedure
_RATE_=0.1 Year=1995
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00815100.0167272
_RATE_=0.1
  Year=1996
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.02515480.0340587
_RATE_=0.1
  Year=1997
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01977150.0209811
_RATE_=0.1
  Year=1998
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00141560.0027333
_RATE_=0.1
  Year=1999
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00953750.0183058
_RATE_=0.1
  Year=2000
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01259070.0205180
_RATE_=0.1
  Year=2001
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00166660.0031349
_RATE_=0.1
  Year=2002
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.03716160.0744343
_RATE_=0.1
  Year=2003
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01731200.0424055
_RATE_=0.1
  Year=2004
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00510480.0125040
_RATE_=0.1
  Year=2005
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
6000
_RATE_=0.1
  Year=2006
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
6000
_RATE_=0.1
  Year=2007
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01517000.0300848
_RATE_=0.1
  Year=2008
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.02839770.0429333
_RATE_=0.1
  Year=2009
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
6000
_RATE_=0.1
  Year=2010
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00812680.0126904
_RATE_=0.1
  Year=2011
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00828580.0135355
_RATE_=0.1
  Year=2012
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
6000
_RATE_=0.25
  Year=1995
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01470120.0184258
_RATE_=0.25
  Year=1996
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01634100.0275292
_RATE_=0.25
  Year=1997
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01334640.0154777
_RATE_=0.25
  Year=1998
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00211270.0039143
_RATE_=0.25
  Year=1999
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00499960.0062379
_RATE_=0.25
  Year=2000
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00132480.0020659
_RATE_=0.25
  Year=2001
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00742480.0100632
_RATE_=0.25
  Year=2002
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00677910.0087785
_RATE_=0.25
  Year=2003
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00171360.0039769
_RATE_=0.25
  Year=2004
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.03973290.0598700
_RATE_=0.25
  Year=2005
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.02198610.0538548
_RATE_=0.25
  Year=2006
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01495990.0366442
_RATE_=0.25
  Year=2007
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00717860.0098935
_RATE_=0.25
  Year=2008
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00940420.0115324
_RATE_=0.25
  Year=2009
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.05335730.1149143
_RATE_=0.25
  Year=2010
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.02404040.0337330
_RATE_=0.25
  Year=2011
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00278590.0037937
_RATE_=0.25
  Year=2012
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
6000
_RATE_=0.33333
  Year=1995
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01048590.0125634
_RATE_=0.33333
  Year=1996
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00354800.0068107
_RATE_=0.33333
  Year=1997
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01655120.0205497
_RATE_=0.33333
  Year=1998
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00199530.0037752
_RATE_=0.33333
  Year=1999
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00114100.0018540
_RATE_=0.33333
  Year=2000
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00511170.0112255
_RATE_=0.33333
  Year=2001
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00272460.0057535
_RATE_=0.33333
  Year=2002
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00669350.0068624
_RATE_=0.33333
  Year=2003
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.0003393900.000625208
_RATE_=0.33333
  Year=2004
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00610840.0094970
_RATE_=0.33333
  Year=2005
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
6000
_RATE_=0.33333
  Year=2006
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01280830.0313739
_RATE_=0.33333
  Year=2007
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00657040.0057434
_RATE_=0.33333
  Year=2008
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.02153210.0189579
_RATE_=0.33333
  Year=2009
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00138980.0020662
_RATE_=0.33333
  Year=2010
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00653920.0084608
_RATE_=0.33333
  Year=2011
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00194340.0017686
_RATE_=0.33333
  Year=2012
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
6000
_RATE_=0.5
  Year=1995
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.04069400.0492116
_RATE_=0.5
  Year=1996
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01310050.0177264
_RATE_=0.5
  Year=1997
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.05081800.0715886
_RATE_=0.5
  Year=1998
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00354990.0075979
_RATE_=0.5
  Year=1999
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00486530.0075474
_RATE_=0.5
  Year=2000
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00788130.0091853
_RATE_=0.5
  Year=2001
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.02666250.0485095
_RATE_=0.5
  Year=2002
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.02005200.0356918
_RATE_=0.5
  Year=2003
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00465990.0088813
_RATE_=0.5
  Year=2004
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.02380680.0439375
_RATE_=0.5
  Year=2005
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
6000
_RATE_=0.5
  Year=2006
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
6000
_RATE_=0.5
  Year=2007
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01246090.0092889
_RATE_=0.5
  Year=2008
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01836430.0157922
_RATE_=0.5
  Year=2009
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.02350130.0542259
_RATE_=0.5
  Year=2010
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01758790.0126084
_RATE_=0.5
  Year=2011
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00506980.0061390
_RATE_=0.5
  Year=2012
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
6000
_RATE_=0.66667
  Year=1995
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.03377430.0313972
_RATE_=0.66667
  Year=1996
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01434650.0130083
_RATE_=0.66667
  Year=1997
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.04512690.0536985
_RATE_=0.66667
  Year=1998
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00344710.0042790
_RATE_=0.66667
  Year=1999
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00556260.0057445
_RATE_=0.66667
  Year=2000
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00757820.0088977
_RATE_=0.66667
  Year=2001
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00598080.0097095
_RATE_=0.66667
  Year=2002
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01639210.0257660
_RATE_=0.66667
  Year=2003
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00541840.0070632
_RATE_=0.66667
  Year=2004
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00694230.0081281
_RATE_=0.66667
  Year=2005
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01010800.0247594
_RATE_=0.66667
  Year=2006
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01027480.0126101
_RATE_=0.66667
  Year=2007
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01202850.0112328
_RATE_=0.66667
  Year=2008
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01822680.0079895
_RATE_=0.66667
  Year=2009
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00317030.0021238
_RATE_=0.66667
  Year=2010
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01838290.0180565
_RATE_=0.66667
  Year=2011
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00568040.0049754
_RATE_=0.66667
  Year=2012
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
6000
_RATE_=0.75
  Year=1995
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.03504590.0391202
_RATE_=0.75
  Year=1996
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01834940.0124532
_RATE_=0.75
  Year=1997
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.03759010.0455889
_RATE_=0.75
  Year=1998
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00308240.0054692
_RATE_=0.75
  Year=1999
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00598780.0058065
_RATE_=0.75
  Year=2000
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00561770.0063520
_RATE_=0.75
  Year=2001
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.02431480.0283375
_RATE_=0.75
  Year=2002
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00497380.0062295
_RATE_=0.75
  Year=2003
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00527010.0057973
_RATE_=0.75
  Year=2004
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01865190.0248487
_RATE_=0.75
  Year=2005
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00977320.0239392
_RATE_=0.75
  Year=2006
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01057750.0131181
_RATE_=0.75
  Year=2007
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01350030.0143191
_RATE_=0.75
  Year=2008
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01947310.0191022
_RATE_=0.75
  Year=2009
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.02066360.0430237
_RATE_=0.75
  Year=2010
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01664140.0132600
_RATE_=0.75
  Year=2011
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00396180.0050535
_RATE_=0.75
  Year=2012
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
6000
_RATE_=0.9
  Year=1995
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.02967860.0270912
_RATE_=0.9
  Year=1996
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01619630.0130608
_RATE_=0.9
  Year=1997
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.03567980.0405470
_RATE_=0.9
  Year=1998
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00377320.0052966
_RATE_=0.9
  Year=1999
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00560020.0049333
_RATE_=0.9
  Year=2000
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00653810.0074156
_RATE_=0.9
  Year=2001
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01856600.0195838
_RATE_=0.9
  Year=2002
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01186020.0160331
_RATE_=0.9
  Year=2003
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00836790.0093528
_RATE_=0.9
  Year=2004
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01601130.0187704
_RATE_=0.9
  Year=2005
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00965880.0236592
_RATE_=0.9
  Year=2006
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00896040.0110970
_RATE_=0.9
  Year=2007
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00949930.0037775
_RATE_=0.9
  Year=2008
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01825130.0128379
_RATE_=0.9
  Year=2009
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01866340.0346475
_RATE_=0.9
  Year=2010
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01524280.0102962
_RATE_=0.9
  Year=2011
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00413230.0036972
_RATE_=0.9
  Year=2012
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
6000
_RATE_=1
  Year=1995
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.02981760.0260602
_RATE_=1
  Year=1996
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01625990.0115162
_RATE_=1
  Year=1997
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.03425010.0369118
_RATE_=1
  Year=1998
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00365600.0051358
_RATE_=1
  Year=1999
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00557240.0049559
_RATE_=1
  Year=2000
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00653720.0075367
_RATE_=1
  Year=2001
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01788940.0189897
_RATE_=1
  Year=2002
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01200960.0145428
_RATE_=1
  Year=2003
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00794420.0087244
_RATE_=1
  Year=2004
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01478920.0175250
_RATE_=1
  Year=2005
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00956880.0234388
_RATE_=1
  Year=2006
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00890330.0109716
_RATE_=1
  Year=2007
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01281670.0115824
_RATE_=1
  Year=2008
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01743030.0124328
_RATE_=1
  Year=2009
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01905550.0356186
_RATE_=1
  Year=2010
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.01502350.0102565
_RATE_=1
  Year=2011
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
600.00424060.0032645
_RATE_=1
  Year=2012
Analysis
  Variable : W_PUBLIC
NN MissMeanStd Dev
6000

When I obtain a warning message in the log, this is because mean and standard deviation equal 0. Some years, depending on the sampling rate, I must have not even a single loan in the public industry.

It's obvious now why had multiple warning messages. Can we circumvent this problem?

Solution
‎08-04-2014 01:16 PM
Trusted Advisor
Posts: 1,228

Re: Create normalize distances from euclidean distances

I would say just exclude those variables from the analysis that have mean and standard deviation equal to 0

Contributor
Posts: 62

Re: Create normalize distances from euclidean distances

Thank you so much for your time.

And lastly, where I should use the UNDEF option in the proc distance?

(I ask because sometimes, it's surprising how much time it takes for me to include correctly a simple option into an existing code that I create. Ah, the learning curve in coding )

Trusted Advisor
Posts: 1,228

Re: Create normalize distances from euclidean distances

In proc distance options

Contributor
Posts: 62

Re: Create normalize distances from euclidean distances

Thank you!

Contributor
Posts: 62

Re: Create normalize distances from euclidean distances

Sorry again.

I've added the UNDEF option and I obtain the same warnings in the log. Maybe, it needs an accurate number in the UNEDEF option to work?

Trusted Advisor
Posts: 1,228

Re: Create normalize distances from euclidean distances

Why are you using UNDEF option?

Contributor
Posts: 62

Re: Create normalize distances from euclidean distances

I thought that this option would resolve my problem because the description of the UNDEF option is: ''specifies the numeric constant used to replace undefined distances''.

My data are not missing so the options NOMISS, REPLACE OR REPONLY are not relevant.

SAS/STAT(R) 9.22 User's Guide

I was looking in the proc distance  <options> to find a solution to exclude those variables with mean and variance equal to 0.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 25 replies
  • 886 views
  • 6 likes
  • 3 in conversation