Rick,
Thank you for your persistence. I will describe my problem as best as I can so that you may detect any errors in my reasoning.
I am applying a fuzzy similarity measure from Lukasiewicz logic called a "normal Lukasiewicz structure". I have enclosed a partially-completed draft of a paper describing this concept. It is stated in reference [6] in the draft that is attached to this posting.
I am trying to classify a set of data represented as columns of features that contain measures of a process, in this case, the well-known Wisconsin Breast Cancer data downloaded from the UCI KDD ML archives. Each column has been linearly standardized into [0,1]. I have composed an objective function based on the Lukasiewicz similarity structure to compute the parameters m and p in equation (6) in the draft. Essentially, the Lukasiewicz structure is the generalized mean of the sum of the Minkowski distances between each feature value and the class mean of that feature computed from each subset of the data that are classified into groups by their class values. Equation (6) describes the similarity between feature vector j and class mean vector i. Similarity values lie in the interval [0,1].
I use differential evolution (DE) to minimize an objective function based on the Lukasiewicz normal structure. The DE algorithm computes pairs m and p for a set of scenarios. I have specified parameter values for the DE algorithm and once all of the scenarios have been run, I choose the set of parameters that produce the maximum Enhanced Matthews Correlation Coefficient. Using this optimal pair of m and p, I score a validation dataset and use the performance metrics to measure the efficacy of the algorithm.
My difficulties arise when I compute the Lukasiewicz normal structure using parameter values m and p that force the Minkowski differences computed in equation (6) to be very close to 0 or to 1.
I have attached a SAS program, compute_similarity.sas, that computes the similarity between a feature vector and one or more class mean vectors. It is invoked by the module, ffss_classify_score_test.sas (also attached),which performs preliminary processing before invoking compute_similarity. I also include a PDF file containing the output of ffss_classify_score_test, which contains numerous examples of the case when
ndx_pred_class = loc( sim_mat[ j, ] = max( sim_mat[ j, ] )) ;
returns two values, indicating a failure to produce a unique maximum value. This result is not due to a fault in the loc() function, because sim_mat[] contains {0 0} or {1 1} when two values are produced. Rather, the process resulting in the nonuniqueness of the output of the classifier is of interest, to wit, how can I overcome the tendency of values like
.9999995284293320000
.0947296688581390000
.9999999949998210000
.0640424310483060000
.9999999999892080000
.9999995534842310000
.1050134801112600000
.9999999936572220000
.1031645294170150000
.9999999992193940000
which are the contents of the compute_similarity dist[] matrix to be converted to 1’s or 0’s in subsequent computation?
These are the inputs to compute_similarity:
x
.2574131800000000000 1.000000000000000000 .1562455800000000000 1.000000000000000000 .0796267300000000000
y
.0616844800000000100 .0947296700000000000 .0587415400000000100 .0640424500000000000 .0259968300000000000
.1864923200000000000 .1050134800000000000 .1709497800000000000 .1031645300000000000 .1276107500000000000
m
.0000170200000000000
p
9.106516920000000000
The dist matrix represents the Lukasiewicz similarities between each element of the feature and the respective elements of the two class means to which the feature is being compared.
dist
.9999995284293320000 .0947296688581390000 .9999999949998210000 .0640424310483060000 .9999999999892080000
.9999995534842310000 .1050134801112600000 .9999999936572220000 .1031645294170150000 .9999999992193940000
The rslt vector contains the similarities of the feature vector to each class mean. In this case, there appears to be no similarity, which I surmise is due to the limitations of finite precision arithmetic.
rslt
0.000000000000000000 0.000000000000000000
... View more