About rbettinger

rbettinger · ‎08-14-2025

You wrote: "My suspicion is that the one of the stored modules is calling ffss_classify with the old syntax. But that error should only appear if the ffss_classify is called or if a second function that calls ffss_classify is being stored." I checked all of my modules and found one that called FFSS_CLASSIFY with 5 parameters and not 6. I updated it and tried to run my code suite again. Then I said to myself, "No more Mr. Nice Guy. It's time for a cold boot" and I ran all of my modules again to update all of them in the catalog. Voila! Things began to work properly again. So I am moving forward. Thank you so much! Ross

rbettinger · ‎08-13-2025

Update to previous message: I changed the name of the IML catalog to featclus.featclus_new from featclus.featclus and submitted the code below. Mirabile dictu! Here is the log of the interpreted modules: 1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 68 69 options nonotes nosymbolgen ; 70 71 options nonotes ; 72 73 proc iml ; 74 reset storage=featclus.featclus_test ; 75 load module=_all_ ; 76 77 start ffss_classify( data /* matrix containing feature population, class variable */ 78 , m /* power parameter m for generalized mean */ 79 , p /* power parameter p for Minkowski distance function */ 80 , dsn_out /* output dataset of feature data, actual and pred class values */ 81 , names_in /* names of features + class_label in data matrix */ 82 ) ; 83 84 return results ; 85 finish ffss_classify ; 86 87 store module=ffss_classify ; 88 quit ; 89 90 proc iml ; 91 reset storage=featclus.featclus_test ; 92 load module=_all_ ; 93 94 start ffss_classify( data /* matrix containing feature population, class variable */ 95 , m /* power parameter m for generalized mean */ 96 , p /* power parameter p for distance function */ 97 , dsn_ideals /* input dataset containing ideal vectors */ 98 , dsn_out /* output dataset of feature data, actual and pred class values */ 99 , names_in /* names of features + class_label in data matrix */ 100 ) ; 101 return results ; 102 finish ffss_classify ; 103 104 /* store module=ffss_classify ; */ 105 quit ; 106 107 108 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 118 So if all that I did was to cause the modules to be stored in a new catalog, does that action suggest that there is something wrong in the IML function that updates modules? I tried "remove module=ffss_classify" before defining the 6-parameter module, but to no avail. Please tell me where I have gone wrong. This error has never happened to me before. Thanks, Ross

rbettinger · ‎08-13-2025

Rick, I have attached two SAS programs, each one defining a SAS/IML module. The first one has 5 parameters and has no errors--it is merely the empty body of the module. The second one has 6 parameters and one error: the "required argument is missing". But the second module is not invoked! so I am at a loss to understand why there is any interpretation error. Does SAS/IML check modules for compatibility prior to invocation? If not, why the error? Also, If I remove a module from a SAS/IML catalog (remove=ffss_classify) and then define it again, was any trace of the old module left behind to in some way influence the definition of a new module with the same name? For your lectorial convenience, here are the modules as embedded code in this question: options nonotes ; proc iml ; reset storage=featclus.featclus ; load module=_all_ ; start ffss_classify( data /* matrix containing feature population, class variable */ , m /* power parameter m for generalized mean */ , p /* power parameter p for Minkowski distance function */ , dsn_out /* output dataset of feature data, actual and pred class values */ , names_in /* names of features + class_label in data matrix */ ) ; return results ; finish ffss_classify ; store module=ffss_classify ; quit ; --------------------------------------------------------------------------------------------------- 69 options nonotes ; 70 71 proc iml ; 72 reset storage=featclus.featclus ; 73 load module=_all_ ; ERROR: Too many arguments for function FFSS_CLASSIFY. 74 75 start ffss_classify( data /* matrix containing feature population, class variable */ 76 , m /* power parameter m for generalized mean */ 77 , p /* power parameter p for distance function */ 78 , dsn_ideals /* input dataset containing ideal vectors */ 79 , dsn_out /* output dataset of feature data, actual and pred class values */ 80 , names_in /* names of features + class_label in data matrix */ 81 ) ; 174 175 return results ; 176 finish ffss_classify ; 177 178 /* store module=ffss_classify ; */ 179 quit ;

rbettinger · ‎08-12-2025

I defined an IML module named FFSS_CLASSIFY. It originally had much more code than shown below, but I removed everything except the start/finish framework and the reset, load module, and store module environment to simplify my question. The question is: why am I getting the error "ERROR: A required argument is missing in the call to function FFSS_CLASSIFY."? The code below does not generate any errors, but neither does it do anything useful. Once I start to uncomment the parameters, errors show up and I cannot understand why. If I uncomment the reset, load, or store statements, errors show up. Please tell me that there is an easy solution so that I can watch my fingers dance on the keyboard once again. TYVM, Ross proc iml ; /* reset storage=featclus.featclus ; */ /* load module=_all_ ; */ start ffss_classify( data /* , m */ /* , p */ /* , dsn_ideals */ /* , dsn_out */ /* , names_in */ ) ; /* empty function body */ return results ; finish ffss_classify ; /* store module=ffss_classify ; */ quit ;

rbettinger · ‎06-29-2025

Thank you, StatDave, for putting so much time and effort into producing a complete answer to my question. Ross

rbettinger · ‎06-26-2025

Here, in three tables, is what I am trying to do: These tables were created by my classifier. I want to compare the classifier performance to the logistic regression performance, so I thought that by computing LR results for "1 species vs the other two", I can form a meaningful comparison. Am I explaining myself clearly?

rbettinger · ‎06-25-2025

Thank you, StatDave for asking me to refine my question. My goal is to compare the performance of a classifier algorithm to classification by logistic regression in the multinary case. When I have a class variable that has n categories, the classification matrix will have n rows and columns. When n = 2, we have the usual classification matrix, and we have the usual TP, FP, FN, TN frequency counts and corresponding statistics like accuracy, precision, recall, and specificity. But when n > 2, life becomes more interesting. I don't know of any corresponding statistics for n > 2 as for when n = 2, so if I compute classification matrices for, e.g., class1 vs aggregated (class2 and class3), I have reduced the multinary problem to n = 2 and can compute the usual stats. I can do the same thing for class 2 vs aggregated (class1 and class3), etc. In this context, "aggregation" means that frequency counts for class1 are compared to frequency counts for class2 and class3 grouped into a single category, so we have the expression "class1 vs the rest" to describe the n=2 classification matrix created. If I have muddled the statistical waters with the word "power", I apologize. There is no power of test involved here. I just want to simplify the problem to n=2 using logistic regression to summarize the relationship between the designated class and any other classes.

rbettinger · ‎06-25-2025

Thank you for a prompt reply, Eduardo. I am trying to compare the performance of the indicated class (Class1, Class2, Class3) with all of the other classes that are "noise" compared to the "signal" that the indicated class represents. There are > 2 classes and I want to represent one of them as the target value , e.g., (Class1 = ( species = 1 )) and the data for the other two species is then grouped into "noise" so that the logistic regression algorithm will find the patterns in the "signal" and extract information from the "noise" of the other two classes.

rbettinger · ‎06-25-2025

I want to convert a dataset containing a class variable with > 2 levels into a dataset containing a class variable with only 2 levels. For example, data two_class ; set SASHELP.iris ; class1 = upcase( species ) = 'IRIS SETOSA' ; class2 = upcase( species ) = 'IRIS VERSICOLOR' ; class3 = upcase( species ) = 'IRIS VIRGINICA' ; run ; proc logistic data=two_class ; model class1( event='1' ) = SepalWidth SepalLength PetalWidth PetalLength ; run ; /* same code but different model statement */ model class2( event='1' ) = < same 4 variables > ; /* and for class3 */ mode class3( event='1' ) = < same 4 variables > ; Is this good practice? Is there a better way of finding the one-vs-the-rest power of a single species? Thanks for your suggestions, Ross

rbettinger · ‎05-07-2025

Rick, Thank you for your persistence. I will describe my problem as best as I can so that you may detect any errors in my reasoning. I am applying a fuzzy similarity measure from Lukasiewicz logic called a "normal Lukasiewicz structure". I have enclosed a partially-completed draft of a paper describing this concept. It is stated in reference [6] in the draft that is attached to this posting. I am trying to classify a set of data represented as columns of features that contain measures of a process, in this case, the well-known Wisconsin Breast Cancer data downloaded from the UCI KDD ML archives. Each column has been linearly standardized into [0,1]. I have composed an objective function based on the Lukasiewicz similarity structure to compute the parameters m and p in equation (6) in the draft. Essentially, the Lukasiewicz structure is the generalized mean of the sum of the Minkowski distances between each feature value and the class mean of that feature computed from each subset of the data that are classified into groups by their class values. Equation (6) describes the similarity between feature vector j and class mean vector i. Similarity values lie in the interval [0,1]. I use differential evolution (DE) to minimize an objective function based on the Lukasiewicz normal structure. The DE algorithm computes pairs m and p for a set of scenarios. I have specified parameter values for the DE algorithm and once all of the scenarios have been run, I choose the set of parameters that produce the maximum Enhanced Matthews Correlation Coefficient. Using this optimal pair of m and p, I score a validation dataset and use the performance metrics to measure the efficacy of the algorithm. My difficulties arise when I compute the Lukasiewicz normal structure using parameter values m and p that force the Minkowski differences computed in equation (6) to be very close to 0 or to 1. I have attached a SAS program, compute_similarity.sas, that computes the similarity between a feature vector and one or more class mean vectors. It is invoked by the module, ffss_classify_score_test.sas (also attached),which performs preliminary processing before invoking compute_similarity. I also include a PDF file containing the output of ffss_classify_score_test, which contains numerous examples of the case when ndx_pred_class = loc( sim_mat[ j, ] = max( sim_mat[ j, ] )) ; returns two values, indicating a failure to produce a unique maximum value. This result is not due to a fault in the loc() function, because sim_mat[] contains {0 0} or {1 1} when two values are produced. Rather, the process resulting in the nonuniqueness of the output of the classifier is of interest, to wit, how can I overcome the tendency of values like .9999995284293320000 .0947296688581390000 .9999999949998210000 .0640424310483060000 .9999999999892080000 .9999995534842310000 .1050134801112600000 .9999999936572220000 .1031645294170150000 .9999999992193940000 which are the contents of the compute_similarity dist[] matrix to be converted to 1’s or 0’s in subsequent computation? These are the inputs to compute_similarity: x .2574131800000000000 1.000000000000000000 .1562455800000000000 1.000000000000000000 .0796267300000000000 y .0616844800000000100 .0947296700000000000 .0587415400000000100 .0640424500000000000 .0259968300000000000 .1864923200000000000 .1050134800000000000 .1709497800000000000 .1031645300000000000 .1276107500000000000 m .0000170200000000000 p 9.106516920000000000 The dist matrix represents the Lukasiewicz similarities between each element of the feature and the respective elements of the two class means to which the feature is being compared. dist .9999995284293320000 .0947296688581390000 .9999999949998210000 .0640424310483060000 .9999999999892080000 .9999995534842310000 .1050134801112600000 .9999999936572220000 .1031645294170150000 .9999999992193940000 The rslt vector contains the similarities of the feature vector to each class mean. In this case, there appears to be no similarity, which I surmise is due to the limitations of finite precision arithmetic. rslt 0.000000000000000000 0.000000000000000000

rbettinger · ‎05-05-2025

Thank you for replying. In the interests of using my time well, I am going to avoid trying to solve this problem by rewriting the code that produces it.

rbettinger · ‎05-05-2025

Thank you for replying. In the interests of using my time well, I am going to avoid trying to solve this problem by rewriting the code that produces the problem.

rbettinger · ‎05-05-2025

Thank you for replying. In the interests of using my time well, I am going to avoid trying to solve this problem by rewriting the code that produces the problem. Let's say that I am using Captain Kirk's Kobayashi Maru solution, e.g., he reprogrammed the simulation that was designed to cause him to fail.

rbettinger · ‎05-04-2025

I am trying to use SAS/IML to perform computations with what in some cases are very small numbers, e.g., less than constant('maceps') and am stymied by my inability to make the expression ndx = loc( vector= max( vector )) return only one value instead of > 1 . When, for example, vector = {8.372E-26 4.63E-103}, the variable ndx will contain {0 0} because the two values in vector are smaller than constant('maceps'), which is 2.2e-16. I am enclosing the code of an IML module which demonstrates what I want to do, and have included a listing of the inputs and outputs to and from the module for your perusal. For small values of m and p, e.g., .01, computing ( x ## p - y ## p ) ## ( 1/p ) produces values close to 1, and I see that the results include 0 in the output, or some very small number such that computing 1 + number > 1 results in False, e.g., 0, and not True, e.g., 1 . I will be appreciative for any suggestions because I have tried strategies such as scaling the x and y values by constant('maceps'') or constant('small') but to no avail. Perturbing x[ j ] and y[ i, j ] by adding a uniform random variate in [0,1] # constant('maceps') similarly fails when 1/p is large, e.g., 1/p >> 1. For example, proc iml ; a =8.372E-26 ; b= 4.63E-103 ; p = .01 ; c = ( a ## p + b ## p ) ## (1/p) ; print c ; quit ; ---------- Output: --------- c 4.985E-19 TIA, Ross options nonotes ; proc iml ; reset storage=featclus.featclus ; load module=_all_ ; start compute_similarity( x, y, m, p ) ; /* compute unweighted Lukasiewicz ("normal" (unweighted) Lukasiewicz structure) similarity * * purpose: compute similarity between feature vector j, class mean(s) i * * parameters: * x ::= 1 x n_feature vector of features, each column normalized into [0, 1] * * y ::= n_class x n_feature matrix of class means, produced from normalized x features */ if nrow( x ) ^= 1 then run error( 'compute_similarity', 'Feature vector x must be row vector' ) ; n_classes = nrow( y ) ; /* # of class means in class mean matrix */ n_feat = ncol( x ) ; /* # of features in x vector */ dist = j( n_classes, n_feat, . ) ; similarity = j( 1, n_classes, . ) ; do i = 1 to n_classes ; do j = 1 to n_feat ; /* compute similarity btwn x, y for each feature element x[ j ], class mean value y[ i, j ] */ dist[ i, j ] = ( 1 - abs( x[ j ] ## p - y[ i, j ] ## p )) ## ( 1 / p ) ; end ; /* j */ end ; /* i */ /* compute similarity btwn x, y for each class mean * dist[ i , : ]` is mean of row vector for class mean i */ similarity = dist[ , : ]` ## ( 1 / m ) ; return similarity ; finish compute_similarity ; x = { .1 .2, .3 .4, .5 .6, .7 .8 } ; if any( x = 0 ) then x[ loc( x=0 ) ] = constant( 'maceps' ) ; y = { .15 .35, .351 .45 } ; print x y ; rslt = compute_similarity( x[1,], y, 1,1 ) ; print rslt ; rslt = compute_similarity( x[2,], y, .5, .5 ) ; print rslt ; rslt = compute_similarity( x[3,], y, .001, .01 ) ; print rslt ; rslt = compute_similarity( x[4,], y, .05, .05 ) ; print rslt ; rslt = compute_similarity( x[1,], y, 0.000024943605419, 4.148552428951820) ; print rslt ; ***store module=compute_similarity ; quit ; ---------- Output: ---------- x y 0.1 0.2 0.15 0.35 0.3 0.4 0.351 0.45 0.5 0.6 0.7 0.8 rslt 0.9 0.7495 rslt 0.6600432 0.8439022 rslt 0 6.03E-139 rslt 2.496E-10 3.9649E-6 rslt 8.372E-26 4.63E-103

rbettinger · ‎03-17-2025

Your DO-Loop posting is a very clear, complete walk-through (traceback? ;-| ) of debugging an IML module. Thanks, Rick!

Online Status	Offline
Date Last Visited	‎08-17-2025 09:16 PM