About Rick_SAS

sasgorilla · ‎07-17-2025

Sorry for the slow reply, @Rick_SAS ! Thank you for this. I will give it a whirl.

Ksharp · ‎06-19-2025

To finish the Rick's IML code with creating a WANT table: %let numbers=2,4,16,21,22,24,27,31,32,34; proc iml; x = {&numbers}; idx = allcomb(nrow(x), 6); comb = shape(x[idx,], nrow(idx)); /* then print or save to data set */ create want from comb[c=('x1':'x6')]; append from comb; close; quit;

Rick_SAS · ‎06-16-2025

Would you be willing to try an experiment that might help us diagnose the issue? I would be very interested in knowing what happens if you use WHERE clauses instead of the BY statement. Could you try the following: Remove the BY statement. Add the following statement: WHERE trtgrpn=1 AND trtgrp='A'; On the RANDOM statement, remove the SOLUTION option. Change the ODS OUTPUT statement to be ODS OUTPUT Estimates=Est_No_Soln; Run the procedure. On the RANDOM statement, specify the SOLUTION option. Change the ODS OUTPUT statement to be ODS OUTPUT Estimates=Est_Soln; Run the procedure. Compare the Est_no_Soln and Est_Soln datasets. If you get different results, please use PROC PRINT to show both datasets. If you do not get different results, please run the other three cases that are handled by the BY statement: WHERE trtgrpn=1 AND trtgrp='B'; WHERE trtgrpn=2 AND trtgrp='A'; WHERE trtgrpn=2 AND trtgrp='B'; By getting rid of the BY statement, we can focus on the subset of the data that is causing the issue.

Rick_SAS · ‎06-05-2025

> do you know if it’s possible to extract the scores using PROC PRINCOMP? The "scores" usually refer to the projection of the data onto the first two PCs. Yes, PROC PRINTCOMP provides that information. The challenge is getting the vectors for the biplots. There are four common scalings for the vectors (see my articles), and it is easiest to get the vectors from the SVD decomposition of the data matrix, X. PROC PRINCOMP gives the eigenvalues and eigenvectors of the covariance matrix of X. These are related, but it's easier to work with the SVD. If you are proficient with PROC IML and matrix computations, you can use PROC IML to perform the necessary matrix calculations for the biplots. But if you want the information from a STAT procedure, use PROC PRINQUAL. My articles show both methods.

josecorrente · ‎06-04-2025

Thank you very much for you reply. I will read the article and try in my data. Sincerely Jose Eduardo

Rick_SAS · ‎05-30-2025

No worries. Glad your problem is solved!

Ksharp · ‎05-25-2025

My suggestion for second one is using CORRB option,and remove thoese variable have collinearity, not listing these variables in PROC LOGISTIC. proc logistic data=sashelp.heart; model sex(event='Male')=_numeric_/corrb; run;

rbettinger · ‎05-07-2025

Rick, Thank you for your persistence. I will describe my problem as best as I can so that you may detect any errors in my reasoning. I am applying a fuzzy similarity measure from Lukasiewicz logic called a "normal Lukasiewicz structure". I have enclosed a partially-completed draft of a paper describing this concept. It is stated in reference [6] in the draft that is attached to this posting. I am trying to classify a set of data represented as columns of features that contain measures of a process, in this case, the well-known Wisconsin Breast Cancer data downloaded from the UCI KDD ML archives. Each column has been linearly standardized into [0,1]. I have composed an objective function based on the Lukasiewicz similarity structure to compute the parameters m and p in equation (6) in the draft. Essentially, the Lukasiewicz structure is the generalized mean of the sum of the Minkowski distances between each feature value and the class mean of that feature computed from each subset of the data that are classified into groups by their class values. Equation (6) describes the similarity between feature vector j and class mean vector i. Similarity values lie in the interval [0,1]. I use differential evolution (DE) to minimize an objective function based on the Lukasiewicz normal structure. The DE algorithm computes pairs m and p for a set of scenarios. I have specified parameter values for the DE algorithm and once all of the scenarios have been run, I choose the set of parameters that produce the maximum Enhanced Matthews Correlation Coefficient. Using this optimal pair of m and p, I score a validation dataset and use the performance metrics to measure the efficacy of the algorithm. My difficulties arise when I compute the Lukasiewicz normal structure using parameter values m and p that force the Minkowski differences computed in equation (6) to be very close to 0 or to 1. I have attached a SAS program, compute_similarity.sas, that computes the similarity between a feature vector and one or more class mean vectors. It is invoked by the module, ffss_classify_score_test.sas (also attached),which performs preliminary processing before invoking compute_similarity. I also include a PDF file containing the output of ffss_classify_score_test, which contains numerous examples of the case when ndx_pred_class = loc( sim_mat[ j, ] = max( sim_mat[ j, ] )) ; returns two values, indicating a failure to produce a unique maximum value. This result is not due to a fault in the loc() function, because sim_mat[] contains {0 0} or {1 1} when two values are produced. Rather, the process resulting in the nonuniqueness of the output of the classifier is of interest, to wit, how can I overcome the tendency of values like .9999995284293320000 .0947296688581390000 .9999999949998210000 .0640424310483060000 .9999999999892080000 .9999995534842310000 .1050134801112600000 .9999999936572220000 .1031645294170150000 .9999999992193940000 which are the contents of the compute_similarity dist[] matrix to be converted to 1’s or 0’s in subsequent computation? These are the inputs to compute_similarity: x .2574131800000000000 1.000000000000000000 .1562455800000000000 1.000000000000000000 .0796267300000000000 y .0616844800000000100 .0947296700000000000 .0587415400000000100 .0640424500000000000 .0259968300000000000 .1864923200000000000 .1050134800000000000 .1709497800000000000 .1031645300000000000 .1276107500000000000 m .0000170200000000000 p 9.106516920000000000 The dist matrix represents the Lukasiewicz similarities between each element of the feature and the respective elements of the two class means to which the feature is being compared. dist .9999995284293320000 .0947296688581390000 .9999999949998210000 .0640424310483060000 .9999999999892080000 .9999995534842310000 .1050134801112600000 .9999999936572220000 .1031645294170150000 .9999999992193940000 The rslt vector contains the similarities of the feature vector to each class mean. In this case, there appears to be no similarity, which I surmise is due to the limitations of finite precision arithmetic. rslt 0.000000000000000000 0.000000000000000000

IanWakeling · ‎05-07-2025

May be you could find an approximation to the probit function that was faster. There is this stackexchange thread that might be useful, in particular the 3rd answer has an approximation and a link to a paper that discusses more options. Or perhaps a hybrid approach - if a lot of your probability values are small, it might be possible to find a very good and fast approximation for values less than some threshold (1E-4 say), and then use the probit function for everything else.

Season · ‎04-24-2025

Weighting is not necessarily needed in logistic regression, unless you are modeling complex survey data or dealing with rare events. See the documentation of PROC SURVEYLOGISTIC for more information of the former and Weighted logistic regression for large-scale imbalanced and rare events data - ScienceDirect and Improving performance of hurdle models using rare-event weighted logistic regression: an application to maternal mortality data - PMC for more information of the latter.

Astounding · ‎04-03-2025

Not the same thing but, it's my favorite "audience bet". Playig backgammon, and rolling two dice, rolling "doubles" is advantageious (1:1, 2:2, ... 6:6). Chances of doubles on a single roll is 1 out of 6. I claim that my opponent is exceedingly lucky and if he rolls the dice only 5 times, he will roll doubles at some point. The math is much easier than the question with three colors of balls.

PaigeMiller · ‎03-24-2025

Euclidean norm can have a value of 1, or some other value. Any norm can have a value of 1 or some other value. If you divide the weights by the norm, then they should produce a vector with norm of 1. SAS is obviously not dividing the weights by the norm. Python must be dividing the weights by the norm. It's optional whether a PLS program does this or not, because it doesn't affect the predicted values or the model fit. SAS obviously applies the scaling factor later in the algorithm than Python does. So I conclude that SAS and Python are calculating the weights the exact same way (is that what you need to know?) and then Python scales them but SAS doesn't. Here is simple data step code which finds the Euclidean norm of the weights, and then re-scales the weights by dividing by the norm, so that you can see the difference, and how after you do the division, the norm becomes 1. DATA regress; INPUT Y X1 X2 X3 X4 X5; DATALINES; 7 0 23 3 4 1 8 2 7 2 3 2 2 0 8 8 3 3 6 0 9 2 5 4 5 0 1 5 2 5 ; RUN; PROC PLS DATA=regress nfac = 1 details varss ; ods output XWeights = work.pls_xweights; MODEL Y = X1 X2 X3 X4 X5; RUN; data ssq; set pls_xweights; norm_x=sqrt(uss(of x1-x5)); y1=x1/norm_x; y2=x2/norm_x; y3=x3/norm_x; y4=x4/norm_x; y5=x5/norm_x; norm_y=sqrt(uss(of y1-y5)); run;

VivekRSingh · ‎03-24-2025

Good way to look at the data. Thank you!!

Werner_69 · ‎03-23-2025

Thanks so much. There was no reason found/no true solution. For the time being I'm using a two step approach : 1. Download 2. Open with SAS Universal viewer + PRINT It's the best I can get right now. Werner_69

Ksharp · ‎03-18-2025

As @StatDave said it is a zero-inflation model generally suited to COUNT data ,not a continous data. Check this brand-new session: https://communities.sas.com/t5/SAS-Communities-Library/Making-Zero-Inflation-Count/ta-p/962019/jump-to/first-unread-message

Online Status	Online
Date Last Visited	Friday