About Rick_SAS

Rick_SAS · ‎06-17-2024

Also, if you are performing unconstrained optimization, you do not need to define the CON matrix. You can omit that argument instead of defining it as a matrix of missing values.

Rick_SAS · ‎06-17-2024

I think you are saying that you want the symbols for the markers to be different for the different groups? If so, consider using ATTRPRIORITY=NONE ods graphics / attrpriority=none; proc sgplot data=sashelp.cars(where=(origin^="USA")); vbox mpg_city /group=origin; run; For more on this topic, see "The interaction between ATTRPRIORITY, CYCLEATTRS, and STYLEATTRS in ODS graphics," which describes all the ways that you can get the group attributes to vary.

Rick_SAS · ‎06-12-2024

For those who do not know about silhouette analysis, see "What is the silhouette statistic in cluster analysis?" As of May 2023, there was not a built-in procedure in SAS that computed the silhouette statistic. I wrote some SAS IML functions to compute the silhouette statistic and to create silhouette plots. You can see my blog post for details and examples. The code is freely available on GitHub, I don't know whether it will serve your needs, but you are welcome to it.

Rick_SAS · ‎06-11-2024

I assume X is sorted by the first column? 1. Unless you have some very extreme outliers, the MLE estimates for 4 million observations are going to be very similar to estimates for a smaller subset of the data. If possible, use a smaller subset (like 8K-10K) to obtain preliminary estimates, then use those estimates as an initial guess for the MLE applied to the whole data set. 2. I think your logic is wrong in the line if LL < 1/(2**500) then LL = 1/(2**500); LL is a vector, which means that the statement is interpreted as if ALL(LL) < 2##(-500) then... See IF-THEN logic with matrix expressions - The DO Loop (sas.com) 3. The following parameters only need to be computed once prior to calling the optimizer. They do not need to be computed during each call to the log-likelihood (LL) function. sess_credits = max(x[idx,5]`); sl_fe_scroll = max(x[idx,6]`); sl_pos_scroll = max(x[idx,7]`); max_pos = max(max(x[idx,8]`),3); wk = max(x[idx,9]`); 4. I don't think you need to subset the data every time you call the LL. For example, if you are analyzing 3 groups, you could subset X into x1, x2, x3 before calling the optimizer. You would use x1, x2, and x3 on the GLOBAL clause instead of X. If you are analyzing many groups (or you don't know how many groups you will be analyzing), you can build a list of subsets L such that L$i is the i_th subset. I suspect that precomputing the subsets of x will improve the performance.

Rick_SAS · ‎06-10-2024

What is the purpose of your study? The statistic itself is sometimes reported differently in different software. This has been discussed before. There are several different statistics that can be used for the signed rank test. See "On the computation of the Wilcoxon signed rank statistic" Regarding the p-values and continuity corrections, there is a modification to the test statistic due to Pratt, which will affect the p-values. For a discussion of that and other issues, see "Modifications of the Wilcoxon signed rank test and exact p-values." Both articles contain references.

Rick_SAS · ‎06-06-2024

I think what many people do is to use unicode characters, such as the "RIGHTWARDS ARROW" symbol, which has the Unicode code U+2192. But HOW you use it depends on what you are trying to do. Put it in a title or footnote? Use it as the value of an observation in a data set? I recommend that you perform an internet search for your terms, such as "use unicode characters in SAS reports" or something similar. There have been several papers and articles on this topic. Here is one that came up when I did a search: Useful Tips for Handling and Creating Special Characters in SAS® (pharmasug.org)

Rick_SAS · ‎06-05-2024

When you use PROC FASTCLUS, use the OUT= option to create an output data set. The output data set contains all the original observations and some new variables. Among the new variables, the CLUSTER variable specifies the cluster to which each observation is assigned. In your example, the CLUSTER variable will contain the values 1-4. So, for example, if you want to analyze the text terms that are in the first cluster, you can use a WHERE statement (or clause) such as WHERE CLUSTER=1; The documentation for PROC FASTCLUS contains several examples. I suggest starting with the Getting Started example.

Rick_SAS · ‎06-04-2024

For a discussion of the DATA step and which functions can run in CAS, see the article, "A list of SAS DATA step functions that do not run in CAS"

Rick_SAS · ‎06-02-2024

If I understand your question, I believe you are referring to a phenomenon known as "label switching." The FMM documentation for the PARTIAL= option states: In a model in which label switching is a problem, you can sometimes avoid switching by assigning just a few observations to categories. For example, in a three-component model, switching might be prevented if you assign the observation that has the smallest response value to the first component and the observation that has the largest response value to the last component. Alternatively, you can try to post-process the labels by looking at the ParameterEstimates table. If the Intercept1 term is greater than the Intercept2 term, then switch the labels for the parameters and for the mixing probability. I don't think you want to use the RESTRICT statement. That will impose constraints on the parameters for the first and second components, but it does not affect which components are assigned as first and second.

Rick_SAS · ‎05-31-2024

The First Component: 1. For observations for which group="group 1", the conditional distribution of dk_cons is f_11 = N(Intercept_1 + beta_1, 1.0489), where the second argument is the estimated Variance 2. For observations for which group="group 2", the conditional distribution of dk_cons is f_12 = N(Intercept_1, 1.0489) The Second Component: 1. For observations for which group="group 1", the conditional distribution of dk_cons is f_21 = N(Intercept_2 + beta_2, 1.0197), where the second argument is the estimated Variance 2. For observations for which group="group 2", the conditional distribution of dk_cons is f_22 = N(Intercept_2, 1.0197) Let p = 0.2573 be the mixing probability. Then the full model is: - For observations for which group="group 1", the conditional distribution is p*f_11 + (1-p)*f_21 - For observations for which group="group 2", the conditional distribution is p*f_12 + (1-p)*f_22

Rick_SAS · ‎05-31-2024

The image you posted looks like a summary of the descriptive statistics for variables. You want to use the ParameterEstimates table to interpret the model. In the upper left corner of the screen is a drop-down menu that says PostSummaries. Click on that and see if ParameterEstimates is another option to view. In addition, for the FMM procedure, you will need to output the MixingProbs table, so add the statement ods output ParameterEstimates=ParameterEstimates MixingProbs=MixingProbs; The parameter estimates give the parameters for each component. You will see a column 'Component' that has values 1 and 2, and parameter estimates for "Intercept", the relative increment for each class level, and the variance for each component. The MixingProbs table will have one mixing probability. Use 1-probability as the second mixing probability.

Rick_SAS · ‎05-30-2024

I cannot visualize what you are trying to do, but consider whether you need to use a second SCATTER statement to get the symbols that you need. The first SCATTER statement uses the GROUP= option, so the marker colors and symbols are tied to the group variable (i). You might need to structure the data differently and use a second scatter plot overlay to achieve your result.

Rick_SAS · ‎05-26-2024

I think you can plot a density estimate of the failures and successes and look for peaks (modes) of the density estimates. I'd use PROC UNIVARIATE and a CLASS statement for the failure flag. If both KDEs are approximately constant, then the quality does not exhibit "clusters" in the way that you describe. I don't work with time-of-day data very often, but the following code might help you get started: proc univariate data=have; class fail; histogram time / kernel(lower='0:00't upper='24:00't) nobars endpoints=('0:00't to '24:00't by '1:00't); /* how to format the X axis??? */ run ; It isn't clear to me whether you should let the procedure choose the kernel bandwidth based on the data or figure out a bandwidth that you will always use. For example, compare the output of the about to the output from the KERNEL(C=0.5) option.

Rick_SAS · ‎05-23-2024

In many cases, you can simply use a SET statement to vertically concatenate the data from different data sets. You might need to rename variables if two data sets have variables with the same name. For an example, see "How to overlay custom curves with PROC SGPLOT."

Rick_SAS · ‎05-22-2024

First, use PROC GLM to get the parameter estimates for the regression model: data Have; set sashelp.cars; x = EngineSize; y = MPG_City; run; proc glm data=Have; model y = x x*x / solution; quit; Then you have two choices: 1. Display the regression coefficients in a table as name-value pairs 2. Display the regression line as an equation For the first case (the table), here is an example: proc sgplot data=Have noautolegend; reg Y=y X=x / degree=2; inset("Intercept"= "39.26" "x" = "-8.65" "x^2" = "0.74") / title="Regression Coefficients" opaque border; run; For the second case (an equation), here is an example: proc sgplot data=Have noautolegend; reg Y=y X=x / degree=2; inset "y = 39.26 - 8.65*x + 0.74*x^2" / opaque border; run; If necessary, there are also ways to get an inline superscipt instead of x^2.

Online Status	Offline
Date Last Visited	Tuesday