About lvm

lvm · ‎11-10-2015

Correction: it took about 11 sec of real time on 50 M records.

lvm · ‎11-10-2015

Out of curiosity, I just simulated 50 million observations and determined the median and quartiles with PROC HPSUMMARY. No problem. I did this with "only" 8 GB of memory and a slow processor.Took less than 1/2 second of real time. It is important to use the P2 method of quantile estimation (approximation). The default (OS) requires internal ordering of the observations, which is a challenge with so many observations. proc hpsummary data=a qmethod=p2; var y; output out=out q1=q1 q3=q3 median=median mean=mean; run; proc print data=out;run;

lvm · ‎11-09-2015

There are several "HP" (high performance) procedures now. Run on single machine or in distributed mode. There is no HPMEANS, but there is HPSUMMARY. This might work for your purpose of getting quantiles with very large data sets. With 40 M observations, quantiles will be difficult to get without the tricks of large-scale computing. At some point, you would need to get the distributed computing products. For some procedures, SAS 9.4 won't allow the (non-HP) procedure to run if it will take too much time. This is frustrating. I have 9.3 and 9.4 on my desktop, and I can fit a mixed model on a large data set with 9.3 (taking many hours), but in 9.4 I just get a message that it would take too long to run.

lvm · ‎11-09-2015

I presume you mean PROC MIXED. You should use HPMIXED. This should handle your memory problems. Most of the syntax is the same, although there are far fewer options with HPMIXED. http://support.sas.com/resources/papers/proceedings09/256-2009.pdf The mixed model equations can consist of some very large matrices; inverting them takes a great deal of time and memory when there are many random effects. I highly recommend that you figure out how to use HPMIXED.

lvm · ‎11-09-2015

I can't tell if you are fitting the right model since you don't give the full simulation code. Don't know what group and ID are, based on your code. I assume the right model. You should note that when you use a 'residual' for binary or binomial data, you are getting quasi-likelihoods, not true likelihoods. It is difficult to interpret an R-side var-covar matrix with binary data, other than it is giving you desired inflation of the SEs of the fixed effect parameters (and adjustments for test statistics). But this is just background. GENMOD and GLIMMIX use different estimation methods, even for working correlation matrix models, such as the one you are using. The actual algorithm is different. So, one may converge more readily than the other for certain types of data sets and models.

lvm · ‎11-09-2015

THose last columns in the output are often known as "predictive values" (false positive predictive values). These are estimates of posterior probabilities. You can get your false positive and false negative percentages by simply subtracting sensitivity and specificy (both in the table) from 100.

lvm · ‎11-04-2015

I recommend that you check out PROC POWER. It should have what you need. Also, read: http://support.sas.com/kb/48/616.html

lvm · ‎11-04-2015

You can get the EBLUPs easy enough with a solution option on a random statement: random hospital / s; You could store these with the approrpriate ODS OUTPUT statement (look up the name for the random effect predictions). You can get the inverse link for each subject using ESTIMATE statements. For instance, assume you have one predictor variable X, the EBLUP for hospital one and three are : class hospital .....<other terms> ...; model y = X / dist=binomial link=logit; random hospital / s; estimate 'x 1 hosp 1' int 1 X 1 | hospital 1 / ilink; estimate 'x 1 hosp 2' int 1 X 1 | hospital 0 0 1 / ilink; My code gets the EBLUP for X=1 (if X is continuous) or the first level of X if it is a factor. The ILINK option uses the delta method for the SEs. I have never heard of predictive margins, and have no idea why one assumes that all observations belong, in turn, to each subject. I assume there are good reasons. But based on what you wrote, you would need to write code in a data step (or multiple data steps) to do what you want. It would take some time.

lvm · ‎10-30-2015

There have been several posts, with answers, on the coding and meaning of interactions of factors and continuous variables.

lvm · ‎10-30-2015

I tried the file upload from two different computers, with the same failed result. But I pasted the full program within the post, so there is no urgency.

lvm · ‎10-30-2015

I tried again to attach a file, with the lower case sas extension. It still said that the contents didn't match. So I am pasting the entire program here. First linear and then nonlinear, for three quantiles (the quartiles). A work in progress (no guarantees). /* Linear QR with Asymmetrical Laplace (AD) distribution */ /* (may or may not be working correctly. No warranties. */ title 'Quantile regression with Asymmetrical Laplace'; data t; call streaminit(12345); do x = 0 to 100 by 10; do j = 1 to 10 by 1; e = rand('norm',0)*12; mu = 10 + 1.5*x; y = mu + e; output; end; end; run; proc quantreg data=t; model y = x / quantile = 0.25, 0.5, 0.75; run; proc nlmixed data=t tech=nmsimp maxiter=1000 ; title2 'median (tau=0.5)'; *---must do initial search of parameter estimates to get close; parms nu =0 to 20 by 2 beta = 0.5 to 3.5 by .5 sigma = 1 to 6 by 1 ; tau = 0.5; *<--for median. change to 0.75 for 75th quantile, etc.; *---in example, nu is intercept, and beta is slope; eta = nu + beta*x; *<--change this to any function (linear or nonlinear); *---rest does not change; diff = y - eta; rho = diff*(tau - (diff < 0)); ll = log(tau*(1-tau)) - log(sigma) - rho/sigma ; model y ~ general(ll); *--Tell NLMIXED that ll (defined above) is the user-specified log-likelihood; estimate 'var(quant)' sigma**2; predict eta out=prd ; run; *proc print data=prd;run; proc sgplot data=prd; scatter y=y x=x; series y=pred x=x; run; proc nlmixed data=t tech=nmsimp maxiter=10000 ; title2 'median (tau=0.75)'; parms nu =0 to 20 by 2 beta = 0.5 to 3.5 by .5 sigma = 1 to 6 by 1 ; tau = 0.75; eta = nu + beta*x; diff = y - eta; rho = diff*(tau - (diff < 0)); ll = log(tau*(1-tau)) - log(sigma) - rho/sigma ; model y ~ general(ll); *--Tell NLMIXED that ll (defined above) is the user-specified log-likelihood; estimate 'var(quant)' sigma**2; predict eta out=prd75 ; run; *proc print data=prd;run; proc sgplot data=prd75; scatter y=y x=x; series y=pred x=x; run; proc nlmixed data=t tech=nmsimp maxiter=1000 ; title2 '25% quantile (tau=0.25)'; parms nu =0 to 20 by 2 beta = 0.5 to 3.5 by .5 sigma = 1 to 6 by 1 ; tau = 0.25; *<--define the desired quantile; eta = nu + beta*x; diff = y - eta; rho = diff*(tau - (diff < 0)); ll = log(tau*(1-tau)) - log(sigma) - rho/sigma ; model y ~ general(ll); *--Tell NLMIXED that ll (defined above) is the user-specified log-likelihood; estimate 'var(quant)' sigma**2; predict eta out=prd25 ; run; *proc print data=prd;run; proc sgplot data=prd25; scatter y=y x=x; series y=pred x=x; run; *---merge files with predicted values; data predco; merge prd25(rename=(pred=pred25)) prd(rename=(pred=pred50)) prd75(rename=(pred=pred75)); proc sgplot data=predco; title2 'view three predicted quantiles'; scatter y=y x=x; series y=pred25 x=x / lineattrs=(pattern=1 color=red thickness=2); series y=pred50 x=x / lineattrs=(pattern=2 color=black thickness=2); series y=pred75 x=x / lineattrs=(pattern=3 color=blue thickness=2); run; /* Now for a possible nonlinear case */ title 'Quantile regression for nonlinear model example, using AD distribution'; data l; call streaminit(12345); do x = 0 to 100 by 10; do j = 1 to 10 by 1; e = rand('norm',0)*12; logit = -5 + .08*x; mu = 100/(1+exp(-logit)); y = mu + e; output; end; end; run; *proc print data=l;run; proc sgplot data=l; scatter y=y x=x; run; proc nlmixed data=l tech=nmsimp maxiter=1000 ; title2 'median (tau=0.5)'; parms nu =-12 to 6 by 1 beta = .0 to .4 by .05 sigma = 1 to 8 by 1 ; tau = 0.5; eta = 100/(1+exp(-(nu + beta*x))); diff = y - eta; rho = diff*(tau - (diff < 0)); ll = log(tau*(1-tau)) - log(sigma) - rho/sigma ; model y ~ general(ll); *--Tell NLMIXED that ll (defined above) is the user-specified log-likelihood; estimate 'var(quant)' sigma**2; predict eta out=prd ; run; *proc print data=prd;run; proc sgplot data=prd; scatter y=y x=x; series y=pred x=x; run; proc nlmixed data=l tech=nmsimp maxiter=10000 ; title2 '75% quantile (tau=0.75)'; parms nu =-12 to 6 by .5 beta = .0 to .4 by .025 sigma = 1 to 6 by .5 ; tau = 0.75; eta = 100/(1+exp(-(nu + beta*x))); diff = y - eta; rho = diff*(tau - (diff < 0)); ll = log(tau*(1-tau)) - log(sigma) - rho/sigma ; model y ~ general(ll); *--Tell NLMIXED that ll (defined above) is the user-specified log-likelihood; estimate 'var(quant)' sigma**2; predict eta out=prd75 ; run; *proc print data=prd;run; proc sgplot data=prd75; scatter y=y x=x; series y=pred x=x; run; proc nlmixed data=l tech=nmsimp maxiter=10000 ; title2 '25% quantile (tau=0.25)'; parms nu =-12 to 6 by .5 beta = .0 to .4 by .025 sigma = 1 to 6 by .5 ; tau = 0.25; eta = 100/(1+exp(-(nu + beta*x))); diff = y - eta; rho = diff*(tau - (diff < 0)); ll = log(tau*(1-tau)) - log(sigma) - rho/sigma ; model y ~ general(ll); *--Tell NLMIXED that ll (defined above) is the user-specified log-likelihood; estimate 'var(quant)' sigma**2; predict eta out=prd25 ; run; *proc print data=prd;run; proc sgplot data=prd25; scatter y=y x=x; series y=pred x=x; run; data predco; merge prd25(rename=(pred=pred25)) prd(rename=(pred=pred50)) prd75(rename=(pred=pred75)); ods html style=analysis; proc sgplot data=predco; scatter y=y x=x; series y=pred25 x=x / lineattrs=(pattern=1 color=red thickness=2); series y=pred50 x=x / lineattrs=(pattern=2 color=black thickness=2); series y=pred75 x=x / lineattrs=(pattern=3 color=blue thickness=2); run;

lvm · ‎10-30-2015

Rick, thanks for the comments. I am well aware of the line-crossing issue. I just like when it doesn't happen, but I know it does. Another good source on the AD distribution is Geraci and Bottai (Stat. Comput. 2014. 24: 461-479). This is for quantile mixed models, which is why I learned about it. But section 2 of the article is a good synopsis of the application of this distribution. You can read about the link between the distribution and the L1 norm that is commonly used for quantile regression. As I wrote last night, my coding for this problem is a work in progress. I make not guarantees at this point. I will try posting my full program in a following post.

lvm · ‎10-29-2015

The website won't let me attach the sas file. It keeps telling me that content does not match file type (which is ridiculous). I can cut an paste the total contents in a message if you want it.

lvm · ‎10-29-2015

If you truly want quantile regression (QR) for models that are nonlinear in the parameters, then Rick is right, there is no dedicated procedure. However, if you are knowledgeable in statistics and know how to program in NLMIXED, there may be hope. There have been several developments in "parametric" approaches to the QR problem. See articles by Geraci and colleagues (maybe start with a 2008 Biostatistics article). Their aproach is for linear models, and most recently for linear mixed quantile models, but the following points should still apply to nonlinear. The idea is to pretend that the data have an asymmetric Laplace (AL) distribution. This is a 3-parameter distribution, where one of the parameters, eta, is the tau-th quantile. That is, tau defines the desired quantile and is fixed by the user; e.g., if one specifies tau=0.5, then eta is the median; with tau=0.75, then eta is the upper quartile; and so on. Eta could either be a single number, or expanded, so that eta = f(X, other parameters). One can fit this model, in principle, by defining the likelihood in NLMIXED. I say "in principle" because I am not yet convinced that I have worked out the details. I have played with this a few times over the last couple of years, but never got far enough. There are several problems with my effort so far, but you may find this helpful. The code below runs, but it doesn't necessarily work.That is, I generate data for a linear model (as an example), and then fit a linear model for the tau-th quantile using NLMIXED (changing one line of code could make it appropriate for a nonlinear model). It converges, but the Hessian has a negative eigenvalue, which is a pathological problem for the standard errors. But I don't think the SEs would be valid anyway, because one is only using the AL as crank to get the quantiles, not for inference. One probably should do bootstrapping for measures of uncertainty. It is critical that the starting values for the parameters are CLOSE to the true values before the optimization starts. Thus, you need a dense grid search specified in the parms statement. If initial guesses are not close, the final estimates will be far from the correct ones. I found it helpful to use an optimization method that does not use derivatives (nmsimp). You should only treat my code as a guide. I am not convinced that I am doing this correctly, so others may find problems with my logic or my coding. I have played with fitting a nonlinear model. I can get it to work, but results can be a bit strange with nonlinear quantile models. The different prediction lines can actually cross. I am also attaching example sas code to do three different quantiles for the linear case, and also three quantiles for a nonlinear example. You should know that the results do NOT duplicate the results from PROC QUANTILE for the linear case (they may be close). Based on Geraci, this is not unexpected. data t; call streaminit(12345); do x = 0 to 100 by 10; do j = 1 to 10 by 1; e = rand('norm',0)*12; mu = 10 + 1.5*x; y = mu + e; output; end; end; run; proc nlmixed data=t tech=nmsimp maxiter=1000 ; title2 'median (tau=0.5)'; *---must do initial search of parameter estimates to get close; parms nu =0 to 20 by 2 beta = 0.5 to 3.5 by .5 sigma = 1 to 6 by 1 ; tau = 0.5; *<--for median. change to 0.75 for 75th quantile, etc.; *---in example, nu is intercept, and beta is slope; eta = nu + beta*x; *<--change this to any function (linear or nonlinear); *---rest does not change; diff = y - eta; rho = diff*(tau - (diff < 0)); ll = log(tau*(1-tau)) - log(sigma) - rho/sigma ; model y ~ general(ll); *--Tell NLMIXED that ll (defined above) is the user-specified log-likelihood; estimate 'var(quant)' sigma**2; predict eta out=prd ; run; *proc print data=prd;run; proc sgplot data=prd; scatter y=y x=x; series y=pred x=x; run;

lvm · ‎10-16-2015

You don't need to correct for this -- it is automatically taken care of. Do not use the 'random _residual_" statement for this purpose (this is for overdispersion: high variability than one would obtain with a binomial distribution).

Online Status	Offline
Date Last Visited	‎10-02-2024 05:21 PM

Re: mianalyze of lsmestimate

Re: mianalyze of lsmestimate

Re: TEMPLATE: how to combine the equivalent of LAYOUT LATTICE and LAYO...

TEMPLATE: how to combine the equivalent of LAYOUT LATTICE and LAYOUT D...

Re: SAS code for proc glimmix data - interaction analysis

Re: Mixture of chi square with NLMixed in sas

Re: Stepwise Model Selection for longitudinal binary data using PROc G...

Re: Calculating weight for site effect based on standard error

Re: Estimating treatment effects, 2 Group Pre-Post Matched Analysis

Re: Proc Mix insufficient memory issue

Re: GLIMMIX: order of random variable syntax

Re: mianalyze of lsmestimate

Re: mianalyze of lsmestimate

Re: SAS code for proc glimmix data - interaction analysis

poisson regression goodness of fit stats

Re: Stepwise Model Selection for longitudinal binary data using PROc G...

Re: Proc Mix insufficient memory issue

Re: Proc Mix insufficient memory issue

Re: Proc Mix insufficient memory issue

Re: Proc Mix insufficient memory issue

Re: Convergence of GLIMMIX vs. GENMOD

Re: False POS/NEG Rate incorrect in PROC LOGISTIC

Re: Sample Size for Non-inferiority with interim analyses (group seque...

Re: Glimmix and predictive margins

Re: estimate continuous - categorical interaction

Re: Nonlinear qunatile regression

Re: Nonlinear qunatile regression

Re: Nonlinear qunatile regression

Re: Nonlinear qunatile regression

Re: Nonlinear qunatile regression

Re: binomial unequal group sizes