About halladje

halladje · ‎01-08-2020

Hi there, I am still experiencing issues with pooling ICCs. Please let me know if you have any further suggestions. Thanks so much, Jillian

seeff · ‎12-10-2019

1. There are a lot of resources around the web to explain how ML handles missing data, but here is a nice paper from a SAS forum: http://www.statisticalhorizons.com/wp-content/uploads/MissingDataByML.pdf 2. FIML and ML are different terms for the same thing. "Missingness" in the sense which you are using it is referring to the outcome, not the predictor variables. So, yes, PROC MIXED deletes an observation if a predictor variable is missing, but retains an observation if predictors are valid but outcome is missing.

StatDave · ‎11-25-2019

GENMOD does not estimate multilevel models. Only one REPEATED statement is allowed.

PaigeMiller · ‎10-11-2019

@halladje wrote: Hello there, 3 level explanation: students (level 1) clustered in classrooms (level 2) clustered in schools (level 3). I have ~30,000 students in ~2,000 classrooms in ~200 schools. I would use the word "nested", not "clustered". Yes, GLIMMIX can handle nesting. upper level effects explanation: using teacher reported variables (class level) and student aggregated variables to the school level (school level variables) as independent variables in the model. Yes, you can aggregate the data to any level you think is appropriate. Can GENMOD also model random intercept models? What would be the benefit of using one over the other? Can both calculate an accurate ICC for continuous and binary outcome data? I don't know.

pau13rown · ‎10-02-2019

for what it's worth, i stumbled upon a couple of sources of info: -"sas performance tuning techniques" https://www.phuse.eu/blog/sas%c2%ae-performance-tuning-techniques [need to be a phuse member to access, most recommendations are common sense] -guy on this message board asks a similar Q, and ultimately the recommendation is to buy more RAM because it's cheap: https://www.developpez.net/forums/d1490127/logiciels/solutions-d-entreprise/business-intelligence/sas/sas-stat/proc-phreg-fragilite-effet-aleatoire-p-processus-comptage/

Haris · ‎09-24-2019

It's been two years since this inquiry. SAS: any progress to report?

FreelanceReinh · ‎09-24-2019

Hi @halladje, If I understand your requirements correctly, you want to modify one existing dataset (by setting a number of variables to missing). So, your probabilities (0.2782, 0.3497, etc.) are actually expected relative frequencies in that dataset (after the modification). The main issue is: Most of the probabilities you've specified are marginal probabilities, but constraints such that "a pre-specified number to have all items deleted (9.17% missing all items)" or "it is not possible for all 5 columns to =1" imply that the Bernoulli random variables you're trying to simulate are statistically dependent. This means, you can't simply use RAND('bern',0.2782), RAND('bern',0.3497), etc. (or RAND('bern',0.1865), RAND('bern',0.258), etc. for that matter). Maybe there is an additional issue: The relative frequencies would most likely differ from the specified probabilities due to random fluctuations. For example, on average, more than one out of ten selections from 1000 individuals using independent RAND('bern',0.3497) values will contain >368 individuals. Given the precision of the specified probabilities, you might not be happy with the results. Here's an outline of how you could avoid both of these issues: There are 2**5 - 2 = 30 different combinations of five zeros and ones after excluding "00000" (="no item missing") and "11111" (="all items missing"). Denote the relative frequencies to be determined for the 30 combinations "00001" (="only item 1 missing"), ..., "11110" (="only item 1 nonmissing") with x1, ..., x30. Write down the constraints for the xi (besides xi>=0). These are linear equations. Examples: The constraint that 9.17% of the observations are to have all items missing translates to x1+x2+...+x30=0.9083 (=1-0.0917). The constraint that 23.72% of the observations are to have item 5 missing, but not all items missing, translates to x16+x17+...+x30=0.2372 (see first digit of 16, 17, ..., 30 in the binary system). Solve the resulting system of linear equations (SAS/IML?). There will be many free parameters in the solution. Think of reasonable values for these parameters (or specify more constraints in step 2). Compute the corresponding absolute frequencies from the solution obtained in step 3: If your dataset contains N individuals, determine n1, ..., n30 by ni=floor(xi*N) or ni=ceil(xi*N) and similarly n31=floor(0.0917*N) or n31=ceil(0.0917*N) so that n1+...+n31=N. Use PROC SURVEYSELECT with the GROUPS=(n1 ... n31) option to assign the individuals randomly to the 31 groups (numbered 1, ..., 31). In a DATA step, use the 1st, ..., 5th digit of the respective individual's group number in BINARY5. format (i.e. "00001", ..., "11111") to determine which of the items 5, 4, 3, 2, 1 need to be set to missing and perform this operation in a DO loop (1 to 5).

PaigeMiller · ‎07-19-2019

I would use PROC SUMMARY and not PROC SQL to do this, but in either case, in order to advise you properly, we would need to see a small portion of your input data set, provided according to these directions: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-data-AKA-generate/ta-p/258712

FreelanceReinh · ‎07-18-2019

Thanks for your detailed reply. I haven't checked all the details, but it seems that you're already selecting 0-1-patterns of missingness and then set variables to missing accordingly -- which is good. (And, yes, the probabilities for all possible patterns must add to 1.) A few things could be improved, e.g., variable MissInd is character, so should be compared to character literals such as '0010000', not numbers without quotes (see notes about automatic type conversion in the log). Statements like "else j_pared2=j_pared2;" are redundant. @halladje wrote: My post was initially asking about how to deal with the subsequent patterns 8 (missing 1 or 2 variables but not in a pattern >=1% of the sample) and 9 (missing 3+ variables but not in a pattern). Is there a different more efficient way to do this? At the end of the day, you are effectively simulating the random vector I mentioned in my earlier post. Maybe the most transparent approach would be: Specify the 127(+1) probabilities determining the multivariate distribution of the random vector. Many of them will be 0 if you don't want to get (most of the) patterns with more than two or three missings -- which simplifies things. Use the "Table" distribution with the non-zero values (or all but one) among those 128 probabilities to generate the sequence number of a pattern. Set variables to missing according to the randomly selected pattern. As to step 1 above, you seem to have the probabilities already for the most frequent individual patterns. So, what remains is to define probabilities for the "collapsed" patterns you've mentioned. For example, if there are n distinct patterns (outside the set of the "most frequent" patterns) containing exactly 2 missings and their total probability should be, say, 0.04567, then you have to decide on the distribution: Would you deem 0.04567/n appropriate for the probability of each of the n patterns? Or rather the observed individual relative frequencies (assuming their sum is 0.04567)? Or perhaps the (scaled) expected relative frequencies assuming independence (using the marginal probabilities in the calculation)? Or something else?

halladje · ‎07-15-2019

hmm I also have a categorical variable that I am trying to compare just particular categories for each t-test. For example, my chi-square code is as follows: proc freq data=lpa; by _imputation_; tables other*class_mem/chisq nocol nopercent; where class_mem in ( 2 4); output out=ChiSqData n nmiss pchi lrchi; run; Can you do this in a t-test as well? How do you specify both the categories to be compared and the imputation statement? Thanks so much, Jillian

halladje · ‎07-09-2019

Great! That worked thank you so much. Jillian

Rick_SAS · ‎07-05-2019

I'm pretty sure the answer is "yes" to all your questions, although I have not done it and have no code to contribute. I suggest you attempt to program a solution to these questions on a small data set such as Sashelp.Heart that we all can access. If you run into problems, open a new thread and show us your initial attempts.

DarthPathos · ‎07-04-2019

hi Jillian Take a look at this paper - it may give you some ideas, and talks about a macro MMI_IMPUTE. Having said that, I've poked around here and found a thread asking a very similar question; the author of the paper, who as of 2017 worked for SAS, posted "...I no longer recommend using MMI_IMPUTE. MMI_IMPUTE uses an imputation algorithm called PAN, which was developed a while back by Joseph Schafer. Unfortunately, PAN is a bit outdated, and is less flexible than newer algorithms (e.g., the algorithm can't incorporate random effects between incomplete variables; all incomplete variables are required to be normally distributed). I recommend that you instead use a standalone software package called Blimp. Blimp was written by Brian Keller and Craig Enders at UCLA. Unlike MMI_IMPUTE, Blimp can handle random effects between incomplete variables and can also handle some non-normal incomplete variables (e.g., binary variables). The software and associated documentation are available at http://www.appliedmissingdata.com/multilevel-imputation.html. Blimp is free, and the website contains scripts for using it from SAS." I don't know Blimp at all, but as Missing Data / Imputation is something I'm interested in, I will definitely be taking a look. Please post back if you have any further questions; I'm also happy to contact you via email if that's easier. Chris

halladje · ‎06-22-2019

Is there a paper you recommend talking about how PAN and MMI_IMPUTE work? Is there anything published regarding what you said re: outdated? Also, can MMI_IMPUTE handle 3 level, or only 2-level? I am working on a paper comparing different missing data strategies in multilevel models (including BLIMP) and I am just wondering if I should also include MMI_IMPUTE for comparison purposes, or if something has already been published that I can reference. Thanks so much!

ballardw · ‎08-17-2017

If you use frequently Value Labels in SPSS you will need to become familiar with Proc Format. Formats are the equivalent of a value label and for some purposes that suffices (binning or grouping). When data is read either from an external file or an existing variable an INFORMAT is one way to accomplish bulk recoding without a large number of If/then/else statements. I do sypathize with having to make that adjusment. I had to go the other way to SPSS after using SAS for about 13 years.

Online Status	Offline
Date Last Visited	‎07-30-2020 02:38 PM

Re: Pooling ICCs after/within MIANALYZE

3-level GENMOD Exchangeable working correlations

Re: Pooling ICCs after/within MIANALYZE

Re: Pooling ICCs after/within MIANALYZE

Pooling ICCs after/within MIANALYZE

PROC MIXED Method=ML and missing data (FIML?)

Re: GENMOD or GLIMMIX for 3-level random effects models

GENMOD or GLIMMIX for 3-level random effects models

Re: Generating an array of random bernoulli variables with a minimum a...

Re: Generating an array of random bernoulli variables with a minimum a...

Re: Multilevel Imputation

Re: Pooling ICCs after/within MIANALYZE

Re: PROC MIXED Method=ML and missing data (FIML?)

Re: 3-level GENMOD Exchangeable working correlations

Re: GENMOD or GLIMMIX for 3-level random effects models

Re: issues with proc GLIMMIX glogit multinomial models

Re: FIML in GLIMMIX & GLIMMIX code

Re: Generating an array of random bernoulli variables with a minimum a...

Re: Aggregating dataset to an upper level

Re: Rand function to select 2-3 values in an array/table

Re: independent ttest with multiply imptued data

Re: Multiple Imputation in GLIMMIX with continuous and categorical pre...

Re: Saving missing data patterns

Re: Multilevel Missing techniques

Re: Multilevel Imputation

Re: recoding data