About toneill

toneill · ‎07-21-2015

Looking for some interpretation help with a GLIMMIX output (as most the documentation is from logistic models). The very basic bivariable model is as follows: proc glimmix data=have noclprint noitprint method=quad gradient ; class study_id ; model score=race / solution dist=mult link=clogit ; random intercept / type=un subject=study_id ; covtest 'Need Random Intercept?' 0 ; where race ^in(.,88,99) ; run ; I have 5500 observations, with a total of 1795 individuals (study_id) observed over time. Participants have multiple scores (range: 0-4) over time. I have made no assumption about the distance between adjacent levels, but they are ordered (i.e. 0=not present, 4=very bad levels present). Race is ethnic race (i.e. caucasian, African-American etc) with 5 levels. I am modeling the probabilities of levels of score having lower ordered values in the response profile table. Once run, I have 4 intercepts (0-3), and 5 estimates for race. Numbers below are just made up. Effect Score Race Estimate Intercept 0 0.5 Intercept 1 0.6 Intercept 2 0.7 Intercept 3 0.8 Race Race1 0.09 Race Race 2 0.1 Race Race3 0.11 Race Race4 0.12 Race Race5 0.13 I would like to be able to interpret this (eventually) in a multivariable model as well, then present it in a logical and meaningful way. Any direction would be appreciated.

toneill · ‎06-11-2015

Thank you both. You have given me much to ponder over the next few days. I will look in to both suggestions of multinomial distribution as well adding a constant (with considerations about the distribution as described). Your input has been much appreciated. When I figure out what approach to take and the outcome(s) I estimate, I'll re-post for those who may have a similar problem.

toneill · ‎06-09-2015

I have a conundrum that I just can't figure out...(epidemiologist and not a bio/statistician) I have a continuous outcome with a range of 0-5.0. This data was generated from a questionnaire from a Likert-like item as a score (i.e. 0=no event, 1=event+severity level 1, 2=event+severity level 2, etc). The data is right-skewed. I want to model the data using <proc glimmix> with gamma log-link (where my distribution is exponential, r=1 -- such that my gamma density specializes to the exponential density). This assumes a constant coefficient of variation over the range of X, a random variable. From my understanding, GLMM with gamma should be able to model a gamma function with limits (0,infinity). So why does proc glimmix restrict to interval data? I can't model this data as a count process (i.e. Poisson, etc), so that is not an alternative option. I could use proc genmod, but intuitively, a GLMM is more appropriate than GEE models for my given research question. Or am I just modeling the data as an exponential with log-link since r=1? In which case, can I then model my 0's? Alternatively, I could add an interval to each observation (e.g. +1) to avoid the issue of zeros in the data, but I don't know how this would impact my beta estimates (if at all). Thanks!

toneill · ‎06-01-2015

I actually found the error in my code seconds after posting this -- by using <put qcount=;> -- funny how that happens (after spending HOURS trying to figure out my error!). I had 13 dates (only for 2 individuals out of 1800...) As for the example data: where study_id = the individuals unique identified onMeds=on medication at the time of the interview (this doesn't exist in my dataset, but it is easier to understand my problem with this variable), 0=yes/1=no interviewdt=interviewdt Variables wanted: *medNaive*=on medication at current interview (medNaive=onMeds), 0=yes/1=no *medEver*=if EVER on meds, even if not at current interview, 0=never/1=ever Study_id onMeds interviewdt *medNaive* *medEver* 1 1 01jan2007 1 1 1 0 01jan2008 0 1 1 0 01jan2009 0 1 2 0 02feb2008 0 0 2 1 02feb2009 1 1 2 0 02feb2010 0 1 3 0 03mar2008 0 0 3 0 03mar2009 0 0 3 1 03mar2010 1 1

toneill · ‎06-01-2015

Two code problems that I was hoping to get some further insight: (1) Trying my hand at arrays and I continue to get the an error code = "Error: Array subscript out of range at line X column Y". I am trying to merge (I) questionnaire and (ii) medication data (both longitudinal, sorted by participant study_id). There are a maximum of 8 interviewdates per participant, but tens of medications possible corresponding to a given interview date. (2) I also want to "carry forward" a variable (mednaive) but am unsure on how to do this. In this particular variable, a participant can be on a medication at one questionnaire (mednaive=0) and then off at another (mednaive=1). However, I want to capture "ever exposed" to the medication such that, if an individual is not on a medication at a given questionnaire but was in the past, then they will ALWAYS be mednaive=0 (even if they are mednaive=1 at this particular questionnaire). data qSummary ; set dataset ; by study_id ; /*sorted by study_id*/ retain qCount qdate1-qdate11 ; / *variables created for interview dates and number of completed questionnaires*/ mednaive=. ; /*new variable to identify those naive vs those experienced to a particular medication class*/ if ((drug_class='X') or (drug_class='Y') or (drug_class='Z')) then mednaive=0 ; /*on medication at current questionnaires*/ else mednaive=1 ; /*no on medication at current questionnaire*/ format mednaive mednaive. ; /*NEWVAR needed to address "carry forward" concept noted above -- unsure of how to do this*/ array qdate {*} qdate1-qdate11 ; format qdate1-qdate11 date9. ; if first.study_id then do ; /*if first questionnaire per person*/ qCount=1 ; /*count of questionnaire (max=11 questionnaires)*/ qdate1=interviewdt ; /*the first questionnaire date is equal to the first interviewdate variable already defined in the dataset*/ do i=2 to 11 ; qdate{i}=. ; end ; end ; else do ; /*if not first questionnaire per person*/ qcount=sum(qcount,1) ; qdate{qcount}=interviewdt; /* <-- ERROR: ARRAY SUBSCRIPT OUT OF RANGE AT (THIS) LINE COLUMN X*/ end ; if last.study_id then output ; run ;

toneill · ‎05-18-2015

I need to estimate the difference between interview dates (<interviewdt>), delete those interviews completed <300 days apart, and re-code questionnaire numbers to reflect the new dataset. This is longitudinal data. Variables: 1. study_id = unique id per patient 2. qnum = questionnaire number 3. interviewdt = date interview was conducted data datediff ; input study_id qnum interviewdt ; datalines ; 1 1 01Jan2007 1 2 04Jan2007 1 3 07July2008 2 1 15Feb2009 2 2 03Mar2009 2 3 30Mar2010 3 1 20Dec2012 3 2 15Feb2013 ; run ; Data should look like: Obs study_id qnum interviewDt 1 1 1 01Jan2007 2 1 2 04Jan2007 3 1 3 07July2008 4 2 1 15Feb2009 5 2 2 03Mar2009 6 2 3 30Mar2010 7 3 1 20Dec2012 8 3 2 15Feb2013 --I need to understand how to bu program that would (1) Compute the difference between dates within an individual (i.e. from qnum=1 to qnum=2, and qnum=2 and qnum=3) etc (2) Delete observations who completes an interview <300d from the previous (i.e. should only complete one questionnaire approximately every 12 months) (3) renumber QNUM (e.g. in the above data, I would delete Obs 2 and Obs 8 -- so I want to create a new variable for Qnum reflecting the new interview number -- such that, for Obs2 where Qnum=2, this observation would be deleted and Obs3 where QNum=3 would become NewQnum=2). I am not familiar with <proc SQL> and would prefer to avoid (I know you all love SQL on the discussion boards!) ** ====================================================== ; This is what I tried but it didn't work : data dateDiff ; set old ; by study_id ; diff_IntDt = dif(interviewDt) ; /*Compute difference between interviewDt*/ if not first.study_id then output ; run ; proc print data=dateDiff ; var interviewYr qnum ; run ; PROBLEM: I ended up deleting ALL my first questionnaires (since I asked for <if not first.study_id then output>). If I don't include this condition, however, I don't get a dataset computing the difference between sequential <interviewdt>. The next step I would imagine is: data new ; set dateDiff ; by study_id qnum ; if diff_intDt <300 then delete ; /*Delete any observation who's interview date occurs <300 days from the previous date*/ qCount = 0 ; /*Want to re-label QNUM to reflect the deleted observations*/ if qnum=1 then qCount=qnum+1 ; label qCount = 'Number of Completed Questionnaires' ; run ; quit ; proc print data = dateDiff ; var interviewYr qCount ; run ;

toneill · ‎03-10-2015

Here is an example: (1) Data1 data Data1 ; input study_id 1-2 qnum 1-2 interviewdt date9 ; /*where study_id=participant, qnum=questionnaire, interviewdt=date of interview*/ datalines ; 1 1 14FEB2008 1 2 12FEB2009 1 3 01MAR2010 2 1 04SEPT2012 3 1 19MAR2008 3 2 12OCT2010 3 3 17NOV2011 ; run ; (2) Data2 data Data2 ; input study_id 1-2 medstartdate date9 atc_code $10 drug_class $6 ; /*where medstartdate=date that prescription was started, atc_code=medication code, drug_class=type of drug*/ datalines ; 1 01MAY1996 ABC /*Note: this value for drug_class is missing because it is a non-relevant drug for my study*/ 1 01NOV1996 CBA A 1 01NOV1996 CBB B 1 01NOV1996 CBC A 2 31MAR1999 ABC 2 01JUN1999 CBD C 2 01JUN1999 CBE A 2 01MAY2003 ABC 3 17FEB1999 CBA A 3 17FEB1999 CBA A 3 17FEB1999 CBB B 3 01MAR2000 ABC ; run ; --> What I would like: As above, a table that only shows one record per participant (study_id) at baseline questionnaire (where qnum=1). I am ONLY interested in those participants who have an outcome for drug_class (i.e. those with non-relevant atc_codes should not be included in the calculation of a new variable; as I am only interested in those with drug_class reported). What I am TRYING to do, is estimate the proportion of the population who is medication naive .vs. medication experienced at his/her first questionnaire (qnum=1).

toneill · ‎03-10-2015

Unfortunately, I don't have any input datasteps; I was provided with two separate data files and need to merge them together (as aforementioned). I'll give your code a try and let you know what happens.

toneill · ‎03-10-2015

Hi all -- I am a clinical researcher (not a trained analyst/statistician), so I'll do my best to be very clear about the problem I am currently having. I will use generic names for datasets and variables for illustrative purposes. I have 2 data sets (data1 and data2). Both are sorted on study_id. Data1 - Has multiple records per individual (i.e. study_id #1 has 6 records, representing 6 completed questionnaires -- therefore, "study_id" variable 1 is listed 6x, and "qnum" variable is 1-6). Period of observation: 01-01-07 to 31-12-13 Data2 - Also has multiple records per individual (i.e. study_id #1 has 72 records, representing 72 medications he/she has been prescribed). The "quest" variable is NOT included in Data2. Period of observation: 01-01-85 to 31-12-13. However, if a participant only completed a single questionnaire (Data1), he/she still has ALL previous prescriptions "linked" to that questionnaire (i.e. study_id=2 has only 1 questionnaire in 2010, but he/she has prescription records back to 1999 liked to qnum=1 when I attempted to merge the data below). Goal: A merged data set that has one record per "questionnaire" (similar to Data1), and a new variable ("newvar") based on the prescribing of a drug reported in data2. To be more clear on "newvar": In Data1, there is a questionnaire interview date ("interviewdt") and in Data2 there is a prescription date ("medstartdate"). This "newvar" should be binary, such that IF a participant has started a particular prescription (represented by a particular code)(Data2) on or before the interview date (Data1), then "newvar"=1 (i.e. yes), ELSE, "newvar"=0 (i.e. no). Here is what I've done : data Data3 ; merge Data1 (keep=study_id qnum othervariables in=in_data1 where=(interviewdt^=.)) Data2 (in=in_data2 where (medstartdate^=. and (prescriptioncode="A" or prescriptioncode="B"))) ; by study_id ; if in_data1 and in_dadta2 ; dtdiff=interviewdt - medstartdate ; /*ONLY want "interviewdt" where qnum=1 (i.e. first questionnaire/baseline) - I think this is part of my problem; how to only select qnum=1?*/ MedStartDtDiff = abs(dtdiff) ; if dtdiff^=. then MedStartDtDiff=(-10220<=(interviewdt - medstartdate)<=10220) ; /*10220 represents the maximum number of days from the last day of observation, 31-12-13, to the first day, 01-01-85)*/ format MedStartDtDiff yesno. ; run ; proc sort data=Data3 (where =(MedStartDtDiff^=.)) ; by study_id MedStartDtDiff ; where qnum=1 ; run ; proc print=Data3 (obs=30) ; var study_id MedStartDtDiff qnum ; where qnum=1 ; run; SO this is what I get as an output --> study_id MedStartDtDiff MedStartDate qnum 1 YES 01NOV1996 1 --> This is correct; study_id1 has 6 questionnaires (with qnum=1 when interviewdt=14Feb2008 AND started med BEFORE interview date; therefore, MedStartDt is YES) 2 YES 01JUN1999 1 2 YES 01JUN1999 1 2 YES 07NOV2006 1 --> This is NOT correct; study_id2 should only have a single entry. So this is my major issue! Why am I getting multiple study_id for each individuals? To further elaborate on study_id2 -- this participant only completed a single questionnaire (Data1), he/she still has ALL previous prescriptions "linked" to that questionnaire (i.e. study_id=2 has basline/only questionnaire in 2010, but he/she has prescription records back to 1999 liked to qnum=1). I really only want one line per study_id. What I'd like to see: study_id MedStartDtDiff MedStartDate qnum InterviewDt 1 YES 01NOV1996 1 14FEB2008 2 YES 01JUN1999 1 6JUN2010 3 YES 15JUN1989 1 20AUG2008 4 NO 07NOV2010 1 11NOV2011 etc etc.... I am happy to elaborate further or clarify if this isn't 100% clear.

toneill · ‎12-08-2014

Thank you Hai.kuo Just one minor correction on the parentheses: title X ; proc ttest data = X (where=(Y in ('yes','no'))) ; class Y ; var A ; where Z = 1 ; run ; Works perfectly! Thanks for helping me learn another SAS trick!

toneill · ‎12-08-2014

Basic question for a new user: I have a class variable (Y) that has >2 levels (yes/no/don't know/refused). I would like to perform a 2-sample t-test to compare population means on a survey at baseline (Z = 1). However, I know that my class variable will return an error of too many levels. How do I exclude "don't know" and "refused" observations (thus, only leaving those participants who responded "yes" or "no" responses)? title X ; proc ttest data = X ; class Y ; var A ; where Z = 1 ; run ; Thank you.

toneill · ‎07-30-2014

I have two variables; the second dependent upon the answer of the first. There are 5 variables (Q1) each with its own severity scale (Q2). Q1. Do you have the symptom? y/n/don't know/refused (1/2/88/99) --> If a participant answered Y(1) to Q1, they should go to Q2. Q2. Severity: 1/2/3/4/88/99 I would like to make a data set with no inappropriate missing data (where 'missing'=./don't know/refused). So far, I have been able to subset my data for Q1: data newdata; set olddata; if cmiss (of var1 var2 var3 var4 var5) then delete; run; This worked well and the expected number of observations were removed. However, I am unsure of how to move forward. If I ask SAS to remove the missing observations from Q2, then all those that answered N(2) to Q1 (and therefore, have appropriately missing data in Q2) would be removed. I only want to remove those individuals who answered Y(1) to Q1 and subsequently have missing data for Q2 (Q2 is conditioned on Q1). Any direction would be appreciated.

toneill · ‎06-10-2014

Yes - I plan to use alternative names; I just used them as there is some privacy issues regarding the data, so needed to be "general"! Thanks again.

toneill · ‎06-10-2014

New, and slightly reluctant, SAS user (previously STATA user) SAS dates have me scratching my head a bit. I have a longitudinal dataset, but I am currently only interested in baseline (e.g. first questionnaire) data for which I have created a new dataset. Variables: (a) Date of Baseline Questionnaire: var1 = DDMMYYYY (SAS Informat: DDMMYYYY8) (b) Known Disease Exposure?: var2: yes(1), no(0), don'tknow(88), refused(99) --> ONLY those that responded YES had the next question asked: (c) Date of Disease Exposure: var3 = MM (0-12, 0=unknown, 1=January etc.), var4 = YYYY Goal: Determine "Time Since Exposure" from baseline questionnaire What I'd like to do/think I should be doing: 1. Only analyse those individuals at baseline who responded "YES(1)" to var2 2. Assign/impute the middle of each month (e.g. 15th day) as a new variable to every individual (create: var5) 3. Create a new variable: 'DateExp' (var6) -- this will combine var3-5 in to a single variable represented as: DDMMYYY (equivalent to: var5-var3-var4) - I believe I need to use the MDY function; but the SAS examples are not easily understood (again, being a new user!) 4. Calculate time since exposure (TimeSinceExp=var2-var6/365.25) with an output in Years, rounded to a single decimal (e.g. 5.4 years) Any assistance would be appreciated. I am pondering this away today and will check in tomorrow to see how close (or far off!) my own code is....so far about 20% of my more complicated coding works (which I would say isn't too bad for 2nd week of use). Best & many thanks in advance. -Tyler

Online Status	Offline
Date Last Visited	‎09-01-2015 07:11 AM

GLIMMIX for repeated measures and multinomial (ordered) response

Re: PROC GLIMMIX with Gamma Log-Link and 0 counts

PROC GLIMMIX with Gamma Log-Link and 0 counts

Re: Array error "ERROR: Array subscript out of range at line/column. "

Array error "ERROR: Array subscript out of range at line/column. "

Estimate datdiff from a single variable

Re: Merge 2 data sets results in multiple observations per participant

Re: Merge 2 data sets results in multiple observations per participant

Merge 2 data sets results in multiple observations per participant

Re: Omit observation for proc ttest

Re: MDY & datdiff functions (longitudinal data set)

GLIMMIX for repeated measures and multinomial (ordered) response

Re: PROC GLIMMIX with Gamma Log-Link and 0 counts

PROC GLIMMIX with Gamma Log-Link and 0 counts

Re: Array error "ERROR: Array subscript out of range at line/column. "

Array error "ERROR: Array subscript out of range at line/column. "

Estimate datdiff from a single variable

Re: Merge 2 data sets results in multiple observations per participant

Re: Merge 2 data sets results in multiple observations per participant

Merge 2 data sets results in multiple observations per participant

Re: Omit observation for proc ttest

Omit observation for proc ttest

Subsetting data

Re: MDY & datdiff functions (longitudinal data set)

MDY & datdiff functions (longitudinal data set)