Can you send me the Log and Output from this new attempt?
@jiltao Your request is below. I also attempted some of the options on the document I found online. When I changed the method for estimating the covariance parameters from REML to MIVQUE0 the data set converged. I attached the sheet of tips to help.
Also, do you think it may be because I am missing data for viral load? It reads in the files as "." The dataset is unbalanced and contains missing data. I do not know if that would cause the proc mixed model to not converge.
SAS code
proc sort data = Firstvisit;
by rfa_id testdate year;
RUN;
data avegvl_youth;
set Firstvisit;
keep rfa_id test_year testdate year sex_hars res_at_hiv_dx race_combined hiv_risk agegroup ccd4c ctimeyr days year mn quarter time first_cd4cgp result_vl_log10;
by rfa_id testdate year;
run;
data youth_year;
set avegvl_youth;
if year=1999 then group = "1" ;
else if year = 2000 then group = "2";
else if year = 2001 then group = "3";
else if year = 2002 then group = "4";
else if year = 2003 then group = "5";
else if year = 2004 then group = "6";
else if year = 2005 then group = "7";
else if year = 2006 then group = "8";
else if year = 2007 then group = "9";
else if year = 2008 then group = "10";
else if year = 2009 then group = "11";
else if year = 2010 then group = "12";
else if year = 2011 then group = "13";
else if year = 2012 then group = "14";
else if year = 2013 then group = "15";
else if year = 2014 then group = "16";
else if year = 2015 then group = "17";
else if year = 2016 then group = "18";
else if year = 2017 then group = "19";
else if year = 2018 then group = "20";
else if year = 2019 then group = "21";
else if year = 2020 then group = "22";
rename group = yr;
run;
************************************* 1st Proc Mixed Code *****************************************************;
************************************* Unstructured *****************************************************;
proc sort data = youth_year out = avegvl_youth_sort nodupkey;
by rfa_id testdate year result_vl_log10;
RUN;
PROC MIXED data= avegvl_youth_sort covtest method=reml PLOTS(MAXPOINTS=NONE)noitprint;
class rfa_id sex_hars (ref="F") res_at_hiv_dx (ref="Urban") race_combined (ref="Black")
hiv_risk (ref="MSM") first_cd4cgp (ref="> 500") agegroup(ref="20-24");
model result_vl_log10= year ccd4c ctimeyr sex_hars res_at_hiv_dx race_combined hiv_risk first_cd4cgp
agegroup ccd4c*sex_hars ccd4c*res_at_hiv_dx ccd4c*race_combined ccd4c*hiv_risk ccd4c*first_cd4cgp
ccd4c*agegroup ctimeyr*sex_hars ctimeyr*res_at_hiv_dx ctimeyr*race_combined ctimeyr*hiv_risk
ctimeyr*first_cd4cgp ctimeyr*agegroup/ solution cl outp = pred_un ;
random intercept year / type= un subject=rfa_id;
lsmeans sex_hars res_at_hiv_dx race_combined hiv_risk first_cd4cgp agegroup /diff adjust= tukey;
title 'Mixed Model Results For Viral Load';
RUN;
The output starts here:
581 proc sort data = Firstvisit;
582 by rfa_id testdate year;
583 RUN;
NOTE: There were 57534 observations read from the data set WORK.FIRSTVISIT.
NOTE: The data set WORK.FIRSTVISIT has 57534 observations and 36 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.07 seconds
cpu time 0.07 seconds
584
585 data avegvl_youth;
586 set Firstvisit;
587 keep rfa_id test_year testdate year sex_hars res_at_hiv_dx race_combined hiv_risk agegroup
587! ccd4c ctimeyr days year mn quarter time first_cd4cgp result_vl_log10;
588 by rfa_id testdate year;
589 run;
NOTE: There were 57534 observations read from the data set WORK.FIRSTVISIT.
NOTE: The data set WORK.AVEGVL_YOUTH has 57534 observations and 17 variables.
NOTE: DATA statement used (Total process time):
real time 0.04 seconds
cpu time 0.03 seconds
590
591
592 data youth_year;
593 set avegvl_youth;
594 if year=1999 then group = "1" ;
595 else if year = 2000 then group = "2";
596 else if year = 2001 then group = "3";
597 else if year = 2002 then group = "4";
598 else if year = 2003 then group = "5";
599 else if year = 2004 then group = "6";
600 else if year = 2005 then group = "7";
601 else if year = 2006 then group = "8";
602 else if year = 2007 then group = "9";
603 else if year = 2008 then group = "10";
604 else if year = 2009 then group = "11";
605 else if year = 2010 then group = "12";
606 else if year = 2011 then group = "13";
607 else if year = 2012 then group = "14";
608 else if year = 2013 then group = "15";
609 else if year = 2014 then group = "16";
610 else if year = 2015 then group = "17";
611 else if year = 2016 then group = "18";
612 else if year = 2017 then group = "19";
613 else if year = 2018 then group = "20";
614 else if year = 2019 then group = "21";
615 else if year = 2020 then group = "22";
616 rename group = yr;
617 run;
NOTE: There were 57534 observations read from the data set WORK.AVEGVL_YOUTH.
NOTE: The data set WORK.YOUTH_YEAR has 57534 observations and 18 variables.
NOTE: DATA statement used (Total process time):
real time 0.04 seconds
cpu time 0.03 seconds
618
619 ************************************* 1st Proc Mixed Code
619! *****************************************************;
620 ************************************* Unstructured
620! *****************************************************;
621
622
623 proc sort data = youth_year out = avegvl_youth_sort nodupkey;
624 by rfa_id testdate year result_vl_log10;
625 RUN;
NOTE: There were 57534 observations read from the data set WORK.YOUTH_YEAR.
NOTE: 207 observations with duplicate key values were deleted.
NOTE: The data set WORK.AVEGVL_YOUTH_SORT has 57327 observations and 18 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.04 seconds
cpu time 0.04 seconds
626
627
628 PROC MIXED data= avegvl_youth_sort covtest method=reml PLOTS(MAXPOINTS=NONE)noitprint;
629 class rfa_id sex_hars (ref="F") res_at_hiv_dx (ref="Urban") race_combined
629! (ref="Black")
630 hiv_risk (ref="MSM") first_cd4cgp (ref="> 500") agegroup(ref="20-24");
631 model result_vl_log10= year ccd4c ctimeyr sex_hars res_at_hiv_dx race_combined
631! hiv_risk first_cd4cgp
632 agegroup ccd4c*sex_hars ccd4c*res_at_hiv_dx ccd4c*race_combined ccd4c*hiv_risk
632! ccd4c*first_cd4cgp
633 ccd4c*agegroup ctimeyr*sex_hars ctimeyr*res_at_hiv_dx ctimeyr*race_combined
633! ctimeyr*hiv_risk
634 ctimeyr*first_cd4cgp ctimeyr*agegroup/ solution cl outp = pred_un ;
635 random intercept year / type= un subject=rfa_id;
636 lsmeans sex_hars res_at_hiv_dx race_combined hiv_risk first_cd4cgp agegroup /diff
636! adjust= tukey;
637 title 'Mixed Model Results For Viral Load';
638 RUN;
NOTE: 16629 observations are not included because of missing values.
NOTE: The data set WORK.PRED_UN has 0 observations and 0 variables.
WARNING: Data set WORK.PRED_UN was not replaced because new file is incomplete.
NOTE: PROCEDURE MIXED used (Total process time):
real time 10.39 seconds
cpu time 10.10 seconds
I think your DATA step to compute group from year is wrong. You didn't set the LENGTH of group, so the first assignment sets the length. Depending on the data, that might be one character, which means that the group variable has mostly value "1" because "10" through "19" get truncated to "1".
Use
LENGTH group $2;
or even better use a numeric variable:
group = year-1998;
I saw a couple things in the output that concerned me. Two of your continuous effects had no degrees of freedom in your Type3 tests, and one is year.
Consider replacing
random intercept year / type= un subject=rfa_id;
with a random subject effect and a repeated year effect. With your current random statement, you are not capturing the repeated nature of your data (of course that assumes that you really do have multiple measurements over time on the subjects). I suggest this:
random intercept / subject=rfa_id;
repeated year /subject=rfa_id type=ar(1);
Note that this requires you to treat year as a categorical variable, and so it would have to be added to the CLASS statement.
Then in your model statement, add the option
ddfm=bw
This should enable the degrees of freedom to reflect the repeated nature of the data, and I believe will enable all of your variables to have numerator degrees of freedom for the F test.
SteveDenham
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.