Hello! We randomly assigned couples to one of three bank account conditions (IV = Joint, Separate, No-Guidance) and measured relationship quality (DV) across six data points. Our primary model is a dyadic growth curve model with distinguishable dyads (male vs. female partners). We observe significant differences in relationship trajectories as a function of bank account structure. I am trying to assess whether household income moderates these differences in trajectories. In other words, I'm interested in the three-way interaction terms between the two bank account dummy variables, time, and household income.
The data is in a person period format where each couple has 12 lines of data (6 time points x 2 partners = 12 observations). For household income, I averaged each partner's estimates at Time 1 and used that value across all 12 rows of data per couple. I centered household income across the entire person period data set before creating interaction terms.
Here are relevant variables, followed by syntax. ContrastS and ConstrastN are the two dummy codes comparing Joint couples to Separate couples (ContrastS) and No-Guidance couples (ContrastN). I then created male and female version of all parameters since this is a dual-intercept model.
if BankManip = 1 then ConstrastS = 1; if BankManip = 2 then ConstrastS = 0; if BankManip = 3 then ConstrastS = 0; if BankManip = 1 then ConstrastN = 0; if BankManip = 2 then ConstrastN = 0; if BankManip = 3 then ConstrastN = 1; SepM = Male*ConstrastS; SepF = Female*ConstrastS; NoGdM = Male*ConstrastN; NoGdF = Female*ConstrastN; MonthsM = Male*MonthsE; MonthsF = Female*MonthsE; SepMnthM = Male*ConstrastS*MonthsE; SepMnthF = Female*ConstrastS*MonthsE; NoGMnthM = Male*ConstrastN*MonthsE; NoGMnthF = Female*ConstrastN*MonthsE; HHM = Male*AvgHHC; HHF = Female*AvgHHC; HHSepM = Male*ConstrastS*AvgHHC; HHSepF = Female*ConstrastS*AvgHHC; HHNoGdM = Male*ConstrastN*AvgHHC; HHNoGdF = Female*ConstrastN*AvgHHC; HHSepMnthM = Male*ConstrastS*MonthsE*AvgHHC; HHSepMnthF = Female*ConstrastS*MonthsE*AvgHHC; HHNoGMnthM = Male*ConstrastN*MonthsE*AvgHHC; HHNoGMnthF = Female*ConstrastN*MonthsE*AvgHHC; run; PROC MIXED COVTEST METHOD=REML; CLASS CoupID TimeC Gender; MODEL ZRelationshipWB = Male Female SepM SepF NoGdM NoGdF MonthsM MonthsF SepMnthM SepMnthF NoGMnthM NoGMnthF HHM HHF HHSepM HHSepF HHNoGdM HHNoGdF HHSepMnthM HHSepMnthF HHNoGMnthM HHNoGMnthF/ SOLUTION DDFM=SATTERTH NOINT; RANDOM Male Female MonthsM MonthsF/ G GCORR SUBJECT=CoupID TYPE=UN; REPEATED Gender / SUBJECT=CoupID*TimeC TYPE=csh; WHERE StrictCompliance = 1; ESTIMATE 'Intercept ' Male .5 Female .5; ESTIMATE 'Separate' SepM .5 SepF .5; ESTIMATE 'NoGuide' NoGdM .5 NoGdF .5; ESTIMATE 'Months' MonthsM .5 MonthsF .5; ESTIMATE 'SepXMonth' SepMnthM .5 SepMnthF .5; ESTIMATE 'NoGdXMonth' NoGMnthM .5 NoGMnthF .5; ESTIMATE 'HH' HHM .5 HHF .5; ESTIMATE 'HHXSeparate' HHSepM .5 HHSepF .5; ESTIMATE 'HHXNoGuide' HHNoGdM .5 HHNoGdF .5; ESTIMATE 'HHXSepXMonth' HHSepMnthM .5 HHSepMnthF .5; ESTIMATE 'HH*NoGdXMonth' HHNoGMnthM .5 HHNoGMnthF .5; RUN;
I observed no gender differences, so collapsed across male/female parameters. Here are the key results (note there is no HHxMonths term since I forced income to be constant across time within each couple):
I don't know why the three-way terms are yielding infinity t-values and SEs of zero.
I tried removing the random intercepts and then removing the random slopes, but I get similar results.
I also tried using a different variable for household income. Rather than having it be the same value for both partners over time, I allowed it to vary for each partner over time. I grand-mean centered the variable before running the model. Here are those results. HHxMonths is "non-est" and the three-way terms are still "infinity."
Any suggestions for what's happening? Thanks!
P.S. the model runs when I use a Z-scored version of household income... but I'd like to understand what's going on with income when it's centered in the model.
I am glad the Z score method worked for you, as it makes my original guess as to the issue less likely. That guess is that you have overspecified the model, such that there is enough collinearity that the SWEEP operator ends up with no variability "remaining" for those terms. Is that at all likely?
SteveDenham
It's possible, as we're running dyadic growth curves with lots of parameters.
Here's my best guess, after some additional digging and alternate specifications of income - the (wide) range of possible values. With centering, the range is quite large across couples and so REML struggles; with z-scoring, the range is constrained and the model gives us estimates. Thus, I tried two other transformations of income (before centering): 1) I divided each couples’ income by $1,000 to reduce the range of possible values for REML and 2) I log-transformed income to reduce positive skew and better approximate a normal distribution (i.e., a few couples have very high incomes). These two methods yield estimates, because, the range of possible values is smaller.
Would this be a convincing explanation/approach for reviewers? I want to make sure I'm fairly testing the possibility of moderation by income.
I believe you have identified the source of ill-conditioning, and the log transform is often used with income levels, as the data is likely long-tailed to the right. You could fit a lognormal distribution using PROC GLIMMIX, but the results should be nearly the same.
SteveDenham
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.