Hi there.
Can I please ask your advise on how can I find out if households in group X are made up of older/younger adults and children than households in group Y e.g. are adults in X an average of 50yrs old, whilst Y families are 40yrs old?
I would do an ANOVA between X and Y but I am stuck on how I should aggregate my data at the household level. Which approach is appropriate? Or are they both incorrect?
Your insight is very much appreciated.
Approach A | Approach B | |||||||
Household | MemberID | Membership | Age | Group | Average Household Adult Age | Average Household Children Age | Total adult's age | Total children's age |
1 | 1 | Adult | 56 | X | 42.8 | 10 | 214 | 10 |
1 | 2 | Adult | 21 | X | ||||
1 | 3 | Adult | 70 | X | ||||
1 | 4 | Adult | 23 | X | ||||
1 | 5 | Child | 10 | X | ||||
1 | 6 | Adult | 44 | X | ||||
. | . | . | . | . | ||||
. | . | . | . | . | ||||
. | . | . | . | . | ||||
256 | 1 | Adult | 88 | X | 88 | 11 | 88 | 22 |
256 | 2 | Child | 7 | X | ||||
256 | 3 | Child | 15 | X | ||||
100 | 1 | Adult | 53 | Y | 37 | 0 | 112 | 0 |
100 | 2 | Adult | 34 | Y | ||||
100 | 3 | Adult | 25 | Y | ||||
. | . | . | . | . | ||||
. | . | . | . | . | ||||
. | . | . | . | . | ||||
300 | 1 | Adult | 34 | Y | 32 | 4 | 64 | 4 |
300 | 2 | Adult | 30 | Y | ||||
300 | 3 | Child | 4 | Y | ||||
Approach A: | ||||||||
The total number of household in group X is 256. | ||||||||
Hence, the average for household in group X is thus | ||||||||
Adult: (42.8+…+88)/256 | ||||||||
Children: (10+…+11)/256 | ||||||||
Approach B: | ||||||||
Hence, the average for household in group X is thus | ||||||||
Adult: (214+…+88)/256 | ||||||||
Children: (10+…+22)/256 |
proc sql;
create table want as select group, mean(age) from have group by group;
quit;
Is you metric of interest average adults household age? Or average age of adults in all households?
Thats the difference in your metrics.
One is correct for your purposes - but it depends on your purpose.
Also, maybe you should use a different type,of analysis that can handle the variable number of adults per household.
Sorry @Reeza. I am not too sure.
However, my interest is to find out whether
1) households in group X are made up of younger or older adults than households in group Y.
2) households in group X are made up of younger or older children than households in group Y.
I need to produce an average and std error for both group X and Y and p-value for test of difference for (1) and (2) if that sounds correct?
Also is model-based analysis using the individual-level data more appropriate?
- proc mixed/genmod with random effect or repeated subject statement being household and covariate being the interaction term of membership and group in the model?
- proc surveyreg with cluster being household, domain being the membership and group as the covariate?
Thank you.
Since it is Lift Time(age) data ,which does not conform to Normal distribution,
and you can not apply it into ANOVA. So I use Gamma distribution + LOG link function,
since age is not censored .
Check the example of GENMOD:
Example 44.3: Gamma Distribution Applied to Life Data
data have;
call streaminit(12345678);
do household=1 to 200;
do id=1 to 200;
age=ceil(80*rand('uniform'));
member=ifc(age lt 18,'Child','Adult');
group=ifc(rand('bern',0.5)=0,'X','Y');
output;
end;
end;
run;
proc genmod data=have;
class member group;
model age=member group household member*group
/dist=gamma link=log type3 ;
lsmeans member*group/ ilink exp diff cl;
effectplot interaction(x=group sliceby=member);
run;
OUTPUT:
Differences of member*group Least Squares Means
member group _member _group Estimate Standard Error z Value Pr > |z| Alpha Lower Upper Exponentiated Exponentiated Lower Exponentiated Upper
Adult X Adult Y 0.003689 0.005204 0.71 0.4784 0.05 -0.00651 0.01389 1.0037 0.9935 1.0140
Adult X Child X 1.6941 0.008011 211.47 <.0001 0.05 1.6784 1.7098 5.4419 5.3571 5.5280
Adult X Child Y 1.6987 0.008052 210.97 <.0001 0.05 1.6830 1.7145 5.4670 5.3814 5.5540
Adult Y Child X 1.6904 0.008010 211.05 <.0001 0.05 1.6747 1.7061 5.4218 5.3374 5.5076
Adult Y Child Y 1.6950 0.008050 210.56 <.0001 0.05 1.6793 1.7108 5.4469 5.3616 5.5335
Child X Child Y 0.004613 0.01009 0.46 0.6477 0.05 -0.01517 0.02440 1.0046 0.9849 1.0247
Hi @Ksharp.
Thank you for yor reply.
I am not familiar with modelling by gamma with log-link.
However, I would have thought stating household in the repeated subject statement instead to account for the clustering.
And are the p-value test for difference also correct on the original scale of the response variable? Since as far as I understand it is testing the response variable on the log scale.
Thank you again.
If it was repeated measure ,then you should use NLMIXED , I will leave it to @Steave .
" are the p-value test for difference also correct on the original scale of the response variable?"
Yes. I used ilink option, if you want see real mean value ,add option mean in it .
lsmeans member*group/ ilink exp diff cl mean ;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.