BookmarkSubscribeRSS Feed
Miracle
Barite | Level 11

Hi there.

Can I please ask your advise on how can I find out if households in group X are made up of older/younger adults and children than households in group Y e.g. are adults in X an average of 50yrs old, whilst Y families are 40yrs old?

I would do an ANOVA between X and Y but I am stuck on how I should aggregate my data at the household level. Which approach is appropriate? Or are they both incorrect?

Your insight is very much appreciated.

 

     Approach AApproach B
HouseholdMemberIDMembershipAgeGroupAverage Household Adult AgeAverage Household Children AgeTotal adult's ageTotal children's age
11Adult56X42.81021410
12Adult21X
13Adult70X
14Adult23X
15Child10X
16Adult44X
.....    
.....    
.....    
2561Adult88X88118822
2562Child7X
2563Child15X
1001Adult53Y3701120
1002Adult34Y
1003Adult25Y
.....    
.....    
.....    
3001Adult34Y324644
3002Adult30Y
3003Child4Y
         
Approach A:     
The total number of household in group X is 256.    
Hence, the average for household in group X is thus       
Adult: (42.8+…+88)/256       
Children: (10+…+11)/256    
         
Approach B:     
Hence, the average for household in group X is thus     
Adult: (214+…+88)/256     
Children: (10+…+22)/256    
6 REPLIES 6
Reeza
Super User

Is you metric of interest average adults household age? Or average age of adults in all households? 

Thats the difference in your metrics. 

 

One is correct for your purposes - but it depends on your purpose. 

 

Also, maybe you should use a different type,of analysis that can handle the variable number of adults per household. 

Miracle
Barite | Level 11

Sorry @Reeza. I am not too sure. 

 

However, my interest is to find out whether

1) households in group X are made up of younger or older adults than households in group Y.

2) households in group X are made up of younger or older children than households in group Y.

I need to produce an average and std error for both group X and Y and p-value for test of difference for (1) and (2) if that sounds correct? 

 

Also is model-based analysis using the individual-level data more appropriate?

- proc mixed/genmod with random effect or repeated subject statement being household and covariate being the interaction term of membership and group in the model?

- proc surveyreg with cluster being household, domain being the membership and group as the covariate?

 

Thank you.

Ksharp
Super User

Since it is Lift Time(age) data ,which does not conform to Normal distribution,

and you can not apply it into ANOVA. So I use Gamma distribution + LOG link function,

since age is not censored . 

Check the example of GENMOD:

Example 44.3: Gamma Distribution Applied to Life Data

 

 

 

data have;
call streaminit(12345678);
 do household=1 to 200;
  do id=1 to 200;
   age=ceil(80*rand('uniform'));
   member=ifc(age lt 18,'Child','Adult');
   group=ifc(rand('bern',0.5)=0,'X','Y');
   output;
  end;
 end;
run;
  
  
proc genmod data=have;
class member group;
model age=member group household member*group
          /dist=gamma link=log type3 ;
lsmeans member*group/ ilink exp diff cl;
effectplot interaction(x=group sliceby=member);
run;

 

 

OUTPUT:

Differences of member*group Least Squares Means
member	group	_member	_group	Estimate	Standard Error	z Value	Pr > |z|	Alpha	Lower	Upper	Exponentiated	Exponentiated Lower	Exponentiated Upper
Adult	X	Adult	Y	0.003689	0.005204	0.71	0.4784	0.05	-0.00651	0.01389	1.0037	0.9935	1.0140
Adult	X	Child	X	1.6941	0.008011	211.47	<.0001	0.05	1.6784	1.7098	5.4419	5.3571	5.5280
Adult	X	Child	Y	1.6987	0.008052	210.97	<.0001	0.05	1.6830	1.7145	5.4670	5.3814	5.5540
Adult	Y	Child	X	1.6904	0.008010	211.05	<.0001	0.05	1.6747	1.7061	5.4218	5.3374	5.5076
Adult	Y	Child	Y	1.6950	0.008050	210.56	<.0001	0.05	1.6793	1.7108	5.4469	5.3616	5.5335
Child	X	Child	Y	0.004613	0.01009	0.46	0.6477	0.05	-0.01517	0.02440	1.0046	0.9849	1.0247
Miracle
Barite | Level 11

Hi @Ksharp.

Thank you for yor reply.

I am not familiar with modelling by gamma with log-link.

However, I would have thought stating household in the repeated subject statement instead to account for the clustering.

And are the p-value test for difference also correct on the original scale of the response variable? Since as far as I understand it is testing the response variable on the log scale. 

Thank you again.

 

 

Ksharp
Super User

If it was repeated measure ,then you should use NLMIXED , I will leave it to @Steave .

 

" are the p-value test for difference also correct on the original scale of the response variable?"

Yes. I used ilink option, if you want see real mean value ,add option mean in it .

 

 

lsmeans member*group/ ilink exp diff cl mean ;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1538 views
  • 2 likes
  • 4 in conversation