Re: Average age of household adults and children between group X and Y...

Miracle · Posted 10-06-2016 02:12 AM

Hi there.

Can I please ask your advise on how can I find out if households in group X are made up of older/younger adults and children than households in group Y e.g. are adults in X an average of 50yrs old, whilst Y families are 40yrs old?

I would do an ANOVA between X and Y but I am stuck on how I should aggregate my data at the household level. Which approach is appropriate? Or are they both incorrect?

Your insight is very much appreciated.

					Approach A	Approach B
Household	MemberID	Membership	Age	Group	Average Household Adult Age	Average Household Children Age	Total adult's age	Total children's age
1	1	Adult	56	X	42.8	10	214	10
1	2	Adult	21	X
1	3	Adult	70	X
1	4	Adult	23	X
1	5	Child	10	X
1	6	Adult	44	X
.	.	.	.	.
.	.	.	.	.
.	.	.	.	.
256	1	Adult	88	X	88	11	88	22
256	2	Child	7	X
256	3	Child	15	X
100	1	Adult	53	Y	37	0	112	0
100	2	Adult	34	Y
100	3	Adult	25	Y
.	.	.	.	.
.	.	.	.	.
.	.	.	.	.
300	1	Adult	34	Y	32	4	64	4
300	2	Adult	30	Y
300	3	Child	4	Y

Approach A:
The total number of household in group X is 256.
Hence, the average for household in group X is thus
Adult: (42.8+…+88)/256
Children: (10+…+11)/256

Approach B:
Hence, the average for household in group X is thus
Adult: (214+…+88)/256
Children: (10+…+22)/256

Kurt_Bremser · Posted 10-06-2016 02:21 AM

proc sql;
create table want as select group, mean(age) from have group by group;
quit;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Reeza · Posted 10-06-2016 02:26 AM

Is you metric of interest average adults household age? Or average age of adults in all households?

Thats the difference in your metrics.

One is correct for your purposes - but it depends on your purpose.

Also, maybe you should use a different type,of analysis that can handle the variable number of adults per household.

Miracle · Posted 10-06-2016 12:22 PM

Sorry @Reeza. I am not too sure.

However, my interest is to find out whether

1) households in group X are made up of younger or older adults than households in group Y.

2) households in group X are made up of younger or older children than households in group Y.

I need to produce an average and std error for both group X and Y and p-value for test of difference for (1) and (2) if that sounds correct?

Also is model-based analysis using the individual-level data more appropriate?

- proc mixed/genmod with random effect or repeated subject statement being household and covariate being the interaction term of membership and group in the model?

- proc surveyreg with cluster being household, domain being the membership and group as the covariate?

Thank you.

Ksharp · Posted 10-06-2016 05:47 AM

Since it is Lift Time(age) data ,which does not conform to Normal distribution,

and you can not apply it into ANOVA. So I use Gamma distribution + LOG link function,

since age is not censored .

Check the example of GENMOD:

Example 44.3: Gamma Distribution Applied to Life Data

data have;
call streaminit(12345678);
 do household=1 to 200;
  do id=1 to 200;
   age=ceil(80*rand('uniform'));
   member=ifc(age lt 18,'Child','Adult');
   group=ifc(rand('bern',0.5)=0,'X','Y');
   output;
  end;
 end;
run;
  
  
proc genmod data=have;
class member group;
model age=member group household member*group
          /dist=gamma link=log type3 ;
lsmeans member*group/ ilink exp diff cl;
effectplot interaction(x=group sliceby=member);
run;

OUTPUT:

Differences of member*group Least Squares Means
member	group	_member	_group	Estimate	Standard Error	z Value	Pr > |z|	Alpha	Lower	Upper	Exponentiated	Exponentiated Lower	Exponentiated Upper
Adult	X	Adult	Y	0.003689	0.005204	0.71	0.4784	0.05	-0.00651	0.01389	1.0037	0.9935	1.0140
Adult	X	Child	X	1.6941	0.008011	211.47	<.0001	0.05	1.6784	1.7098	5.4419	5.3571	5.5280
Adult	X	Child	Y	1.6987	0.008052	210.97	<.0001	0.05	1.6830	1.7145	5.4670	5.3814	5.5540
Adult	Y	Child	X	1.6904	0.008010	211.05	<.0001	0.05	1.6747	1.7061	5.4218	5.3374	5.5076
Adult	Y	Child	Y	1.6950	0.008050	210.56	<.0001	0.05	1.6793	1.7108	5.4469	5.3616	5.5335
Child	X	Child	Y	0.004613	0.01009	0.46	0.6477	0.05	-0.01517	0.02440	1.0046	0.9849	1.0247

Miracle · Posted 10-06-2016 12:40 PM

Hi @Ksharp.

Thank you for yor reply.

I am not familiar with modelling by gamma with log-link.

However, I would have thought stating household in the repeated subject statement instead to account for the clustering.

And are the p-value test for difference also correct on the original scale of the response variable? Since as far as I understand it is testing the response variable on the log scale.

Thank you again.

Ksharp · Posted 10-06-2016 10:45 PM

If it was repeated measure ,then you should use NLMIXED , I will leave it to @Steave .

" are the p-value test for difference also correct on the original scale of the response variable?"

Yes. I used ilink option, if you want see real mean value ,add option mean in it .

lsmeans member*group/ ilink exp diff cl mean ;