Hi,
This is a naive question, and it's definitely time for me to revisit stat101...
I collected health utility data (range 0-1) from a cohort of patients, and there is no control group in this study. I would like to compare the data collected against 1 published literature controlling for age and sex. However, the published literature only has summary statistics, i.e. mean and standard deviation, and there is also no information on the shape of its data distribution (could be normal, or could be beta?)
If we move beyond the point of whether it's appropriate to perform such a comparison (ie. ideally, a control group should have been recruited at the same time). May I know what approach and proc to test whether my data is different from the published literature?
Thank you.
example:
data sample;
input ID age sex health_u;
cards;
1 18 1 0.75
2 22 1 0.6
3 40 2 0.88
4 50 1 0.65
5 35 2 0.9
6 51 2 0.6
7 33 1 0.8
;
example published literature to compare:
age group "16-25" sex=1 mean_health_u=0.76 stdev=0.1;
age group "16-25" sex=2 mean_health_u=0.71 stdev=0.06;
age group "26-35" sex=1 mean_health_u=0.8 stdev=0.10;
age group "26-35" sex=2 mean_health_u=0.6 stdev=0.08;...
You can test one sample at a time against a hypothetical mean value with proc ttest.
Example:
proc ttest data=sample h0=0.76; where (16 le age le 25) and sex=1; var health_u; run;
You would need a separate run for each hypothesized mean value with appropriate selection for age and sex as there really isn't any way I am aware of to provide different means, the h0 value, with a single run of the procedure.
If your data and the comparison sample have at least 30 subjects in each age/sex group you shouldn't be in too much trouble. I note that your shown "literature" does not include a sample size. So that might be a concern.
Have you looked at your standard deviations by the same groups?
That would be easy, at least for an eyeball comparison, with:
proc format; value agegroup 16 - 25 = '16 to 25' 26 - 35 = '26 to 35' ; proc means data=sample mean std; class age sex; format age agegroup.; var health_u; run;
Formats will create groups honored by most analysis, reporting and graphing procedures. So this should show where your mean and std are at least in the ball park.
If you want to determine if your data distribution is the same then you need much more information from the literature and is likely not to be forthcoming.
You can test one sample at a time against a hypothetical mean value with proc ttest.
Example:
proc ttest data=sample h0=0.76; where (16 le age le 25) and sex=1; var health_u; run;
You would need a separate run for each hypothesized mean value with appropriate selection for age and sex as there really isn't any way I am aware of to provide different means, the h0 value, with a single run of the procedure.
If your data and the comparison sample have at least 30 subjects in each age/sex group you shouldn't be in too much trouble. I note that your shown "literature" does not include a sample size. So that might be a concern.
Have you looked at your standard deviations by the same groups?
That would be easy, at least for an eyeball comparison, with:
proc format; value agegroup 16 - 25 = '16 to 25' 26 - 35 = '26 to 35' ; proc means data=sample mean std; class age sex; format age agegroup.; var health_u; run;
Formats will create groups honored by most analysis, reporting and graphing procedures. So this should show where your mean and std are at least in the ball park.
If you want to determine if your data distribution is the same then you need much more information from the literature and is likely not to be forthcoming.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.