Hi,
Dependent variable = gestational age at first clinic visit (continuous) = p1v1_gest
Independent variable = village distance from clinic = distcat
I'd like to know if the gestational age at first clinic visit varies LINEARLY with village distance from clinic.
Code:
proc glm data = ...;
class distcat;
model p1v1_gest = distcat / solution;
estimate "Linear trend for distcat" distcat -3 -1 1 3;
contrast 'linear' distcat -3 -1 1 3;
run;
Results: (see attached)
I have the following questions (If you can only answer one, the first is most important!):
Thanks SO much!
AJ
This SAS note discusses the selection of coefficients for linear trend
http://support.sas.com/kb/22/912.html
Note that {-3, -1, 1, 3} are appropriate for a categorical factor with 4 equally spaced levels that are in order from smallest to largest ordinal value. Based on the labels for distcat in your output, it's not apparent that the levels of distcat are evenly spaced, or in fact, what value should be associated with each level of distcat. And the levels clearly are not in order, which is the point made by @PGStats.
The ESTIMATE and CONTRAST statements both test H0: no linear trend: the p-values are the same, and t^2 for ESTIMATE is equal to F for CONTRAST. In other words, they are the same test. The Estimate value reported by ESTIMATE is the linear combination of the distcat means using the specified coefficients; notably, it is not an estimate of the slope.
I won't speculate about the impact of adding covariates. It depends on what model you specify, for example, whether you include interaction between distcat and covariates.
It's not obvious to me that Single, Currently Married, and Previously Married are ordered values. Linear trend makes sense for only ordered values.
If you have actual distance values, you could and most likely should use those values in a regression model rather than categorizing distance to use in an ANOVA model. You needlessly give up information by categorizing, not to mention the arbitrary aspect of deciding how many categories and what cutpoints.
HTH
Your output seems to indicate that your distcat levels are not ordered properly in your tests to represent a linear effect. Specifying option ORDER=DATA in the proc statement might help you solve this problem. Anyway, you should use the E option in your contrast and estimate statements to check the ordering of distcat levels.
This SAS note discusses the selection of coefficients for linear trend
http://support.sas.com/kb/22/912.html
Note that {-3, -1, 1, 3} are appropriate for a categorical factor with 4 equally spaced levels that are in order from smallest to largest ordinal value. Based on the labels for distcat in your output, it's not apparent that the levels of distcat are evenly spaced, or in fact, what value should be associated with each level of distcat. And the levels clearly are not in order, which is the point made by @PGStats.
The ESTIMATE and CONTRAST statements both test H0: no linear trend: the p-values are the same, and t^2 for ESTIMATE is equal to F for CONTRAST. In other words, they are the same test. The Estimate value reported by ESTIMATE is the linear combination of the distcat means using the specified coefficients; notably, it is not an estimate of the slope.
I won't speculate about the impact of adding covariates. It depends on what model you specify, for example, whether you include interaction between distcat and covariates.
It's not obvious to me that Single, Currently Married, and Previously Married are ordered values. Linear trend makes sense for only ordered values.
If you have actual distance values, you could and most likely should use those values in a regression model rather than categorizing distance to use in an ANOVA model. You needlessly give up information by categorizing, not to mention the arbitrary aspect of deciding how many categories and what cutpoints.
HTH
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.