BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rogersaj
Obsidian | Level 7

Hi,

 

Dependent variable = gestational age at first clinic visit (continuous) = p1v1_gest

Independent variable = village distance from clinic = distcat

I'd like to know if the gestational age at first clinic visit varies LINEARLY with village distance from clinic.

 

Code:

proc glm data = ...;

class distcat;

model p1v1_gest = distcat / solution;

estimate "Linear trend for distcat" distcat -3 -1 1 3;

contrast 'linear' distcat -3 -1 1 3;

run;

Results: (see attached)

 

I have the following questions (If you can only answer one, the first is most important!):

  1. Can you please help me interpet the output with as regards the "Linear trend for distcat"?
  2. How does the "estimate" result differ from the "contrast" result in terms of interpetation?
  3. Was the use of -3 -1 1 3 appropriate or am I supposed to put in the median values and make them sum to zero?
  4. Once I add in covariates into the model, how will that change my interpretation?
  5. If, instead of distance, I had three marital status categories (Single, Currently Married, Previously Married) could I still test for a linear trend? If so, how would that interpretation work?

Thanks SO much!

 

AJ

 

1 ACCEPTED SOLUTION

Accepted Solutions
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

This SAS note discusses the selection of coefficients for linear trend

http://support.sas.com/kb/22/912.html

 

Note that {-3, -1, 1, 3} are appropriate for a categorical factor with 4 equally spaced levels that are in order from smallest to largest ordinal value. Based on the labels for distcat in your output, it's not apparent that the levels of distcat are evenly spaced, or in fact, what value should be associated with each level of distcat. And the levels clearly are not in order, which is the point made by @PGStats.

 

The ESTIMATE and CONTRAST statements both test H0: no linear trend: the p-values are the same, and t^2 for ESTIMATE is equal to F for CONTRAST. In other words, they are the same test. The Estimate value reported by ESTIMATE is the linear combination of the distcat means using the specified coefficients; notably, it is not an estimate of the slope.

 

I won't speculate about the impact of adding covariates. It depends on what model you specify, for example, whether you include interaction between distcat and covariates.

 

It's not obvious to me that Single, Currently Married, and Previously Married are ordered values. Linear trend makes sense for only ordered values.

 

If you have actual distance values, you could and most likely should use those values in a regression model rather than categorizing distance to use in an ANOVA model. You needlessly give up information by categorizing, not to mention the arbitrary aspect of deciding how many categories and what cutpoints.

 

HTH

View solution in original post

2 REPLIES 2
PGStats
Opal | Level 21

Your output seems to indicate that your distcat levels are not ordered properly in your tests to represent a linear effect. Specifying option ORDER=DATA in the proc statement might help you solve this problem. Anyway, you should use the E option in your contrast and estimate statements to check the ordering of distcat levels.

PG
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

This SAS note discusses the selection of coefficients for linear trend

http://support.sas.com/kb/22/912.html

 

Note that {-3, -1, 1, 3} are appropriate for a categorical factor with 4 equally spaced levels that are in order from smallest to largest ordinal value. Based on the labels for distcat in your output, it's not apparent that the levels of distcat are evenly spaced, or in fact, what value should be associated with each level of distcat. And the levels clearly are not in order, which is the point made by @PGStats.

 

The ESTIMATE and CONTRAST statements both test H0: no linear trend: the p-values are the same, and t^2 for ESTIMATE is equal to F for CONTRAST. In other words, they are the same test. The Estimate value reported by ESTIMATE is the linear combination of the distcat means using the specified coefficients; notably, it is not an estimate of the slope.

 

I won't speculate about the impact of adding covariates. It depends on what model you specify, for example, whether you include interaction between distcat and covariates.

 

It's not obvious to me that Single, Currently Married, and Previously Married are ordered values. Linear trend makes sense for only ordered values.

 

If you have actual distance values, you could and most likely should use those values in a regression model rather than categorizing distance to use in an ANOVA model. You needlessly give up information by categorizing, not to mention the arbitrary aspect of deciding how many categories and what cutpoints.

 

HTH

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 3821 views
  • 3 likes
  • 3 in conversation