Hello,
I have three variables a,b, and c. Variable c is obtained by dividing a by b and multiplying by 100, in other words c=a/b*100.
Can I calculate a correlation or even a linear regression where the dependent variable is c and the explanatory variable is b; given that variable b was involved in calculating variable c? variables a and b are measured in a continuous scale.
Appreciate your help.
Yes, you can calculate a correlation.
Yes, you can calculate a regression.
Why don't you give it a try?
Appreciate your reply. So I did run a regression and a correlation, but the question now is that, are the results/estimate valid given that one of the variables being compared was used in calculating the other variable being compared with?
@Jep wrote:
Appreciate your reply. So I did run a regression and a correlation, but the question now is that, are the results/estimate valid given that one of the variables being compared was used in calculating the other variable being compared with?
Since we really know nothing about the problem, and we also don't know what use the correlation/regression will be put to, the answer is: "It depends".
I hope a scatter plot of (a[i], b[i]) shows a quadratic relationship that passes near the origin. That is what your model says:
Find the least-squares values (beta0, beta1)such that
100*a[i] / b[i] ~ beta0 + beta1 * b[i]
or
a[] ~ gamma0*b[i] + gamma1*b[i]^2
Thus your model assumes a quadratic relationship (no intercept) between a and b. IMHO, it would be clearer to drop c and write the model as
MODEL a = b b*b;
Thanks for your reply. The scatter plot for a and b actually shows a linear positive relationship. But when I do a scatter plot for b and c, it shows a linear negative relationship or correlation. So the problem that I have is that is the estimate obtained by running a correlation or a linear regression between variable's b and c valid given that variable b was part of the variable used to obtain/calculate variable c?
You can calculate a correlation between any numeric variables that have 2 or more pairs of values.
The interpretation may be pretty difficult if there isn't much of a natural relationship such a number of hairs on your head and your first grade teachers dress size.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.