06-09-2018 08:22 PM
My PI suggested I standardize values for a longitudinal dataset we are looking at. There are 4 time points, (one time point during pregnancy, and three time points after) and we are trying to control for the during pregnancy scores on a variable.
Therefore, we are using a stacked variable that repeats itself 4 times(the value of the first time point) and then including that in the model with the other variable that has four different values representing the four different time points.
Now, my PI has suggested I standardize the values. I am not sure as to what step I should standardize them, because if I do it once I have my long dataset, I will be standardizing each value using the mean of ALL visits. Would it be accurate to do this and then for the "first" variable, only standardize that across the "first" visits?
06-11-2018 07:39 PM
Perhaps the "standardization" here means to have an interval value instead of a date?
But no example data makes it very hard to see what you mean by "stacked" or what your current data structure may be. If you could demonstrate what the values might look like without including any personally identifiable data (xxx, yyy for instance to indicate different person identities) and some of the values.
If the research question to analyze is phrased in any way like "measure increases/decreases after the end of the pregnancy" I could really see the data being in a form similar to
Person time measure
xxx 0 123
xxx 15 112
xxx 34 111
xxx 66 109
where the "time" is standardized to indicate: 0 = during pregnancy and the others are days (or other time unit) after pregnancy.
Assuming some variance in the actual "time" after pregnancy for measurement this would have the affect in a regression of measure change per time unit.
Or discuss more with your PI what was actually intended for standardization.
06-11-2018 08:07 PM
The PI wants to standardize the variables just so that they are more meaningful..
What the data looks like in the long format is this:
Participant Time variable 1 variable 2 variable 1(first)
1 1(pregnancy) X1 Y X1
1 2 X Y X1
1 3 X Y X1
1 4 X Y X1
So we are including the variable 1(first) to control for things that happened during time point 1.
Does that help clear up my question?
So I'm confused if I should standardize during this point or before when it was in the wide format like this:
Participant Variable1_Time1 Variable1_Time2 Variable1_Time3 Variable1_Time4 Variable2_Time1 ..etc.
1 X1 X X X Y