BookmarkSubscribeRSS Feed
ccellberg
Calcite | Level 5

Hello, 

 

My PI suggested I standardize values for a longitudinal dataset we are looking at. There are 4 time points, (one time point during pregnancy, and three time points after) and we are trying to control for the during pregnancy scores on a variable. 

Therefore, we are using a stacked variable that repeats itself 4 times(the value of the first time point) and then including that in the model with the other variable that has four different values representing the four different time points. 

 

Now, my PI has suggested I standardize the values. I am not sure as to what step I should standardize them, because if I do it once I have my long dataset, I will be standardizing each value using the mean of ALL visits. Would it be accurate to do this and then for the "first" variable, only standardize that across the "first" visits?

2 REPLIES 2
ballardw
Super User

Perhaps the "standardization" here means to have an interval value instead of a date?

But no example data makes it very hard to see what you mean by "stacked" or what your current data structure may be. If you could demonstrate what the values might look like without including any personally identifiable data (xxx, yyy for instance to indicate different person identities) and some of the values.

 

If the research question to analyze is phrased in any way like "measure increases/decreases after the end of the pregnancy" I could really see the data being in a form similar to

Person time measure

xxx       0       123

xxx       15     112

xxx       34     111

xxx       66      109

where the "time" is standardized to indicate: 0 = during pregnancy and the others are days (or other time unit) after pregnancy.

 

Assuming some variance in the actual "time" after pregnancy for measurement this would have the affect in a regression of measure change per time unit.

 

Or discuss more with your PI what was actually intended for standardization.

ccellberg
Calcite | Level 5

The PI wants to standardize the variables just so that they are more meaningful..

What the data looks like in the long format is this: 

Participant         Time             variable 1     variable 2    variable 1(first) 

1                   1(pregnancy)          X1             Y                   X1

1                   2                              X            Y                     X1

1                    3                             X            Y                    X1 

1                     4                            X             Y                    X1 

 

So we are including the variable 1(first) to control for things that happened during time point 1. 

Does that help clear up my question?

 

So I'm confused if I should standardize during this point or before when it was in the wide format like this:

 

Participant      Variable1_Time1 Variable1_Time2 Variable1_Time3 Variable1_Time4 Variable2_Time1 ..etc.

 

1                         X1                      X                         X                                  X                         Y

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1316 views
  • 0 likes
  • 2 in conversation