10-22-2017 05:21 AM
I've got a question about combining and analyzing my data. I will try to explain it as good as possible:
I've got 4000 subjects, and for each subject 200 different variables with food intake. For each variable of food intake, I have three different values, which I have to (separately) multiply with the variable to create new variables. So at the end, for each subject, I would have 600 more variables?
subject 1: 50
Then I have to multiply the 50 g/d potato with 0,55 , 0,46 and 0,83 into three new variables. And repeat this for the 3999 other subjects. However, the next variable, bread needs to be multiplied with three values as well, but these values are different values as mentioned above.. And is there a procedure which I can use to do it faster?
At the moment I have a wide data set, but I guess I should make it a long data set. So I would have 3 observations for each subject, but the observations would be the same?
I really hope someone can help me.
10-22-2017 07:56 AM
I'm having a hard time envisioning your data. But my first thought would be to make it into a long narrow dataset with 200 records per subject. So you would have 4,000 subjects * 200 foods = 800,000 records.
It would look like:
Subject Food Intake Factor1 Factor2 Factor3 1 Potato 50 .55 .46 .83 1 Milk 75 .33 .68 .99 1 Honey 25 .02 .34 .33 ...
With a structure like that, it should be straight forward to make new variables. In general, working with long narrow datasets is easier than wide datasets.
You could even take it further, and transpose each of the above records into three records (one for each factor). So your variables would be Subject Food Intake FactorID (1-3) and Factor. But depending one what you will be doing with these data, that might be overkill.
10-22-2017 10:49 AM