10-16-2014 11:05 AM
Hi all, I'm hoping you can help me with my data issue. I have data with tens of thousands of rows per participant (repeated measurements
), and I need to convert it into a single row. I would appreciate any suggestions! I have read the information on transposing, but that's not quite what I want to do.
A data file for one participant is in a format like this (but obviously has many more rows):
Time VariableA VariableB VariableC
1 3 2 9
2 8 5 11
3 2 7 5
And I need to convert it to a format like this, retaining the timestamp for each set of measurements as its own variable:
Time1 VariableA1 VariableB1 VariableC1 Time2 VariableA2 VariableB2 VariableC2 Time3 VariableA3 VariableB3 VariableC3
1 3 2 9 2 8 5 11 3 2 7 5
I have many files like this, and want to eventually end up with a single file that has one row per participant.
10-16-2014 11:16 AM
This SAS macro has helped me achieved what you want to do. One caveat is to have a unique identifier for each participant and transpose by that variable.
10-16-2014 11:27 AM
My first question would be why? You will end up with a lot of variables and few rows which probably isn't the best for processing or further work. What is it your trying to achieve as there may be better ways of getting the outcome.
10-16-2014 11:49 AM
Re: kewong. Thank you, I will try this out. This may be a silly question, but where do I put that line of %multtransponse? Does it go within a data statement? And what about the huge amount of code that is at the bottom of the website you linked to... do I need all of that, or just the one line?
Re: RW9, I need to sum the observations across time variables. If it helps, what I'm working with is physical activity data collected over an entire week. Each 15-second interval is its own row. So I need to create a summary variable for each day that adds up all of the physical activity. But also will need to mark and pull out the hours of the day when the person was sleeping (they were still wearing the monitor 24 hrs/day, even when asleep, so there are data from when they are rolling around in bed, etc., which I need to remove).
10-16-2014 12:31 PM
I think you might be better off creating a group variable from time. Since your "time" is obviously not time but occurrence if you describe the relation to the desired totals needed I'll bet we can do this with your existing data and proc means/summary.
10-16-2014 12:44 PM
Ballard, I edited the original post to add one of my actual datasets in case that is helpful to anyone. I think I understand what you are saying about the time variable... if I understand correctly, you are suggesting that we could just enumerate the occurences. Since everyone has 1 week of data from their physical activity sensor ( ~40,320 15-second intervals of data), we could just number those 40,320 occurences, right, rather than labeling them by the actual date/time of the occurence? The rub is that I need to remove blocks of data... so if someone slept from 11:21 pm - 6:07 a.m. between the first and second day, I need to be able to identify and delete all the rows that pertain to those dates and times.
10-16-2014 12:50 PM
So, I ran a single-variable PROC TRANSPOSE, which seems to have worked fine. The log said I have 1 observation and 40,320 variables, so that sounds good to me. But I can't really take a look at the data, because when I run PROC CONTENTS, SAS gets all gummed up and stops responding. Is this normal with huge datasets? How do you work with it if you can't look at it?
10-16-2014 12:56 PM
This is the kind of problem you can run into, not with large data sets, but with such a large number of variables. It explains why commenters are so reluctant to create so many variables.
10-16-2014 01:02 PM
So your really going to try working with a dataset which has 40320 variables? Best of luck with that. Me personally I would get your data looking like (normalised):
Time VariableName VariableResult
1 A 3
1 B 2
1 C 9
2 A 8
2 B 5
2 C 11
Then use grouped functions, e.g for SQL:
create table SUMS as
select distinct TIME,
sum(VARIABLE_RESULT) as SUM
group by TIME;
If you could provide actual test data and required output I could have a look at it.
10-16-2014 11:51 AM
I think you probably can achieve the data structure you want by transposing - that's where I'd be focusing my efforts. You'll need to do multiple transpositions into new datasets - one for each variable or variable, timestamp combination, AND you'll need something as a unique key to merge datasets back together - perhaps a concatenated field. Once you've transposed all the variables you'll need to join or merge the wide datasets back together to get one dataset - that's where you'll be glad you when through the effort to create a key. See the link below for step-by-step details.
10-16-2014 12:03 PM
I think that may be my error in understanding the term. Sometimes in Excel I use the transpose feature, which switches columns and rows. So I assumed that proc transpose would reverse columns and rows, not turn multiple rows into a single row...
10-16-2014 12:04 PM
I actually agree with mr. boostrap.
you can basically use proc transpose, but you need a unique key say "Account" to eventually merge all columns. Change in new variable names can be achieved by a macro fairly easily.
10-16-2014 12:11 PM
Thank you everyone who has responded so far. This is my first time trying the forum so I didn't know whether anyone would chime in. I really appreciate this help.
So, in re: Bootstrap_armada, I read the example you sent. That example seems a bit more straightforward because each family has data for the same three years: 96, 97, and 98. So they specify "id=year" and the new columns are labeled accordingly But will that still work for my example, when the values for the "time" variable are different for everyone? (Everyone started wearing their monitor on a different date)? I need mine not to append the column name with a value, but basically just say "time1 time2 time3 time4" etc. because "time1" is different for every person.
** Edited to add that other than that bit, the page you sent is very helpful and I was able to understand what they're doing... so it seems promising... it's just the "id" piece I'm unsure about.
10-16-2014 12:27 PM
if i understand correctly, you can sort by time, then use a first. function to create a count, then you have 1,2,3,4. by string function here you go, time1 time2 time3 time4...etc