File flattening help? Transposing nested lists...

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 12
Accepted Solution

File flattening help? Transposing nested lists...

Hi there,

I am a novice programmer attempting to flatten a large data file with nested lists.

Currently, each data line corresponds to one individual at one time point.  Each individual may have anywhere between 1 and 1800 time points captured for all variables. 

What I need is to have one data line for each person with each time point transposed for all variables.

As a more concrete example, my data currently looks like this:

Person1 Time1 Variable1 Variable2 Variable3 etc.

Person1 Time2 Variable1 Variable2 Variable3 etc.

Person2  Time1 Variable1 Variable2 Variable3 etc.

I want my data to look like this:

Person1 Variable1Time1 Variable2Time1 Variable3Time1 Variable1Time2 Variable2Time2 Variable3Time2...etc

I am trying to use line by line coding but with the size of my dataset, I would have to make thousands of new datasets to merge and will never finish at this rate!!  Although I can't really write macros, I can usually adapt them to suit my needs...Any help would be most appreciated! 

EpiLinds


Accepted Solutions
Solution
‎08-13-2014 04:03 PM
Grand Advisor
Posts: 10,223

Re: File flattening help? Transposing nested lists...

Here's a template for what I think you are trying to do. I only used 3 variables and a couple of patients.

data work.junk;
input Id days var1 var2 var3;
datalines;
1 1 2 3 4
1 2 89 89 90
1 5 34 36 39
2 1 23 35 46
2 4 13 24 40
;
run;
/* this data set is used to create a list that will make sure
you have all possible values from day1 to dayN within the range
given*/
data work.dummy;
   id = 0;
   do days = 1 to 5; /* largest value of the days for any record*/
      output;
   end;
run;
/* now transform a wide to long, one record per person/time/variable*/
data work.transin;
   set work.dummy work.junk;
   array v var1-var3;
   do vnum = 1 to dim(v);
      var=v[vnum];
      output;
   end;
   drop var1-var3;
run;

/* the where dateset option drops the dummy record*/
proc transpose data=work.transin out=work.trans (where=(id ne 0))
prefix=var;
by id;
id vnum days;
run;


View solution in original post


All Replies
Valued Guide
Posts: 3,206

Re: File flattening help? Transposing nested lists...

Is the dataset ordered by person or not?

Can it be ordered or is it stream processing.

The number of variables is looking to be an array definition.

How many persons do you have? When that will fit into a memory based approach the logic can be different.

Reading data with the input statement does have a lot of possibilities a programming world task on its own.

---->-- ja karman --<-----
Occasional Contributor
Posts: 12

Re: File flattening help? Transposing nested lists...

Thanks Jaap...

The dataset is sorted by person.  I'm unclear on the different between ordered vs. stream processing. 

The example I gave above was somewhat simplified... to clarify:

There are 1700 persons, who each have somewhere between 1 and 6 surgeries, and within these surgeries, there are between 1 and 1800 days of follow up.  Currently, there is a line for each day of follow up.  Thus, multiple observations (data lines) for each person and each surgery.

Thoughts?

Grand Advisor
Posts: 10,223

Re: File flattening help? Transposing nested lists...

Are you asking to create new variables with the value of time appended to the name? Is the time variable an actual SAS date, time or datetime variable or something else (character or numeric that looks similar to a time value) or perhaps just an interval value?

Have you looked at all at Proc Transpose for this task?

Occasional Contributor
Posts: 12

Re: File flattening help? Transposing nested lists...

Hi ballardw,

Yes, that is exactly what I am trying to do (replace the old generic variables with more specific variables that append the surgery number and day of follow up to the name).  I haven't looked into Proc Transpose but I'm checking it out now!

Thanks!

Grand Advisor
Posts: 17,396

Re: File flattening help? Transposing nested lists...

I'll echo what Astounding has mentioned. You'll lose some nice functionality when you transpose your data to this format.

In the current format you can easily create summaries, ie how many people had X intervention/complication/surgery that will be more difficult once you transpose your data.

Occasional Contributor
Posts: 12

Re: File flattening help? Transposing nested lists...

Hi Reeza,

I appreciate your comment but already have my summary data recorded and am certain that I need to flatten the file as I have described.

Respected Advisor
Posts: 4,993

Re: File flattening help? Transposing nested lists...

EpiLinds,

You mention that you are a novice programmer.  In that light, it is only fair to mention that you might be spending a lot of time on a task that doesn't need to be done.  SAS contains tons of tools that will process your data in its current form without transposing.  If you give us some picture of how you will analyze the data, you may find that you can move directly onto the analysis phase.

Good luck.

Occasional Contributor
Posts: 12

Re: File flattening help? Transposing nested lists...

Thanks Astounding,

Unfortunately, I am cleaning and flattening the file as part of a larger multi-centre study and my collaborator needs a flat file to load and analyze in SPSS and Excel. They were specific about the format they needed so this time my hands are tied Smiley Sad

Respected Advisor
Posts: 4,993

Re: File flattening help? Transposing nested lists...

That's unfortunate.  Here's a simple approach that will require you to learn PROC TRANSPOSE.

(1) Transpose each variable one at a time using a separate PROC TRANSPOSE (each creating its own output data set).  Use the prefix= option to control the variable names.  For example, when transposing the variable Age, add prefix='Age'.  That will name the new variables automatically Age1-Age1800.  (Make sure this is an acceptable result!  You can use a longer prefix if needed, such as prefix='AgeTime'.)

(2) Merge all the transposed data sets together.

This approach requires that you have no gaps in your time series.  The first observation for a patient must be for Time1, the second must be for Time2, etc.  If that's not the case, it gets harder!

Occasional Contributor
Posts: 12

Re: File flattening help? Transposing nested lists...

Thanks again Astounding - I actually do have gaps in my time series but ballardw's approach is working on my dummy data, so I think I'm set.  I'm happy you told me about Proc Transpose anyway as I'm sure it will come in handy a lot.

Cheers

Solution
‎08-13-2014 04:03 PM
Grand Advisor
Posts: 10,223

Re: File flattening help? Transposing nested lists...

Here's a template for what I think you are trying to do. I only used 3 variables and a couple of patients.

data work.junk;
input Id days var1 var2 var3;
datalines;
1 1 2 3 4
1 2 89 89 90
1 5 34 36 39
2 1 23 35 46
2 4 13 24 40
;
run;
/* this data set is used to create a list that will make sure
you have all possible values from day1 to dayN within the range
given*/
data work.dummy;
   id = 0;
   do days = 1 to 5; /* largest value of the days for any record*/
      output;
   end;
run;
/* now transform a wide to long, one record per person/time/variable*/
data work.transin;
   set work.dummy work.junk;
   array v var1-var3;
   do vnum = 1 to dim(v);
      var=v[vnum];
      output;
   end;
   drop var1-var3;
run;

/* the where dateset option drops the dummy record*/
proc transpose data=work.transin out=work.trans (where=(id ne 0))
prefix=var;
by id;
id vnum days;
run;


Occasional Contributor
Posts: 12

Re: File flattening help? Transposing nested lists...

Thanks so much ballardw!  You're a lifesaver Smiley Happy

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 793 views
  • 6 likes
  • 5 in conversation