Long to wide

Nrjn7 · Posted 05-21-2021 01:14 PM

data have;

input ID timepoint age education$ gender$ survey1 survey2 survey3;

datalines;

1 1 23 BA M 3 8 9

1 2 23 BA M 8 9 6

1 3 23 BA M 6 8 0

2 1 24 MA F 4 5 9

2 2 24 MA F 0 7 7

2 3 24 MA F 2 0 7

;

run;

Tranposed to following:

ID	age	education	gender	survey1_1	survey1_2	survey1_3	survey2_1	survey2_2	survey2_3	survey3_1	survey3_2	survey3_3
1	23	BA	M	30	8	6	8	9	8	9	6	0
2	24	MA	F	4	5	9	0	7	7	2	0	7

PaigeMiller · Posted 05-21-2021 05:28 PM

Creating this type of wide data set is easy, but programming with it will be a nightmare. Long data sets are much easier to handle, and you would be wise not to go down this path.

Please explain what analyses you are planning to do with this data, this will indicate whether or not long is a better format (a common situation), or wide is a better format (very uncommon)

--
Paige Miller

Nrjn7 · Posted 05-21-2021 09:03 PM

@PaigeMiller For this I need it to be one ID per row. How do I tackle this?

Kurt_Bremser · Posted 05-22-2021 03:27 AM

Why do you need this to be one row per id?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

PaigeMiller · Posted 05-22-2021 05:31 AM

@Nrjn7 wrote:

@PaigeMiller For this I need it to be one ID per row. How do I tackle this?

I think my point is that you may think you need one ID per row, but more experience SAS programmers would not go down this route, and would keep the data set long, and in the end, do much less programming to get to the end goal. Again, we really need to know what analysis you are going to do from this data in order to be sure, you need to state the end goal, and not intermediate goals such as one data set per row.

Also, just to be clear, there are indeed situations where you need one ID per row, these are relatively rare and usually involve fitting some form of statistical model. Which is why we want to know what your end goal is. What analysis will you do with this data?

--
Paige Miller

Nrjn7 · Posted 05-22-2021 08:54 AM

@PaigeMiller I understand that the experienced user may not go this route. My case is indeed different.

The goal is to look at my repeated instruments of surveys and look at how the participants are making improvements. It would make much sense to look at this longitudinal data as a single row that have them as the way it was collected. I see the the posts are helpful but still struggling with how I'd want them to display.

Any way thanks for the help.

Kurt_Bremser · Posted 05-22-2021 09:48 AM

In my experience, following the development of values is much easier done in a longitudinal dataset.

Do all participants always have the same number of observations, and is that number known beforehand?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

PaigeMiller · Posted 05-25-2021 07:16 AM

@Nrjn7 wrote:

@PaigeMiller I understand that the experienced user may not go this route. My case is indeed different.

The goal is to look at my repeated instruments of surveys and look at how the participants are making improvements. It would make much sense to look at this longitudinal data as a single row that have them as the way it was collected. I see the the posts are helpful but still struggling with how I'd want them to display.

Any way thanks for the help.

There's absolutely no need to have one row per ID to calculate if participants are making improvements. Of course, you don't say exactly what "making improvements" means, but procedures such as PROC SUMMARY (and many other PROCs) can calculate means and standard deviations (and many other statistics) for different participants and time points, from a LONG data set. By doing the work to convert this to a wide data set, you are doing unnecessary work (because PROC SUMMARY and many other PROCs work on LONG data sets) and making your programming more difficult.

--
Paige Miller

Tom · Posted 05-22-2021 10:57 AM

Converting is simple and there are many examples on the Web and this site.

On complication with your data is you are starting in a middle state between truly LONG and truly WIDE.

For a simple case like this you could use a DATA step and ARRAY statements to collapse the observations. Looks like TIMEPOINT is the index into your arrays. The dimensions on the arrays should match the maximum value of TIMEPOINT.

data want ;
  do until (last.gender) ;
    set have;
    by ID age education gender;
    array survey1_ [3] ;
    array survey2_ [3] ;
    array survey3_ [3] ;
    survey1_[timepoint] = survey1;
    survey2_[timepoint] = survey2;
    survey3_[timepoint] = survey3;
  end;
  drop timepoint survey1-survey3;
run;

You could potentially add more arrays and loops to handle survey1 to 3, but with N = 3 it is not really that really worth the effort.

For a more general solution first convert your semi-long structure to LONG. And then convert that to WIDE. The SORT step is just to try to set the order the variables are created in the final dataset.

proc transpose data=have out=tall name=question;
  by ID age education gender timepoint ;
  var survey1-survey3 ;
run;

proc sort data=tall;
  by ID age education gender question timepoint ;
run;

proc transpose data=tall out=want2(drop=_name_) delim=_ ;
  by ID age education gender;
  id question timepoint;
  var col1;
run;

Both methods produce the same result.

The COMPARE Procedure
Comparison of WORK.WANT with WORK.WANT2
(Method=EXACT)

Data Set Summary

Dataset              Created          Modified  NVar    NObs

WORK.WANT   22MAY21:10:55:00  22MAY21:10:55:00    13       2
WORK.WANT2  22MAY21:10:55:00  22MAY21:10:55:00    13       2


Variables Summary

Number of Variables in Common: 13.


Observation Summary

Observation      Base  Compare

First Obs           1        1
Last  Obs           2        2

Number of Observations in Common: 2.
Total Number of Observations Read from WORK.WANT: 2.
Total Number of Observations Read from WORK.WANT2: 2.

Number of Observations with Some Compared Variables Unequal: 0.
Number of Observations with All Compared Variables Equal: 2.

NOTE: No unequal values were found. All values compared are exactly equal.

ghosh · Posted 05-22-2021 01:50 PM

%macro transp;
%do i=1 %to 3;
	proc transpose prefix=survey&i._ data=have out=want&i; 
	    by id age education gender;
	    var survey&i;
	run;
	%end;
	data want (drop=_name_);
	merge 
		%do i=1 %to 3;
		want&i
		%end;
		;		
	    by id age education gender;
	run;
	proc print;
		id id;
	run;
%mend;

%transp;

Long to wide

Re: Long to wide

Re: Long to wide

Re: Long to wide

Re: Long to wide

Re: Long to wide

Re: Long to wide

Re: Long to wide

Re: Long to wide

Re: Long to wide

SAS Innovate 2025: Call for Content

Classroom Training Available!