BookmarkSubscribeRSS Feed
Nrjn7
Fluorite | Level 6

data have;

input ID timepoint age education$ gender$ survey1 survey2 survey3;

datalines;

1 1 23 BA M 3 8 9 

1 2 23 BA M 8 9 6 

1 3 23 BA M 6 8 0 

2 1 24 MA F 4 5 9 

2 2 24 MA F 0 7 7

2 3 24 MA F 2 0 7

;

run;

 

Tranposed to following:

IDageeducationgendersurvey1_1survey1_2survey1_3survey2_1survey2_2survey2_3survey3_1survey3_2survey3_3
123BAM308689860
224MAF459077207
9 REPLIES 9
PaigeMiller
Diamond | Level 26

Creating this type of wide data set is easy, but programming with it will be a nightmare. Long data sets are much easier to handle, and you would be wise not to go down this path.

 

Please explain what analyses you are planning to do with this data, this will indicate whether or not long is a better format (a common situation), or wide is a better format (very uncommon)

--
Paige Miller
Nrjn7
Fluorite | Level 6

@PaigeMiller For this I need it to be one ID per row. How do I tackle this?

PaigeMiller
Diamond | Level 26

@Nrjn7 wrote:

@PaigeMiller For this I need it to be one ID per row. How do I tackle this?


I think my point is that you may think you need one ID per row, but more experience SAS programmers would not go down this route, and would keep the data set long, and in the end, do much less programming to get to the end goal. Again, we really need to know what analysis you are going to do from this data in order to be sure, you need to state the end goal, and not intermediate goals such as one data set per row.

 

Also, just to be clear, there are indeed situations where you need one ID per row, these are relatively rare and usually involve fitting some form of statistical model. Which is why we want to know what your end goal is. What analysis will you do with this data?

--
Paige Miller
Nrjn7
Fluorite | Level 6

@PaigeMiller I understand that the experienced user may not go this route. My case is indeed different.

 

The goal is to look at my repeated instruments of surveys and look at how the participants are making improvements. It would make much sense to look at this longitudinal data as a single row that have them as the way it was collected. I see the the posts are helpful but still struggling with how I'd want them to display.

 

Any way thanks for the help. 

Kurt_Bremser
Super User

In my experience, following the development of values is much easier done in a longitudinal dataset.

 

Do all participants always have the same number of observations, and is that number known beforehand?

PaigeMiller
Diamond | Level 26

@Nrjn7 wrote:

@PaigeMiller I understand that the experienced user may not go this route. My case is indeed different.

 

The goal is to look at my repeated instruments of surveys and look at how the participants are making improvements. It would make much sense to look at this longitudinal data as a single row that have them as the way it was collected. I see the the posts are helpful but still struggling with how I'd want them to display.

 

Any way thanks for the help. 


There's absolutely no need to have one row per ID to calculate if participants are making improvements. Of course, you don't say exactly what "making improvements" means, but procedures such as PROC SUMMARY (and many other PROCs) can calculate means and standard deviations (and many other statistics) for different participants and time points, from a LONG data set. By doing the work to convert this to a wide data set, you are doing unnecessary work (because PROC SUMMARY and many other PROCs work on LONG data sets) and making your programming more difficult.

--
Paige Miller
Tom
Super User Tom
Super User

Converting is simple and there are many examples on the Web and this site.

On complication with your data is you are starting in a middle state between truly LONG and truly WIDE.

 

For a simple case like this you could use a DATA step and ARRAY statements to collapse the observations.  Looks like TIMEPOINT is the index into your arrays.  The dimensions on the arrays should match the maximum value of TIMEPOINT.

data want ;
  do until (last.gender) ;
    set have;
    by ID age education gender;
    array survey1_ [3] ;
    array survey2_ [3] ;
    array survey3_ [3] ;
    survey1_[timepoint] = survey1;
    survey2_[timepoint] = survey2;
    survey3_[timepoint] = survey3;
  end;
  drop timepoint survey1-survey3;
run;

You could potentially add more arrays and loops to handle survey1 to 3, but with N = 3 it is not really that really worth the effort.

 

For a more general solution first convert your semi-long structure to LONG.  And then convert that to WIDE.  The SORT step is just to try to set the order the variables are created in the final dataset.

proc transpose data=have out=tall name=question;
  by ID age education gender timepoint ;
  var survey1-survey3 ;
run;

proc sort data=tall;
  by ID age education gender question timepoint ;
run;

proc transpose data=tall out=want2(drop=_name_) delim=_ ;
  by ID age education gender;
  id question timepoint;
  var col1;
run;

Both methods produce the same result.

The COMPARE Procedure
Comparison of WORK.WANT with WORK.WANT2
(Method=EXACT)

Data Set Summary

Dataset              Created          Modified  NVar    NObs

WORK.WANT   22MAY21:10:55:00  22MAY21:10:55:00    13       2
WORK.WANT2  22MAY21:10:55:00  22MAY21:10:55:00    13       2


Variables Summary

Number of Variables in Common: 13.


Observation Summary

Observation      Base  Compare

First Obs           1        1
Last  Obs           2        2

Number of Observations in Common: 2.
Total Number of Observations Read from WORK.WANT: 2.
Total Number of Observations Read from WORK.WANT2: 2.

Number of Observations with Some Compared Variables Unequal: 0.
Number of Observations with All Compared Variables Equal: 2.

NOTE: No unequal values were found. All values compared are exactly equal.

 

 

ghosh
Barite | Level 11

 

%macro transp;
%do i=1 %to 3;
	proc transpose prefix=survey&i._ data=have out=want&i; 
	    by id age education gender;
	    var survey&i;
	run;
	%end;
	data want (drop=_name_);
	merge 
		%do i=1 %to 3;
		want&i
		%end;
		;		
	    by id age education gender;
	run;
	proc print;
		id id;
	run;
%mend;

%transp;

transp.png

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1064 views
  • 0 likes
  • 5 in conversation