How do I transpose data to long-form if the raw data is both long and ...

MHines · Posted 09-23-2024 11:07 AM

Hello,

Any advice is much appreciated! I need to transpose data to long-form. I am getting tripped up because I initially assumed that the data I am cleaning was wide format and that the proc transpose command would be straightforward, however I determined that the data is BOTH wide and long. As such, the proc transpose command is only partially functioning as I intended. As a fake example, the raw data file looks like this:

Row	CaseID	Person1	Person2	Person3	Person 4	Person 5
1	2024-000-01	Sally	James
2	2024-000-01	Gerry	Natalie
3	2024-000-01	Sam
4	2024-000-02	Annie	Holly	Jaxson	Ned	Clark
5	2024-000-03	Camille	Henry
6	2024-000-04	Alice	Chris
7	2024-000-05	Maria	George
8	2024-000-05	Ronnie	Brenda

I was initially under the impression that everything would be structured as wide-form, like row #4, #5, and #6. However about 15% of the CaseIDs are entered the way rows #1, #2, #3, #7 and #8 are. To complicate matters, for cases that have split rows, the number of rows they occupy varies - there are between 2 and 12 rows per case with multiple persons in each row.

I am trying to clean the data so that it is long-form with one row per person name, like this:

Row	CaseID	Person
1	2024-000-01	Sally
2	2024-000-01	James
3	2024-000-01	Gerry
4	2024-000-01	Natalie
5	2024-000-01	Sam
6	2024-000-02	Annie
7	2024-000-02	Holly
8	2024-000-02	Jackson
9	2024-000-02	Ned
10	2024-000-02	Clark
11	2024-000-03	Camille
12	2024-000-03	Henry
13	2024-000-04	Alice
14	2024-000-04	Chris
15	2024-000-05	Maria
16	2024-000-05	George
17	2024-000-05	Ronnie
18	2024-000-05	Brenda

FreelanceReinh · Posted 09-23-2024 11:36 AM

Hello @MHines,

You can transpose the list of persons for each row and remove the observations with missing values with a WHERE= option:

data have;
infile cards truncover;
input Row CaseID :$11. (Person1-Person5) ($);
cards;
1 2024-000-01 Sally James
2 2024-000-01 Gerry Natalie
3 2024-000-01 Sam
4 2024-000-02 Annie Holly Jaxson Ned Clark
5 2024-000-03 Camille Henry
6 2024-000-04 Alice Chris
7 2024-000-05 Maria George
8 2024-000-05 Ronnie Brenda
;

proc transpose data=have out=want(drop=_: row rename=(col1=Person) where=(Person));
by Row CaseID;
var Person:;
run;

If variable Row is not contained in your dataset HAVE or should be contained in dataset WANT (with new values), it can be created easily.

Quentin · Posted 09-23-2024 11:39 AM

Could you please post the example dataset as code, i.e. a DATA step with CARDS statement?

Also, can you please post the PROC TRANPOSE code you have tried, running on the example data, and describe what went wrong?

The Boston Area SAS Users Group is hosting free webinars!

Register now at https://www.basug.org/events.

MHines · Posted 09-23-2024 11:55 AM

Hello,

Thanks for following up. I tried this:

proc transpose data=have out=want;

by CaseID;

var person1-5;

run;

The code is working for the cases that are not split across multiple rows, but only partially working for cases that are split across multiple rows. It looks like the table below. My goal is that all of the names are in the same column.

CaseID	_NAME_	col1	col2	col3
2024-000-01	Person1	Sally	Gerry	Sam
2024-000-01	Person2	James	Natalie
2024-000-01	Person3
2024-000-01	Person4
2024-000-01	Person5
2024-000-02	Person1	Annie
2024-000-02	Person2	Holly
2024-000-02	Person3	Jaxson
2024-000-02	Person4	Ned
2024-000-02	Person5	Clark
2024-000-03	Person1	Camille
2024-000-03	Person2	Henry
2024-000-03	Person3
2024-000-03	Person4
2024-000-03	Person5
2024-000-04	Person1	Alice
2024-000-04	Person2	Chris
2024-000-04	Person3
2024-000-04	Person4
2024-000-04	Person5
2024-000-05	Person1	Maria	Ronnie
2024-000-05	Person2	George	Brenda
2024-000-05	Person3
2024-000-05	Person4
2024-000-05	Person5

MHines · Posted 09-23-2024 11:58 AM

I suppose I could always create a unique CaseID for each row, something like 2024-000-01-a, 2024-000-01-b, 2024-000-01-c etc. to transpose them that way, then convert the IDs back to the original and append the rows, but it seems like there are faster alternatives.

Quentin · Posted 09-23-2024 12:03 PM

@MHines wrote:

I suppose I could always create a unique CaseID for each row, something like 2024-000-01-a, 2024-000-01-b, 2024-000-01-c etc. to transpose them that way, then convert the IDs back to the original and append the rows, but it seems like there are faster alternatives.

See @FreelanceReinh 's transpose answer. You can use ROW on your BY statement, you don't need to change the values of your CaseID. If you don't have ROW in your data, you can add it.

The Boston Area SAS Users Group is hosting free webinars!

Register now at https://www.basug.org/events.

Tom · Posted 09-23-2024 12:08 PM

So assuming you do not actually have the ROW variable you displayed in your first listing you can easily add it. It does not need to depend on the CASEID to work. So something as simple as this should work.

data add_row;
  row+1;
  set have;
run;

proc transpose data=add_row out=want(rename=(col1=person)) name=place;
  by row CaseID;
  var person:;
run;

If the dataset is really large then make the ADD_ROW table a view instead.

data add_row / view=add_row;
  row+1;
  set have;
run;

ballardw · Posted 09-23-2024 11:46 AM

Best would be to provide the example data in the form of a working data step.

And show the code you attempted that failed.

Duplicates of BY variables means that transpose will combine the first values into in one row, the second values into second row and so on.

So a data step may be the appropriate approach:

data have;
  input Row 	CaseID :$12. 	Person1:$10. 	Person2:$10. 	Person3 :$10.	Person4:$10. 	Person5:$10.;
datalines;
1 	2024-000-01 	Sally 	James 	. . .  	  	 
2 	2024-000-01 	Gerry 	Natalie . . .  	  	 
3 	2024-000-01 	Sam 	  	. . . .
4 	2024-000-02 	Annie 	Holly 	Jaxson 	Ned 	Clark
5 	2024-000-03 	Camille Henry 	. . .  	  	 
6 	2024-000-04 	Alice 	Chris 	. . .  	  	 
7 	2024-000-05 	Maria 	George 	. . .   	  	 
8 	2024-000-05 	Ronnie 	Brenda 	. . .
;

data want;
   set have;
   array p(*) Person1 - Person5;
   do i=1 to dim(p);
      Person= p[i];
      if not missing(person) then output;
   end;
   keep caseid person;
run;

How do I transpose data to long-form if the raw data is both long and wide?

Re: How do I transpose data to long-form if the raw data is both long and wide?

Re: How do I transpose data to long-form if the raw data is both long and wide?

Re: How do I transpose data to long-form if the raw data is both long and wide?

Re: How do I transpose data to long-form if the raw data is both long and wide?

Re: How do I transpose data to long-form if the raw data is both long and wide?

Re: How do I transpose data to long-form if the raw data is both long and wide?

Re: How do I transpose data to long-form if the raw data is both long and wide?

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away