BookmarkSubscribeRSS Feed
MHines
Obsidian | Level 7

Hello,

 

Any advice is much appreciated! I need to transpose data to long-form. I am getting tripped up because I initially assumed that the data I am cleaning was wide format and that the proc transpose command would be straightforward, however I determined that the data is BOTH wide and long. As such, the proc transpose command is only partially functioning as I intended. As a fake example, the raw data file looks like this:

Row CaseID Person1 Person2 Person3 Person 4 Person 5
1 2024-000-01 Sally James      
2 2024-000-01 Gerry Natalie      
3 2024-000-01 Sam        
4 2024-000-02 Annie Holly Jaxson Ned Clark
5 2024-000-03 Camille Henry      
6 2024-000-04 Alice Chris      
7 2024-000-05 Maria George      
8 2024-000-05 Ronnie Brenda      

 

I was initially under the impression that everything would be structured as wide-form, like row #4, #5, and #6. However about 15% of the CaseIDs are entered the way rows #1, #2, #3, #7 and #8 are. To complicate matters, for cases that have split rows, the number of rows they occupy varies - there are between 2 and 12 rows per case with multiple persons in each row.

 

I am trying to clean the data so that it is long-form with one row per person name, like this:

Row CaseID Person
1 2024-000-01 Sally
2 2024-000-01 James
3 2024-000-01 Gerry
4 2024-000-01 Natalie
5 2024-000-01 Sam
6 2024-000-02 Annie
7 2024-000-02 Holly
8 2024-000-02 Jackson
9 2024-000-02 Ned
10 2024-000-02 Clark
11 2024-000-03 Camille
12 2024-000-03 Henry
13 2024-000-04 Alice
14 2024-000-04 Chris
15 2024-000-05 Maria
16 2024-000-05 George
17 2024-000-05 Ronnie 
18 2024-000-05 Brenda
7 REPLIES 7
FreelanceReinh
Jade | Level 19

Hello @MHines,

 

You can transpose the list of persons for each row and remove the observations with missing values with a WHERE= option:

data have;
infile cards truncover;
input Row CaseID :$11. (Person1-Person5) ($);
cards;
1 2024-000-01 Sally James
2 2024-000-01 Gerry Natalie
3 2024-000-01 Sam
4 2024-000-02 Annie Holly Jaxson Ned Clark
5 2024-000-03 Camille Henry
6 2024-000-04 Alice Chris
7 2024-000-05 Maria George
8 2024-000-05 Ronnie Brenda
;

proc transpose data=have out=want(drop=_: row rename=(col1=Person) where=(Person));
by Row CaseID;
var Person:;
run;

If variable Row is not contained in your dataset HAVE or should be contained in dataset WANT (with new values), it can be created easily.

Quentin
Super User

Could you please post the example dataset as code, i.e. a DATA step with CARDS statement?  

 

Also, can you please post the PROC TRANPOSE code you have tried, running on the example data, and describe what went wrong?

The Boston Area SAS Users Group (BASUG) is hosting our in person SAS Blowout on Oct 18!
This full-day event in Cambridge, Mass features four presenters from SAS, presenting on a range of SAS 9 programming topics. Pre-registration by Oct 15 is required.
Full details and registration info at https://www.basug.org/events.
MHines
Obsidian | Level 7

Hello,

Thanks for following up. I tried this:

 

proc transpose data=have out=want;

by CaseID;

var person1-5;

run;

 

The code is working for the cases that are not split across multiple rows, but only partially working for cases that are split across multiple rows. It looks like the table below. My goal is that all of the names are in the same column.

 

CaseID _NAME_ col1 col2 col3
2024-000-01 Person1 Sally Gerry Sam
2024-000-01 Person2 James Natalie  
2024-000-01 Person3      
2024-000-01 Person4      
2024-000-01 Person5      
2024-000-02 Person1 Annie    
2024-000-02 Person2 Holly    
2024-000-02 Person3 Jaxson    
2024-000-02 Person4 Ned    
2024-000-02 Person5 Clark    
2024-000-03 Person1 Camille    
2024-000-03 Person2 Henry    
2024-000-03 Person3      
2024-000-03 Person4      
2024-000-03 Person5      
2024-000-04 Person1 Alice    
2024-000-04 Person2 Chris    
2024-000-04 Person3      
2024-000-04 Person4      
2024-000-04 Person5      
2024-000-05 Person1 Maria Ronnie  
2024-000-05 Person2 George Brenda  
2024-000-05 Person3      
2024-000-05 Person4      
2024-000-05 Person5      
MHines
Obsidian | Level 7

I suppose I could always create a unique CaseID for each row, something like 2024-000-01-a, 2024-000-01-b, 2024-000-01-c etc. to transpose them that way, then convert the IDs back to the original and append the rows, but it seems like there are faster alternatives.

Quentin
Super User

@MHines wrote:

I suppose I could always create a unique CaseID for each row, something like 2024-000-01-a, 2024-000-01-b, 2024-000-01-c etc. to transpose them that way, then convert the IDs back to the original and append the rows, but it seems like there are faster alternatives.


See @FreelanceReinh 's transpose answer.  You can use ROW on your BY statement, you don't need to change the values of your CaseID.  If you don't have ROW in your data, you can add it.

The Boston Area SAS Users Group (BASUG) is hosting our in person SAS Blowout on Oct 18!
This full-day event in Cambridge, Mass features four presenters from SAS, presenting on a range of SAS 9 programming topics. Pre-registration by Oct 15 is required.
Full details and registration info at https://www.basug.org/events.
Tom
Super User Tom
Super User

So assuming you do not actually have the ROW variable you displayed in your first listing you can easily add it.  It does not need to depend on the CASEID to work.  So something as simple as this should work.

data add_row;
  row+1;
  set have;
run;

proc transpose data=add_row out=want(rename=(col1=person)) name=place;
  by row CaseID;
  var person:;
run;

If the dataset is really large then make the ADD_ROW table a view instead.

data add_row / view=add_row;
  row+1;
  set have;
run;
ballardw
Super User

Best would be to provide the example data in the form of a working data step.

 

And show the code  you attempted that failed.

Duplicates of BY variables means that transpose will combine the first values into in one row, the second values into second row and so on.

 

So a data step may be the appropriate approach:

data have;
  input Row 	CaseID :$12. 	Person1:$10. 	Person2:$10. 	Person3 :$10.	Person4:$10. 	Person5:$10.;
datalines;
1 	2024-000-01 	Sally 	James 	. . .  	  	 
2 	2024-000-01 	Gerry 	Natalie . . .  	  	 
3 	2024-000-01 	Sam 	  	. . . .
4 	2024-000-02 	Annie 	Holly 	Jaxson 	Ned 	Clark
5 	2024-000-03 	Camille Henry 	. . .  	  	 
6 	2024-000-04 	Alice 	Chris 	. . .  	  	 
7 	2024-000-05 	Maria 	George 	. . .   	  	 
8 	2024-000-05 	Ronnie 	Brenda 	. . .
;

data want;
   set have;
   array p(*) Person1 - Person5;
   do i=1 to dim(p);
      Person= p[i];
      if not missing(person) then output;
   end;
   keep caseid person;
run;

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 297 views
  • 6 likes
  • 5 in conversation