BookmarkSubscribeRSS Feed
MHines
Obsidian | Level 7

Hello,

 

Any advice is much appreciated! I need to transpose data to long-form. I am getting tripped up because I initially assumed that the data I am cleaning was wide format and that the proc transpose command would be straightforward, however I determined that the data is BOTH wide and long. As such, the proc transpose command is only partially functioning as I intended. As a fake example, the raw data file looks like this:

Row CaseID Person1 Person2 Person3 Person 4 Person 5
1 2024-000-01 Sally James      
2 2024-000-01 Gerry Natalie      
3 2024-000-01 Sam        
4 2024-000-02 Annie Holly Jaxson Ned Clark
5 2024-000-03 Camille Henry      
6 2024-000-04 Alice Chris      
7 2024-000-05 Maria George      
8 2024-000-05 Ronnie Brenda      

 

I was initially under the impression that everything would be structured as wide-form, like row #4, #5, and #6. However about 15% of the CaseIDs are entered the way rows #1, #2, #3, #7 and #8 are. To complicate matters, for cases that have split rows, the number of rows they occupy varies - there are between 2 and 12 rows per case with multiple persons in each row.

 

I am trying to clean the data so that it is long-form with one row per person name, like this:

Row CaseID Person
1 2024-000-01 Sally
2 2024-000-01 James
3 2024-000-01 Gerry
4 2024-000-01 Natalie
5 2024-000-01 Sam
6 2024-000-02 Annie
7 2024-000-02 Holly
8 2024-000-02 Jackson
9 2024-000-02 Ned
10 2024-000-02 Clark
11 2024-000-03 Camille
12 2024-000-03 Henry
13 2024-000-04 Alice
14 2024-000-04 Chris
15 2024-000-05 Maria
16 2024-000-05 George
17 2024-000-05 Ronnie 
18 2024-000-05 Brenda
7 REPLIES 7
FreelanceReinh
Jade | Level 19

Hello @MHines,

 

You can transpose the list of persons for each row and remove the observations with missing values with a WHERE= option:

data have;
infile cards truncover;
input Row CaseID :$11. (Person1-Person5) ($);
cards;
1 2024-000-01 Sally James
2 2024-000-01 Gerry Natalie
3 2024-000-01 Sam
4 2024-000-02 Annie Holly Jaxson Ned Clark
5 2024-000-03 Camille Henry
6 2024-000-04 Alice Chris
7 2024-000-05 Maria George
8 2024-000-05 Ronnie Brenda
;

proc transpose data=have out=want(drop=_: row rename=(col1=Person) where=(Person));
by Row CaseID;
var Person:;
run;

If variable Row is not contained in your dataset HAVE or should be contained in dataset WANT (with new values), it can be created easily.

Quentin
Super User

Could you please post the example dataset as code, i.e. a DATA step with CARDS statement?  

 

Also, can you please post the PROC TRANPOSE code you have tried, running on the example data, and describe what went wrong?

The Boston Area SAS Users Group is hosting free webinars!
Next up: SAS Trivia Quiz hosted by SAS on Wednesday May 21.
Register now at https://www.basug.org/events.
MHines
Obsidian | Level 7

Hello,

Thanks for following up. I tried this:

 

proc transpose data=have out=want;

by CaseID;

var person1-5;

run;

 

The code is working for the cases that are not split across multiple rows, but only partially working for cases that are split across multiple rows. It looks like the table below. My goal is that all of the names are in the same column.

 

CaseID _NAME_ col1 col2 col3
2024-000-01 Person1 Sally Gerry Sam
2024-000-01 Person2 James Natalie  
2024-000-01 Person3      
2024-000-01 Person4      
2024-000-01 Person5      
2024-000-02 Person1 Annie    
2024-000-02 Person2 Holly    
2024-000-02 Person3 Jaxson    
2024-000-02 Person4 Ned    
2024-000-02 Person5 Clark    
2024-000-03 Person1 Camille    
2024-000-03 Person2 Henry    
2024-000-03 Person3      
2024-000-03 Person4      
2024-000-03 Person5      
2024-000-04 Person1 Alice    
2024-000-04 Person2 Chris    
2024-000-04 Person3      
2024-000-04 Person4      
2024-000-04 Person5      
2024-000-05 Person1 Maria Ronnie  
2024-000-05 Person2 George Brenda  
2024-000-05 Person3      
2024-000-05 Person4      
2024-000-05 Person5      
MHines
Obsidian | Level 7

I suppose I could always create a unique CaseID for each row, something like 2024-000-01-a, 2024-000-01-b, 2024-000-01-c etc. to transpose them that way, then convert the IDs back to the original and append the rows, but it seems like there are faster alternatives.

Quentin
Super User

@MHines wrote:

I suppose I could always create a unique CaseID for each row, something like 2024-000-01-a, 2024-000-01-b, 2024-000-01-c etc. to transpose them that way, then convert the IDs back to the original and append the rows, but it seems like there are faster alternatives.


See @FreelanceReinh 's transpose answer.  You can use ROW on your BY statement, you don't need to change the values of your CaseID.  If you don't have ROW in your data, you can add it.

The Boston Area SAS Users Group is hosting free webinars!
Next up: SAS Trivia Quiz hosted by SAS on Wednesday May 21.
Register now at https://www.basug.org/events.
Tom
Super User Tom
Super User

So assuming you do not actually have the ROW variable you displayed in your first listing you can easily add it.  It does not need to depend on the CASEID to work.  So something as simple as this should work.

data add_row;
  row+1;
  set have;
run;

proc transpose data=add_row out=want(rename=(col1=person)) name=place;
  by row CaseID;
  var person:;
run;

If the dataset is really large then make the ADD_ROW table a view instead.

data add_row / view=add_row;
  row+1;
  set have;
run;
ballardw
Super User

Best would be to provide the example data in the form of a working data step.

 

And show the code  you attempted that failed.

Duplicates of BY variables means that transpose will combine the first values into in one row, the second values into second row and so on.

 

So a data step may be the appropriate approach:

data have;
  input Row 	CaseID :$12. 	Person1:$10. 	Person2:$10. 	Person3 :$10.	Person4:$10. 	Person5:$10.;
datalines;
1 	2024-000-01 	Sally 	James 	. . .  	  	 
2 	2024-000-01 	Gerry 	Natalie . . .  	  	 
3 	2024-000-01 	Sam 	  	. . . .
4 	2024-000-02 	Annie 	Holly 	Jaxson 	Ned 	Clark
5 	2024-000-03 	Camille Henry 	. . .  	  	 
6 	2024-000-04 	Alice 	Chris 	. . .  	  	 
7 	2024-000-05 	Maria 	George 	. . .   	  	 
8 	2024-000-05 	Ronnie 	Brenda 	. . .
;

data want;
   set have;
   array p(*) Person1 - Person5;
   do i=1 to dim(p);
      Person= p[i];
      if not missing(person) then output;
   end;
   keep caseid person;
run;

 

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1046 views
  • 6 likes
  • 5 in conversation