Hi there,
I'm wondering if anyone knows a way to transpose multiple variables while pairing them based on the original dataset. I have a file organized by ID number with information about coursework and the institutional affiliation of each course.
So my data look like this:
PublicId | FirstName | LastName | Institution | CourseTitle | Credit |
A2 | FN | LN | UMN | Parent and Child Relations | 3.00 |
A2 | FN | LN | UMN | Orientation to Human Services | 3.00 |
A2 | FN | LN | Tufts | Child Growth And Development | 3.00 |
A2 | FN | LN | Tufts | Curric for Early Childhood | 3.00 |
A2 | FN | LN | Macalester | Fund Of Writing | 3.00 |
A2 | FN | LN | Macalester | General Psych | 3.00 |
A2 | FN | LN | Macalester | Field Exp Seminar 1 | 3.00 |
And I'd like it to look like this:
PublicId | Course1 | Institution1 | Course2 | Institution2 | Course3 | Institution3 | Course4 | Institution4 |
A2 | Parent and Child Relations | UMN | Orientation to Human Services | Child Growth And Development | Tufts | Curric for Early Childhood | Name | Tufts |
I created a count variable to reshape the data using proc transpose, but I can't seem to get my desired output. I can only get one variable to transpose at a time. Here is what I have so far. Any help would be much appreciated!
data work.WFR_Coursework;
set access.WFR_Coursework;
run;
* creating a variable for the number of courses each teacher has */
data WFR_Coursework1;
set WFR_Coursework;
count + 1;
by PublicId;
if first.PublicId then count =1;
run;
/* check the new variable */
proc freq data = WFR_Coursework1;
table count/list missing;
run;
proc print data = WFR_Coursework1 (obs = 5);
run;
/* now reshape according to the count while copying the other variables into the new dataset */
proc transpose data = WFR_Coursework1
out = WFR_CourseworkMiddle
prefix = Course
name = CourseTitle;
var CourseTitle;
by PublicId;
run;
proc transpose data = WFR_CourseworkMiddle
out = WFR_CourseworkNEW
delim=_;
id Institution _name_ ;
var Course1-Course57;
by PublicID;
run;
proc print data = WFR_CourseworkNEW (obs = 5);
run;
Hi @rhilty see if this helps?
data have;
input (PublicId FirstName LastName Institution) ($) CourseTitle & $75. Credit ;
cards;
A2 FN LN UMN Parent and Child Relations 3.00
A2 FN LN UMN Orientation to Human Services 3.00
A2 FN LN Tufts Child Growth And Development 3.00
A2 FN LN Tufts Curric for Early Childhood 3.00
A2 FN LN Macalester Fund Of Writing 3.00
A2 FN LN Macalester General Psych 3.00
A2 FN LN Macalester Field Exp Seminar 1 3.00
;
proc transpose data=have out=_w;
by PublicId Institution CourseTitle notsorted;
var Institution CourseTitle;
run;
data __w;
set _w;
by PublicId CourseTitle notsorted;
if first.publicid then grp=0;
if first.CourseTitle then grp+1;
keep publicid _name_ grp col1;
run;
proc transpose data=__w out=want(drop=_name_) ;
by PublicId notsorted;
var col1 ;
id _name_ grp;
run;
are you sure your sample output for your input sample is correct?
some institutions have course values and vice versa??????????
Hi @rhilty see if this helps?
data have;
input (PublicId FirstName LastName Institution) ($) CourseTitle & $75. Credit ;
cards;
A2 FN LN UMN Parent and Child Relations 3.00
A2 FN LN UMN Orientation to Human Services 3.00
A2 FN LN Tufts Child Growth And Development 3.00
A2 FN LN Tufts Curric for Early Childhood 3.00
A2 FN LN Macalester Fund Of Writing 3.00
A2 FN LN Macalester General Psych 3.00
A2 FN LN Macalester Field Exp Seminar 1 3.00
;
proc transpose data=have out=_w;
by PublicId Institution CourseTitle notsorted;
var Institution CourseTitle;
run;
data __w;
set _w;
by PublicId CourseTitle notsorted;
if first.publicid then grp=0;
if first.CourseTitle then grp+1;
keep publicid _name_ grp col1;
run;
proc transpose data=__w out=want(drop=_name_) ;
by PublicId notsorted;
var col1 ;
id _name_ grp;
run;
This is exactly what I needed! THANK YOU!
@rhilty Thank you for the feedback. May i request you to please mark the solution as accepted and close the thread?
Quick follow-up @novinosrin... I tried running this code on a similar dataset where, instead of looking at college courses by teacher (PublicId), we're looking at different jobs over time, all of which are stored on separate rows. This dataset has more variables than the last one, and some of the values are duplicated. For example, one teacher might have had the ExperienceType "Administrative Support Staff" under multiple entries. However, their JobTitle, StartDate, and EndDate all vary.
Here is the piece of the code that is returning the error:
proc transpose data = WFR_Employment1
out = WFR_EmploymentMiddle;
var ExperienceType JobTitle StartDate EndDate;
by PublicId ExperienceType JobTitle StartDate EndDate notsorted;
run;
data WFR_EmploymentMiddle;
set WFR_EmploymentMiddle;
by PublicId ExperienceType notsorted;
if first.PublicId then grp=0;
if first.ExperienceType then grp+1;
keep PublicId _name_ grp col1;
run;
proc transpose data = WFR_EmploymentMiddle
out = WFR_EmploymentNEW (drop=_name_);
by PublicId notsorted;
var col1;
id _name_ grp;
run;
And here is part of the error (it's repeated for many of the IDs in the dataset):
71 proc transpose data = WFR_EmploymentMiddle
72 out = WFR_EmploymentNEW (drop=_name_);
73 by PublicId notsorted;
74 var col1;
75 id _name_ grp;
76
77 run;
ERROR: The ID value "ExperienceType2" occurs twice in the same BY group.
ERROR: The ID value "JobTitle2" occurs twice in the same BY group.
ERROR: The ID value "StartDate2" occurs twice in the same BY group.
ERROR: The ID value "EndDate2" occurs twice in the same BY group.
ERROR: The ID value "ExperienceType2" occurs twice in the same BY group.
ERROR: The ID value "JobTitle2" occurs twice in the same BY group.
ERROR: The ID value "StartDate2" occurs twice in the same BY group.
ERROR: The ID value "EndDate2" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
PublicId=A07608
Could it be an issue that certain values are duplicated, or is the issue the number of variables I'm trying to pair? Thank you again for your help figuring this out! I'm not even sure what to Google on this one.
Rowan
Good morning @rhilty It's difficult to tell without knowing the data as I can only guess there might be huge variances between the two samples. You came into the forum with an interesting and tricky not so conventional transpose and it that took me a good 12-15 mins to think through the grouping variables in the first place before my fingers started typing the syntax. A really good one.
For your additional requirement, your best is to provide a clear and representative sample of your real with an expected output for that input sample. This helps for anybody to comfortably comprehend and respond with a one stop shop solution that you could copy paste to a production environment. My 2 cents!
Thanks for the feedback @novinosrin! I'm new to the forum (and to SAS) so it's helpful to hear what's most useful for figuring out solutions.
Here are a few rows of the data. I changed the ID numbers for data security reasons but can tell you that PublicId A3 yielded the error I included above (although this was just one of many IDs that came up with the same set of errors).
PublicId ExperienceType JobTitle StartDate EndDate
A2 Administrative Support Staff Adminstrator/Teacher 2001-07-09 00:00:00.000 2013-09-02 00:00:00.000
A2 Education Coordinator/Educational Lead Administrator 2013-09-09 00:00:00.000 2017-05-31 00:00:00.000
A2 Early Learning Teacher State Pre-Kindergarten Teacher 2017-08-06 00:00:00.000
A3 Elementary Teacher State Pre-K Teacher 2015-08-24 00:00:00.000
A3 Early Learning Teacher PreK Lead Teacher/ Jr. Kindergarten 2011-05-23 00:00:00.000 2015-08-21 00:00:00.000
A3 Early Learning Teacher PreK Lead Teacher 1999-09-06 00:00:00.000 2008-02-09 00:00:00.000
A3 Early Learning Teacher PreK Lead Teacher 2008-08-04 00:00:00.000 2011-02-11 00:00:00.000
A4 Early Learning Teacher Assistant TEACHER ASSISTANT 1981-09-07 00:00:00.000
A5 Early Learning Teacher Lead Teacher State Funded PreK 2010-09-07 00:00:00.000
A5 Early Learning Teacher Teacher 2015-08-17 00:00:00.000
A5 Early Learning Teacher Teacher 1984-09-17 00:00:00.000 1985-09-17 00:00:00.000
A5 Early Learning Teacher Teacher 1985-09-17 00:00:00.000 2015-08-17 00:00:00.000
@rhilty that data makes my eyes hurt. Can you attach and excel sheet and also with an output sample plz
@novinosrin here you go!
@rhilty Try
data have;
infile cards truncover;
input PublicId $ ExperienceType & $50. JobTitle & $50. StartDate & $50. EndDate & $50.;
cards;
A2 Administrative Support Staff Adminstrator/Teacher 2001-07-09 00:00:00.000 2013-09-02 00:00:00.000
A2 Education Coordinator/Ed Lead Administrator 2013-09-09 00:00:00.000 2017-05-31 00:00:00.000
A2 Early Learning Teacher State Pre-Kindergarten Teacher 2017-08-06 00:00:00.000 .
A3 Elementary Teacher State Pre-K Teacher 2015-08-24 00:00:00.000 .
A3 Early Learning Teacher PreK Lead Teacher/ Jr. K 2011-05-23 00:00:00.000 2015-08-21 00:00:00.000
A3 Early Learning Teacher PreK Lead Teacher 1999-09-06 00:00:00.000 2008-02-09 00:00:00.000
A3 Early Learning Teacher PreK Lead Teacher 2008-08-04 00:00:00.000 2011-02-11 00:00:00.000
;
run;
proc transpose data=have out=_w(keep=PublicId ExperienceType _name_ col1);
by _all_ notsorted;
var ExperienceType JobTitle StartDate EndDate;
run;
data __w;
set _w;
by PublicId ExperienceType notsorted;
if first.publicid then grp=0;
if _name_='ExperienceType' then grp+1;
keep publicid _name_ grp col1 ;
run;
proc transpose data=__w out=want(drop=_name_) ;
by PublicId notsorted;
var col1 ;
id _name_ grp;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.