BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rhilty
Obsidian | Level 7

Hi there, 

 

I'm wondering if anyone knows a way to transpose multiple variables while pairing them based on the original dataset. I have a file organized by ID number with information about coursework and the institutional affiliation of each course.

 

So my data look like this:

 

PublicIdFirstNameLastNameInstitutionCourseTitleCredit
A2FNLNUMNParent and Child Relations3.00
A2FNLNUMNOrientation to Human Services3.00
A2FNLNTuftsChild Growth And Development3.00
A2FNLNTuftsCurric for Early Childhood3.00
A2FNLNMacalesterFund Of Writing3.00
A2FNLNMacalesterGeneral Psych3.00
A2FNLNMacalesterField Exp Seminar 13.00

 

And I'd like it to look like this: 

PublicIdCourse1Institution1Course2Institution2Course3Institution3Course4Institution4
A2Parent and Child RelationsUMNOrientation to Human ServicesChild Growth And DevelopmentTuftsCurric for Early ChildhoodNameTufts

 

I created a count variable to reshape the data using proc transpose, but I can't seem to get my desired output. I can only get one variable to transpose at a time. Here is what I have so far. Any help would be much appreciated!

 

 

data work.WFR_Coursework;
set access.WFR_Coursework;
run;

* creating a variable for the number of courses each teacher has */ data WFR_Coursework1; set WFR_Coursework; count + 1; by PublicId; if first.PublicId then count =1; run; /* check the new variable */ proc freq data = WFR_Coursework1; table count/list missing; run; proc print data = WFR_Coursework1 (obs = 5); run; /* now reshape according to the count while copying the other variables into the new dataset */ proc transpose data = WFR_Coursework1 out = WFR_CourseworkMiddle prefix = Course name = CourseTitle; var CourseTitle; by PublicId; run; proc transpose data = WFR_CourseworkMiddle out = WFR_CourseworkNEW delim=_; id Institution _name_ ; var Course1-Course57; by PublicID; run; proc print data = WFR_CourseworkNEW (obs = 5); run;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Hi @rhilty see if this helps?

 


data have;
input (PublicId	FirstName	LastName	Institution) ($)	CourseTitle & $75.	Credit ;
cards;
A2	FN	LN	UMN	Parent and Child Relations	    3.00
A2	FN	LN	UMN	Orientation to Human Services	  3.00
A2	FN	LN	Tufts	Child Growth And Development	 3.00
A2	FN	LN	Tufts	Curric for Early Childhood	   3.00
A2	FN	LN	Macalester	Fund Of Writing	  3.00
A2	FN	LN	Macalester	General Psych	  3.00
A2	FN	LN	Macalester	Field Exp Seminar 1	 3.00
;

proc transpose data=have out=_w;
by PublicId Institution CourseTitle notsorted;
var Institution CourseTitle;
run;

data __w;
set _w;
by 	PublicId  CourseTitle notsorted;
if first.publicid then grp=0;
if first.CourseTitle then grp+1;
keep  publicid _name_ grp col1;
run;

proc transpose data=__w  out=want(drop=_name_) ;
by PublicId   notsorted;
var   col1 ;
id _name_ grp;
run;

View solution in original post

10 REPLIES 10
novinosrin
Tourmaline | Level 20

are you sure your sample output for your input sample is correct?

 

some institutions have course values and vice versa??????????

novinosrin
Tourmaline | Level 20

Hi @rhilty see if this helps?

 


data have;
input (PublicId	FirstName	LastName	Institution) ($)	CourseTitle & $75.	Credit ;
cards;
A2	FN	LN	UMN	Parent and Child Relations	    3.00
A2	FN	LN	UMN	Orientation to Human Services	  3.00
A2	FN	LN	Tufts	Child Growth And Development	 3.00
A2	FN	LN	Tufts	Curric for Early Childhood	   3.00
A2	FN	LN	Macalester	Fund Of Writing	  3.00
A2	FN	LN	Macalester	General Psych	  3.00
A2	FN	LN	Macalester	Field Exp Seminar 1	 3.00
;

proc transpose data=have out=_w;
by PublicId Institution CourseTitle notsorted;
var Institution CourseTitle;
run;

data __w;
set _w;
by 	PublicId  CourseTitle notsorted;
if first.publicid then grp=0;
if first.CourseTitle then grp+1;
keep  publicid _name_ grp col1;
run;

proc transpose data=__w  out=want(drop=_name_) ;
by PublicId   notsorted;
var   col1 ;
id _name_ grp;
run;
rhilty
Obsidian | Level 7

This is exactly what I needed! THANK YOU! Smiley Very Happy

novinosrin
Tourmaline | Level 20

@rhilty Thank you for the feedback. May i request you to please mark the solution as accepted and close the thread?

rhilty
Obsidian | Level 7

Quick follow-up @novinosrin... I tried running this code on a similar dataset where, instead of looking at college courses by teacher (PublicId), we're looking at different jobs over time, all of which are stored on separate rows. This dataset has more variables than the last one, and some of the values are duplicated. For example, one teacher might have had the ExperienceType "Administrative Support Staff" under multiple entries. However, their JobTitle, StartDate, and EndDate all vary.

 

Here is the piece of the code that is returning the error:

 

 

proc transpose data = WFR_Employment1
	out = WFR_EmploymentMiddle;
var ExperienceType JobTitle StartDate EndDate;
by PublicId ExperienceType JobTitle StartDate EndDate notsorted;
run; 

data WFR_EmploymentMiddle;
set WFR_EmploymentMiddle;
by PublicId ExperienceType notsorted;
if first.PublicId then grp=0;
if first.ExperienceType then grp+1;
keep PublicId _name_ grp col1;
run;

proc transpose data = WFR_EmploymentMiddle
	out = WFR_EmploymentNEW (drop=_name_);
	by PublicId notsorted;
	var col1;
	id _name_ grp;

run;

 

And here is part of the error (it's repeated for many of the IDs in the dataset):

 

 

71   proc transpose data = WFR_EmploymentMiddle
72       out = WFR_EmploymentNEW (drop=_name_);
73       by PublicId notsorted;
74       var col1;
75       id _name_ grp;
76
77   run;

ERROR: The ID value "ExperienceType2" occurs twice in the same BY group.
ERROR: The ID value "JobTitle2" occurs twice in the same BY group.
ERROR: The ID value "StartDate2" occurs twice in the same BY group.
ERROR: The ID value "EndDate2" occurs twice in the same BY group.
ERROR: The ID value "ExperienceType2" occurs twice in the same BY group.
ERROR: The ID value "JobTitle2" occurs twice in the same BY group.
ERROR: The ID value "StartDate2" occurs twice in the same BY group.
ERROR: The ID value "EndDate2" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
      PublicId=A07608

 

Could it be an issue that certain values are duplicated, or is the issue the number of variables I'm trying to pair? Thank you again for your help figuring this out! I'm not even sure what to Google on this one.

 

Rowan

 

novinosrin
Tourmaline | Level 20

Good morning @rhilty  It's difficult to tell without knowing the data as I can only guess there might be huge variances between the two samples. You came into the forum with an interesting and tricky not so conventional transpose and it that took me a good 12-15 mins to think through the grouping variables in the first place before my fingers started typing the syntax. A really good one.

 

For your additional requirement, your best is to provide a clear and representative sample of your real with an expected output for that input sample. This helps for anybody to comfortably comprehend and respond with a one stop shop solution  that you could copy paste to a production environment. My 2 cents!

 

 

rhilty
Obsidian | Level 7

Thanks for the feedback @novinosrin! I'm new to the forum (and to SAS) so it's helpful to hear what's most useful for figuring out solutions.

 

Here are a few rows of the data. I changed the ID numbers for data security reasons but can tell you that PublicId A3 yielded the error I included above (although this was just one of many IDs that came up with the same set of errors).

PublicId	ExperienceType	JobTitle	StartDate	EndDate
A2	Administrative Support Staff	Adminstrator/Teacher	2001-07-09 00:00:00.000	2013-09-02 00:00:00.000
A2	Education Coordinator/Educational Lead	Administrator	2013-09-09 00:00:00.000	2017-05-31 00:00:00.000
A2	Early Learning Teacher	State Pre-Kindergarten Teacher	2017-08-06 00:00:00.000	
A3	Elementary Teacher	State Pre-K Teacher	2015-08-24 00:00:00.000	
A3	Early Learning Teacher	PreK Lead Teacher/ Jr. Kindergarten	2011-05-23 00:00:00.000	2015-08-21 00:00:00.000
A3	Early Learning Teacher	PreK Lead Teacher	1999-09-06 00:00:00.000	2008-02-09 00:00:00.000
A3	Early Learning Teacher	PreK Lead Teacher	2008-08-04 00:00:00.000	2011-02-11 00:00:00.000
A4	Early Learning Teacher Assistant	TEACHER ASSISTANT	1981-09-07 00:00:00.000	
A5	Early Learning Teacher	Lead Teacher State Funded PreK	2010-09-07 00:00:00.000	
A5	Early Learning Teacher	Teacher	2015-08-17 00:00:00.000	
A5	Early Learning Teacher	Teacher	1984-09-17 00:00:00.000	1985-09-17 00:00:00.000
A5	Early Learning Teacher	Teacher	1985-09-17 00:00:00.000	2015-08-17 00:00:00.000

 

novinosrin
Tourmaline | Level 20

@rhilty that data makes my eyes hurt. Can you attach and excel sheet and also with an output sample plz

rhilty
Obsidian | Level 7

@novinosrin here you go!

 

novinosrin
Tourmaline | Level 20

@rhilty  Try

 

data have;
infile cards truncover;
input PublicId $	ExperienceType  & $50.	JobTitle  & $50.	StartDate & $50.	EndDate & $50.;
cards;
A2	Administrative Support Staff		Adminstrator/Teacher	2001-07-09 00:00:00.000		2013-09-02 00:00:00.000
A2	Education Coordinator/Ed Lead		Administrator	2013-09-09 00:00:00.000		2017-05-31 00:00:00.000
A2	Early Learning Teacher		State Pre-Kindergarten Teacher	2017-08-06 00:00:00.000	 .
A3	Elementary Teacher		State Pre-K Teacher		2015-08-24 00:00:00.000	  .
A3	Early Learning Teacher		PreK Lead Teacher/ Jr. K	2011-05-23 00:00:00.000		2015-08-21 00:00:00.000
A3	Early Learning Teacher		PreK Lead Teacher	1999-09-06 00:00:00.000		2008-02-09 00:00:00.000
A3	Early Learning Teacher		PreK Lead Teacher	2008-08-04 00:00:00.000		2011-02-11 00:00:00.000
;
run;
proc transpose data=have out=_w(keep=PublicId ExperienceType _name_ col1);
by _all_ notsorted;
var ExperienceType	JobTitle	StartDate	EndDate;
run;
data __w;
set _w;
by 	PublicId  ExperienceType notsorted;
if first.publicid then grp=0;
if _name_='ExperienceType' then grp+1;
keep  publicid _name_ grp col1 ;
run;

proc transpose data=__w  out=want(drop=_name_) ;
by PublicId   notsorted;
var   col1 ;
id _name_ grp;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 3900 views
  • 1 like
  • 2 in conversation