Solved: Re: Proc transpose with multiple variables

rhilty · Posted 07-17-2018 08:58 PM

Hi there,

I'm wondering if anyone knows a way to transpose multiple variables while pairing them based on the original dataset. I have a file organized by ID number with information about coursework and the institutional affiliation of each course.

So my data look like this:

PublicId	FirstName	LastName	Institution	CourseTitle	Credit
A2	FN	LN	UMN	Parent and Child Relations	3.00
A2	FN	LN	UMN	Orientation to Human Services	3.00
A2	FN	LN	Tufts	Child Growth And Development	3.00
A2	FN	LN	Tufts	Curric for Early Childhood	3.00
A2	FN	LN	Macalester	Fund Of Writing	3.00
A2	FN	LN	Macalester	General Psych	3.00
A2	FN	LN	Macalester	Field Exp Seminar 1	3.00

And I'd like it to look like this:

PublicId	Course1	Institution1	Course2	Institution2	Course3	Institution3	Course4	Institution4
A2	Parent and Child Relations	UMN	Orientation to Human Services	Child Growth And Development	Tufts	Curric for Early Childhood	Name	Tufts

I created a count variable to reshape the data using proc transpose, but I can't seem to get my desired output. I can only get one variable to transpose at a time. Here is what I have so far. Any help would be much appreciated!

data work.WFR_Coursework;
set access.WFR_Coursework;
run;

* creating a variable for the number of courses each teacher has */

data WFR_Coursework1;
	set WFR_Coursework;
	count + 1;
	by PublicId;
	if first.PublicId then count =1;
run;

/* check the new variable */

proc freq data = WFR_Coursework1;
	table count/list missing;
run;

proc print data = WFR_Coursework1 (obs = 5);
run;

/* now reshape according to the count while copying the other variables into the new dataset */

proc transpose data = WFR_Coursework1
	out = WFR_CourseworkMiddle
	prefix = Course
	name = CourseTitle;
var CourseTitle;
by PublicId;

run; 

proc transpose data = WFR_CourseworkMiddle
	out = WFR_CourseworkNEW
	delim=_;
	id Institution _name_ ;
	var Course1-Course57;
	by PublicID;

run;

proc print data = WFR_CourseworkNEW (obs = 5);
run;

novinosrin · Posted 07-18-2018 12:02 AM

Hi @rhilty see if this helps?


data have;
input (PublicId	FirstName	LastName	Institution) ($)	CourseTitle & $75.	Credit ;
cards;
A2	FN	LN	UMN	Parent and Child Relations	    3.00
A2	FN	LN	UMN	Orientation to Human Services	  3.00
A2	FN	LN	Tufts	Child Growth And Development	 3.00
A2	FN	LN	Tufts	Curric for Early Childhood	   3.00
A2	FN	LN	Macalester	Fund Of Writing	  3.00
A2	FN	LN	Macalester	General Psych	  3.00
A2	FN	LN	Macalester	Field Exp Seminar 1	 3.00
;

proc transpose data=have out=_w;
by PublicId Institution CourseTitle notsorted;
var Institution CourseTitle;
run;

data __w;
set _w;
by 	PublicId  CourseTitle notsorted;
if first.publicid then grp=0;
if first.CourseTitle then grp+1;
keep  publicid _name_ grp col1;
run;

proc transpose data=__w  out=want(drop=_name_) ;
by PublicId   notsorted;
var   col1 ;
id _name_ grp;
run;

View solution in original post

novinosrin · Posted 07-17-2018 11:56 PM

are you sure your sample output for your input sample is correct?

some institutions have course values and vice versa??????????

novinosrin · Posted 07-18-2018 12:02 AM

Hi @rhilty see if this helps?


data have;
input (PublicId	FirstName	LastName	Institution) ($)	CourseTitle & $75.	Credit ;
cards;
A2	FN	LN	UMN	Parent and Child Relations	    3.00
A2	FN	LN	UMN	Orientation to Human Services	  3.00
A2	FN	LN	Tufts	Child Growth And Development	 3.00
A2	FN	LN	Tufts	Curric for Early Childhood	   3.00
A2	FN	LN	Macalester	Fund Of Writing	  3.00
A2	FN	LN	Macalester	General Psych	  3.00
A2	FN	LN	Macalester	Field Exp Seminar 1	 3.00
;

proc transpose data=have out=_w;
by PublicId Institution CourseTitle notsorted;
var Institution CourseTitle;
run;

data __w;
set _w;
by 	PublicId  CourseTitle notsorted;
if first.publicid then grp=0;
if first.CourseTitle then grp+1;
keep  publicid _name_ grp col1;
run;

proc transpose data=__w  out=want(drop=_name_) ;
by PublicId   notsorted;
var   col1 ;
id _name_ grp;
run;

rhilty · Posted 07-18-2018 10:23 PM

This is exactly what I needed! THANK YOU!

novinosrin · Posted 07-18-2018 11:37 PM

@rhilty Thank you for the feedback. May i request you to please mark the solution as accepted and close the thread?

rhilty · Posted 07-19-2018 07:53 AM

Quick follow-up @novinosrin... I tried running this code on a similar dataset where, instead of looking at college courses by teacher (PublicId), we're looking at different jobs over time, all of which are stored on separate rows. This dataset has more variables than the last one, and some of the values are duplicated. For example, one teacher might have had the ExperienceType "Administrative Support Staff" under multiple entries. However, their JobTitle, StartDate, and EndDate all vary.

Here is the piece of the code that is returning the error:

proc transpose data = WFR_Employment1
	out = WFR_EmploymentMiddle;
var ExperienceType JobTitle StartDate EndDate;
by PublicId ExperienceType JobTitle StartDate EndDate notsorted;
run; 

data WFR_EmploymentMiddle;
set WFR_EmploymentMiddle;
by PublicId ExperienceType notsorted;
if first.PublicId then grp=0;
if first.ExperienceType then grp+1;
keep PublicId _name_ grp col1;
run;

proc transpose data = WFR_EmploymentMiddle
	out = WFR_EmploymentNEW (drop=_name_);
	by PublicId notsorted;
	var col1;
	id _name_ grp;

run;

And here is part of the error (it's repeated for many of the IDs in the dataset):

71   proc transpose data = WFR_EmploymentMiddle
72       out = WFR_EmploymentNEW (drop=_name_);
73       by PublicId notsorted;
74       var col1;
75       id _name_ grp;
76
77   run;

ERROR: The ID value "ExperienceType2" occurs twice in the same BY group.
ERROR: The ID value "JobTitle2" occurs twice in the same BY group.
ERROR: The ID value "StartDate2" occurs twice in the same BY group.
ERROR: The ID value "EndDate2" occurs twice in the same BY group.
ERROR: The ID value "ExperienceType2" occurs twice in the same BY group.
ERROR: The ID value "JobTitle2" occurs twice in the same BY group.
ERROR: The ID value "StartDate2" occurs twice in the same BY group.
ERROR: The ID value "EndDate2" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
      PublicId=A07608

Could it be an issue that certain values are duplicated, or is the issue the number of variables I'm trying to pair? Thank you again for your help figuring this out! I'm not even sure what to Google on this one.

Rowan

novinosrin · Posted 07-19-2018 10:12 AM

Good morning @rhilty It's difficult to tell without knowing the data as I can only guess there might be huge variances between the two samples. You came into the forum with an interesting and tricky not so conventional transpose and it that took me a good 12-15 mins to think through the grouping variables in the first place before my fingers started typing the syntax. A really good one.

For your additional requirement, your best is to provide a clear and representative sample of your real with an expected output for that input sample. This helps for anybody to comfortably comprehend and respond with a one stop shop solution that you could copy paste to a production environment. My 2 cents!

rhilty · Posted 07-19-2018 11:52 AM

Thanks for the feedback @novinosrin! I'm new to the forum (and to SAS) so it's helpful to hear what's most useful for figuring out solutions.

Here are a few rows of the data. I changed the ID numbers for data security reasons but can tell you that PublicId A3 yielded the error I included above (although this was just one of many IDs that came up with the same set of errors).

PublicId	ExperienceType	JobTitle	StartDate	EndDate
A2	Administrative Support Staff	Adminstrator/Teacher	2001-07-09 00:00:00.000	2013-09-02 00:00:00.000
A2	Education Coordinator/Educational Lead	Administrator	2013-09-09 00:00:00.000	2017-05-31 00:00:00.000
A2	Early Learning Teacher	State Pre-Kindergarten Teacher	2017-08-06 00:00:00.000	
A3	Elementary Teacher	State Pre-K Teacher	2015-08-24 00:00:00.000	
A3	Early Learning Teacher	PreK Lead Teacher/ Jr. Kindergarten	2011-05-23 00:00:00.000	2015-08-21 00:00:00.000
A3	Early Learning Teacher	PreK Lead Teacher	1999-09-06 00:00:00.000	2008-02-09 00:00:00.000
A3	Early Learning Teacher	PreK Lead Teacher	2008-08-04 00:00:00.000	2011-02-11 00:00:00.000
A4	Early Learning Teacher Assistant	TEACHER ASSISTANT	1981-09-07 00:00:00.000	
A5	Early Learning Teacher	Lead Teacher State Funded PreK	2010-09-07 00:00:00.000	
A5	Early Learning Teacher	Teacher	2015-08-17 00:00:00.000	
A5	Early Learning Teacher	Teacher	1984-09-17 00:00:00.000	1985-09-17 00:00:00.000
A5	Early Learning Teacher	Teacher	1985-09-17 00:00:00.000	2015-08-17 00:00:00.000

novinosrin · Posted 07-19-2018 01:02 PM

@rhilty that data makes my eyes hurt. Can you attach and excel sheet and also with an output sample plz

rhilty · Posted 07-19-2018 01:09 PM

@novinosrin here you go!

novinosrin · Posted 07-19-2018 01:55 PM

@rhilty Try

data have;
infile cards truncover;
input PublicId $	ExperienceType  & $50.	JobTitle  & $50.	StartDate & $50.	EndDate & $50.;
cards;
A2	Administrative Support Staff		Adminstrator/Teacher	2001-07-09 00:00:00.000		2013-09-02 00:00:00.000
A2	Education Coordinator/Ed Lead		Administrator	2013-09-09 00:00:00.000		2017-05-31 00:00:00.000
A2	Early Learning Teacher		State Pre-Kindergarten Teacher	2017-08-06 00:00:00.000	 .
A3	Elementary Teacher		State Pre-K Teacher		2015-08-24 00:00:00.000	  .
A3	Early Learning Teacher		PreK Lead Teacher/ Jr. K	2011-05-23 00:00:00.000		2015-08-21 00:00:00.000
A3	Early Learning Teacher		PreK Lead Teacher	1999-09-06 00:00:00.000		2008-02-09 00:00:00.000
A3	Early Learning Teacher		PreK Lead Teacher	2008-08-04 00:00:00.000		2011-02-11 00:00:00.000
;
run;
proc transpose data=have out=_w(keep=PublicId ExperienceType _name_ col1);
by _all_ notsorted;
var ExperienceType	JobTitle	StartDate	EndDate;
run;
data __w;
set _w;
by 	PublicId  ExperienceType notsorted;
if first.publicid then grp=0;
if _name_='ExperienceType' then grp+1;
keep  publicid _name_ grp col1 ;
run;

proc transpose data=__w  out=want(drop=_name_) ;
by PublicId   notsorted;
var   col1 ;
id _name_ grp;
run;

Registration is open

SAS Training: Just a Click Away