Solved: proc transpose on multiple columns

Emjay · Posted 02-24-2020 05:27 AM

I have a table in the following form: (Actually there are 10 Years and the columns don't end after Pieces, but I hope for the question it's fine.)

data have;
input Year Name_ID Type $ Price Pieces;
datalines;
2017 1 Aa 100 .
2017 1 Bb 200 3
2017 2 Cc 500 2
2018 1 Aa 30 11
2018 1 Bb 40 32
2018 2 Aa 50 5
2018 2 Cc 10 6
;
run;

What I want is to transpose the data to the following form:

data want;
input Name_ID Type $ Price_2017 Pieces_2017 Price_2018 Pieces_2018;
datalines;
1 Aa 100 . 30 11
1 Bb 200 3 40 32
2 Aa . . 50 5
2 Cc 500 2 10 6
;
run;

I've done this before on a different dataset with a transpose for every single column and merged them together. But now I have a lot of columns and I don't want to do this manually. And another downside is, that after the merge, the columns aren't in the proper order either and the retain would need a lot of typing.

Can you please help me in finding a smarter way?

ed_sas_member · Posted 02-24-2020 06:34 AM

Hi @Emjay

Here is an approach to do this:

proc sort data=have out=have_sorted;
	by Name_ID Type Year;
run;

/* Create macrovariables */
proc sql noprint;
	select min(Year), max(Year) into: min_year trimmed,:max_year trimmed from have;
	select name into:name1-  from dictionary.columns where libname="WORK" and memname= "HAVE"
		   and name not in('Year', 'Name_ID', 'Type');
	select name into:variables_sp separated by ' ' from dictionary.columns where libname="WORK" and memname= "HAVE"
		   and name not in('Year', 'Name_ID', 'Type');
	select count(name) into:nb from dictionary.columns where libname="WORK" and memname= "HAVE"
		   and name not in('Year', 'Name_ID', 'Type');
run;
		/* &min_year = 2017 */
		/* &max_year = 2018 */
		/* &name1 = price */
		/* &name2 = pieces */
		/* &variables_sp = Price Pieces*/
		/* &nb = 2*/

%macro _tr();
	
	/* Transpose data */
	
	data have_tr (drop=counter year &variables_sp);
	
		set have_sorted;
		by Name_ID Type;

		if first.Type then call missing (counter);
	
		%do i=1 %to &nb;
	
			array _&&name&i. (&min_year:&max_year) &&name&i.._&min_year - &&name&i.._&max_year;
			retain _&&name&i.;
 			if first.Type then call missing (counter, of _&&name&i.(*));
 			counter+1;	
			_&&name&i.(year) = &&name&i.;

		%end;
	
		if last.Type then output;
	run;
	
	/* Re-order variables */
	
	proc sql noprint;
		select name into:var_keep separated by ',' from dictionary.columns where libname="WORK" and memname= "HAVE_TR" and name not in('Year', 'Name_ID', 'Type') order by scan(name,2,'_'), name desc; 
		create table want as select Name_ID, Type, &var_keep. from have_tr;
	quit;
%mend;

%_tr;

View solution in original post

PeterClemmensen · Posted 02-24-2020 06:23 AM

"the columns aren't in the proper order". This makes me wonder if this is for reporting purposes or you actually need a data set for this?

The DATA to DATA Step Macro
Blog: SASnrd

Emjay · Posted 02-24-2020 06:28 AM

I have to provide the output to my colleagues in Excel and there it has to be like:

Price_2017 Pieces_2017 Price_2018 Pieces_2018

and not:

Price_2017 Price_2018 Pieces_2017 Pieces_2018

That's what I mean with the not propper order.

ed_sas_member · Posted 02-24-2020 06:34 AM

Hi @Emjay

Here is an approach to do this:

proc sort data=have out=have_sorted;
	by Name_ID Type Year;
run;

/* Create macrovariables */
proc sql noprint;
	select min(Year), max(Year) into: min_year trimmed,:max_year trimmed from have;
	select name into:name1-  from dictionary.columns where libname="WORK" and memname= "HAVE"
		   and name not in('Year', 'Name_ID', 'Type');
	select name into:variables_sp separated by ' ' from dictionary.columns where libname="WORK" and memname= "HAVE"
		   and name not in('Year', 'Name_ID', 'Type');
	select count(name) into:nb from dictionary.columns where libname="WORK" and memname= "HAVE"
		   and name not in('Year', 'Name_ID', 'Type');
run;
		/* &min_year = 2017 */
		/* &max_year = 2018 */
		/* &name1 = price */
		/* &name2 = pieces */
		/* &variables_sp = Price Pieces*/
		/* &nb = 2*/

%macro _tr();
	
	/* Transpose data */
	
	data have_tr (drop=counter year &variables_sp);
	
		set have_sorted;
		by Name_ID Type;

		if first.Type then call missing (counter);
	
		%do i=1 %to &nb;
	
			array _&&name&i. (&min_year:&max_year) &&name&i.._&min_year - &&name&i.._&max_year;
			retain _&&name&i.;
 			if first.Type then call missing (counter, of _&&name&i.(*));
 			counter+1;	
			_&&name&i.(year) = &&name&i.;

		%end;
	
		if last.Type then output;
	run;
	
	/* Re-order variables */
	
	proc sql noprint;
		select name into:var_keep separated by ',' from dictionary.columns where libname="WORK" and memname= "HAVE_TR" and name not in('Year', 'Name_ID', 'Type') order by scan(name,2,'_'), name desc; 
		create table want as select Name_ID, Type, &var_keep. from have_tr;
	quit;
%mend;

%_tr;

Emjay · Posted 02-24-2020 07:37 AM

Hi @ed_sas_member thanks for your help!

I'm not really familiar with the array-function, so I don't understand all of your code, but it looks fine. 🙂

Is there no easy way to do it with a transpose, or is array just nicer?

The only problem is, that I have columns that already contain a "_" in the beginning, or even more than one. So the scan is not really working. I may have to change the original names then...

ed_sas_member · Posted 02-24-2020 07:44 AM

Hi @Emjay

You're welcome.

You can also obtain a similar output using multiple PROC TRANSPOSE (one transpose pieces, another to transpose price, ...) and then merge the results. The only 'difficulty' is to label the new columns (price2017, ...).

Regarding, the last part of the code, using a proc sql to rearrange columns, I suggest that you use another function than the SCAN function to identify variable ending with the year (e.. SUBSTR, etc.)

Best,

Emjay · Posted 02-24-2020 08:08 AM

Well labeling was not the problem, but sorting was. 🙂 I just thought there may be a nicer way to it, than writing a proc transpose for every single column.

I didn't think about substr. I'll try that later, thanks.

novinosrin · Posted 02-24-2020 08:24 AM

Hi @Emjay Good morning and Sorry my late entry to the part as I have been away for a few days missing all the excitement here. If I understand your requirement correctly, all you need is a very simple double transpose. Of course, your data must be sorted.

So,in SAS words

data have;
input Year Name_ID Type $ Price Pieces;
datalines;
2017 1 Aa 100 .
2017 1 Bb 200 3
2017 2 Cc 500 2
2018 1 Aa 30 11
2018 1 Bb 40 32
2018 2 Aa 50 5
2018 2 Cc 10 6
;
run;

proc sort data=have out=_have;
by name_id type year;
run;
proc transpose data=_have out=temp;
by name_id type year;
var price pieces;
run;

proc transpose data=temp out=want(drop=_:);
by  name_id type;
id _name_ year;
run;

Emjay · Posted 02-24-2020 08:49 AM

Hi @novinosrin

this looks like the thing I had in mind. I never really manage to get my head around when to use ID and when it's not needed. 😕

Any idea on how I can get a "_" between the names in the transpose? So that there is Price_2017 and not Price2017.

novinosrin · Posted 02-24-2020 08:53 AM

HI @Emjay Oh sorry I forgot to add the delimiter option, Please notice the Revised with delimiter option in the 2nd transpose step

proc sort data=have out=_have;
by name_id type year;
run;
proc transpose data=_have out=temp;
by name_id type year;
var price pieces;
run;

proc transpose data=temp out=want(drop=_:) delimiter=_;
by  name_id type;
id _name_ year;
run;

Emjay · Posted 02-24-2020 08:57 AM

Cool, thanks!

Emjay · Posted 02-24-2020 09:38 AM

Sorry to bother you again @novinosrin but I came across an issue, when I ran your code on my whole data set.

I've made a small adaption and added line number 3 with the year 2016. Since the year is not present for Name_ID 1 the order in the final data set ist not ascending anymore.

data have;
input Year Name_ID Type $ Price Pieces;
datalines;
2017 1 Aa 100 .
2017 1 Bb 200 3
2016 2 Cc 80 4
2017 2 Cc 500 2
2018 1 Aa 30 11
2018 1 Bb 40 32
2018 2 Aa 50 5
2018 2 Cc 10 6
;
run;

If I change the first sort and start with year, it will fail in the end due to wrong sorting. I thought about maybe adding some fake data with all years and deleting it later. Can you maybe think of something more elegant?

novinosrin · Posted 02-24-2020 09:53 AM

Hi @Emjay No bother at all and don't have to be so formal. Can you also illustrate/show the expected output for the modified one. I have some assumptions, but before playing with assumptions I believe it's better to know what you want as output for quicker solutions

Emjay · Posted 02-24-2020 10:00 AM

Hi @novinosrin

I want the final set to like that:

data want;
input Name_ID Type $ Price_2016 Pieces_2016 Price_2017 Pieces_2017 Price_2018 Pieces_2018;
datalines;
1 Aa . . 100 . 30 11
1 Bb . . 200 3 40 32
2 Aa . . . . 50 5
2 Cc 80 4 500 2 10 6
;
run;

So, persist the order of the base-columns but start with the smallest year.

novinosrin · Posted 02-24-2020 12:28 PM

I am gonna seek Guru @data_null__'s help if he doesn't mind..

Guru @data_null__ , If you don;t mind, can you give us a hand in getting the correct combination in TYPES/WAYS in Proc Summary to get all year values for each ID TYPE _NAME_ independent group. The idea is to avoid a cartesian(SQL) or any datastep programming fix rather to use Proc summary options before the proc transpose so that we get the missing years and the years in order just how OP wants. Once you could help this part, then it's the same transpose after. The completetypes pulls the unintended types and ID's and my mind is not working well. May i request your time plz . Thank you*1e6 in advance.

proc transpose on multiple columns

Re: proc transpose on multiple columns

Re: proc transpose on multiple columns

Re: proc transpose on multiple columns

Re: proc transpose on multiple columns

Re: proc transpose on multiple columns

Re: proc transpose on multiple columns

Re: proc transpose on multiple columns

Re: proc transpose on multiple columns

Re: proc transpose on multiple columns

Re: proc transpose on multiple columns

Re: proc transpose on multiple columns

Re: proc transpose on multiple columns

Re: proc transpose on multiple columns

Re: proc transpose on multiple columns

Re: proc transpose on multiple columns

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!