Re: Long to wide

Ronein · Posted 12-13-2021 12:12 PM

Hello

What is the way to create data set "wanted" from dataset "have" ?


Data have;
input id serial  Revenue  profit;
cards;
1 1 10 2
1 2 20 5
1 3 30 7
1 4 40 9
2 1 5 3
2 2 10 8
2 3 . .
2 4 . .
;
Run;

data wanted;
input ID Revenue1 Revenue2 Revenue3 Revenue4 profit1 profit2 profit3 profit4;
cards;
1 10 20 30 40 2 5 7 9
2 5 10 . . 3 8 . .
;
Run;

GraphGuy · Posted 12-13-2021 12:26 PM

If you're looking for a solution for just this one specific case (and not trying to generalize it), then you could use brute force:

Data have;
input id serial  Revenue  profit;
cards;
1 1 10 2
1 2 20 5
1 3 30 7
1 4 40 9
2 1 5 3
2 2 10 8
2 3 . .
2 4 . .
;
Run;

data wanted (drop=serial revenue profit); set have;
retain revenue1 revenue2 revenue3 revenue4 profit1 profit2 profit3 profit4;
if serial=1 then do;
 revenue1=revenue;
 profit1=profit;
 end;
if serial=2 then do;
 revenue2=revenue;
 profit2=profit;
 end;
if serial=3 then do;
 revenue3=revenue;
 profit3=profit;
 end;
if serial=4 then do;
 revenue4=revenue;
 profit4=profit;
 end;
if serial=4 then output;
run;

proc print data=wanted; run;

Ronein · Posted 12-13-2021 12:33 PM

Thanks,

The real data including 20 million rows in long structure and 20 variables so I am looking for a more efficient solution

PaigeMiller · Posted 12-13-2021 01:09 PM

@Ronein wrote:

The real data including 20 million rows in long structure and 20 variables so I am looking for a more efficient solution

More efficient solution: work with the data in the long structure.

--
Paige Miller

Reeza · Posted 12-13-2021 01:22 PM

@Ronein wrote:

Thanks,

The real data including 20 million rows in long structure and 20 variables so I am looking for a more efficient solution

Array method is slightly more efficient, single pass of the data then. The tutorial below shows exactly what you need.

But why?

Transposing data tutorials:
Long to Wide:
https://stats.idre.ucla.edu/sas/modules/reshaping-data-long-to-wide-using-the-data-step/

Reeza · Posted 12-13-2021 12:32 PM

Two proc transposes plus a merge/join.
Or an array method.

acordes · Posted 12-13-2021 12:34 PM

this trick seen first time from @Ksharp

proc sql;
select max(n) into :n from (select count(*) as n from have group by id);
quit;

proc summary data=have;
by id;
output out=wanted(drop=_:) idgroup(out[&n] (revenue profit)=);
run;

maguiremq · Posted 12-13-2021 12:37 PM

proc transpose
	data = have (keep = id revenue:)
	out = want_rev (drop = _name_)
	prefix = revenue;
	by id;
		var revenue;
run;

proc transpose
	data = have (keep = id profit:)
	out = want_prof (drop = _name_)
	prefix = profit;
	by id;
		var profit;
run;

data want;
	merge want_rev want_prof;
	by id;
run;

tc · Posted 12-22-2021 06:50 PM

If your data's in a massively-parallel database like Oracle, Teradata, Snowflake, DB2, check out the LONG2WIDE macro in this old 2008 SGF paper:

It's a Bird, It's a Plane, It's SQL Transpose!
https://support.sas.com/resources/papers/proceedings/pdfs/sgf2008/089-2008.pdf

An SQL CASE statement approach still work with SAS datasets, but I'm thinking a DATA step using arrays or hard-coding would be faster, although SAS SQL has gotten much better at parallel SQL execution over the years.

Good luck!

p.s. As an alternative to using a SAS macro to generate SQL, I've also seen people use Excel formulas to generate transpose CASE statement logic from a list of column names and copy-and-paste that into their queries!

SAS Training: Just a Click Away