Hello
What is the way to create data set "wanted" from dataset "have" ?
Data have;
input id serial Revenue profit;
cards;
1 1 10 2
1 2 20 5
1 3 30 7
1 4 40 9
2 1 5 3
2 2 10 8
2 3 . .
2 4 . .
;
Run;
data wanted;
input ID Revenue1 Revenue2 Revenue3 Revenue4 profit1 profit2 profit3 profit4;
cards;
1 10 20 30 40 2 5 7 9
2 5 10 . . 3 8 . .
;
Run;
If you're looking for a solution for just this one specific case (and not trying to generalize it), then you could use brute force:
Data have;
input id serial Revenue profit;
cards;
1 1 10 2
1 2 20 5
1 3 30 7
1 4 40 9
2 1 5 3
2 2 10 8
2 3 . .
2 4 . .
;
Run;
data wanted (drop=serial revenue profit); set have;
retain revenue1 revenue2 revenue3 revenue4 profit1 profit2 profit3 profit4;
if serial=1 then do;
revenue1=revenue;
profit1=profit;
end;
if serial=2 then do;
revenue2=revenue;
profit2=profit;
end;
if serial=3 then do;
revenue3=revenue;
profit3=profit;
end;
if serial=4 then do;
revenue4=revenue;
profit4=profit;
end;
if serial=4 then output;
run;
proc print data=wanted; run;
Thanks,
The real data including 20 million rows in long structure and 20 variables so I am looking for a more efficient solution
@Ronein wrote:
The real data including 20 million rows in long structure and 20 variables so I am looking for a more efficient solution
More efficient solution: work with the data in the long structure.
@Ronein wrote:
Thanks,
The real data including 20 million rows in long structure and 20 variables so I am looking for a more efficient solution
Array method is slightly more efficient, single pass of the data then. The tutorial below shows exactly what you need.
But why?
Transposing data tutorials:
Long to Wide:
https://stats.idre.ucla.edu/sas/modules/reshaping-data-long-to-wide-using-the-data-step/
this trick seen first time from @Ksharp
proc sql;
select max(n) into :n from (select count(*) as n from have group by id);
quit;
proc summary data=have;
by id;
output out=wanted(drop=_:) idgroup(out[&n] (revenue profit)=);
run;
proc transpose
data = have (keep = id revenue:)
out = want_rev (drop = _name_)
prefix = revenue;
by id;
var revenue;
run;
proc transpose
data = have (keep = id profit:)
out = want_prof (drop = _name_)
prefix = profit;
by id;
var profit;
run;
data want;
merge want_rev want_prof;
by id;
run;
If your data's in a massively-parallel database like Oracle, Teradata, Snowflake, DB2, check out the LONG2WIDE macro in this old 2008 SGF paper:
It's a Bird, It's a Plane, It's SQL Transpose!
https://support.sas.com/resources/papers/proceedings/pdfs/sgf2008/089-2008.pdf
An SQL CASE statement approach still work with SAS datasets, but I'm thinking a DATA step using arrays or hard-coding would be faster, although SAS SQL has gotten much better at parallel SQL execution over the years.
Good luck!
p.s. As an alternative to using a SAS macro to generate SQL, I've also seen people use Excel formulas to generate transpose CASE statement logic from a list of column names and copy-and-paste that into their queries!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.