Hello
What is the way to create data set "wanted" from dataset "have" ?
Data have;
input id serial Revenue profit;
cards;
1 1 10 2
1 2 20 5
1 3 30 7
1 4 40 9
2 1 5 3
2 2 10 8
2 3 . .
2 4 . .
;
Run;
data wanted;
input ID Revenue1 Revenue2 Revenue3 Revenue4 profit1 profit2 profit3 profit4;
cards;
1 10 20 30 40 2 5 7 9
2 5 10 . . 3 8 . .
;
Run;
If you're looking for a solution for just this one specific case (and not trying to generalize it), then you could use brute force:
Data have;
input id serial Revenue profit;
cards;
1 1 10 2
1 2 20 5
1 3 30 7
1 4 40 9
2 1 5 3
2 2 10 8
2 3 . .
2 4 . .
;
Run;
data wanted (drop=serial revenue profit); set have;
retain revenue1 revenue2 revenue3 revenue4 profit1 profit2 profit3 profit4;
if serial=1 then do;
revenue1=revenue;
profit1=profit;
end;
if serial=2 then do;
revenue2=revenue;
profit2=profit;
end;
if serial=3 then do;
revenue3=revenue;
profit3=profit;
end;
if serial=4 then do;
revenue4=revenue;
profit4=profit;
end;
if serial=4 then output;
run;
proc print data=wanted; run;
Thanks,
The real data including 20 million rows in long structure and 20 variables so I am looking for a more efficient solution
@Ronein wrote:
The real data including 20 million rows in long structure and 20 variables so I am looking for a more efficient solution
More efficient solution: work with the data in the long structure.
@Ronein wrote:
Thanks,
The real data including 20 million rows in long structure and 20 variables so I am looking for a more efficient solution
Array method is slightly more efficient, single pass of the data then. The tutorial below shows exactly what you need.
But why?
Transposing data tutorials:
Long to Wide:
https://stats.idre.ucla.edu/sas/modules/reshaping-data-long-to-wide-using-the-data-step/
this trick seen first time from @Ksharp
proc sql;
select max(n) into :n from (select count(*) as n from have group by id);
quit;
proc summary data=have;
by id;
output out=wanted(drop=_:) idgroup(out[&n] (revenue profit)=);
run;
proc transpose
data = have (keep = id revenue:)
out = want_rev (drop = _name_)
prefix = revenue;
by id;
var revenue;
run;
proc transpose
data = have (keep = id profit:)
out = want_prof (drop = _name_)
prefix = profit;
by id;
var profit;
run;
data want;
merge want_rev want_prof;
by id;
run;
If your data's in a massively-parallel database like Oracle, Teradata, Snowflake, DB2, check out the LONG2WIDE macro in this old 2008 SGF paper:
It's a Bird, It's a Plane, It's SQL Transpose!
https://support.sas.com/resources/papers/proceedings/pdfs/sgf2008/089-2008.pdf
An SQL CASE statement approach still work with SAS datasets, but I'm thinking a DATA step using arrays or hard-coding would be faster, although SAS SQL has gotten much better at parallel SQL execution over the years.
Good luck!
p.s. As an alternative to using a SAS macro to generate SQL, I've also seen people use Excel formulas to generate transpose CASE statement logic from a list of column names and copy-and-paste that into their queries!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.