BookmarkSubscribeRSS Feed
smilingmelbourne
Fluorite | Level 6
Hello everyone,

I have a dataset that I want to transform into a panel data set. I usually do it in 2 ways: (1) built-in Proc Transpose, and (2) array in a Data step. For example,

1st way:
proc sort data=mydata; by isin; run;

proc transpose data=mydata out=myPaneldata (rename=(col1=myvar));
by isin;
var var1-var 500;
run;



2nd way:
data myPaneldata;
set mydata;
array vars{*} _NUMERIC_;
do k=1 to dim(vars);
myvar=vars{k}; output;
end;
drop k var1-var500;
run;


My question:
- What is the strength and weakness of each method?
- From what I observed repeatedly in several large datasets I run, it seems that using ARRAY in a data set is faster? How come? I argue that PROC TRANSPOSE is a built-in function and for that reason, experts must have chosen the most efficient, fastest way of transposing a dataset, and so anything that is not built-in is 99% of the time slower or at equally fast but less safe.

Can anyone please advise?
4 REPLIES 4
art297
Opal | Level 21
Given the nature of the task, I would vote for the datastep approach to this problem. Both approaches have room for coding error, thus have to be carefully checked to ensure that you are doing what you intend to do, but the datastep approach, in this case, will run in almost half the time.

Similarly, there are many cases where proc summary can be used to transpose data and do it with much less code and quite a bit quicker than proc transpose.

On the other hand, as the complexity of the task increases, yes, proc transpose or proc summary can definitely pose less risk than hard coding the solution.

Art
----------
> Hello everyone,
>
> I have a dataset that I want to transform into a
> panel data set. I usually do it in 2 ways: (1)
> built-in Proc Transpose, and (2) array in a Data
> step. For example,
>
> 1st way:
> proc sort data=mydata; by isin; run;
>
> proc transpose data=mydata out=myPaneldata
> (rename=(col1=myvar));
> by isin;
> var var1-var 500;
> run;
>
>
> 2nd way:
> data myPaneldata;
> set mydata;
> array vars{*} _NUMERIC_;
> do k=1 to dim(vars);
> myvar=vars{k}; output;
> end;
> drop k var1-var500;
> run;
>
> My question:
> - What is the strength and weakness of each method?
> - From what I observed repeatedly in several large
> datasets I run, it seems that using ARRAY in a data
> set is faster? How come? I argue that PROC TRANSPOSE
> is a built-in function and for that reason, experts
> must have chosen the most efficient, fastest way of
> transposing a dataset, and so anything that is not
> built-in is 99% of the time slower or at equally fast
> but less safe.
>
> Can anyone please advise?
smilingmelbourne
Fluorite | Level 6
Thank you very much. Now I'm aware that there is Proc Summary for the similar purpose, and I will learn how to use it apart from Proc Transpose + Data step array.
sos
Calcite | Level 5 sos
Calcite | Level 5

For both ways, could say something about how to create panel IDs? thanks!

FriedEgg
SAS Employee

In my experience with very large data the performance of proc transpose breaks down and becomes highly inefficient in which case data step with array subscript processing becomes more increasingly more efficeint, this is also more sensitive to the number of variables vs. the number of rows.  With the number of variables you are dealing with, I have no experienced issues with transpose performance and usually for smaller sets the simplicity of the call outways the small performance difference I experience on my systems. 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1759 views
  • 0 likes
  • 4 in conversation