Help using Base SAS procedures

Panel Data: PROC TRANSPOSE vs ARRAY in DATA STEP

Reply
Contributor
Posts: 72

Panel Data: PROC TRANSPOSE vs ARRAY in DATA STEP

Hello everyone,

I have a dataset that I want to transform into a panel data set. I usually do it in 2 ways: (1) built-in Proc Transpose, and (2) array in a Data step. For example,

1st way:
proc sort data=mydata; by isin; run;

proc transpose data=mydata out=myPaneldata (rename=(col1=myvar));
by isin;
var var1-var 500;
run;



2nd way:
data myPaneldata;
set mydata;
array vars{*} _NUMERIC_;
do k=1 to dim(vars);
myvar=vars{k}; output;
end;
drop k var1-var500;
run;


My question:
- What is the strength and weakness of each method?
- From what I observed repeatedly in several large datasets I run, it seems that using ARRAY in a data set is faster? How come? I argue that PROC TRANSPOSE is a built-in function and for that reason, experts must have chosen the most efficient, fastest way of transposing a dataset, and so anything that is not built-in is 99% of the time slower or at equally fast but less safe.

Can anyone please advise?
PROC Star
Posts: 7,363

Re: Panel Data: PROC TRANSPOSE vs ARRAY in DATA STEP

Given the nature of the task, I would vote for the datastep approach to this problem. Both approaches have room for coding error, thus have to be carefully checked to ensure that you are doing what you intend to do, but the datastep approach, in this case, will run in almost half the time.

Similarly, there are many cases where proc summary can be used to transpose data and do it with much less code and quite a bit quicker than proc transpose.

On the other hand, as the complexity of the task increases, yes, proc transpose or proc summary can definitely pose less risk than hard coding the solution.

Art
----------
> Hello everyone,
>
> I have a dataset that I want to transform into a
> panel data set. I usually do it in 2 ways: (1)
> built-in Proc Transpose, and (2) array in a Data
> step. For example,
>
> 1st way:
> proc sort data=mydata; by isin; run;
>
> proc transpose data=mydata out=myPaneldata
> (rename=(col1=myvar));
> by isin;
> var var1-var 500;
> run;
>
>
> 2nd way:
> data myPaneldata;
> set mydata;
> array vars{*} _NUMERIC_;
> do k=1 to dim(vars);
> myvar=vars{k}; output;
> end;
> drop k var1-var500;
> run;
>
> My question:
> - What is the strength and weakness of each method?
> - From what I observed repeatedly in several large
> datasets I run, it seems that using ARRAY in a data
> set is faster? How come? I argue that PROC TRANSPOSE
> is a built-in function and for that reason, experts
> must have chosen the most efficient, fastest way of
> transposing a dataset, and so anything that is not
> built-in is 99% of the time slower or at equally fast
> but less safe.
>
> Can anyone please advise?
Contributor
Posts: 72

Re: Panel Data: PROC TRANSPOSE vs ARRAY in DATA STEP

Thank you very much. Now I'm aware that there is Proc Summary for the similar purpose, and I will learn how to use it apart from Proc Transpose + Data step array.
Occasional Contributor sos
Occasional Contributor
Posts: 12

Panel Data: PROC TRANSPOSE vs ARRAY in DATA STEP

For both ways, could say something about how to create panel IDs? thanks!

Trusted Advisor
Posts: 1,300

Panel Data: PROC TRANSPOSE vs ARRAY in DATA STEP

In my experience with very large data the performance of proc transpose breaks down and becomes highly inefficient in which case data step with array subscript processing becomes more increasingly more efficeint, this is also more sensitive to the number of variables vs. the number of rows.  With the number of variables you are dealing with, I have no experienced issues with transpose performance and usually for smaller sets the simplicity of the call outways the small performance difference I experience on my systems. 

Ask a Question
Discussion stats
  • 4 replies
  • 671 views
  • 0 likes
  • 4 in conversation