turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- Panel Data: PROC TRANSPOSE vs ARRAY in DATA STEP

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-10-2011 09:13 PM

Hello everyone,

I have a dataset that I want to transform into a panel data set. I usually do it in 2 ways: (1) built-in Proc Transpose, and (2) array in a Data step. For example,

1st way:

**proc sort data=mydata; by isin; run;**

proc transpose data=mydata out=myPaneldata (rename=(col1=myvar));

by isin;

var var1-var 500;

run;

2nd way:

**data myPaneldata;**

set mydata;

array vars{*} _NUMERIC_;

do k=1 to dim(vars);

myvar=vars{k}; output;

end;

drop k var1-var500;

run;

My question:

- What is the strength and weakness of each method?

- From what I observed repeatedly in several large datasets I run, it seems that using ARRAY in a data set is faster? How come? I argue that PROC TRANSPOSE is a built-in function and for that reason, experts must have chosen the most efficient, fastest way of transposing a dataset, and so anything that is not built-in is 99% of the time slower or at equally fast but less safe.

Can anyone please advise?

I have a dataset that I want to transform into a panel data set. I usually do it in 2 ways: (1) built-in Proc Transpose, and (2) array in a Data step. For example,

1st way:

proc transpose data=mydata out=myPaneldata (rename=(col1=myvar));

by isin;

var var1-var 500;

run;

2nd way:

set mydata;

array vars{*} _NUMERIC_;

do k=1 to dim(vars);

myvar=vars{k}; output;

end;

drop k var1-var500;

run;

My question:

- What is the strength and weakness of each method?

- From what I observed repeatedly in several large datasets I run, it seems that using ARRAY in a data set is faster? How come? I argue that PROC TRANSPOSE is a built-in function and for that reason, experts must have chosen the most efficient, fastest way of transposing a dataset, and so anything that is not built-in is 99% of the time slower or at equally fast but less safe.

Can anyone please advise?

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to smilingmelbourne

05-11-2011 12:27 AM

Given the nature of the task, I would vote for the datastep approach to this problem. Both approaches have room for coding error, thus have to be carefully checked to ensure that you are doing what you intend to do, but the datastep approach, in this case, will run in almost half the time.

Similarly, there are many cases where proc summary can be used to transpose data and do it with much less code and quite a bit quicker than proc transpose.

On the other hand, as the complexity of the task increases, yes, proc transpose or proc summary can definitely pose less risk than hard coding the solution.

Art

----------

> Hello everyone,

>

> I have a dataset that I want to transform into a

> panel data set. I usually do it in 2 ways: (1)

> built-in Proc Transpose, and (2) array in a Data

> step. For example,

>

> 1st way:

> proc sort data=mydata; by isin; run;

>

> proc transpose data=mydata out=myPaneldata

> (rename=(col1=myvar));

> by isin;

> var var1-var 500;

> run;

>

>

> 2nd way:

> data myPaneldata;

> set mydata;

> array vars{*} _NUMERIC_;

> do k=1 to dim(vars);

> myvar=vars{k}; output;

> end;

> drop k var1-var500;

> run;

>

> My question:

> - What is the strength and weakness of each method?

> - From what I observed repeatedly in several large

> datasets I run, it seems that using ARRAY in a data

> set is faster? How come? I argue that PROC TRANSPOSE

> is a built-in function and for that reason, experts

> must have chosen the most efficient, fastest way of

> transposing a dataset, and so anything that is not

> built-in is 99% of the time slower or at equally fast

> but less safe.

>

> Can anyone please advise?

Similarly, there are many cases where proc summary can be used to transpose data and do it with much less code and quite a bit quicker than proc transpose.

On the other hand, as the complexity of the task increases, yes, proc transpose or proc summary can definitely pose less risk than hard coding the solution.

Art

----------

> Hello everyone,

>

> I have a dataset that I want to transform into a

> panel data set. I usually do it in 2 ways: (1)

> built-in Proc Transpose, and (2) array in a Data

> step. For example,

>

> 1st way:

> proc sort data=mydata; by isin; run;

>

> proc transpose data=mydata out=myPaneldata

> (rename=(col1=myvar));

> by isin;

> var var1-var 500;

> run;

>

>

> 2nd way:

> data myPaneldata;

> set mydata;

> array vars{*} _NUMERIC_;

> do k=1 to dim(vars);

> myvar=vars{k}; output;

> end;

> drop k var1-var500;

> run;

>

> My question:

> - What is the strength and weakness of each method?

> - From what I observed repeatedly in several large

> datasets I run, it seems that using ARRAY in a data

> set is faster? How come? I argue that PROC TRANSPOSE

> is a built-in function and for that reason, experts

> must have chosen the most efficient, fastest way of

> transposing a dataset, and so anything that is not

> built-in is 99% of the time slower or at equally fast

> but less safe.

>

> Can anyone please advise?

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to art297

05-11-2011 02:00 AM

Thank you very much. Now I'm aware that there is Proc Summary for the similar purpose, and I will learn how to use it apart from Proc Transpose + Data step array.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to smilingmelbourne

10-31-2011 10:00 PM

For both ways, could say something about how to create panel IDs? thanks!

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to smilingmelbourne

11-02-2011 01:47 PM

In my experience with very large data the performance of proc transpose breaks down and becomes highly inefficient in which case data step with array subscript processing becomes more increasingly more efficeint, this is also more sensitive to the number of variables vs. the number of rows. With the number of variables you are dealing with, I have no experienced issues with transpose performance and usually for smaller sets the simplicity of the call outways the small performance difference I experience on my systems.