DATA Step, Macro, Functions and more

How to expand columns in a dataset?

Reply
Contributor
Posts: 72

How to expand columns in a dataset?

[ Edited ]

Hi SAS users,

 

I have a table as below:

 

Variable1      Variable2      Variable3

      1                   a                    b

      1                   c                    d

 

 

and I would like to expand my columns (creating new variables) for each unique Variable1 as below:

 

Variable1      Variable2      Variable3     Variable4      Variable5

     1                    a                    b                  c                  d

 

 

I have milions of rows (with unique variable1). Could you please help with it?

Super User
Posts: 17,912

Re: How to expand columns in a dataset?

Looks like your possibly doing a long to wide data transpose. Can you provide more sample data, your base case is not enough to generate code that will likely cover your requirements.
Contributor
Posts: 72

Re: How to expand columns in a dataset?

[ Edited ]

Let me put it this way (another example):

 

Variable1    Variable2    Variable3   

     1                  W                 3.2

     1                  X                  5.1

     2                  X                  7.0

     3                  E                  8.9

     3                  W                 1.3

 

Whenever, the Variable2 corresponding to Variable1 (ID) is changing its value, we would like to expand it to new columns. In fact per each new category (based on Variable2), two new columns are added to the right:

 

Variable1   Variable2    Variable3   Variable4   Variable5

     1                W                3.2              X               5.1

     2                X                 7.0             null            null

     3                E                 8.9             W               1.3

 

 

PURPOSE: At the end, we are interested to have a unique Variable1 in first column; and the corresponding category(ies) are expanded to the right instead of stacking on top of eachother.

 

I hope its more clear now. Thank you for help

Super User
Super User
Posts: 7,417

Re: How to expand columns in a dataset?

Well, two questions.  

The first question is why.  What is it your attempting to do which requires a transposed dataset.  If you think about it you have millions of rows, which could mean up to 3 * millions of columns.  Totally unworkable.  SAS procedures, and SQL, allow what is called by group processing or grouped functions.  These are easier to program, and faster.

 

The second question is less relevant, as from point 1 I don't believe transposing is a good idea.  But if you were to do that, what relationship is there in the data to say that a should be in variable2, and c in variable4 - I mean if the data is sorted differently, they could swap over.

 

If it was my data I would actually normalise that data further - variable 1 appears to be some sort of ID, and will assume for now there is no relation between VAR2-VAR3:

ID        VAR

1          A

1          B

1          C

1          D

...

 

Its very easy to group the data up in this structure.

Contributor
Posts: 72

Re: How to expand columns in a dataset?

I put another example, and I hope its more clear now. Thank you for help
Super User
Super User
Posts: 7,417

Re: How to expand columns in a dataset?

Yes, but you haven't answered the question, why?  What is the purpose of having a transposed dataset?  In you example, you have only 2 elements being transposed, however you say your data has millions of records, what are you intending to do with - assuming it even runs with this many - a dataset with possible hundreds of columns?

Unless there is a very specific reason to have hundreds of columns, the dataset you have now is far easier to work with.  Take your example dataset.  Say I want to get the mean of all W variables.  How are you going to do this in your transposed dataset?  You would need to setup arrays, scan over the arrays, check if element is W, then add element+1 value to a counter etc.  Far more code than simply:

proc means data=have;

  where variable2="W";

  output out=want mean=mean;

run;

You see how simpler the code is to work with, and it will be far quicker than array processing.  The only time I can see transposed data being beneficial is in an output file for people to review, however if you have hundreds of columns, is it really going to get reviewed?

Contributor
Posts: 72

Re: How to expand columns in a dataset?

[ Edited ]

Although there are millions of rows, it is not going to generate hundreds of columns. Because, the number of Variable2 values (categories) are limited; it is probably about 10 . So it means that I am going to have 9*2 = 18 new expanded columns.

 

Again, the purpose is to use this file to merge with another dataset. It makes it easier to review the databse once I have only unique Variable1 values in 1st column. 

 

Thank you

Trusted Advisor
Posts: 1,115

Re: How to expand columns in a dataset?

Hi @almmotamedi,

 

You can find no less than four different brilliant techniques for this task in this very recent thread:

https://communities.sas.com/t5/Base-SAS-Programming/Data-Transpose-with-proc-summary-and-column-reaa...

Ask a Question
Discussion stats
  • 7 replies
  • 356 views
  • 0 likes
  • 4 in conversation