BookmarkSubscribeRSS Feed
almmotamedi
Obsidian | Level 7

Hi SAS users,

 

I have a table as below:

 

Variable1      Variable2      Variable3

      1                   a                    b

      1                   c                    d

 

 

and I would like to expand my columns (creating new variables) for each unique Variable1 as below:

 

Variable1      Variable2      Variable3     Variable4      Variable5

     1                    a                    b                  c                  d

 

 

I have milions of rows (with unique variable1). Could you please help with it?

7 REPLIES 7
Reeza
Super User
Looks like your possibly doing a long to wide data transpose. Can you provide more sample data, your base case is not enough to generate code that will likely cover your requirements.
almmotamedi
Obsidian | Level 7

Let me put it this way (another example):

 

Variable1    Variable2    Variable3   

     1                  W                 3.2

     1                  X                  5.1

     2                  X                  7.0

     3                  E                  8.9

     3                  W                 1.3

 

Whenever, the Variable2 corresponding to Variable1 (ID) is changing its value, we would like to expand it to new columns. In fact per each new category (based on Variable2), two new columns are added to the right:

 

Variable1   Variable2    Variable3   Variable4   Variable5

     1                W                3.2              X               5.1

     2                X                 7.0             null            null

     3                E                 8.9             W               1.3

 

 

PURPOSE: At the end, we are interested to have a unique Variable1 in first column; and the corresponding category(ies) are expanded to the right instead of stacking on top of eachother.

 

I hope its more clear now. Thank you for help

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, two questions.  

The first question is why.  What is it your attempting to do which requires a transposed dataset.  If you think about it you have millions of rows, which could mean up to 3 * millions of columns.  Totally unworkable.  SAS procedures, and SQL, allow what is called by group processing or grouped functions.  These are easier to program, and faster.

 

The second question is less relevant, as from point 1 I don't believe transposing is a good idea.  But if you were to do that, what relationship is there in the data to say that a should be in variable2, and c in variable4 - I mean if the data is sorted differently, they could swap over.

 

If it was my data I would actually normalise that data further - variable 1 appears to be some sort of ID, and will assume for now there is no relation between VAR2-VAR3:

ID        VAR

1          A

1          B

1          C

1          D

...

 

Its very easy to group the data up in this structure.

almmotamedi
Obsidian | Level 7
I put another example, and I hope its more clear now. Thank you for help
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Yes, but you haven't answered the question, why?  What is the purpose of having a transposed dataset?  In you example, you have only 2 elements being transposed, however you say your data has millions of records, what are you intending to do with - assuming it even runs with this many - a dataset with possible hundreds of columns?

Unless there is a very specific reason to have hundreds of columns, the dataset you have now is far easier to work with.  Take your example dataset.  Say I want to get the mean of all W variables.  How are you going to do this in your transposed dataset?  You would need to setup arrays, scan over the arrays, check if element is W, then add element+1 value to a counter etc.  Far more code than simply:

proc means data=have;

  where variable2="W";

  output out=want mean=mean;

run;

You see how simpler the code is to work with, and it will be far quicker than array processing.  The only time I can see transposed data being beneficial is in an output file for people to review, however if you have hundreds of columns, is it really going to get reviewed?

almmotamedi
Obsidian | Level 7

Although there are millions of rows, it is not going to generate hundreds of columns. Because, the number of Variable2 values (categories) are limited; it is probably about 10 . So it means that I am going to have 9*2 = 18 new expanded columns.

 

Again, the purpose is to use this file to merge with another dataset. It makes it easier to review the databse once I have only unique Variable1 values in 1st column. 

 

Thank you

FreelanceReinh
Jade | Level 19

Hi @almmotamedi,

 

You can find no less than four different brilliant techniques for this task in this very recent thread:

https://communities.sas.com/t5/Base-SAS-Programming/Data-Transpose-with-proc-summary-and-column-reaa...

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1959 views
  • 0 likes
  • 4 in conversation