Help with Tranpose

Accepted Solution Solved
Reply
Super Contributor
Posts: 418
Accepted Solution

Help with Tranpose

Hello all. I have a dataset that has numerous amounts of array data concatenated together by a common string within a cell (string is a ';').  I have developed a process that will loop through each variable , split it out into X new variables, and then tranpose them (maintaining the data order, that step is 100% CRUCIAL)! An example data is given below

Key      Var

1          a;A;B;b;

2          c;b;c;b

Desired results

Key     Var

1          a

1          A

1          B

1          b

2          c

2          b

2          c

2          b

My process basically looks like the code below, plus some statements to merge the datasets back together. Can anyone tell me if they know a more efficient way of coding this, as the tranpose on a million records (million records after the tranpose) takes about 2-3 seconds, and since I will be doing this hundreds of times for extremely large amounts of data, and then have to set the tables back together, it takes far too long for my desired purposes.

Thank you in advance!

proc contents data=key_vars out=mydata(keep=name varnum);

run;

proc sort data=mydata;

by varnum;

run;

proc sql;

select count(*) into :total_vars

from mydata;

quit;

run;

%do i=1 %to &total_vars;

data _null_;

set mydata;

if _n_=1 then call symputx('variable',name);

run;

%let new_variable_name=new.&variable;


DATA Work.MAXCOUNT (keep=&New_Variable_Name.);
SET MASTER (keep=&Variable);
&New_Variable_Name.=count(&Variable_Name.,';');
RUN;


PROC MEANS data=Work.MAXCOUNT;
var &New_Variable_Name.;
output out=Work.meansdata(drop=_type_ _freq_) max=;
RUN;

DATA Work.meansdata;
SET Work.meansdata;
  if &New_Variable_Name. > 0 then &New_Variable_Name.=&new_variable_name.+1;
RUN;

DATA _null_;
SET meansdata;
  call symput('vararray_length',&new_variable_name);
RUN;

%LET countvar= &vararray_length.;
DATA &drive..&ds (drop=I compress=yes);
SET &drive..MASTER(keep=&variable_name. &keyvars);
ARRAY a{&countvar.} $1000
  a1-a&countvar.;
DO I=1 to &countvar.;
a{I}= strip(SCAN(&Variable_Name,(i),';','m'));
END;
RUN;

proc transpose data=&ds. out=&ds. (rename=(col1=&variable_name. _name_=new) keep=&keyvars _name_ col1
compress=yes );
by &keyvars;
var a1-a&countvar.;
run;

%end;

Thanks in advance!


Accepted Solutions
Solution
‎02-05-2013 12:49 AM
Super User
Super User
Posts: 7,035

Re: Help with Tranpose

Posted in reply to Anotherdream

You can generalize to as many variables as you want by using arrays.

data have;

  input key var1 $ var2 $ ;

cards4;

1 a;b;c ;;B

2 b;b;b C;;D

;;;;

data want ;

  set have ;

  array old var1-var2;

  do i=1 to dim(old);

    max = max(max,countw(old(i),';','m'));

  end;

  array new $8 newvar1-newvar2;

  do i=1 to max;

    do j=1 to dim(old);

      new(j) = scan(old(j),i,';','m');

    end;

    output;

  end;

  drop j max var1-var2;

  rename newvar1-newvar2 = var1-var2 ;

run;

Obs    key    i    var1    var2

1      1     1     a

2      1     2     b

3      1     3     c       B

4      2     1     b       C

5      2     2     b

6      2     3     b       D

View solution in original post


All Replies
PROC Star
Posts: 7,467

Re: Help with Tranpose

Posted in reply to Anotherdream

I may not understand what you are trying to do.  Does the following approximate what you want to do?

data have;

  informat var $80.;

  input Key Var;

  cards4;

1          a;A;B;b;

2          c;b;c;b

;;;;

data want (keep=key var);

  format key best12.;

  length var $1;

  set have (rename=var=var_in);

  i=1;

  do while (scan(var_in,i,";") ne "");

    var=scan(var_in,i,";");

    output;

    i+1;

  end;

run;

Super User
Super User
Posts: 7,035

Re: Help with Tranpose

Posted in reply to Anotherdream

What is the question?  The program doesn't appear to be transposing the data.  Instead it is leaving it horizontal and just splitting the values from the delimited string into multiple variables.  So if you had a variable named FRED with value 'a;b;c' you would generate FRED1='a' , FRED2='b' and FRED3='c'.   Am I reading the intent properly?

Are you just having trouble determining the maximum number of concatenated values in a variable?

Do you what a coding pattern that will let you split multiple variables at once?


proc sql noprint ;

  select max(countw(VAR1,';','m'))

       , max(countw(VAR2,';','m'))

    into :countvar1

       , :countvar2

    from mydata

  ;

quit;


data want;

   set mydata ;

   array var1_ (&countvar1) $10 ;

   array var2_ (&countvar2) $10 ;

   do _n_ = 1 to &countvar1;

     var1_ (_n_) = scan(var1,_n_,';','m');

   end;

   do _n_ = 1 to &countvar2;

     var2_ (_n_) = scan(var2,_n_,';','m');

   end;

   drop var1 var2 ;

run;

Super Contributor
Posts: 418

Re: Help with Tranpose

I'm sorry but the code sample I gave is not a full sample but a generic outline of what I am going, It is a very generic example. Tom, you have step 1 correct in what i'm trying to do. . First I split the variable (call it var 1) into X different variables (a1,a2,a3, etc...) then the transpose step transposes these by the key variables that I define.

Aka

Key     Var

1          a;b;c

2           b;b;b;

Becomes

key           var          a1     a2     a3

1             a;b;c;         a     b       c

2             b;b;b;         b     b       b

Then when the tranpose happens it become the dataset that I showed in my example. I might be able to alter the full code to give the full example with some more time tomorrow.

         

I then strip the tranposed column of the A's (since the varialbes become a1,a2,a3 etc..) and then order by the the remaining numbers, that way data order is maintained.

The array process makes new varialbes and the tranpose step tranposes then.

ARTHUR, this is exactly what I need the process to do, however there are a few minor issues. In my data, it is possible to have valid null data in the strings. Example, the string a;;b;;d;;d;e;d would be a valid string, and we would need the second row of data to be a true null. I also  need it to run dynamically for any number of columns (can be different number of columns with different names).  Is this something that you can adjust your code to do? (My original code looked similar to yours however I could not figure out how to deal with the null isue). Thanks again everyone! All advice here is extremely helpful!


Super User
Super User
Posts: 7,035

Re: Help with Tranpose

Posted in reply to Anotherdream

I am still at a loss to understand what the issue is.  If you just want to rotate the concatenated values into observations then a simple DO loop with an embedded OUTPUT statement will do that.  You should remember to keep the loop counter so that you can preserve the order.

Was there a question in there about how to deal with more than one of these concatenated variables in a single pass through the data?  If so can you give an example of the output you would like?

data want ;

  set have ;

  do i=1 to countw(var,';','m');

    newvar=scan(var,i,';','m');

    output;

  end;

run;

data _null_;

  set want ;

  put key i newvar var ;

run;


1 1 a a;b;c

1 2 b a;b;c

1 3 c a;b;c

2 1 b b;b;b;

2 2 b b;b;b;

2 3 b b;b;b;

2 4   b;b;b;

Super Contributor
Posts: 418

Re: Help with Tranpose

HELLO Tom. You are correct the question is how to put the data going the long direction instead of the wide, however doing it with multiple variables. For example, the following dataset below would below the after dataset if working correctly.

Key     Var1               Var2

1          a;b;c               ;;B

2           b;b;b              C;;D

Becomes

key           var1      Var2

1               a          NULL

1               b          NULL

1               c          B

2               b          C

2               b          NULL

2               b          D

Note that I wrote the word NULL instead of leaving it blank for illustration purposes. Basically I dont know how to loop through each variable in a dataset when you can have 100 different columns you wish to loop through in one data step. It also has to be dynamic, as the column names and numbers can change (and do) daily.

I could theoritically split it up by unique variable and then merge (which is similar to what I'm doing now), but i'm looking for the ability to do it all in one dataset to maximze efficiency.  [To be honest the output method might still be more efficient than the method I have been using, I will perform some tests on this tomorrow.]

Thanks again

Brandon

Super User
Super User
Posts: 7,035

Re: Help with Tranpose

Posted in reply to Anotherdream

So just adjust the upper limit of your loop counter to account for maximum over all of the variables that you are converting.

data want ;

  set have ;

  do i=1 to max(countw(var1,';','m'),countw(var1,';','m'));

    newvar1=scan(var1,i,';','m');

    newvar2=scan(var2,i,';','m');

    output;

  end;

run;

Solution
‎02-05-2013 12:49 AM
Super User
Super User
Posts: 7,035

Re: Help with Tranpose

Posted in reply to Anotherdream

You can generalize to as many variables as you want by using arrays.

data have;

  input key var1 $ var2 $ ;

cards4;

1 a;b;c ;;B

2 b;b;b C;;D

;;;;

data want ;

  set have ;

  array old var1-var2;

  do i=1 to dim(old);

    max = max(max,countw(old(i),';','m'));

  end;

  array new $8 newvar1-newvar2;

  do i=1 to max;

    do j=1 to dim(old);

      new(j) = scan(old(j),i,';','m');

    end;

    output;

  end;

  drop j max var1-var2;

  rename newvar1-newvar2 = var1-var2 ;

run;

Obs    key    i    var1    var2

1      1     1     a

2      1     2     b

3      1     3     c       B

4      2     1     b       C

5      2     2     b

6      2     3     b       D

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 274 views
  • 0 likes
  • 3 in conversation