DATA Step, Macro, Functions and more

Split variable values into multiple lines

Accepted Solution Solved
Reply
Contributor
Posts: 20
Accepted Solution

Split variable values into multiple lines

Dear all,

This is my first post here. I’ve been googling over the last hours for an answer but I cannot find it.

I have a table whose variables contain list of values separated by a comma. I would like to transform it into a total flat table. For example:

VAR1     VAR2     VAR3
r,f,e     3,5     S
j     2,4,6     D,J,N

should transform into:

VAR1 VAR2 VAR3
r 3 S
r 5 S
f 3 S
f 5 S
e 3 S
e 5 S
j 2 D
j 2 J
j 2 N
j 4 D
j 4 J
j 4 N
j 6 D
j 6 J
j 6 N

That may sound like a basic but I don't how to perform it!

Thanks a lot for your help.

ntro


Accepted Solutions
Solution
‎06-11-2013 03:21 PM
Super User
Posts: 19,815

Re: Split variable values into multiple lines

Needed to reset the counters at the beginning of the loops, not change the output statement.

data want;

set have;

*initialize counters to 0;

i=1;

do while (scan(var1, i, ",") ne "");

     new_var1=scan(var1, i, ",");

     j=1;

     do while (scan(var2, j, ",") ne "");

          new_var2=scan(var2, j, ",");

          k=1;

          do while (scan(var3, k, ",") ne "");

               new_var3=scan(var3, k, ",");

               output;

               k+1;

           end;

       j+1;

       end;

i+1;

end;

run;

View solution in original post


All Replies
Super User
Posts: 19,815

Re: Split variable values into multiple lines

You're looking for the scan function with a do loop iteration and an explicit output statement.

This isn't tested/working code, you may have to play with the ends/counters to get it working, but hopefully gives you an idea.

data want;

set have;

*initialize counters to 0;

i=1; j=1; k=1;

do while (scan(var1, i, ",") ne "");

     new_var1=scan(var1, i, ",");

     do while (scan(var2, j, ",") ne "");

          new_var2=scan(var2, j, ",");

          do while (scan(var3, k, ",") ne "");

               new_var3=scan(var3, k, ",");

               output;

               k+1;

           end;

       j+1;

       end;

i+1;

end;

run;

Contributor
Posts: 20

Re: Split variable values into multiple lines

Reeza,

Thank you very much for the prompt and useful answer. I have tested it and I think it is not far from the solution.

Here what I obtain:

var1 var2 var3 i j k new_var1 new_var2 new_var3

r,f,e 3,5 S 1 1 1 r 3 S

j 2,4,6 D,J,N 1 1 1 j 2 D

j 2,4,6 D,J,N 1 1 2 j 2 J

j 2,4,6 D,J,N 1 1 3 j 2 N

So the first record is treated only once (r 3 S) but then it jumps to last record, from which only 3 lines are produced.

It seems like only var3 is treated. Any idea why?

Thank again

Super User
Posts: 19,815

Re: Split variable values into multiple lines

Try adding output statements at the end of each loop, unfortunately I don't have time to test anything today.

Super User
Super User
Posts: 7,050

Re: Split variable values into multiple lines

Looks like the loops are just ending too early.

You could use iterative DO loop using COUNTW() command to set upper bound of loops.

  ...

do i=1 to countw(var1, ',' ); 

  ....

What do you want to happen when any of the three variables are missing?

Solution
‎06-11-2013 03:21 PM
Super User
Posts: 19,815

Re: Split variable values into multiple lines

Needed to reset the counters at the beginning of the loops, not change the output statement.

data want;

set have;

*initialize counters to 0;

i=1;

do while (scan(var1, i, ",") ne "");

     new_var1=scan(var1, i, ",");

     j=1;

     do while (scan(var2, j, ",") ne "");

          new_var2=scan(var2, j, ",");

          k=1;

          do while (scan(var3, k, ",") ne "");

               new_var3=scan(var3, k, ",");

               output;

               k+1;

           end;

       j+1;

       end;

i+1;

end;

run;

Contributor
Posts: 20

Re: Split variable values into multiple lines

Dear all,

I really can't say how much I appreciate your help. I have used Reeza's code and it wroks perfectly.

Arthur, your code seems like a very nice and clever solution. However I have the following error (sorry it is in French):

98  +input (r,f,e 3,5 S ) ($);

             -

             22

             76

ERREUR 22-322: Erreur de syntaxe ; syntaxe requise : [, {.

ERREUR 76-322: Erreur de syntaxe ; l'instruction sera ignorée.

It seems like the comma is not understood.

Thanks again

PROC Star
Posts: 7,474

Re: Split variable values into multiple lines

: Worked fine on my computer.  Commas are among the default delimiters for both ASCII and EBCDIC but, since the main difference between Reeza's and my code was that she explicitly declared the comma as a modifier in her use of the scan function, it apparently is needed on your system.

Ah, but as I look at the error you received, I would guess that you really didn't have a first record that contained "var1 var2 var3", i.e., the variable names.  The reason I was suggesting having SAS write and run a program was so that the variable names could be extracted from the first record.  Since that appears not to be the case, Reesa's suggested code would definitely be the way to go.

Contributor
Posts: 20

Re: Split variable values into multiple lines

Thank you Arthur.

I am using now Reeza's code but since my final table is 12M lines, it seems quicker to make x data steps rather than x loop in a single data step.

Have a nice week-end

PROC Star
Posts: 7,474

Re: Split variable values into multiple lines

One way to accomplish what I think you are trying to do is to use SAS to write, then run, a program that restructures the data.  e.g.:

filename program temp;

data have;

  informat x y z $10.;

  infile cards end=eof;

  input x y z;

  cards;

VAR1     VAR2     VAR3

r,f,e     3,5     S

j     2,4,6     D,J,N

;

data _null_;

  file program;

  set have end=eof;

  if _n_ eq 1 then do;

    put "data want;";

    put "input ("@;

    put x y z @;

    put ") ($);";

    put "cards;";

  end;

  else do;

    i=1;

    do while (scan(x,i) ne "");

      data1=scan(x,i);

      i+1;

      j=1;

      do while (scan(y,j) ne "");

        data2=scan(y,j);

        j+1;

        k=1;

        do while (scan(z,k) ne "");

          data3=scan(z,k);

          put data1 data2 data3;

          k+1;

        end;

      end;

    end;

  end;

  if eof then put ";";

run;

%include program;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 351 views
  • 3 likes
  • 4 in conversation