Wide to Long, But With a Twist

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 18
Accepted Solution

Wide to Long, But With a Twist

Hi Everyone,

I have a data file that needs restructuring.  Here's the problem:

The data are round-robin where each participant in a group rates each other (including self-ratings) following discussion.  The data are in wide format, which each person's ratings of the other members constituting a record.  I need to create a long file, but each record needs to include both the ID for the person doing the rating and the person being rated.  The data are structured as followed:

Group     ID     Var_1     Var_2     Var_3     Var_4

1            1        

1            2

1            3

1            4

2            5

2            6

2            7

Where the four variables are a given person's rating of the other members in the group for a given item (e.g., "how influential was person X during the discussion?")  The data are sorted so that the first person in the group has his/her ratings in variable "Var_1", the ratings for person 2 in "Var_2", and so on.  (For person 1, Var_1 is a self rating, for person 2 Var_2 is the self rating, and so on.) 

But the data need to be in long format as follows:

Group     ID     TargetID     Var

1            1       1  

1            1       2

1            1       3

1            1       4

1            2       1

1            2       2

1            2       3

1            2       4

And so on.  It would be an easy thing to do except for the fact that the target ID's need to be included in the format above.  I've come close with several different methods, but not exactly what I need.  Any help would be greatly appreciated.

Joe


Accepted Solutions
Solution
‎08-25-2012 02:57 PM
PROC Star
Posts: 7,492

Re: Wide to Long, But With a Twist

Here is basically the same solution I suggested previously, but using a DOW loop to account for what you want:

data have;

  infile cards truncover;

  input Group ID Var_1-Var_4;

  cards;

1  1 1 1 1 1

1  2 2 2 2 2

1  3 3 3 3 4

1  4 4 4 4 4

2  5 1 1 1

2  6 2 2 2

2  7 3 3 3

3  9 1 1 1 1

3 10 2 2 2 2

3 11 3 3 3 4

3 12 4 4 4 4

;

data want (drop=i j k Var_: idsSmiley Happy;

  array ids(4);

  i=0;

  do until (last.group);

    set have;

    by group;

    i+1;

    ids(i)=ID;

  end;

  do until (last.group);

    set have;

    array targets(*) Var_1-Var_4;

    by group;

    do k=1 to i;

      targetid=ids(k);

      var=targets(k);

      output;

    end;

  end;

run;

View solution in original post


All Replies
PROC Star
Posts: 7,492

Re: Wide to Long, But With a Twist

I may or may not understand what you are trying to do.  Does the following approximate your desired resulting file?

data have;

  input Group     ID     Var_1-Var_4;

  cards;

1  1 1 1 1 1

1  2 2 2 2 2

1  3 3 3 3 4

1  4 4 4 4 4

2  5 1 1 1 1

2  6 2 2 2 2

2  7 3 3 3 3

2  8 4 4 4 4

;

data want (drop=Var_Smiley Happy;

  set have;

  array targets(*) Var_1-Var_4;

  do targetid=1 to dim(targets);

    var=targets(targetid);

    output;

  end;

run;

Occasional Contributor
Posts: 18

Re: Wide to Long, But With a Twist

Hi Arthur,

Thanks for your reply.  Your solution was the first I tried, but the problem is that the ID numbers in group 2 are 5 through 8, then in group 3 are 9-12, and so on.  (And that's for groups that all have the same number of members.  I'd like the solution to work for all groups, regardless of size.  In my example, group 2 has 3 people.)  That solution would always have the target IDs as 1 through 4, whereas the solution I'm looking for uses the original ID to identify the targets.  Here's what the output for group 2 (with 3 people--Var_4 would have a missing value) in my example would look like:

Group     ID     TargetID     Var

2            5       5 

2            5       6

2            5       7

2            6       5

2            6       6

2            6       7

2            7       5

2            7       6

2            7       7

Seems the ideal solution would "look ahead" (maybe using the pointer--I've tried that but without success) to see what the other IDs in the group are, then use those to print the TargetID variable.

Joe

Solution
‎08-25-2012 02:57 PM
PROC Star
Posts: 7,492

Re: Wide to Long, But With a Twist

Here is basically the same solution I suggested previously, but using a DOW loop to account for what you want:

data have;

  infile cards truncover;

  input Group ID Var_1-Var_4;

  cards;

1  1 1 1 1 1

1  2 2 2 2 2

1  3 3 3 3 4

1  4 4 4 4 4

2  5 1 1 1

2  6 2 2 2

2  7 3 3 3

3  9 1 1 1 1

3 10 2 2 2 2

3 11 3 3 3 4

3 12 4 4 4 4

;

data want (drop=i j k Var_: idsSmiley Happy;

  array ids(4);

  i=0;

  do until (last.group);

    set have;

    by group;

    i+1;

    ids(i)=ID;

  end;

  do until (last.group);

    set have;

    array targets(*) Var_1-Var_4;

    by group;

    do k=1 to i;

      targetid=ids(k);

      var=targets(k);

      output;

    end;

  end;

run;

Respected Advisor
Posts: 4,934

Re: Wide to Long, But With a Twist

I guess you want something like this :

data have;
  input Group ID Var_1-Var_4;
datalines;
1  1 1 1 1 1
1  2 2 2 2 2
1  3 3 3 3 4
1  4 4 4 4 4
2  5 1 1 1 .
2  6 2 2 2 .
2  7 3 3 3 .
;

data temp(keep=group ID IDorder order rating);
do IDorder = 1 by 1 until(last.group);
     set have;
     by group;
     array v{*} var:;
     do order = 1 to dim(v);
          rating = v{order};
          output;
          end;
     end;
run;

proc sql;
create table want as
select a.group, a.ID, b.ID as targetID, a.rating
from temp as a inner join
     (select unique group, ID, IDorder from temp) as b
          on a.group=b.group and a.order=b.IDorder;
drop table temp;
select * from want;
quit;

PG

PG
PROC Star
Posts: 7,492

Re: Wide to Long, But With a Twist

: FWIW, since I noticed that both of our proposed ways of dealing with the problem resulted with identical output files, I tested the two methods on a dataset that had 100,000 groups.

Both your and my datasteps to the same amount of time (1.8 seconds on my computer).  However, while the DOW approach was done at that time, the sql code took an additional 5.5 seconds to run.

FINALLY! A case where sql loses as a time saver!

p.s.  I added a noprint option to your proc sql code which, given the size of the file I was testing, was DEFINITELY needed.

Occasional Contributor
Posts: 18

Re: Wide to Long, But With a Twist

Hey Pierre and Arthur,

Thanks to both of you.  Each method does the trick nicely.  I see the logic of Pierre's solution, but please allow me to walk through Arthur's so that I can do these types of processes on my own.

1. The first do-until loop creates an array variable containing each group's IDs in order as they appear in the data file.  The counter "i" identifies the number of members, and hence IDs, in the group.

2. The second do-until loop assigns values to both targetid and var based on the array position.  The "k" counter is based on group size "i" and gets the appropriate ID and variable score. 

Very cool.  One question, though, Arthur.  What is the "j" counter for?

Best,

Joe

PROC Star
Posts: 7,492

Re: Wide to Long, But With a Twist

Joe: the j counter serves absolutely no purpose at all and should be removed from the code.  I originally thought I would need it but, obviously, never applied it.  Yes, your understanding is correct, but the DOW loop processing requires knowing what goes on in the background of any SAS datastep.  A good overview can be found in the following paper: http://support.sas.com/resources/papers/proceedings09/038-2009.pdf

p.s. I am going to edit my post to remove the reference to j

Respected Advisor
Posts: 4,934

Re: Wide to Long, But With a Twist

Hi Art. Nice work. I'm not too surprised by your timing results. For me, the major difference in computation time occurs in my brain Smiley Happy as I find easier to state what I want in SQL than do some delicate procedural data step programming. But as a matter of curiosity, I would like to know if the following version would have been faster than my original in your test :

data temp(keep=group ID order rating)
tempID(keep=group ID IDorder);
do IDorder = 1 by 1 until(last.group);
set have;
by group notsorted;
array v{*} var:;
do order = 1 to dim(v);
  rating = v{order};
  output temp;
  end;
output tempID;
end;
run;

proc sql;
create table want as
select a.group, a.ID, b.ID as targetID, a.rating
from temp as a inner join tempID as b
  on a.group=b.group and a.order=b.IDorder;
drop table temp, tempID;
quit;

Thanks!

PG

PG
PROC Star
Posts: 7,492

Re: Wide to Long, But With a Twist

PG,

Yes, quicker indeed, but still slower than doing everything in one datastep.  However, like you said, we all have our own preferred ways of doing things and with which we are most comfortable.

Art

Respected Advisor
Posts: 4,934

Re: Wide to Long, But With a Twist

Great! Thank you Art. - PG

PG
Super User
Super User
Posts: 7,076

Re: Wide to Long, But With a Twist

What is wrong with just using PROC TRANSPOSE?  If you need to you could parse the id number from the _NAME_ variable.

proc transpose data=have out=want(rename=(col1=value)) ;

  by group id ;

  var var_:;

run;

proc print; run;

Obs    Group    ID    _NAME_    value

  1      1       1    Var_1       1

  2      1       1    Var_2       1

  3      1       1    Var_3       1

  4      1       1    Var_4       1

  5      1       2    Var_1       2

  6      1       2    Var_2       2

  7      1       2    Var_3       2

  8      1       2    Var_4       2

  9      1       3    Var_1       3

10      1       3    Var_2       3

11      1       3    Var_3       3

12      1       3    Var_4       4

13      1       4    Var_1       4

14      1       4    Var_2       4

15      1       4    Var_3       4

16      1       4    Var_4       4

17      2       5    Var_1       1

18      2       5    Var_2       1

19      2       5    Var_3       1

20      2       5    Var_4       1

21      2       6    Var_1       2

22      2       6    Var_2       2

23      2       6    Var_3       2

24      2       6    Var_4       2

25      2       7    Var_1       3

26      2       7    Var_2       3

27      2       7    Var_3       3

28      2       7    Var_4       3

29      2       8    Var_1       4

30      2       8    Var_2       4

31      2       8    Var_3       4

32      2       8    Var_4       4

Respected Advisor
Posts: 4,934

Re: Wide to Long, But With a Twist

Tom, that way you get var_1-var_4 where you would want var_5-var_8. That's the twist! - PG

PG
Super User
Super User
Posts: 7,076

Re: Wide to Long, But With a Twist

I do not see where that is in the original request.  But if there is an offset of four between the variable name and the target id then that could be handled in the remapping of _NAME_.

Occasional Contributor
Posts: 18

Re: Wide to Long, But With a Twist

Thanks, Tom.  I clarified the problem in a subsequent post.  I marked Art's response as the correct answer because it's efficient and modifiable for my purposes.  I actually had 12 variables (4 columns for each), and it was easy to modify Art's code to accommodate the other variables.  What I did was a a kind of brute force solution--I'm sure there's a more efficient and elegant way to do it, but it served my purpose.  Here's what I did:

data long (drop=i k DIS: ids: lname);

  array ids(4);

  i=0;

  do until (last.group);

    set srm;

    by group;

    i+1;

    ids(i)=ID;

  end;

  do until (last.group);

    set srm;

    array targets1(*) DIS1_1-DIS1_4;      

    array targets2(*) DIS2_1-DIS2_4;

    array targets3(*) DIS3_1-DIS3_4;

    array targets4(*) DIS4_1-DIS4_4;

    array targets5(*) DIS5_1-DIS5_4;

    array targets6(*) DIS6_1-DIS6_4;

    array targets7(*) DIS7_1-DIS7_4;

    array targets8(*) DIS8_1-DIS8_4;

    array targets9(*) DIS9_1-DIS9_4;

    array targets10(*) DIS10_1-DIS10_4;

    array targets11(*) DIS11_1-DIS11_4;

    array targets12(*) DIS12_1-DIS12_4;

    by group;

    do k=1 to i;

      targetid=ids(k);

      var1=targets1(k);

      var2=targets2(k);

      var3=targets3(k);

      var4=targets4(k);

      var5=targets5(k);

      var6=targets6(k);

      var7=targets7(k);

      var8=targets8(k);

      var9=targets9(k);

      var10=targets10(k);

      var11=targets11(k);

      var12=targets12(k);

      output;

    end;

  end;

run;

Thanks again, everyone, for your help.

Joe

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 15 replies
  • 561 views
  • 3 likes
  • 4 in conversation