Help using Base SAS procedures

Proc Transpose - Repeated Values

Accepted Solution Solved
Reply
Contributor JAR
Contributor
Posts: 45
Accepted Solution

Proc Transpose - Repeated Values

Dear All,

I have the following table

TeacherScoreStudent
Teacher1ATom
Teacher1AHarry
Teacher1BRon
Teacher2ANeval

I need to transpose it in this manner:

TeacherA'sB's
Teacher1TomRon
Teacher1Harry
Teacher2Neval


Howerever, Proc Transose sends and error as ID (score) has duplicates.

Could someone help me to write a proc code so that when the repeatation happends, the entry must be moved to the next row (as in the second one)?

Thanking you in advance,

Sincerely,

Jijil Ramakrishnan


Accepted Solutions
Solution
‎02-13-2012 09:28 AM
Super User
Posts: 5,085

Re: Proc Transpose - Repeated Values

JAR,

Here's an approach that is deceptively short.  You have to understand some of the inner workings of the DATA step.  In this case, you'll get a note about a many-to-many merge.  It's harmless this time, but may not be in other DATA steps:

data final;

   merge original (rename=(student=A) where=(score='A'))

             original (rename=(student=B) where=(score='B'));

   by teacher;

   output;

   A = ' ';

   B = ' ';

run;

All of the replies so far are relying on your data containing just A and B.  All could be extended to handle other values as well.  But it's a harder problem to tackle if you don't know the values of SCORE ahead of time.

Good luck.

View solution in original post


All Replies
PROC Star
Posts: 7,363

Proc Transpose - Repeated Values

If it doesn't matter which people get paired, as long as they are on the currect teacher/grade columns and variables, then you could use something like:

data have;

  input (Teacher Score Student) ($);

  cards;

Teacher1          A          Tom

Teacher1          A          Harry

Teacher1          B          Ron

Teacher2          A          Neval

;

data a (keep=Teacher A) b (keep=Teacher B);

  set have;

  if Score eq "A" then do;

    a=Student;

    b='ZZZZZZZZ';

    output a;

    output b;

  end;

  else if Score eq "B" then do;

    b=Student;

    a='ZZZZZZZZ';

    output a;

    output b;

  end;

run;

proc sort data=a;

  by teacher a;

run;

proc sort data=b;

  by teacher b;

run;

data want;

  set a;

  set b;

  if catt(a,b) ne 'ZZZZZZZZZZZZZZZZ';

  if a eq 'ZZZZZZZZ' then call missing(a);

  if b eq 'ZZZZZZZZ' then call missing(b);

run;

Respected Advisor
Posts: 3,777

Re: Proc Transpose - Repeated Values

Are you willing to create one more variable?

dm 'clear log; clear output;';

data test;

   input teacher:$10. grade :$1. name :$8.;

   cards;

Teacher1 A Tom

Teacher1 A Harry

Teacher1 B Ron

Teacher2 A Neval

;;;;

   run;

proc sort data=test;

   by teacher grade name;

   run;

data test;

   set test;

   by teacher grade;

   if first.grade then index=0;

   index +1;

   run;

proc sort data=test;

   by teacher index grade;

   run;

proc transpose data=test out=want(drop=_name_);;

   by teacher index;

   var name;

   id grade;

   run;

proc print;

   run;

Super Contributor
Posts: 1,636

Re: Proc Transpose - Repeated Values

Hi Data _null_,

Why did you use ';;;;' instead of ';'?

Thank you!

PROC Star
Posts: 7,363

Re: Proc Transpose - Repeated Values

Linlin: a lot of people use ;;;; instead of ; to end a cards, datalines, cards4 or datalines4 section as they are needed for cards4 and datalines4 and don't pose a problem for the other two.

Cards4 and Datalines4 are necessary when the input contains a semi-colon

Respected Advisor
Posts: 3,777

Re: Proc Transpose - Repeated Values

I like to be VERY careful. :smileygrin:

Super Contributor
Posts: 1,636

Re: Proc Transpose - Repeated Values

Thank you Art and Data _null_! 

Solution
‎02-13-2012 09:28 AM
Super User
Posts: 5,085

Re: Proc Transpose - Repeated Values

JAR,

Here's an approach that is deceptively short.  You have to understand some of the inner workings of the DATA step.  In this case, you'll get a note about a many-to-many merge.  It's harmless this time, but may not be in other DATA steps:

data final;

   merge original (rename=(student=A) where=(score='A'))

             original (rename=(student=B) where=(score='B'));

   by teacher;

   output;

   A = ' ';

   B = ' ';

run;

All of the replies so far are relying on your data containing just A and B.  All could be extended to handle other values as well.  But it's a harder problem to tackle if you don't know the values of SCORE ahead of time.

Good luck.

Contributor JAR
Contributor
Posts: 45

Proc Transpose - Repeated Values

Your code is astounding! It worked perfectly.

There is some sort of elegance in this.

Thanks Mr. Atounding!

I wonder how would you have dealt with it, if the scores are not predetermined.

Jijil

Super User
Posts: 5,085

Re: Proc Transpose - Repeated Values

Jijil,

Thank you.  If the scores are not predetermined, macro language gets involved.  (I didn't check data_null_'s approach, but it might work for "normal" scores.  It would run into trouble if the same data set had scores of both $ and # because the ID statement in PROC TRANSPOSE would try to create a variable named _ for both of those.  A similar issue would arise if the data contained both A+ and A-.)

The details might be burdensome, but this is the approach I would take.  Capture a list of distinct values for SCORE.  (Be careful not to overcount if any of the SCORE values actually contains an embedded blank, such as A +.)  Use two sets of loops in macro language to generate SCORE_01 - SCORE_nn as needed and to label them appropriately.  For example, in the MERGE statement, one iteration through the loop would generate:

original (rename=(student=score_01) where=(score="&first_value_found"))

Use a second loop through the same values later, to generate labels, for example:

label score_01 = "Score was &first_value_found");

Even that might be a hair too simple if any of the original SCORE values contain double quotes, but that's the idea.

Good luck.

Super User
Posts: 9,682

Re: Proc Transpose - Repeated Values

data test;
   input teacher:$10. grade :$1. name :$8.;
   cards;
Teacher1 A Tom 
Teacher1 A Harry 
Teacher1 B Ron 
Teacher2 A Neval 
;;;;
   run;
proc sql noprint;
 select distinct cats('test(where=(teacher="',teacher,'" and grade="',grade,'") rename=(name=',grade,'))') into : list separated by ' '
  from test;
quit;
%put &list;
data want(drop=grade);
 merge &list ;
 by teacher;
 output;
 call missing(of _all_);
run;

Ksharp

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 262 views
  • 1 like
  • 6 in conversation