Help using Base SAS procedures

Transform data format to accommodate Proc expectations

Accepted Solution Solved
Reply
Regular Contributor
Posts: 212
Accepted Solution

Transform data format to accommodate Proc expectations

Suppose we are faced with results of 100 races by 100 runners.  Consider a subset of the results in the following spreadsheet:

Order FinishedRace 1Race 2Race 3
1DED
2ADB
3E*A
4CBC
5BC*

A SAS procedure I'm trying to use shows an example with the data in a different form, as follows:

RunnerOrder_1Order_2Order 3
A2*3
B542
C454
D121
E31*

As can be seen, some runners miss certain races.

The above transformation was easy enough to do manually.  But 100 x 100 is another story.  Could one of you please help with the SAS code that will transform from the first format to the second?

Your help will be greatly appreciated.

By the way, the SAS example I am attempting to replicate using my data set is the following:

Example 74.2 Principal Components of Basketball Rankings

SAS/STAT(R) 12.1 User's Guide

Thanks much!

Nicholas Kormanik


Accepted Solutions
Solution
‎04-30-2013 09:58 PM
Respected Advisor
Posts: 4,651

Re: Transform data format to accommodate Proc expectations

Use this :


data input;
input rank (race_1-race_3) ($) ;
cards;
1 D E D
2 A D B
3 E * A
4 C B C
5 B C *
;

data list;
set input;
array r race_:;
do race = 1 to dim(r);
     name = r{race};
     if anyalpha(name) then output;
     end;
drop race_:;
run;

proc sort data=list; by name race; run;

proc transpose data=list out=want(drop=_name_) prefix=Order_;
by name;
var rank;
id race;
run;

option missing=*;
proc print data=want noobs; var name order_1-order_3; run;
option missing=.;

PG

PG

View solution in original post


All Replies
Solution
‎04-30-2013 09:58 PM
Respected Advisor
Posts: 4,651

Re: Transform data format to accommodate Proc expectations

Use this :


data input;
input rank (race_1-race_3) ($) ;
cards;
1 D E D
2 A D B
3 E * A
4 C B C
5 B C *
;

data list;
set input;
array r race_:;
do race = 1 to dim(r);
     name = r{race};
     if anyalpha(name) then output;
     end;
drop race_:;
run;

proc sort data=list; by name race; run;

proc transpose data=list out=want(drop=_name_) prefix=Order_;
by name;
var rank;
id race;
run;

option missing=*;
proc print data=want noobs; var name order_1-order_3; run;
option missing=.;

PG

PG
Regular Contributor
Posts: 212

Re: Transform data format to accommodate Proc expectations

Outstanding, , masterfully done.

I'd never have gotten it on my own.

Respected Advisor
Posts: 3,124

Re: Transform data format to accommodate Proc expectations

Indeed, your solution not only precise, but also intuitive. FWIW, here is a hash() approach (I know, I know, it is awful. Hey, Ksharp is online, hopefully he will come up a better Hash solution):

data input;

input rank (race_1-race_3) ($) ;

cards;

1 D E D

2 A D B

3 E * A

4 C B C

5 B C *

;

data _null_;

  call symputx('obs',nobs);

  stop;

  set input nobs=nobs;

run;

proc transpose data=input out=have1;

var _all_;

run;

data _null_;

  call symputx('var',nobs-1);

  stop;

  set have1 nobs=nobs;

run;

%put &var &obs;

data want;

  dcl hash hoh();

  dcl hiter hih('hoh');

  hoh.definekey('_NAME_');

  hoh.definedata('_name_', 'hh');

  hoh.definedone();

  dcl hash hh();

  length player col: rank $8.;

    do _n_=1 by 1 until (last);

           set have1 end=last;

            array col(&obs);

  array _rank(&var) $8.;

                array t(&obs) $8. _temporary_;

  if _n_ >1 and hoh.find() ne 0 then do;

    hh=_new_ hash(ordered: 'a');

    hh.definekey ('player');

    hh.definedata('rank', 'player');

    hh.definedone();

    hoh.replace();

  end;

                if _n_=1 then do i=1 to &obs.;

                  t(i)=left(col(i));

  end;

  else do i=1 to &obs.;

   rank=t(i);

   player=col(i);

   hh.replace();

  end;

  end;

  do until (last1);

      set input (keep=race_1) end=last1;

     rc=hih.first(); i=0;

     do rc=0 by 0 while (rc=0);

  i+1;

  if hh.find(key: race_1)=0 then _rank(i)=rank;

     else _rank(i)='*';

  rc=hih.next();

  end;

  output;

  end;

  stop;

  keep player _r:;

  run;

Haikuo

Super User
Posts: 9,681

Re: Transform data format to accommodate Proc expectations

Dear Haikuo,

I think PG's method is good enough. Hash Table has its advantage when a large data .

Under this situation , PG is good, If You let me write, I also the code like PG.

See you later.

Ksharp

Regular Contributor
Posts: 212

Re: Transform data format to accommodate Proc expectations

  I'm glad for this addition by , as the actual data sets I'm dealing with are pretty large.

So if the second version of transform given here is better suited to larger data sets, I'll use it.

Now I need that mentioned example translated, using this present data set created....

Example 74.2 Principal Components of Basketball Rankings

SAS/STAT(R) 12.1 User's Guide

Thanks much!

Super User
Posts: 9,681

Re: Transform data format to accommodate Proc expectations

OK. Here is .

I am afraid that I will be blamed by Dr .

I should not spoon feed .

If you are familiar with PCA, it should be easy to do it.

data input;
input rank (race1-race3) ($) ;
cards;
1 D E D
2 A D B
3 E . A
4 C B C
5 B C .
;
run;

data _null_;
 if _n_ eq 1 then do;
   length order1 - order3 8 k $ 8;
   declare hash ha(ordered:'y');
    ha.definekey('k');
     ha.definedata('k','order1','order2','order3');
     ha.definedone();
 end;
set input end=last;
array o{3} order1-order3 ;
array r{3} race1-race3 ;
do i=1 to dim(r);
 if not missing(r{i}) then do;
                             k=r{i};
                                    rc=ha.find();
                                    o{i}=rank;
                                    ha.replace();
                                    call missing(of order:);
                                   end;
end;
if last then ha.output(dataset:'bballm');
run;

proc means data=bballm noprint;
   output out=maxrank
      max=morder1 morder2 morder3  ;
run;

data bball;
   set bballm;
   if _n_=1 then set maxrank;
   array services[3] order1-order3;
   array maxranks[3] morder1-morder3 ;
   keep k order1-order3 ;
   do i=1 to 3 ;
      if services=. then services=maxranks+1;
   end;
run;

ods graphics on;

proc prinqual data=bball out=tbball scores n=1 tstandard=z
   plots=transformations;
   transform untie(order1-order3);
   id k ;
run;

* Perform the Final Principal Component Analysis;
proc factor nfactors=1 plots=scree;
   ods select factorpattern screeplot;
   var Torder1-Torder3 ;
run;

proc sort;
   by Prin1;
run;

* Display Scores on the First Principal Component;
proc print;
   var k Prin1;
run;

Ksharp

Regular Contributor
Posts: 212

Re: Transform data format to accommodate Proc expectations

  I'm sure glad you translated that mini-monster.  The results are quite interesting:

RunnerPrincipal Component 1
D-2.69500
E-0.19243
A0.75650
B1.06547
C1.06547

Assuming correct translation, and correct methodology, this PCA approach solves a tie issue between D and E (that resulted in a simpler form of solution script).

Also, if the PC loading values have meaning (good question for ), I suppose we could say the tie between D and E wasn't even close.  Whereas the performance of B and C is very close.  Too, some clustering is possible, eye-balling, or using one of SAS clustering procedures.

Thanks much for your help here!

I still need to convert your script to accommodate my 100x100 data sets, so the work is far from over.  I may have to come back and ask for more help in doing that.

Super User
Posts: 9,681

Re: Transform data format to accommodate Proc expectations

Ha. HaiKuo,

Here is :

data input;
input rank (race1-race3) ($) ;
cards;
1 D E D
2 A D B
3 E . A
4 C B C
5 B C .
;
run;
options missing='*';
data _null_;
 if _n_ eq 1 then do;
   length order1 - order3 8 k $ 8;
   declare hash ha(ordered:'y');
    ha.definekey('k');
     ha.definedata('k','order1','order2','order3');
     ha.definedone();
 end;
set input end=last;
array o{3} order1-order3 ;
array r{3} race1-race3 ;
do i=1 to dim(r);
 if not missing(r{i}) then do;
                             k=r{i};
                                    rc=ha.find();
                                    o{i}=rank;
                                    ha.replace();
                                    call missing(of order:);
                                   end;
end;
if last then ha.output(dataset:'want');
run;

Ksharp

Respected Advisor
Posts: 3,124

Re: Transform data format to accommodate Proc expectations

Bravo! This is ingenious! Very Happy to learn! Thank you!

Haikuo

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 284 views
  • 8 likes
  • 4 in conversation