Solved: Re: Transform data format to accommodate Proc expectations

NKormanik · Posted 04-30-2013 09:03 PM

Suppose we are faced with results of 100 races by 100 runners. Consider a subset of the results in the following spreadsheet:

Order Finished	Race 1	Race 2	Race 3
1	D	E	D
2	A	D	B
3	E	*	A
4	C	B	C
5	B	C	*

A SAS procedure I'm trying to use shows an example with the data in a different form, as follows:

Runner	Order_1	Order_2	Order 3
A	2	*	3
B	5	4	2
C	4	5	4
D	1	2	1
E	3	1	*

As can be seen, some runners miss certain races.

The above transformation was easy enough to do manually. But 100 x 100 is another story. Could one of you please help with the SAS code that will transform from the first format to the second?

Your help will be greatly appreciated.

By the way, the SAS example I am attempting to replicate using my data set is the following:

Example 74.2 Principal Components of Basketball Rankings

SAS/STAT(R) 12.1 User's Guide

Thanks much!

Nicholas Kormanik

PGStats · Posted 04-30-2013 09:58 PM

Use this :

data input;
input rank (race_1-race_3) ($) ;
cards;
1 D E D
2 A D B
3 E * A
4 C B C
5 B C *
;

data list;
set input;
array r race_:;
do race = 1 to dim(r);
     name = r{race};
     if anyalpha(name) then output;
     end;
drop race_:;
run;

proc sort data=list; by name race; run;

proc transpose data=list out=want(drop=_name_) prefix=Order_;
by name;
var rank;
id race;
run;

option missing=*;
proc print data=want noobs; var name order_1-order_3; run;
option missing=.;

PG

View solution in original post

PGStats · Posted 04-30-2013 09:58 PM

Use this :

data input;
input rank (race_1-race_3) ($) ;
cards;
1 D E D
2 A D B
3 E * A
4 C B C
5 B C *
;

data list;
set input;
array r race_:;
do race = 1 to dim(r);
     name = r{race};
     if anyalpha(name) then output;
     end;
drop race_:;
run;

proc sort data=list; by name race; run;

proc transpose data=list out=want(drop=_name_) prefix=Order_;
by name;
var rank;
id race;
run;

option missing=*;
proc print data=want noobs; var name order_1-order_3; run;
option missing=.;

PG

NKormanik · Posted 05-01-2013 01:27 AM

Outstanding, , masterfully done.

I'd never have gotten it on my own.

Haikuo · Posted 05-01-2013 11:13 PM

Indeed, your solution not only precise, but also intuitive. FWIW, here is a hash() approach (I know, I know, it is awful. Hey, Ksharp is online, hopefully he will come up a better Hash solution):

data input;

input rank (race_1-race_3) ($) ;

cards;

1 D E D

2 A D B

3 E * A

4 C B C

5 B C *

;

data _null_;

call symputx('obs',nobs);

stop;

set input nobs=nobs;

run;

proc transpose data=input out=have1;

var _all_;

run;

data _null_;

call symputx('var',nobs-1);

stop;

set have1 nobs=nobs;

run;

%put &var &obs;

data want;

dcl hash hoh();

dcl hiter hih('hoh');

hoh.definekey('_NAME_');

hoh.definedata('_name_', 'hh');

hoh.definedone();

dcl hash hh();

length player col: rank $8.;

do _n_=1 by 1 until (last);

set have1 end=last;

array col(&obs);

array _rank(&var) $8.;

array t(&obs) $8. _temporary_;

if _n_ >1 and hoh.find() ne 0 then do;

hh=_new_ hash(ordered: 'a');

hh.definekey ('player');

hh.definedata('rank', 'player');

hh.definedone();

hoh.replace();

end;

if _n_=1 then do i=1 to &obs.;

t(i)=left(col(i));

end;

else do i=1 to &obs.;

rank=t(i);

player=col(i);

hh.replace();

end;

do until (last1);

set input (keep=race_1) end=last1;

rc=hih.first(); i=0;

do rc=0 by 0 while (rc=0);

i+1;

if hh.find(key: race_1)=0 then _rank(i)=rank;

else _rank(i)='*';

rc=hih.next();

end;

output;

end;

stop;

keep player _r:;

run;

Haikuo

Ksharp · Posted 05-01-2013 11:29 PM

Dear Haikuo,

I think PG's method is good enough. Hash Table has its advantage when a large data .

Under this situation , PG is good, If You let me write, I also the code like PG.

See you later.

Ksharp

NKormanik · Posted 05-02-2013 02:09 AM

I'm glad for this addition by , as the actual data sets I'm dealing with are pretty large.

So if the second version of transform given here is better suited to larger data sets, I'll use it.

Now I need that mentioned example translated, using this present data set created....

Example 74.2 Principal Components of Basketball Rankings

SAS/STAT(R) 12.1 User's Guide

Thanks much!

Ksharp · Posted 05-02-2013 08:18 AM

OK. Here is .

I am afraid that I will be blamed by Dr SteveDenham .

I should not spoon feed .

If you are familiar with PCA, it should be easy to do it.

data input;
input rank (race1-race3) ($) ;
cards;
1 D E D
2 A D B
3 E . A
4 C B C
5 B C .
;
run;

data _null_;
 if _n_ eq 1 then do;
   length order1 - order3 8 k $ 8;
   declare hash ha(ordered:'y');
    ha.definekey('k');
     ha.definedata('k','order1','order2','order3');
     ha.definedone();
 end;
set input end=last;
array o{3} order1-order3 ;
array r{3} race1-race3 ;
do i=1 to dim(r);
 if not missing(r{i}) then do;
                             k=r{i};
                                    rc=ha.find();
                                    o{i}=rank;
                                    ha.replace();
                                    call missing(of order:);
                                   end;
end;
if last then ha.output(dataset:'bballm');
run;

proc means data=bballm noprint;
   output out=maxrank
      max=morder1 morder2 morder3  ;
run;

data bball;
   set bballm;
   if _n_=1 then set maxrank;
   array services[3] order1-order3;
   array maxranks[3] morder1-morder3 ;
   keep k order1-order3 ;
   do i=1 to 3 ;
      if services=. then services=maxranks+1;
   end;
run;

ods graphics on;

proc prinqual data=bball out=tbball scores n=1 tstandard=z
   plots=transformations;
   transform untie(order1-order3);
   id k ;
run;

* Perform the Final Principal Component Analysis;
proc factor nfactors=1 plots=scree;
   ods select factorpattern screeplot;
   var Torder1-Torder3 ;
run;

proc sort;
   by Prin1;
run;

* Display Scores on the First Principal Component;
proc print;
   var k Prin1;
run;

Ksharp

NKormanik · Posted 05-03-2013 02:40 AM

I'm sure glad you translated that mini-monster. The results are quite interesting:

Runner	Principal Component 1
D	-2.69500
E	-0.19243
A	0.75650
B	1.06547
C	1.06547

Assuming correct translation, and correct methodology, this PCA approach solves a tie issue between D and E (that resulted in a simpler form of solution script).

Also, if the PC loading values have meaning (good question for ), I suppose we could say the tie between D and E wasn't even close. Whereas the performance of B and C is very close. Too, some clustering is possible, eye-balling, or using one of SAS clustering procedures.

Thanks much for your help here!

I still need to convert your script to accommodate my 100x100 data sets, so the work is far from over. I may have to come back and ask for more help in doing that.

Ksharp · Posted 05-02-2013 12:36 AM

Ha. HaiKuo,

Here is :

data input;
input rank (race1-race3) ($) ;
cards;
1 D E D
2 A D B
3 E . A
4 C B C
5 B C .
;
run;
options missing='*';
data _null_;
 if _n_ eq 1 then do;
   length order1 - order3 8 k $ 8;
   declare hash ha(ordered:'y');
    ha.definekey('k');
     ha.definedata('k','order1','order2','order3');
     ha.definedone();
 end;
set input end=last;
array o{3} order1-order3 ;
array r{3} race1-race3 ;
do i=1 to dim(r);
 if not missing(r{i}) then do;
                             k=r{i};
                                    rc=ha.find();
                                    o{i}=rank;
                                    ha.replace();
                                    call missing(of order:);
                                   end;
end;
if last then ha.output(dataset:'want');
run;

Ksharp

Haikuo · Posted 05-02-2013 09:15 AM

Bravo! This is ingenious! Very Happy to learn! Thank you!

Haikuo

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away