Suppose we are faced with results of 100 races by 100 runners. Consider a subset of the results in the following spreadsheet:
Order Finished | Race 1 | Race 2 | Race 3 |
1 | D | E | D |
2 | A | D | B |
3 | E | * | A |
4 | C | B | C |
5 | B | C | * |
A SAS procedure I'm trying to use shows an example with the data in a different form, as follows:
Runner | Order_1 | Order_2 | Order 3 |
A | 2 | * | 3 |
B | 5 | 4 | 2 |
C | 4 | 5 | 4 |
D | 1 | 2 | 1 |
E | 3 | 1 | * |
As can be seen, some runners miss certain races.
The above transformation was easy enough to do manually. But 100 x 100 is another story. Could one of you please help with the SAS code that will transform from the first format to the second?
Your help will be greatly appreciated.
By the way, the SAS example I am attempting to replicate using my data set is the following:
Example 74.2 Principal Components of Basketball Rankings
Thanks much!
Nicholas Kormanik
Use this :
data input;
input rank (race_1-race_3) ($) ;
cards;
1 D E D
2 A D B
3 E * A
4 C B C
5 B C *
;
data list;
set input;
array r race_:;
do race = 1 to dim(r);
name = r{race};
if anyalpha(name) then output;
end;
drop race_:;
run;
proc sort data=list; by name race; run;
proc transpose data=list out=want(drop=_name_) prefix=Order_;
by name;
var rank;
id race;
run;
option missing=*;
proc print data=want noobs; var name order_1-order_3; run;
option missing=.;
PG
Use this :
data input;
input rank (race_1-race_3) ($) ;
cards;
1 D E D
2 A D B
3 E * A
4 C B C
5 B C *
;
data list;
set input;
array r race_:;
do race = 1 to dim(r);
name = r{race};
if anyalpha(name) then output;
end;
drop race_:;
run;
proc sort data=list; by name race; run;
proc transpose data=list out=want(drop=_name_) prefix=Order_;
by name;
var rank;
id race;
run;
option missing=*;
proc print data=want noobs; var name order_1-order_3; run;
option missing=.;
PG
Indeed, your solution not only precise, but also intuitive. FWIW, here is a hash() approach (I know, I know, it is awful. Hey, Ksharp is online, hopefully he will come up a better Hash solution):
data input;
input rank (race_1-race_3) ($) ;
cards;
1 D E D
2 A D B
3 E * A
4 C B C
5 B C *
;
data _null_;
call symputx('obs',nobs);
stop;
set input nobs=nobs;
run;
proc transpose data=input out=have1;
var _all_;
run;
data _null_;
call symputx('var',nobs-1);
stop;
set have1 nobs=nobs;
run;
%put &var &obs;
data want;
dcl hash hoh();
dcl hiter hih('hoh');
hoh.definekey('_NAME_');
hoh.definedata('_name_', 'hh');
hoh.definedone();
dcl hash hh();
length player col: rank $8.;
do _n_=1 by 1 until (last);
set have1 end=last;
array col(&obs);
array _rank(&var) $8.;
array t(&obs) $8. _temporary_;
if _n_ >1 and hoh.find() ne 0 then do;
hh=_new_ hash(ordered: 'a');
hh.definekey ('player');
hh.definedata('rank', 'player');
hh.definedone();
hoh.replace();
end;
if _n_=1 then do i=1 to &obs.;
t(i)=left(col(i));
end;
else do i=1 to &obs.;
rank=t(i);
player=col(i);
hh.replace();
end;
end;
do until (last1);
set input (keep=race_1) end=last1;
rc=hih.first(); i=0;
do rc=0 by 0 while (rc=0);
i+1;
if hh.find(key: race_1)=0 then _rank(i)=rank;
else _rank(i)='*';
rc=hih.next();
end;
output;
end;
stop;
keep player _r:;
run;
Haikuo
Dear Haikuo,
I think PG's method is good enough. Hash Table has its advantage when a large data .
Under this situation , PG is good, If You let me write, I also the code like PG.
See you later.
Ksharp
I'm glad for this addition by , as the actual data sets I'm dealing with are pretty large.
So if the second version of transform given here is better suited to larger data sets, I'll use it.
Now I need that mentioned example translated, using this present data set created....
Example 74.2 Principal Components of Basketball Rankings
Thanks much!
OK. Here is .
I am afraid that I will be blamed by Dr SteveDenham .
I should not spoon feed .
If you are familiar with PCA, it should be easy to do it.
data input; input rank (race1-race3) ($) ; cards; 1 D E D 2 A D B 3 E . A 4 C B C 5 B C . ; run; data _null_; if _n_ eq 1 then do; length order1 - order3 8 k $ 8; declare hash ha(ordered:'y'); ha.definekey('k'); ha.definedata('k','order1','order2','order3'); ha.definedone(); end; set input end=last; array o{3} order1-order3 ; array r{3} race1-race3 ; do i=1 to dim(r); if not missing(r{i}) then do; k=r{i}; rc=ha.find(); o{i}=rank; ha.replace(); call missing(of order:); end; end; if last then ha.output(dataset:'bballm'); run; proc means data=bballm noprint; output out=maxrank max=morder1 morder2 morder3 ; run; data bball; set bballm; if _n_=1 then set maxrank; array services[3] order1-order3; array maxranks[3] morder1-morder3 ; keep k order1-order3 ; do i=1 to 3 ; if services=. then services=maxranks+1; end; run; ods graphics on; proc prinqual data=bball out=tbball scores n=1 tstandard=z plots=transformations; transform untie(order1-order3); id k ; run; * Perform the Final Principal Component Analysis; proc factor nfactors=1 plots=scree; ods select factorpattern screeplot; var Torder1-Torder3 ; run; proc sort; by Prin1; run; * Display Scores on the First Principal Component; proc print; var k Prin1; run;
Ksharp
I'm sure glad you translated that mini-monster. The results are quite interesting:
Runner | Principal Component 1 |
D | -2.69500 |
E | -0.19243 |
A | 0.75650 |
B | 1.06547 |
C | 1.06547 |
Assuming correct translation, and correct methodology, this PCA approach solves a tie issue between D and E (that resulted in a simpler form of solution script).
Also, if the PC loading values have meaning (good question for ), I suppose we could say the tie between D and E wasn't even close. Whereas the performance of B and C is very close. Too, some clustering is possible, eye-balling, or using one of SAS clustering procedures.
Thanks much for your help here!
I still need to convert your script to accommodate my 100x100 data sets, so the work is far from over. I may have to come back and ask for more help in doing that.
Ha. HaiKuo,
Here is :
data input; input rank (race1-race3) ($) ; cards; 1 D E D 2 A D B 3 E . A 4 C B C 5 B C . ; run; options missing='*'; data _null_; if _n_ eq 1 then do; length order1 - order3 8 k $ 8; declare hash ha(ordered:'y'); ha.definekey('k'); ha.definedata('k','order1','order2','order3'); ha.definedone(); end; set input end=last; array o{3} order1-order3 ; array r{3} race1-race3 ; do i=1 to dim(r); if not missing(r{i}) then do; k=r{i}; rc=ha.find(); o{i}=rank; ha.replace(); call missing(of order:); end; end; if last then ha.output(dataset:'want'); run;
Ksharp
Bravo! This is ingenious! Very Happy to learn! Thank you!
Haikuo
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.