I have a dataset that contains race times for track and field runners. More specifically, the dataset has the variables name and race time. The way I received the data, each time the runner competes, it's a new observation. So the data looks something like this:
Name Race Time
Joe 56.0
Joe 60.0
Joe 57.2
Bob 52
Bob 51.5
The number of observations per person differs (some people have 5 while other have only 1)
I want to split the dataset into two by race time (so dataset1 will contain roughly the top 50% fastest times, while dataset2 contains the bottom 50%). However, I do not want to split up observations for an individual. In other words, I want all of "Joe's" race times to be contained in one dataset, rather than have some of his race times in dataset1 and the rest in dataset2).
Can anyone help me with this? I would GREATLY appreciate this!
Not sure how you want to deal with ties, but the following might give you an idea on how to accomplish the desired goal:
data have;
input Name $ Race_Time;
cards;
Joe 56.0
Joe 60.0
Joe 57.2
Bob 52
Bob 51.5
Tom 56.0
Tom 60.0
Bill 57.2
Bill 52
Bill 51.5
;
proc sql;
create table need as
select *,mean(Race_Time) as mean_RT
from have
group by name
order by mean_RT
;
quit;
data want1 want2;
set need;
retain cutoff (0);
by mean_rt;
if _n_ le (&sqlobs./2) then output want1;
else do;
if cutoff then output want2;
else if first.mean_RT then do;
cutoff=1;
output want2;
end;
else if last.mean_RT then do;
output want1;
cutoff=1;
end;
end;
run;
Not sure how you want to deal with ties, but the following might give you an idea on how to accomplish the desired goal:
data have;
input Name $ Race_Time;
cards;
Joe 56.0
Joe 60.0
Joe 57.2
Bob 52
Bob 51.5
Tom 56.0
Tom 60.0
Bill 57.2
Bill 52
Bill 51.5
;
proc sql;
create table need as
select *,mean(Race_Time) as mean_RT
from have
group by name
order by mean_RT
;
quit;
data want1 want2;
set need;
retain cutoff (0);
by mean_rt;
if _n_ le (&sqlobs./2) then output want1;
else do;
if cutoff then output want2;
else if first.mean_RT then do;
cutoff=1;
output want2;
end;
else if last.mean_RT then do;
output want1;
cutoff=1;
end;
end;
run;
But someone will have both top50% and bottom50% value, you want him belong to which dataset ?
Assuming include him in bottom50% dataset.
data have; input Name $ Race_Time; cards; Joe 56.0 Joe 60.0 Joe 57.2 Bob 52 Bob 51.5 Tom 56.0 Tom 60.0 Bill 57.2 Bill 52 Bill 51.5 ; run; proc rank data=have out=x groups=2; var race_time; ranks group; run; proc sql noprint; create table temp as select distinct name from x where group=0; create table table1 as select * from have where name in (select name from temp); create table table2 as select * from have where name not in (select name from temp); quit;
Ksharp
yes, both of these do what I want. thank you very much!!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.