topic Re: Splitting dataset in two without splitting up multiple observations on the same individual in Statistical Procedures

Splitting dataset in two without splitting up multiple observations on the same individual

elfkitty12 — Tue, 06 Nov 2012 04:09:19 GMT

I have a dataset that contains race times for track and field runners. More specifically, the dataset has the variables name and race time. The way I received the data, each time the runner competes, it's a new observation. So the data looks something like this:

Name Race Time

Joe 56.0

Joe 60.0

Joe 57.2

Bob 52

Bob 51.5

The number of observations per person differs (some people have 5 while other have only 1)

I want to split the dataset into two by race time (so dataset1 will contain roughly the top 50% fastest times, while dataset2 contains the bottom 50%). However, I do not want to split up observations for an individual. In other words, I want all of "Joe's" race times to be contained in one dataset, rather than have some of his race times in dataset1 and the rest in dataset2).

Can anyone help me with this? I would GREATLY appreciate this!

Re: Splitting dataset in two without splitting up multiple observations on the same individual

art297 — Tue, 06 Nov 2012 05:08:05 GMT

Not sure how you want to deal with ties, but the following might give you an idea on how to accomplish the desired goal:

data have;

input Name $ Race_Time;

cards;

Joe 56.0

Joe 60.0

Joe 57.2

Bob 52

Bob 51.5

Tom 56.0

Tom 60.0

Bill 57.2

Bill 52

Bill 51.5

;

proc sql;

create table need as

select *,mean(Race_Time) as mean_RT

from have

group by name

order by mean_RT

;

quit;

data want1 want2;

set need;

retain cutoff (0);

by mean_rt;

if _n_ le (&sqlobs./2) then output want1;

else do;

if cutoff then output want2;

else if first.mean_RT then do;

cutoff=1;

output want2;

end;

else if last.mean_RT then do;

output want1;

cutoff=1;

end;

run;

Re: Splitting dataset in two without splitting up multiple observations on the same individual

Ksharp — Tue, 06 Nov 2012 05:26:04 GMT

But someone will have both top50% and bottom50% value, you want him belong to which dataset ?

Assuming include him in bottom50% dataset.

data have;
  input Name $ Race_Time;
  cards;
Joe               56.0
Joe               60.0
Joe               57.2
Bob               52
Bob               51.5
Tom               56.0
Tom               60.0
Bill              57.2
Bill              52
Bill              51.5
;
run;
proc rank data=have out=x groups=2;
var race_time;
ranks group;
run;
proc sql noprint;
create table temp as
select distinct name from x where group=0;

create table table1 as
select * from have where name in (select name from temp);

create table table2 as
select * from have where name not in (select name from temp);
quit;

Ksharp

Re: Splitting dataset in two without splitting up multiple observations on the same individual

elfkitty12 — Tue, 06 Nov 2012 14:34:51 GMT

yes, both of these do what I want. thank you very much!!