## Splitting dataset in two without splitting up multiple observations on the same individual

Solved
Occasional Contributor
Posts: 9

# Splitting dataset in two without splitting up multiple observations on the same individual

I have a dataset that contains race times for track and field runners.  More specifically, the dataset has the variables name and race time.  The way I received the data, each time the runner competes, it's a new observation.  So the data looks something like this:

Name          Race Time

Joe               56.0

Joe               60.0

Joe               57.2

Bob               52

Bob               51.5

The number of observations per person differs (some people have 5 while other have only 1)

I want to split the dataset into two by race time (so dataset1 will contain roughly the top 50% fastest times, while dataset2 contains the bottom 50%).  However, I do not want to split up observations for an individual.  In other words, I want all of "Joe's" race times to be contained in one dataset, rather than have some of his race times in dataset1 and the rest in dataset2).

Can anyone help me with this?  I would GREATLY appreciate this!

Accepted Solutions
Solution
‎11-06-2012 12:08 AM
PROC Star
Posts: 7,641

## Re: Splitting dataset in two without splitting up multiple observations on the same individual

Not sure how you want to deal with ties, but the following might give you an idea on how to accomplish the desired goal:

data have;

input Name \$ Race_Time;

cards;

Joe               56.0

Joe               60.0

Joe               57.2

Bob               52

Bob               51.5

Tom               56.0

Tom               60.0

Bill              57.2

Bill              52

Bill              51.5

;

proc sql;

create table need as

select *,mean(Race_Time) as mean_RT

from have

group by name

order by mean_RT

;

quit;

data want1 want2;

set need;

retain cutoff (0);

by mean_rt;

if _n_ le (&sqlobs./2) then output want1;

else do;

if cutoff then output want2;

else if first.mean_RT then do;

cutoff=1;

output want2;

end;

else if last.mean_RT then do;

output want1;

cutoff=1;

end;

end;

run;

All Replies
Solution
‎11-06-2012 12:08 AM
PROC Star
Posts: 7,641

## Re: Splitting dataset in two without splitting up multiple observations on the same individual

Not sure how you want to deal with ties, but the following might give you an idea on how to accomplish the desired goal:

data have;

input Name \$ Race_Time;

cards;

Joe               56.0

Joe               60.0

Joe               57.2

Bob               52

Bob               51.5

Tom               56.0

Tom               60.0

Bill              57.2

Bill              52

Bill              51.5

;

proc sql;

create table need as

select *,mean(Race_Time) as mean_RT

from have

group by name

order by mean_RT

;

quit;

data want1 want2;

set need;

retain cutoff (0);

by mean_rt;

if _n_ le (&sqlobs./2) then output want1;

else do;

if cutoff then output want2;

else if first.mean_RT then do;

cutoff=1;

output want2;

end;

else if last.mean_RT then do;

output want1;

cutoff=1;

end;

end;

run;

Super User
Posts: 10,205

## Re: Splitting dataset in two without splitting up multiple observations on the same individual

But someone will have both top50% and bottom50% value, you want him belong to which dataset ?

Assuming include him in bottom50% dataset.

```data have;
input Name \$ Race_Time;
cards;
Joe               56.0
Joe               60.0
Joe               57.2
Bob               52
Bob               51.5
Tom               56.0
Tom               60.0
Bill              57.2
Bill              52
Bill              51.5
;
run;
proc rank data=have out=x groups=2;
var race_time;
ranks group;
run;
proc sql noprint;
create table temp as
select distinct name from x where group=0;

create table table1 as
select * from have where name in (select name from temp);

create table table2 as
select * from have where name not in (select name from temp);
quit;

```

Ksharp

Occasional Contributor
Posts: 9