BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
elfkitty12
Calcite | Level 5

I have a dataset that contains race times for track and field runners.  More specifically, the dataset has the variables name and race time.  The way I received the data, each time the runner competes, it's a new observation.  So the data looks something like this:

Name          Race Time

Joe               56.0

Joe               60.0

Joe               57.2

Bob               52

Bob               51.5

The number of observations per person differs (some people have 5 while other have only 1)

I want to split the dataset into two by race time (so dataset1 will contain roughly the top 50% fastest times, while dataset2 contains the bottom 50%).  However, I do not want to split up observations for an individual.  In other words, I want all of "Joe's" race times to be contained in one dataset, rather than have some of his race times in dataset1 and the rest in dataset2).

Can anyone help me with this?  I would GREATLY appreciate this!

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

Not sure how you want to deal with ties, but the following might give you an idea on how to accomplish the desired goal:

data have;

  input Name $ Race_Time;

  cards;

Joe               56.0

Joe               60.0

Joe               57.2

Bob               52

Bob               51.5

Tom               56.0

Tom               60.0

Bill              57.2

Bill              52

Bill              51.5

;

proc sql;

  create table need as

    select *,mean(Race_Time) as mean_RT

      from have

        group by name

          order by mean_RT

  ;

quit;

data want1 want2;

  set need;

  retain cutoff (0);

  by mean_rt;

  if _n_ le (&sqlobs./2) then output want1;

  else do;

    if cutoff then output want2;

    else if first.mean_RT then do;

      cutoff=1;

      output want2;

    end;

    else if last.mean_RT then do;

      output want1;

      cutoff=1;

    end;

  end;

run;

View solution in original post

3 REPLIES 3
art297
Opal | Level 21

Not sure how you want to deal with ties, but the following might give you an idea on how to accomplish the desired goal:

data have;

  input Name $ Race_Time;

  cards;

Joe               56.0

Joe               60.0

Joe               57.2

Bob               52

Bob               51.5

Tom               56.0

Tom               60.0

Bill              57.2

Bill              52

Bill              51.5

;

proc sql;

  create table need as

    select *,mean(Race_Time) as mean_RT

      from have

        group by name

          order by mean_RT

  ;

quit;

data want1 want2;

  set need;

  retain cutoff (0);

  by mean_rt;

  if _n_ le (&sqlobs./2) then output want1;

  else do;

    if cutoff then output want2;

    else if first.mean_RT then do;

      cutoff=1;

      output want2;

    end;

    else if last.mean_RT then do;

      output want1;

      cutoff=1;

    end;

  end;

run;

Ksharp
Super User

But someone will have both top50% and bottom50% value, you want him belong to which dataset ?

Assuming include him in bottom50% dataset.

data have;
  input Name $ Race_Time;
  cards;
Joe               56.0
Joe               60.0
Joe               57.2
Bob               52
Bob               51.5
Tom               56.0
Tom               60.0
Bill              57.2
Bill              52
Bill              51.5
;
run;
proc rank data=have out=x groups=2;
var race_time;
ranks group;
run;
proc sql noprint;
create table temp as
select distinct name from x where group=0;

create table table1 as
select * from have where name in (select name from temp);

create table table2 as
select * from have where name not in (select name from temp);
quit;


Ksharp

elfkitty12
Calcite | Level 5

yes, both of these do what I want.  thank you very much!!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1407 views
  • 2 likes
  • 3 in conversation