Fluorite | Level 6

## Spliting a dataset based on row count

Suppose I have a list of new visitors sorted by last name descending. If I have an even number of visitors I want to send the top half to one salesperson and another to the other salesperson.

Have:

 Zuniga James Utiga Tom Balfor Jane Anderson Jessica

Want:
Salesperson A gets:

 Zuniga James Utiga Tom

Sales person B gets:

 Balfor Jane Anderson Jessica

However, suppose I have and odd number of visitors. Since Salesperson A has a higher priority, I want them to get the extra visitor.

Have:

 Zuniga James Utiga Tom Lemieux Lynn Balfor Jane Anderson Jessica

Want:
Salesperson A gets:

 Zuniga James Utiga Tom Lemieux Lynn

Salesperson B gets:

 Balfor Jane Anderson Jessica

1 ACCEPTED SOLUTION

Accepted Solutions
SAS Employee

## Re: Spliting a dataset based on row count

data one;
input last \$ first \$;
cards;
Zuniga James
Utiga Tom
Lemieux Lynn
Balfor Jane
Anderson Jessica
;

data A B;
set one nobs=obs;
half=obs/2;
if mod(obs,2)=0 then do;
if _n_ le half then output A;
else output B;
end;
else do;
if _n_ le ceil(half) then output A;
else output B;
end;

run;

proc print;
run;

4 REPLIES 4
SAS Employee

## Re: Spliting a dataset based on row count

data one;
input last \$ first \$;
cards;
Zuniga James
Utiga Tom
Lemieux Lynn
Balfor Jane
Anderson Jessica
;

data A B;
set one nobs=obs;
half=obs/2;
if mod(obs,2)=0 then do;
if _n_ le half then output A;
else output B;
end;
else do;
if _n_ le ceil(half) then output A;
else output B;
end;

run;

proc print;
run;

Super User

## Re: Spliting a dataset based on row count

Similar but less code:

```data have;
input name \$;
datalines;
abc
pdq
vvv
jhi
ccc
;
data want;
set have nobs=num;
group = (_n_ le round( (num/2),1) );
run;```

Assumes that the ORDER of the names in the data is important.

Nobs= creates a temporary variable of the count of observations in the data set.

Round( (num/2,1) ) rounds the division of the number of records divided by 2 to the nearest integer, so the 2.5 you would get from 5 records gets rounded to 3.

_N_ is a counter for the number of iterations of the data step. In a simple action like this it will act as a row counter. so if the _N_ is less than or equal to that half way we will get 1 and 0 other wise as that is how SAS returns the values of logical comparisons: 1 for true, 0 for false.

Caution: this sort of algorithm doesn't extend well to a generic N number of groups. Proc Rank or Surveyselect might be better in a more complex selection.

Fluorite | Level 6

## Re: Spliting a dataset based on row count

This worked. Thank you.

Diamond | Level 26

## Re: Spliting a dataset based on row count

Don't split the dataset. Assign value of 0 to the top half and 1 to the bottom half (or something like that). Keep everything in one data set.

--
Paige Miller
Discussion stats
• 4 replies
• 567 views
• 4 likes
• 4 in conversation