topic Re: Spliting a dataset based on row count in SAS Programming

Spliting a dataset based on row count

mmaleta851 — Fri, 28 Apr 2023 18:33:18 GMT

Suppose I have a list of new visitors sorted by last name descending. If I have an even number of visitors I want to send the top half to one salesperson and another to the other salesperson.

Have:

Zuniga	James
Utiga	Tom
Balfor	Jane
Anderson	Jessica

Want:
Salesperson A gets:

Zuniga	James
Utiga	Tom

Sales person B gets:

Balfor	Jane
Anderson	Jessica

However, suppose I have and odd number of visitors. Since Salesperson A has a higher priority, I want them to get the extra visitor.

Have:

Zuniga	James
Utiga	Tom
Lemieux	Lynn
Balfor	Jane
Anderson	Jessica

Want:
Salesperson A gets:

Zuniga	James
Utiga	Tom
Lemieux	Lynn

Salesperson B gets:

Balfor	Jane
Anderson	Jessica

Re: Spliting a dataset based on row count

russt_sas — Fri, 28 Apr 2023 19:14:10 GMT

Here is one way to accomplish your task:

data one;
input last $ first $;
cards;
Zuniga James
Utiga Tom
Lemieux Lynn
Balfor Jane
Anderson Jessica
;

data A B;
set one nobs=obs;
half=obs/2;
if mod(obs,2)=0 then do;
if _n_ le half then output A;
else output B;
end;
else do;
if _n_ le ceil(half) then output A;
else output B;
end;

run;

proc print;
run;

Re: Spliting a dataset based on row count

PaigeMiller — Fri, 28 Apr 2023 19:34:58 GMT

Don't split the dataset. Assign value of 0 to the top half and 1 to the bottom half (or something like that). Keep everything in one data set.

Re: Spliting a dataset based on row count

ballardw — Fri, 28 Apr 2023 20:44:08 GMT

Similar but less code:

data have;
  input name $;
datalines;
abc
pdq
vvv
jhi
ccc
;
data want;
   set have nobs=num;
   group = (_n_ le round( (num/2),1) );
run;

Assumes that the ORDER of the names in the data is important.

Nobs= creates a temporary variable of the count of observations in the data set.

Round( (num/2,1) ) rounds the division of the number of records divided by 2 to the nearest integer, so the 2.5 you would get from 5 records gets rounded to 3.

_N_ is a counter for the number of iterations of the data step. In a simple action like this it will act as a row counter. so if the _N_ is less than or equal to that half way we will get 1 and 0 other wise as that is how SAS returns the values of logical comparisons: 1 for true, 0 for false.

Caution: this sort of algorithm doesn't extend well to a generic N number of groups. Proc Rank or Surveyselect might be better in a more complex selection.

Re: Spliting a dataset based on row count

mmaleta851 — Fri, 28 Apr 2023 22:14:08 GMT

This worked. Thank you.