Contributor
Posts: 50

# create subsets of data: first contain the first 100 obs, second and third contain the second and third 100 obs of the original dataset

Hi,

How can I create subsets of a dataset with macro or other methods that meet the requirment that the first subset contains the first 100 observations, and the second,third,fourth subsets contains the second,third and fourth 100 observations respectively. Thanks!

Posts: 4,736

## Re: create subsets of data: first contain the first 100 obs, second and third contain the second and third 100 obs of the original dataset

data sample1 sample2 sample3;

set have;

if _n_<=100 then output sample1;

else if 100<_n_<=200 then output sample2;

else if 200<_n_<=300 then output sample3;

run;

Contributor
Posts: 50

## Re: create subsets of data: first contain the first 100 obs, second and third contain the second and third 100 obs of the original dataset

Thank you for your help. But I have create many subsets, more than 100. Can you tell me simpler solutions?

Posts: 4,736

## Re: create subsets of data: first contain the first 100 obs, second and third contain the second and third 100 obs of the original dataset

You need either a macro like Amir demonstrates or you need to use a hash table as this allows you to create new output data sets during data step execution time. Below a hash approach:

data have;
do i=1 to 501;
var=ceil(ranuni(1)*1000);
output;
end;
stop;
run;

data mapping;
stop;
length hash_key \$1;
call missing(hash_key);
set have;
run;

data _null_;
set have end=last;
if _n_=1 then
do;
if 0 then set mapping;
declare hash h (dataset:'mapping',multidata:'y');
_rc=h.defineKey('hash_key');
_rc=h.defineData(all:'y');
_rc=h.defineDone();
end;

_iter+1;

if mod(_n_,100)=0 or last then
do;
_rc=h.output(dataset:cats('sample_ds_',_n_-_iter+1,'_to_',_n_,'(drop=hash_key)'));
_rc=h.clear();
_iter=0;
end;
run;

Super Contributor
Posts: 340

## Re: create subsets of data: first contain the first 100 obs, second and third contain the second and third 100 obs of the original dataset

Hi,

Using sashelp.class and a split by 5:

%let split=5;

data _null_;

call symputx('nobs',nobs);

stop;

set sashelp.class nobs=nobs;

run;

%macro subset;

%let part=0;

%do i=1 %to &nobs %by &split;

%let part=%eval(&part+1);

data subset&part;

set sashelp.class(firstobs=&i obs=%eval(&i-1+&split));

run;

%end;

%mend subset;

%subset;

Regards,

Amir.

Super User
Posts: 23,763

## Re: create subsets of data: first contain the first 100 obs, second and third contain the second and third 100 obs of the original dataset

Here's a link that explains the variety of ways.

Note the following though:

## Best Practice: Just Don't Do It

I'd bet there's a better way to do what you're doing especially if the subsets are sized at 100.