DATA Step, Macro, Functions and more

create subsets of data: first contain the first 100 obs, second and third contain the second and third 100 obs of the original dataset

Reply
Occasional Contributor
Posts: 17

create subsets of data: first contain the first 100 obs, second and third contain the second and third 100 obs of the original dataset

Hi,

How can I create subsets of a dataset with macro or other methods that meet the requirment that the first subset contains the first 100 observations, and the second,third,fourth subsets contains the second,third and fourth 100 observations respectively. Thanks!

Respected Advisor
Posts: 4,173

Re: create subsets of data: first contain the first 100 obs, second and third contain the second and third 100 obs of the original dataset

data sample1 sample2 sample3;

     set have;

     if _n_<=100 then output sample1;

    else if 100<_n_<=200 then output sample2;

    else if 200<_n_<=300 then output sample3;

run;

Occasional Contributor
Posts: 17

Re: create subsets of data: first contain the first 100 obs, second and third contain the second and third 100 obs of the original dataset

Thank you for your help. But I have create many subsets, more than 100. Can you tell me simpler solutions?

Respected Advisor
Posts: 4,173

Re: create subsets of data: first contain the first 100 obs, second and third contain the second and third 100 obs of the original dataset

You need either a macro like Amir demonstrates or you need to use a hash table as this allows you to create new output data sets during data step execution time. Below a hash approach:

data have;
  do i=1 to 501;
    var=ceil(ranuni(1)*1000);
    output;
  end;
  stop;
run;

data mapping;
  stop;
  length hash_key $1;
  call missing(hash_key);
  set have;
run;

data _null_;
  set have end=last;
  if _n_=1 then
    do;
      if 0 then set mapping;
      declare hash h (dataset:'mapping',multidata:'y');
      _rc=h.defineKey('hash_key');
      _rc=h.defineData(all:'y');
      _rc=h.defineDone();
    end;

  _iter+1;
  _rc=h.add();

  if mod(_n_,100)=0 or last then
    do;
      _rc=h.output(dataset:cats('sample_ds_',_n_-_iter+1,'_to_',_n_,'(drop=hash_key)'));
      _rc=h.clear();
      _iter=0;
    end;
run;

Super Contributor
Posts: 282

Re: create subsets of data: first contain the first 100 obs, second and third contain the second and third 100 obs of the original dataset

Hi,

Using sashelp.class and a split by 5:

%let split=5;

data _null_;

  call symputx('nobs',nobs);

  stop;

  set sashelp.class nobs=nobs;

run;

%macro subset;

  %let part=0;

  %do i=1 %to &nobs %by &split;

    %let part=%eval(&part+1);

    data subset&part;

    set sashelp.class(firstobs=&i obs=%eval(&i-1+&split));

    run;

  %end;

%mend subset;

%subset;

Regards,

Amir.

Super User
Posts: 19,851

Re: create subsets of data: first contain the first 100 obs, second and third contain the second and third 100 obs of the original dataset

Here's a link that explains the variety of ways.

Note the following though:

Best Practice: Just Don't Do It


I'd bet there's a better way to do what you're doing especially if the subsets are sized at 100.

EDIT: Here's the link Smiley Happy

Split Data into Subsets - sasCommunity

Ask a Question
Discussion stats
  • 5 replies
  • 390 views
  • 4 likes
  • 4 in conversation