Programming the statistical procedures from SAS

Splitting dataset based on total observations

Reply
Contributor
Posts: 33

Splitting dataset based on total observations

are Hi,

I am struggling with spilitting my dataset into two based on number of observations.

I have a dataset whose total number of observations can change on pulling out data every month from an access database. The SAS ultimately outputs data to excel file for creating map.

The problem is, I want to split my data into two if my total number of observations is greater than 100.

For eg: something like If No. of observations >=100 then split dataset and make two sets of 50 observation each. OR if no. of observaitons <100 then do nothing.

Can anyone help me with this? I would really appreciate your time.

Thanks,

Super Contributor
Posts: 349

Re: Splitting dataset based on total observations

Hi,

Is this what you  want.

data one;

do i=1 to 200;

output;

end;

run;

data want want1 nothing;

set one;

if _n_>100 and _n_<150 then output want;

else if _n_>150 then output want1;

else if _n_<100 then output nothing;

run;

Thanks,

Shiva

Respected Advisor
Posts: 3,124

Re: Splitting dataset based on total observations

Not sure about the efficiency benchmark, but here is another data step approach:

data want want1 nothing;

set one (firstobs=100 obs=150 in=want)

  one (firstobs=151 in=want1)

one (obs=99 in=nothing);

if want then output want;

else if want1 then output want1;

else if nothing then output nothing;

run;

Regards,

Haikuo

Super Contributor
Posts: 1,636

Re: Splitting dataset based on total observations

or

data have;

   do i=1 to 110;

     output;

   end;

  data want1 want2;

     set have nobs=nobs;

  if nobs>100 then do;

    if mod(_n_,2)=0 then output want1;

        else output want2;

                   end;

   else stop;

  run;

Valued Guide
Posts: 734

Re: Splitting dataset based on total observations

hi ... here's another idea that varies the number of data sets produced (d1, d2, etc.) by size of the original data set ...

* make some data (try some different sizes);

data one;

do i=1 to 420;

output;

end;

run;

* if more than 100 obs, divide into data sets;

filename x temp;

data _null_;

file x;

set one nobs=obs;

if obs le 100 then stop;

put 'data';

x = ceil(obs/50);

do j=1 to ceil(obs/50);

  put 'd' j;

end;

put '; set one; select;';

do j=1 to ceil(obs/50) - 1;

  k = j * 50;

  put 'when (_n_ le ' k ') output d' j  ';';

end;

put 'otherwise output d' j '; end; run;';

stop;

run;

%include x / source2;

filename x;

Esteemed Advisor
Posts: 7,053

Re: Splitting dataset based on total observations

I kind of liked Linlin's approach, but I would wrap it in a macro.  E.g.:

data have;

   do i=1 to 102;

     output;

   end;

run;

%macro split(dsn);

  proc sql noprint;

    select count(*) into :nobs

      from &dsn.

    ;

  quit;

  %if &nobs. gt 100 %then %do;

    data &dsn.1 &dsn.2;

      set &dsn.;

      if mod(_n_,2)=0 then output &dsn.1;

      else output &dsn.2;

    run;

  %end;

%mend split;

%split(have)

Grand Advisor
Posts: 9,447

Re: Splitting dataset based on total observations

Another variation version of Art's code.

data have;
   do i=1 to 102;
     output;
   end;
run;

 

%macro split(dsn);
%let dsid=%sysfunc(open(&dsn));
%let nobs=%sysfunc(attrn(&dsid,nobs));
%let dsid=%sysfunc(close(&dsid));
%let half=%sysevalf(&nobs/2,integer);
  %if &nobs. gt 100 %then %do;
    data &dsn.1 &dsn.2;
      set &dsn.;
      if _n_ le &half then output &dsn.1;
      else output &dsn.2;
    run;
  %end;
%mend split;

 

%split(have)

Ksharp

Ask a Question
Discussion stats
  • 6 replies
  • 8015 views
  • 2 likes
  • 7 in conversation