topic Splitting dataset based on total observations in Statistical Procedures

Splitting dataset based on total observations

Nupur20 — Fri, 04 May 2012 16:52:16 GMT

are Hi,

I am struggling with spilitting my dataset into two based on number of observations.

I have a dataset whose total number of observations can change on pulling out data every month from an access database. The SAS ultimately outputs data to excel file for creating map.

The problem is, I want to split my data into two if my total number of observations is greater than 100.

For eg: something like If No. of observations >=100 then split dataset and make two sets of 50 observation each. OR if no. of observaitons <100 then do nothing.

Can anyone help me with this? I would really appreciate your time.

Thanks,

Re: Splitting dataset based on total observations

shivas — Fri, 04 May 2012 16:59:33 GMT

Hi,

Is this what you want.

data one;

do i=1 to 200;

output;

end;

run;

data want want1 nothing;

set one;

if _n_>100 and _n_<150 then output want;

else if _n_>150 then output want1;

else if _n_<100 then output nothing;

run;

Thanks,

Shiva

Re: Splitting dataset based on total observations

Haikuo — Fri, 04 May 2012 17:36:26 GMT

Not sure about the efficiency benchmark, but here is another data step approach:

data want want1 nothing;

set one (firstobs=100 obs=150 in=want)

one (firstobs=151 in=want1)

one (obs=99 in=nothing);

if want then output want;

else if want1 then output want1;

else if nothing then output nothing;

run;

Regards,

Haikuo

Re: Splitting dataset based on total observations

Linlin — Sat, 05 May 2012 00:34:07 GMT

data have;

do i=1 to 110;

output;

end;

data want1 want2;

set have nobs=nobs;

if nobs>100 then do;

if mod(_n_,2)=0 then output want1;

else output want2;

end;

else stop;

run;

Re: Splitting dataset based on total observations

MikeZdeb — Sat, 05 May 2012 14:02:47 GMT

hi ... here's another idea that varies the number of data sets produced (d1, d2, etc.) by size of the original data set ...

* make some data (try some different sizes);

data one;

do i=1 to 420;

output;

end;

run;

* if more than 100 obs, divide into data sets;

filename x temp;

data _null_;

file x;

set one nobs=obs;

if obs le 100 then stop;

put 'data';

x = ceil(obs/50);

do j=1 to ceil(obs/50);

put 'd' j;

end;

put '; set one; select;';

do j=1 to ceil(obs/50) - 1;

k = j * 50;

put 'when (_n_ le ' k ') output d' j ';';

end;

put 'otherwise output d' j '; end; run;';

stop;

run;

%include x / source2;

filename x;

Re: Splitting dataset based on total observations

art297 — Sat, 05 May 2012 20:35:21 GMT

I kind of liked Linlin's approach, but I would wrap it in a macro. E.g.:

data have;

do i=1 to 102;

output;

end;

run;

%macro split(dsn);

proc sql noprint;

select count(*) into :nobs

from &dsn.

;

quit;

%if &nobs. gt 100 %then %do;

data &dsn.1 &dsn.2;

set &dsn.;

if mod(_n_,2)=0 then output &dsn.1;

else output &dsn.2;

run;

%end;

%mend split;

%split(have)

Re: Splitting dataset based on total observations

Ksharp — Mon, 07 May 2012 08:06:16 GMT

Another variation version of Art's code.

data have;
   do i=1 to 102;
     output;
   end;
run;

 

%macro split(dsn);
%let dsid=%sysfunc(open(&dsn));
%let nobs=%sysfunc(attrn(&dsid,nobs));
%let dsid=%sysfunc(close(&dsid));
%let half=%sysevalf(&nobs/2,integer);
  %if &nobs. gt 100 %then %do;
    data &dsn.1 &dsn.2;
      set &dsn.;
      if _n_ le &half then output &dsn.1;
      else output &dsn.2;
    run;
  %end;
%mend split;

 

%split(have)

Ksharp