BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Nupur20
Calcite | Level 5

are Hi,

I am struggling with spilitting my dataset into two based on number of observations.

I have a dataset whose total number of observations can change on pulling out data every month from an access database. The SAS ultimately outputs data to excel file for creating map.

The problem is, I want to split my data into two if my total number of observations is greater than 100.

For eg: something like If No. of observations >=100 then split dataset and make two sets of 50 observation each. OR if no. of observaitons <100 then do nothing.

Can anyone help me with this? I would really appreciate your time.

Thanks,

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Another variation version of Art's code.

data have;
   do i=1 to 102;
     output;
   end;
run;

 

%macro split(dsn);
%let dsid=%sysfunc(open(&dsn));
%let nobs=%sysfunc(attrn(&dsid,nobs));
%let dsid=%sysfunc(close(&dsid));
%let half=%sysevalf(&nobs/2,integer);
  %if &nobs. gt 100 %then %do;
    data &dsn.1 &dsn.2;
      set &dsn.;
      if _n_ le &half then output &dsn.1;
      else output &dsn.2;
    run;
  %end;
%mend split;

 

%split(have)

Ksharp

View solution in original post

6 REPLIES 6
shivas
Pyrite | Level 9

Hi,

Is this what you  want.

data one;

do i=1 to 200;

output;

end;

run;

data want want1 nothing;

set one;

if _n_>100 and _n_<150 then output want;

else if _n_>150 then output want1;

else if _n_<100 then output nothing;

run;

Thanks,

Shiva

Haikuo
Onyx | Level 15

Not sure about the efficiency benchmark, but here is another data step approach:

data want want1 nothing;

set one (firstobs=100 obs=150 in=want)

  one (firstobs=151 in=want1)

one (obs=99 in=nothing);

if want then output want;

else if want1 then output want1;

else if nothing then output nothing;

run;

Regards,

Haikuo

Linlin
Lapis Lazuli | Level 10

or

data have;

   do i=1 to 110;

     output;

   end;

  data want1 want2;

     set have nobs=nobs;

  if nobs>100 then do;

    if mod(_n_,2)=0 then output want1;

        else output want2;

                   end;

   else stop;

  run;

MikeZdeb
Rhodochrosite | Level 12

hi ... here's another idea that varies the number of data sets produced (d1, d2, etc.) by size of the original data set ...

* make some data (try some different sizes);

data one;

do i=1 to 420;

output;

end;

run;

* if more than 100 obs, divide into data sets;

filename x temp;

data _null_;

file x;

set one nobs=obs;

if obs le 100 then stop;

put 'data';

x = ceil(obs/50);

do j=1 to ceil(obs/50);

  put 'd' j;

end;

put '; set one; select;';

do j=1 to ceil(obs/50) - 1;

  k = j * 50;

  put 'when (_n_ le ' k ') output d' j  ';';

end;

put 'otherwise output d' j '; end; run;';

stop;

run;

%include x / source2;

filename x;

art297
Opal | Level 21

I kind of liked Linlin's approach, but I would wrap it in a macro.  E.g.:

data have;

   do i=1 to 102;

     output;

   end;

run;

%macro split(dsn);

  proc sql noprint;

    select count(*) into :nobs

      from &dsn.

    ;

  quit;

  %if &nobs. gt 100 %then %do;

    data &dsn.1 &dsn.2;

      set &dsn.;

      if mod(_n_,2)=0 then output &dsn.1;

      else output &dsn.2;

    run;

  %end;

%mend split;

%split(have)

Ksharp
Super User

Another variation version of Art's code.

data have;
   do i=1 to 102;
     output;
   end;
run;

 

%macro split(dsn);
%let dsid=%sysfunc(open(&dsn));
%let nobs=%sysfunc(attrn(&dsid,nobs));
%let dsid=%sysfunc(close(&dsid));
%let half=%sysevalf(&nobs/2,integer);
  %if &nobs. gt 100 %then %do;
    data &dsn.1 &dsn.2;
      set &dsn.;
      if _n_ le &half then output &dsn.1;
      else output &dsn.2;
    run;
  %end;
%mend split;

 

%split(have)

Ksharp

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 21966 views
  • 2 likes
  • 7 in conversation