BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MSK4
Obsidian | Level 7

hi team,

 

Here im having 3lakhs 30 k records in dataset i want to divide them equally into n number of datasets.

Please looking for code many thanks

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

@MSK4 wrote:

hi team,

 

Here im having 3lakhs 30 k records in dataset i want to divide them equally into n number of datasets.

Please looking for code many thanks


You haven't said anything about the data except its size.  If the data are already in random order and there is no "group" identifier then this is the simplest way, assuming you want 6 datasets:

 

data data1 data2 data3 data4 data5 data6;
  set have;
  select (mod(_n_-1,6)+1);
    when (1) output data1;
    when (2) output data2;
    when (3) output data3;
    when (4) output data4;
    when (5) output data5;
    when (6) output data6;
  end;
run;

If you want to make the number of datasets a parameter, then you could:

%let ndatasets=6;
filename tmp temp ;
data _null_;
  file tmp;
  length statement $200;
  statement='DATA ';
  do i=1 to &ndatasets;
    statement=catx(' ',statement,cats('data',i));
  end;
  put statement ';';
  put 'set have;';
  put "select (mod(_n_-1,&ndatasets)+1);";
  do i=1 to &ndatasets;
    put "when(" i ") output data" i ";" ;
  end;
  put "end;" / "run;" ;
run;
%include tmp / source2;

The program writes sas code to a temporary file (sas will delete it at the end fo the session). Then the %include statement calls it for execution.  The "/source2" option tells sas to print the included statements in the sas log.

 

 

 

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

5 REPLIES 5
Tom
Super User Tom
Super User

@MSK4 wrote:

hi team,

 

Here im having 3lakhs 30 k records in dataset i want to divide them equally into n number of datasets.

Please looking for code many thanks


Not sure what a "lakhs" is, but assuming it is just some unit of counting this is a common question.

Here is link to a previous question about splitting datasets by the number of observations that has an answer:

https://communities.sas.com/t5/Statistical-Procedures/Splitting-dataset-based-on-total-observations/...

jimbarbour
Meteorite | Level 14

1 lakh = 100,000

30 k = 30,000

 

So @MSK4 is talking about 330,000 records.

 

Jim

mkeintz
PROC Star

@MSK4 wrote:

hi team,

 

Here im having 3lakhs 30 k records in dataset i want to divide them equally into n number of datasets.

Please looking for code many thanks


You haven't said anything about the data except its size.  If the data are already in random order and there is no "group" identifier then this is the simplest way, assuming you want 6 datasets:

 

data data1 data2 data3 data4 data5 data6;
  set have;
  select (mod(_n_-1,6)+1);
    when (1) output data1;
    when (2) output data2;
    when (3) output data3;
    when (4) output data4;
    when (5) output data5;
    when (6) output data6;
  end;
run;

If you want to make the number of datasets a parameter, then you could:

%let ndatasets=6;
filename tmp temp ;
data _null_;
  file tmp;
  length statement $200;
  statement='DATA ';
  do i=1 to &ndatasets;
    statement=catx(' ',statement,cats('data',i));
  end;
  put statement ';';
  put 'set have;';
  put "select (mod(_n_-1,&ndatasets)+1);";
  do i=1 to &ndatasets;
    put "when(" i ") output data" i ";" ;
  end;
  put "end;" / "run;" ;
run;
%include tmp / source2;

The program writes sas code to a temporary file (sas will delete it at the end fo the session). Then the %include statement calls it for execution.  The "/source2" option tells sas to print the included statements in the sas log.

 

 

 

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
andreas_lds
Jade | Level 19

Why do you want to divide a rather small dataset at all?

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 494 views
  • 3 likes
  • 5 in conversation