BookmarkSubscribeRSS Feed
kimdukes77
Obsidian | Level 7

Hi

 

I have several datasets for processing, one variable, varying number of observations in each.

 

I need to split the datasets into multiple datasets.  I need to output the first 80 observations, then the next 80 and so on until I reach the end of the dataset which will output the remaining observations.

 

What would be the simplest way to tackle this?

 

Thanks

5 REPLIES 5
Astounding
PROC Star

By far, the easiest way is not to split the observations.  Instead, create a new variable that is 1 for the first 80 observations, 2 for the next 80, etc.  Then use a BY statement when processing the data set later.  Here is a simple example:

 

data want;

set a b c d e f g;

batch = ceil(_n_/80);

run;

arodriguez
Lapis Lazuli | Level 10

Hi kimdukes77,

You could specify multiple table names in the DATA statement, for example

DATA table1 table2;
  set table;
  if _N_<=80 then output table1;
  else if _N_<=160 then output table2;
run;

If you know the number of observations you could create a step like the previous one.

 

If the number of observations is going to be dynamic, counting the number of rows and apply ceil(nobs/80) can give you the number of data sets that you need

 

Reeza
Super User

It's best not to split the data. Why do you want to split it? 

 

http://www.sascommunity.org/wiki/Split_Data_into_Subsets

GoExplore
Obsidian | Level 7

Let us know if this works.

 

This is based on macro and extracts 2 records from original dataset and creates table.  The sample dataset has 10 records.

 

/* sample dataset */
data test;
   input empid 
         firstname $4.
         lastname $4.;
         datalines;
1 Sara A
2 Sara B
3 Sara C
4 Sara D
5 Sara E
6 Sara F
7 Sara G
8 Sara H
9 Sara I
10 Sara J
;

/* get # of records from original dataset */
proc sql;
select count(*) into: total_rec
from test;
quit;
%put Total # of records: &total_rec;

/* define a counter */
%LET i = 1;

/* macro */
%MACRO test1 (val);

	/* create table with 2 records*/

	%if &total_rec >=2 %then
		%do;
			%put ######################;
			%put Iteration #&i;

			/* extract 2 records */
			proc sql outobs=2;
			create table table_&i as
			select * from test;
			quit;
			%put Table &i created;

			/* sort extracted records */
			proc sort data=table_&i; by empid; run;

			/* remove the extracted records from the original dataset */
			data test;
			merge test (in=a) table_&i (in=b);
			by empid;
			if a and not b;
			run;

			/* count the remaining records */
			proc sql;
			select count(*) into: total_rec
			from test;
			quit;

			/* if we have more or equal to 2 records loop macro */
			%if &total_rec >= 2 %then
				%do;
				%LET i = %sysevalf(&i + 1);
				%test1(&i);
				%put &i;
				%end;
			%else
				%if &total_rec > 0 %then 
					/* there must be atleast 1 record */
					%do;
						%LET i = %sysevalf(&i + 1);
						/* extract remaining records */
						proc sql;
						create table table_&i as
						select * from test;
						quit;
					%end;
		%end;

%MEND;

%test1(&i);

 

JohnHoughton
Quartz | Level 8

This does what you want quite simply, using firstobs & obs dataset options.

 

 

data have; do x=1 to 250; output; end;run;

 

%macro split (dsn=,splitobs=);
proc sql noprint;
select count(*) into : totobs
from &dsn;
quit;

%do i=1 %to &totobs %by &splitobs;
data want%eval(1+&i/&splitobs);
set &dsn (firstobs=&i obs=%eval(&i+&splitobs-1));
run;
%end;
%mend;
%split(dsn=have,splitobs=80);

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 17266 views
  • 15 likes
  • 6 in conversation