DATA Step, Macro, Functions and more

Output groups of observations to multiple datasets

Reply
Occasional Contributor
Posts: 16

Output groups of observations to multiple datasets

Hi

 

I have several datasets for processing, one variable, varying number of observations in each.

 

I need to split the datasets into multiple datasets.  I need to output the first 80 observations, then the next 80 and so on until I reach the end of the dataset which will output the remaining observations.

 

What would be the simplest way to tackle this?

 

Thanks

Super User
Posts: 5,083

Re: Output groups of observations to multiple datasets

By far, the easiest way is not to split the observations.  Instead, create a new variable that is 1 for the first 80 observations, 2 for the next 80, etc.  Then use a BY statement when processing the data set later.  Here is a simple example:

 

data want;

set a b c d e f g;

batch = ceil(_n_/80);

run;

Frequent Contributor
Posts: 144

Re: Output groups of observations to multiple datasets

Hi kimdukes77,

You could specify multiple table names in the DATA statement, for example

DATA table1 table2;
  set table;
  if _N_<=80 then output table1;
  else if _N_<=160 then output table2;
run;

If you know the number of observations you could create a step like the previous one.

 

If the number of observations is going to be dynamic, counting the number of rows and apply ceil(nobs/80) can give you the number of data sets that you need

 

Super User
Posts: 17,831

Re: Output groups of observations to multiple datasets

It's best not to split the data. Why do you want to split it? 

 

http://www.sascommunity.org/wiki/Split_Data_into_Subsets

Occasional Contributor
Posts: 16

Re: Output groups of observations to multiple datasets

[ Edited ]

Let us know if this works.

 

This is based on macro and extracts 2 records from original dataset and creates table.  The sample dataset has 10 records.

 

/* sample dataset */
data test;
   input empid 
         firstname $4.
         lastname $4.;
         datalines;
1 Sara A
2 Sara B
3 Sara C
4 Sara D
5 Sara E
6 Sara F
7 Sara G
8 Sara H
9 Sara I
10 Sara J
;

/* get # of records from original dataset */
proc sql;
select count(*) into: total_rec
from test;
quit;
%put Total # of records: &total_rec;

/* define a counter */
%LET i = 1;

/* macro */
%MACRO test1 (val);

	/* create table with 2 records*/

	%if &total_rec >=2 %then
		%do;
			%put ######################;
			%put Iteration #&i;

			/* extract 2 records */
			proc sql outobs=2;
			create table table_&i as
			select * from test;
			quit;
			%put Table &i created;

			/* sort extracted records */
			proc sort data=table_&i; by empid; run;

			/* remove the extracted records from the original dataset */
			data test;
			merge test (in=a) table_&i (in=b);
			by empid;
			if a and not b;
			run;

			/* count the remaining records */
			proc sql;
			select count(*) into: total_rec
			from test;
			quit;

			/* if we have more or equal to 2 records loop macro */
			%if &total_rec >= 2 %then
				%do;
				%LET i = %sysevalf(&i + 1);
				%test1(&i);
				%put &i;
				%end;
			%else
				%if &total_rec > 0 %then 
					/* there must be atleast 1 record */
					%do;
						%LET i = %sysevalf(&i + 1);
						/* extract remaining records */
						proc sql;
						create table table_&i as
						select * from test;
						quit;
					%end;
		%end;

%MEND;

%test1(&i);

 

Contributor
Posts: 39

Re: Output groups of observations to multiple datasets

This does what you want quite simply, using firstobs & obs dataset options.

 

 

data have; do x=1 to 250; output; end;run;

 

%macro split (dsn=,splitobs=);
proc sql noprint;
select count(*) into : totobs
from &dsn;
quit;

%do i=1 %to &totobs %by &splitobs;
data want%eval(1+&i/&splitobs);
set &dsn (firstobs=&i obs=%eval(&i+&splitobs-1));
run;
%end;
%mend;
%split(dsn=have,splitobs=80);

Ask a Question
Discussion stats
  • 5 replies
  • 613 views
  • 12 likes
  • 6 in conversation