Solved: Re: Splitting large dataset into smaller datasets while keeping observ...

akj · Posted 01-02-2019 01:31 PM

I have a large file of inpatient stays that I need to split into smaller files (approx. 20,000 observations) for operational purposes. A patient (patient_id) may have one or more observations in the data set, these observations need to be kept together in the new smaller datasets, i.e., I can't split the large dataset into the first 20000 obs and then the second 20000 and so on.

How can I make sure to keep the observations for a particular patient (patient_ID) together?

Kurt_Bremser · Posted 01-02-2019 01:39 PM

Sort by patient_id.

Run this data step:

data intermediate;
set have end=eof;
by patient_id;
retain counter 0 ds_counter 1;
counter + 1;
if first.patient_id and counter ge 20000
then do;
  ds_counter + 1;
  counter = 1;
end;
if end then call symputx('num_ds',ds_counter);
drop counter;
run;

You now have the maximum number of datasets in the macro variable (for use in a %do loop), and an indicator in every observation where it should go.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Kurt_Bremser · Posted 01-02-2019 01:39 PM

Sort by patient_id.

Run this data step:

data intermediate;
set have end=eof;
by patient_id;
retain counter 0 ds_counter 1;
counter + 1;
if first.patient_id and counter ge 20000
then do;
  ds_counter + 1;
  counter = 1;
end;
if end then call symputx('num_ds',ds_counter);
drop counter;
run;

You now have the maximum number of datasets in the macro variable (for use in a %do loop), and an indicator in every observation where it should go.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Splitting large dataset into smaller datasets while keeping observations for same ID together

Re: Splitting large dataset into smaller datasets while keeping observations for same ID together

Re: Splitting large dataset into smaller datasets while keeping observations for same ID together

Catch up on SAS Innovate 2026

Splitting large dataset into smaller datasets while keeping observations for same ID together

Re: Splitting large dataset into smaller datasets while keeping observations for same ID together

Re: Splitting large dataset into smaller datasets while keeping observations for same ID together

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away