I have a large file of inpatient stays that I need to split into smaller files (approx. 20,000 observations) for operational purposes. A patient (patient_id) may have one or more observations in the data set, these observations need to be kept together in the new smaller datasets, i.e., I can't split the large dataset into the first 20000 obs and then the second 20000 and so on.
How can I make sure to keep the observations for a particular patient (patient_ID) together?
Sort by patient_id.
Run this data step:
data intermediate;
set have end=eof;
by patient_id;
retain counter 0 ds_counter 1;
counter + 1;
if first.patient_id and counter ge 20000
then do;
ds_counter + 1;
counter = 1;
end;
if end then call symputx('num_ds',ds_counter);
drop counter;
run;
You now have the maximum number of datasets in the macro variable (for use in a %do loop), and an indicator in every observation where it should go.
Sort by patient_id.
Run this data step:
data intermediate;
set have end=eof;
by patient_id;
retain counter 0 ds_counter 1;
counter + 1;
if first.patient_id and counter ge 20000
then do;
ds_counter + 1;
counter = 1;
end;
if end then call symputx('num_ds',ds_counter);
drop counter;
run;
You now have the maximum number of datasets in the macro variable (for use in a %do loop), and an indicator in every observation where it should go.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.