I have a large file of inpatient stays that I need to split into smaller files (approx. 20,000 observations) for operational purposes. A patient (patient_id) may have one or more observations in the data set, these observations need to be kept together in the new smaller datasets, i.e., I can't split the large dataset into the first 20000 obs and then the second 20000 and so on.
How can I make sure to keep the observations for a particular patient (patient_ID) together?
Sort by patient_id.
Run this data step:
data intermediate;
set have end=eof;
by patient_id;
retain counter 0 ds_counter 1;
counter + 1;
if first.patient_id and counter ge 20000
then do;
ds_counter + 1;
counter = 1;
end;
if end then call symputx('num_ds',ds_counter);
drop counter;
run;
You now have the maximum number of datasets in the macro variable (for use in a %do loop), and an indicator in every observation where it should go.
Sort by patient_id.
Run this data step:
data intermediate;
set have end=eof;
by patient_id;
retain counter 0 ds_counter 1;
counter + 1;
if first.patient_id and counter ge 20000
then do;
ds_counter + 1;
counter = 1;
end;
if end then call symputx('num_ds',ds_counter);
drop counter;
run;
You now have the maximum number of datasets in the macro variable (for use in a %do loop), and an indicator in every observation where it should go.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.