Hello all,
I'm looking to split a datasets based on one of the variables. Currently in the drug column I have two medications: drug a and drug b, I want to have one dataset include just drug a and the other just drug b.
The code I tried is as follows but no success. Best way to correct this?
Thank you
Data mydata;
Set sample;
Where location= "country" and code= "anti-infective";
Keep drug vol time_period;
Run;
data mydata.a;
set mydata;
keep drug= a;
run;
data mydata.b;
set mydata;
keep drug= b;
run;
Generally, splitting up a data set is not necessary and not productive. So let's abandon that idea.
If you want to perform an analysis using a certain PROC on part of the data set, you can do this:
proc something data=mydata(where=(drug='a'));
or better yet, you can use BY statements to have analyses performed on both drug A and drug B with one piece of code.
proc something data=mydata;
by drug;
/* Other statements for the PROC go here */
run;
which assumes that MYDATA is sorted by DRUG
Amazing, thank you! I'll give this a try.
To answer your question directly as asked, KEEP selects variables in a data set not observations.
WHERE filters observations in a data sets. Changing KEEP to WHERE in your code will get you the desired results.
data mydata.a;
set mydata;
where drug= "a"; *note that this is case sensitive as well;
run;
data mydata.b;
set mydata;
where drug= "b";
run;
However, @PaigeMiller is correct, you usually do not want to do this, his solution is preferable.
Thank you
The KEEP statement and the KEEP= dataset option control which variables go into the output, not which observations. Use a WHERE statement, like you did in your first step.
Thank you!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.