04-23-2015 11:00 AM
Say I have a data residing in datalines and want to use it to create two sas datasets- one with all the data that's sitting in the datalines and the other that has duplicate observations from the original dataset with some new variables added...how do I go about doing so?
I know that I wanted to create to datasets based upon a value of a variable in the original dataset (e.g. let's say the original dataset has birth month as a variable), I can use subsetting ifs in tandem w/ output statements to create multiple datasets, but how does one go about it, when one dataset contains ALL the original data, and the other dataset ALSO contains ALL the original data plus some new variables?
04-23-2015 11:10 AM
data temp temp2;
input a $;
proc sort data=temp2 nodupkey;
04-23-2015 11:11 AM
The general approach to build two datasets in a single data step would be. Use data set options KEEP or DROP to select desired variables in each set.
Data set1 (keep=<selected variables>) set2 (keep=<variables on inputlist>);
if <condition> then output set1;
However your comment about "duplicate observations" makes me think that you aren't going to accomplish this in a single datastep.
Please provide some example data and what you want as the final output. If your "duplicates" do not always occur in succession then provide that in your example. Since you say the data is in data lines please include the data step that reads them. You don't need to provide hundreds of lines of example data though, just enough to demonstrate all of the rules you want to apply.