Hello,
I am attempting to merge two data sets and want to use the IN= statement to create a new column that tags the data as coming from data set 1, data set 2, or both. However, I am finding that despite putting IN= followed by logic statements later in the data step to assign a value for these three conditions, only one value is being written and the "both" value gets overwritten. An example with two data sets (taken from this paper by Joshua Horstman) is attached along with code for my attempt to merge the two data sets by planet and display whether the data came from the planet_size data set, the planet_dist data set, or both. This image shows what I am getting:
Note how the data_origin column only has values for 'dist', when in reality the observation for Earth should have a value of 'size' as only the planet_size data set contributed. The output that I would like to obtain is attached along with the code shown below. Any thoughts would be greatly appreciated, thanks in advance!
data work.planet_size; input planet $ diam_mi; datalines; Earth 7918 Jupiter 86881 Mars 4212 Mercury 3032 Neptune 30599 Saturn 72367 Uranus 31518 Venus 7521 ; data work.planet_dist; input planet $ dist_au; datalines; Jupiter 4.2 Mars 0.52 Mercury 0.61 Neptune 29.06 Saturn 8.54 Uranus 18.14 Venus 0.28 ; data work.planet_size_dist; merge work.planet_size(in=s) work.planet_dist(in=d); by planet; if s and not d then data_origin = 'dist'; else if d and not s then data_origin = 'size'; else if s and d then data_origin = 'dist_size'; run; /* Desired Output */ PLANET DIAM_MI DIST_AU data_origin Earth 7918 . size Jupiter 86881 4.2 dist_size Mars 4212 0.52 dist_size Mercury 3032 0.61 dist_size Neptune 30599 29.06 dist_size Saturn 72367 8.54 dist_size Uranus 31518 18.14 dist_size Venus 7521 0.28 dist_size
you need to put a length on data_origin.
it will truncate the value without a proper length statement. try running this:
data work.planet_size_dist;
length data_origin $9;
merge work.planet_size(in=s)
work.planet_dist(in=d);
by planet;
if s and not d then data_origin = 'dist';
else if d and not s then data_origin = 'size';
else if s and d then data_origin = 'dist_size';
proc print;
run;
you need to put a length on data_origin.
it will truncate the value without a proper length statement. try running this:
data work.planet_size_dist;
length data_origin $9;
merge work.planet_size(in=s)
work.planet_dist(in=d);
by planet;
if s and not d then data_origin = 'dist';
else if d and not s then data_origin = 'size';
else if s and d then data_origin = 'dist_size';
proc print;
run;
Thank you @tarheel13 , this worked brilliantly!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.