BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
ajd555
Fluorite | Level 6

Hello,

 

I am attempting to merge two data sets and want to use the IN= statement to create a new column that tags the data as coming from data set 1, data set 2, or both. However, I am finding that despite putting IN= followed by logic statements later in the data step to assign a value for these three conditions, only one value is being written and the "both" value gets overwritten. An example with two data sets (taken from this paper by Joshua Horstman) is attached along with code for my attempt to merge the two data sets by planet and display whether the data came from the planet_size data set, the planet_dist data set, or both. This image shows what I am getting:

Screenshot 2022-11-26 at 17.19.59.png

Note how the data_origin column only has values for 'dist', when in reality the observation for Earth should have a value of 'size' as only the planet_size data set contributed. The output that I would like to obtain is attached along with the code shown below. Any thoughts would be greatly appreciated, thanks in advance!

 

data work.planet_size;
	input planet $ diam_mi;
datalines;
Earth 7918
Jupiter 86881
Mars 4212
Mercury 3032
Neptune 30599
Saturn 72367
Uranus 31518
Venus 7521
;

data work.planet_dist;
	input planet $ dist_au;
datalines;
Jupiter 4.2
Mars 0.52
Mercury 0.61
Neptune 29.06
Saturn 8.54
Uranus 18.14
Venus 0.28
;

data work.planet_size_dist;
	merge work.planet_size(in=s)
		work.planet_dist(in=d);
	by planet;
	if s and not d then data_origin = 'dist';
	else if d and not s then data_origin = 'size';
	else if s and d then data_origin = 'dist_size';
run;

/* Desired Output */
PLANET 	DIAM_MI 	DIST_AU		data_origin
Earth 	7918 		.		size
Jupiter 86881 		4.2		dist_size
Mars 	4212 		0.52		dist_size
Mercury 3032 		0.61		dist_size
Neptune 30599 		29.06		dist_size
Saturn 	72367 		8.54		dist_size
Uranus 	31518 		18.14		dist_size
Venus 	7521 		0.28		dist_size
1 ACCEPTED SOLUTION

Accepted Solutions
tarheel13
Rhodochrosite | Level 12

you need to put a length on data_origin. 

tarheel13_0-1669510820996.png

it will truncate the value without a proper length statement. try running this: 

data work.planet_size_dist;
   length data_origin $9;
   merge work.planet_size(in=s)
      work.planet_dist(in=d);
   by planet;
   if s and not d then data_origin = 'dist';
   else if d and not s then data_origin = 'size';
   else if s and d then data_origin = 'dist_size';
proc print;
run;

 

View solution in original post

2 REPLIES 2
tarheel13
Rhodochrosite | Level 12

you need to put a length on data_origin. 

tarheel13_0-1669510820996.png

it will truncate the value without a proper length statement. try running this: 

data work.planet_size_dist;
   length data_origin $9;
   merge work.planet_size(in=s)
      work.planet_dist(in=d);
   by planet;
   if s and not d then data_origin = 'dist';
   else if d and not s then data_origin = 'size';
   else if s and d then data_origin = 'dist_size';
proc print;
run;

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 643 views
  • 2 likes
  • 2 in conversation