BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
ajd555
Fluorite | Level 6

Hello,

 

I am attempting to merge two data sets and want to use the IN= statement to create a new column that tags the data as coming from data set 1, data set 2, or both. However, I am finding that despite putting IN= followed by logic statements later in the data step to assign a value for these three conditions, only one value is being written and the "both" value gets overwritten. An example with two data sets (taken from this paper by Joshua Horstman) is attached along with code for my attempt to merge the two data sets by planet and display whether the data came from the planet_size data set, the planet_dist data set, or both. This image shows what I am getting:

Screenshot 2022-11-26 at 17.19.59.png

Note how the data_origin column only has values for 'dist', when in reality the observation for Earth should have a value of 'size' as only the planet_size data set contributed. The output that I would like to obtain is attached along with the code shown below. Any thoughts would be greatly appreciated, thanks in advance!

 

data work.planet_size;
	input planet $ diam_mi;
datalines;
Earth 7918
Jupiter 86881
Mars 4212
Mercury 3032
Neptune 30599
Saturn 72367
Uranus 31518
Venus 7521
;

data work.planet_dist;
	input planet $ dist_au;
datalines;
Jupiter 4.2
Mars 0.52
Mercury 0.61
Neptune 29.06
Saturn 8.54
Uranus 18.14
Venus 0.28
;

data work.planet_size_dist;
	merge work.planet_size(in=s)
		work.planet_dist(in=d);
	by planet;
	if s and not d then data_origin = 'dist';
	else if d and not s then data_origin = 'size';
	else if s and d then data_origin = 'dist_size';
run;

/* Desired Output */
PLANET 	DIAM_MI 	DIST_AU		data_origin
Earth 	7918 		.		size
Jupiter 86881 		4.2		dist_size
Mars 	4212 		0.52		dist_size
Mercury 3032 		0.61		dist_size
Neptune 30599 		29.06		dist_size
Saturn 	72367 		8.54		dist_size
Uranus 	31518 		18.14		dist_size
Venus 	7521 		0.28		dist_size
1 ACCEPTED SOLUTION

Accepted Solutions
tarheel13
Rhodochrosite | Level 12

you need to put a length on data_origin. 

tarheel13_0-1669510820996.png

it will truncate the value without a proper length statement. try running this: 

data work.planet_size_dist;
   length data_origin $9;
   merge work.planet_size(in=s)
      work.planet_dist(in=d);
   by planet;
   if s and not d then data_origin = 'dist';
   else if d and not s then data_origin = 'size';
   else if s and d then data_origin = 'dist_size';
proc print;
run;

 

View solution in original post

2 REPLIES 2
tarheel13
Rhodochrosite | Level 12

you need to put a length on data_origin. 

tarheel13_0-1669510820996.png

it will truncate the value without a proper length statement. try running this: 

data work.planet_size_dist;
   length data_origin $9;
   merge work.planet_size(in=s)
      work.planet_dist(in=d);
   by planet;
   if s and not d then data_origin = 'dist';
   else if d and not s then data_origin = 'size';
   else if s and d then data_origin = 'dist_size';
proc print;
run;

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 430 views
  • 2 likes
  • 2 in conversation