Hello All,
I have two datasets with a common variable -number, I am trying to merge them.
Datset1-
Id Number place
1 20 A
2 25 B
3 30 C
Dataset2-
Number country
20 US
20 US
20 US
21 CA
22 US
25 UK
What I would like to have a data set where the number is common
Dataset - Want
Id Number place Country
1 20 A US
1 20 US
1 20 US
2 25 B UK
I tried using the following statement and got the following output
Statement:
data want;
set dataset1(in=a) dataset2(in=b);
by number;
if a and b then output;
run;
Output -Want
Id Number place Country
1 20 A US
1 20 A US
1 20 A US
1 20 A US
2 25 B UK
Can anyone help?
Please watch the spelling of things like data set names.
If at all possible provide data in the form of a data step and paste code into a code box opened on the forum with the </> icon. Text pasted into the main message boxes will get reformatted by the forum and may result in "code" that no longer works when someone else copies and tries to use it, such as testing code against your data.
Consider:
data Dataset1; input Id Number place $; datalines; 1 20 A 2 25 B 3 30 C ; data Dataset2; input number country $; datalines; 20 US 20 US 20 US 21 CA 22 US 25 UK ; data want; merge dataset1 (in=in1) dataset2 (in=in2) ; by number; if not first.number then call missing(place); if in1 and in2; run;
You didn't want a SET statement, which appends or stacks data sets, MERGE combines them row by row.
The BY number now tries to align values based on the value of number.
The "if in1 and in2" only keeps records that appear with values from both sets (i.e. matching Number).
Since you have a peculiar requirement to have Place missing when multiple number matches occur the that is done with the call missing to set the Place to missing except for the first value of each number match.
When you use BY in a data step SAS provides automatic variables First. and Last. (please notice the dots) that are used to tell if a specific row has the First or last of each by group value. They are true/false boolean values that can be used as shown.
If you have multiple values of number in both sets then you have many-to-many merge and it is very likely that a data step merge won't get what you want.
Please watch the spelling of things like data set names.
If at all possible provide data in the form of a data step and paste code into a code box opened on the forum with the </> icon. Text pasted into the main message boxes will get reformatted by the forum and may result in "code" that no longer works when someone else copies and tries to use it, such as testing code against your data.
Consider:
data Dataset1; input Id Number place $; datalines; 1 20 A 2 25 B 3 30 C ; data Dataset2; input number country $; datalines; 20 US 20 US 20 US 21 CA 22 US 25 UK ; data want; merge dataset1 (in=in1) dataset2 (in=in2) ; by number; if not first.number then call missing(place); if in1 and in2; run;
You didn't want a SET statement, which appends or stacks data sets, MERGE combines them row by row.
The BY number now tries to align values based on the value of number.
The "if in1 and in2" only keeps records that appear with values from both sets (i.e. matching Number).
Since you have a peculiar requirement to have Place missing when multiple number matches occur the that is done with the call missing to set the Place to missing except for the first value of each number match.
When you use BY in a data step SAS provides automatic variables First. and Last. (please notice the dots) that are used to tell if a specific row has the First or last of each by group value. They are true/false boolean values that can be used as shown.
If you have multiple values of number in both sets then you have many-to-many merge and it is very likely that a data step merge won't get what you want.
Thank you for your reply.
I will keep in mind and use the </> icon.
In my real dataset, I have many variables like "place". Can you please advice how should I go about that?
@ihtishamsultan wrote:
Thank you for your reply.
I will keep in mind and use the </> icon.
In my real dataset, I have many variables like "place". Can you please advice how should I go about that?
Are they all supposed to be set missing the same way? Then place all the variable names separated by commas like Call Missing(place, othervar, thatvar).
Call missing is one of the few functions that will handle multiple variables of different types, so you can mix numeric and character functions.
If you need something else done with other variables then you will need to provide example data and rules for what is to be done.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.