Hi everybody!
Suppose I have three tables named TABLE1, CODES1 and CODES:
----------------------------------------------------
TABLE1
name, category_broad, category_exact
BANANA, 1 (foods), 101 (fruits)
APPLE, 1 (foods), 101 (fruits)
CAR, 2 (things), 201 (cars)
HOUSE, 2 (things), 202 (houses)
BALL, 2 (things), 203 (balls)
BREAD, 1 (foods), 102 (breads)
----------------------------
CODES1
category_broad, code
1 (foods), 34
2 (things), 35
----------------------------
CODES2
category_exact, code
101 (fruits), 463
102 (breads), 464
202 (houses), 564
203 (balls), 565
----------------------------------------------------
What I would like to achieve is two merges:
-first merge where if the category_broad from codes1 matches with category_broad from table1, then the code is added to the row that matched, and only matched rows are included.
-second merge where if the category_exact from codes2 matches with category_exact from table1, then the code is added to the row that matched, and only matched rows are included. So for example the row with "car" would be left out from the merge with codes2 table.
So the desired resulting tables would be:
TABLE1xCODES1
name, category_broad, category_exact, code
BANANA, 1 (foods), 101 (fruits), 34
APPLE, 1 (foods), 101 (fruits), 34
CAR, 2 (things), 201 (cars), 35
HOUSE, 2 (things), 202 (houses), 35
BALL, 2 (things), 203 (balls), 35
BREAD, 1 (foods), 102 (breads), 34
TABLE1xCODES2
name, category_broad, category_exact, code
BANANA, 1 (foods), 101 (fruits), 463
APPLE, 1 (foods), 101 (fruits), 463
HOUSE, 2 (things), 202 (houses), 564
BALL, 2 (things), 203 (balls), 565
BREAD, 1 (foods), 102 (breads), 464
I know this should not be too difficult but I am having trouble in achieving this.
Regarding the second merge, I have so far tried using this:
-------------------------------------
proc sort data=table1;
by category_exact;
run;
proc sort data=codes2;
by category_exact;
run;
data merged;
merge table1 codes2;
by category_exact;
run;
-------------------------------------------
The problem with this approach is that it brings to the resulting table also the row that did not match, which is the CAR row. And in that row, the value for column "code" is missing. This is something I don't understand regarding the 'merge by' option in SAS, since the by-variable did not match there. What is the correct way to solve this? Thank you so much in advance! 🙂
@noobnbrnoobest wrote:
Hi everybody!
Suppose I have three tables named TABLE1, CODES1 and CODES:
----------------------------------------------------
TABLE1
name, category_broad, category_exact
BANANA, 1 (foods), 101 (fruits)
APPLE, 1 (foods), 101 (fruits)
CAR, 2 (things), 201 (cars)
HOUSE, 2 (things), 202 (houses)
BALL, 2 (things), 203 (balls)
BREAD, 1 (foods), 102 (breads)----------------------------
CODES1
category_broad, code
1 (foods), 34
2 (things), 35----------------------------
CODES2
category_exact, code
101 (fruits), 463
102 (breads), 464
202 (houses), 564
203 (balls), 565----------------------------------------------------
What I would like to achieve is two merges:
-first merge where if the category_broad from codes1 matches with category_broad from table1, then the code is added to the row that matched, and only matched rows are included.
-second merge where if the category_exact from codes2 matches with category_exact from table1, then the code is added to the row that matched, and only matched rows are included. So for example the row with "car" would be left out from the merge with codes2 table.
So the desired resulting tables would be:
TABLE1xCODES1name, category_broad, category_exact, code
BANANA, 1 (foods), 101 (fruits), 34
APPLE, 1 (foods), 101 (fruits), 34
CAR, 2 (things), 201 (cars), 35
HOUSE, 2 (things), 202 (houses), 35
BALL, 2 (things), 203 (balls), 35
BREAD, 1 (foods), 102 (breads), 34
TABLE1xCODES2name, category_broad, category_exact, code
BANANA, 1 (foods), 101 (fruits), 463
APPLE, 1 (foods), 101 (fruits), 463
HOUSE, 2 (things), 202 (houses), 564
BALL, 2 (things), 203 (balls), 565
BREAD, 1 (foods), 102 (breads), 464
I know this should not be too difficult but I am having trouble in achieving this.
Regarding the second merge, I have so far tried using this:
-------------------------------------
proc sort data=table1;by category_exact;
run;
proc sort data=codes2;
by category_exact;
run;
data merged;
merge table1 codes2;
by category_exact;
run;
-------------------------------------------
The problem with this approach is that it brings to the resulting table also the row that did not match, which is the CAR row. And in that row, the value for column "code" is missing. This is something I don't understand regarding the 'merge by' option in SAS, since the by-variable did not match there. What is the correct way to solve this? Thank you so much in advance! 🙂
data merged;
merge table1(in=in1) codes2(in=in2);
by category_exact;
if in1 and in2;
run;
@noobnbrnoobest wrote:
Hi everybody!
Suppose I have three tables named TABLE1, CODES1 and CODES:
----------------------------------------------------
TABLE1
name, category_broad, category_exact
BANANA, 1 (foods), 101 (fruits)
APPLE, 1 (foods), 101 (fruits)
CAR, 2 (things), 201 (cars)
HOUSE, 2 (things), 202 (houses)
BALL, 2 (things), 203 (balls)
BREAD, 1 (foods), 102 (breads)----------------------------
CODES1
category_broad, code
1 (foods), 34
2 (things), 35----------------------------
CODES2
category_exact, code
101 (fruits), 463
102 (breads), 464
202 (houses), 564
203 (balls), 565----------------------------------------------------
What I would like to achieve is two merges:
-first merge where if the category_broad from codes1 matches with category_broad from table1, then the code is added to the row that matched, and only matched rows are included.
-second merge where if the category_exact from codes2 matches with category_exact from table1, then the code is added to the row that matched, and only matched rows are included. So for example the row with "car" would be left out from the merge with codes2 table.
So the desired resulting tables would be:
TABLE1xCODES1name, category_broad, category_exact, code
BANANA, 1 (foods), 101 (fruits), 34
APPLE, 1 (foods), 101 (fruits), 34
CAR, 2 (things), 201 (cars), 35
HOUSE, 2 (things), 202 (houses), 35
BALL, 2 (things), 203 (balls), 35
BREAD, 1 (foods), 102 (breads), 34
TABLE1xCODES2name, category_broad, category_exact, code
BANANA, 1 (foods), 101 (fruits), 463
APPLE, 1 (foods), 101 (fruits), 463
HOUSE, 2 (things), 202 (houses), 564
BALL, 2 (things), 203 (balls), 565
BREAD, 1 (foods), 102 (breads), 464
I know this should not be too difficult but I am having trouble in achieving this.
Regarding the second merge, I have so far tried using this:
-------------------------------------
proc sort data=table1;by category_exact;
run;
proc sort data=codes2;
by category_exact;
run;
data merged;
merge table1 codes2;
by category_exact;
run;
-------------------------------------------
The problem with this approach is that it brings to the resulting table also the row that did not match, which is the CAR row. And in that row, the value for column "code" is missing. This is something I don't understand regarding the 'merge by' option in SAS, since the by-variable did not match there. What is the correct way to solve this? Thank you so much in advance! 🙂
data merged;
merge table1(in=in1) codes2(in=in2);
by category_exact;
if in1 and in2;
run;
There are many data set options that can be used in the SET and MERGE statements which may come in handy at some time in the future. You might want to bookmark this list of data set options
https://documentation.sas.com/doc/en/pgmmvacdc/9.4/ledsoptsref/p1pczmnhbq4axpn1l15s9mk6mobp.htm
Well this was quickly resolved with this great documentation:
https://support.sas.com/resources/papers/proceedings18/1746-2018.pdf
(at "EXAMPLE 3: UNMATCHED BY VALUE")
Turned out there's a way to specify that the other table will have all the distinct possible values for the by-variable:
data table1xcodes2;
merge codes2 (in=isfound) table1;
by category_exact;
if isfound;
run;
EDIT: What finally helped me understand this code was the explanation, that the "(in=variable)" -option creates a temporary variable, that gets values 0 and 1 for all rows. Then when it is called in the if-statement it does the magic.
"Any records with a value of [the by-variable "category_exact"] that did not appear in the [table1] dataset will have a value of 0 for [isfound]"
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.