SAS Programming

noobnbrnoobest · Posted 06-12-2023 07:12 AM

Hi everybody!

Suppose I have three tables named TABLE1, CODES1 and CODES:

----------------------------------------------------

TABLE1

name, category_broad, category_exact
BANANA, 1 (foods), 101 (fruits)
APPLE, 1 (foods), 101 (fruits)
CAR, 2 (things), 201 (cars)
HOUSE, 2 (things), 202 (houses)
BALL, 2 (things), 203 (balls)
BREAD, 1 (foods), 102 (breads)

----------------------------

CODES1

category_broad, code
1 (foods), 34
2 (things), 35

----------------------------

CODES2

category_exact, code
101 (fruits), 463
102 (breads), 464
202 (houses), 564
203 (balls), 565

----------------------------------------------------

What I would like to achieve is two merges:
-first merge where if the category_broad from codes1 matches with category_broad from table1, then the code is added to the row that matched, and only matched rows are included.
-second merge where if the category_exact from codes2 matches with category_exact from table1, then the code is added to the row that matched, and only matched rows are included. So for example the row with "car" would be left out from the merge with codes2 table.

So the desired resulting tables would be:

TABLE1xCODES1

name, category_broad, category_exact, code
BANANA, 1 (foods), 101 (fruits), 34
APPLE, 1 (foods), 101 (fruits), 34
CAR, 2 (things), 201 (cars), 35
HOUSE, 2 (things), 202 (houses), 35
BALL, 2 (things), 203 (balls), 35
BREAD, 1 (foods), 102 (breads), 34

TABLE1xCODES2

name, category_broad, category_exact, code
BANANA, 1 (foods), 101 (fruits), 463
APPLE, 1 (foods), 101 (fruits), 463
HOUSE, 2 (things), 202 (houses), 564
BALL, 2 (things), 203 (balls), 565
BREAD, 1 (foods), 102 (breads), 464

I know this should not be too difficult but I am having trouble in achieving this.
Regarding the second merge, I have so far tried using this:

-------------------------------------
proc sort data=table1;

by category_exact;

run;

proc sort data=codes2;

by category_exact;

run;

data merged;

merge table1 codes2;

by category_exact;

run;

-------------------------------------------

The problem with this approach is that it brings to the resulting table also the row that did not match, which is the CAR row. And in that row, the value for column "code" is missing. This is something I don't understand regarding the 'merge by' option in SAS, since the by-variable did not match there. What is the correct way to solve this? Thank you so much in advance! 🙂

PaigeMiller · Posted 06-12-2023 07:27 AM

@noobnbrnoobest wrote:

Hi everybody!

Suppose I have three tables named TABLE1, CODES1 and CODES:

----------------------------------------------------

TABLE1

name, category_broad, category_exact
BANANA, 1 (foods), 101 (fruits)
APPLE, 1 (foods), 101 (fruits)
CAR, 2 (things), 201 (cars)
HOUSE, 2 (things), 202 (houses)
BALL, 2 (things), 203 (balls)
BREAD, 1 (foods), 102 (breads)

----------------------------

CODES1

category_broad, code
1 (foods), 34
2 (things), 35

----------------------------

CODES2

category_exact, code
101 (fruits), 463
102 (breads), 464
202 (houses), 564
203 (balls), 565

----------------------------------------------------

What I would like to achieve is two merges:
-first merge where if the category_broad from codes1 matches with category_broad from table1, then the code is added to the row that matched, and only matched rows are included.
-second merge where if the category_exact from codes2 matches with category_exact from table1, then the code is added to the row that matched, and only matched rows are included. So for example the row with "car" would be left out from the merge with codes2 table.

So the desired resulting tables would be:

TABLE1xCODES1

name, category_broad, category_exact, code
BANANA, 1 (foods), 101 (fruits), 34
APPLE, 1 (foods), 101 (fruits), 34
CAR, 2 (things), 201 (cars), 35
HOUSE, 2 (things), 202 (houses), 35
BALL, 2 (things), 203 (balls), 35
BREAD, 1 (foods), 102 (breads), 34

TABLE1xCODES2

name, category_broad, category_exact, code
BANANA, 1 (foods), 101 (fruits), 463
APPLE, 1 (foods), 101 (fruits), 463
HOUSE, 2 (things), 202 (houses), 564
BALL, 2 (things), 203 (balls), 565
BREAD, 1 (foods), 102 (breads), 464

I know this should not be too difficult but I am having trouble in achieving this.
Regarding the second merge, I have so far tried using this:

-------------------------------------
proc sort data=table1;

by category_exact;

run;

proc sort data=codes2;

by category_exact;

run;

data merged;

merge table1 codes2;

by category_exact;

run;

-------------------------------------------

The problem with this approach is that it brings to the resulting table also the row that did not match, which is the CAR row. And in that row, the value for column "code" is missing. This is something I don't understand regarding the 'merge by' option in SAS, since the by-variable did not match there. What is the correct way to solve this? Thank you so much in advance! 🙂

data merged;
    merge table1(in=in1) codes2(in=in2);
    by category_exact;
    if in1 and in2;
run;

--
Paige Miller

View solution in original post

PaigeMiller · Posted 06-12-2023 07:27 AM

@noobnbrnoobest wrote:

Hi everybody!

Suppose I have three tables named TABLE1, CODES1 and CODES:

----------------------------------------------------

TABLE1

name, category_broad, category_exact
BANANA, 1 (foods), 101 (fruits)
APPLE, 1 (foods), 101 (fruits)
CAR, 2 (things), 201 (cars)
HOUSE, 2 (things), 202 (houses)
BALL, 2 (things), 203 (balls)
BREAD, 1 (foods), 102 (breads)

----------------------------

CODES1

category_broad, code
1 (foods), 34
2 (things), 35

----------------------------

CODES2

category_exact, code
101 (fruits), 463
102 (breads), 464
202 (houses), 564
203 (balls), 565

----------------------------------------------------

What I would like to achieve is two merges:
-first merge where if the category_broad from codes1 matches with category_broad from table1, then the code is added to the row that matched, and only matched rows are included.
-second merge where if the category_exact from codes2 matches with category_exact from table1, then the code is added to the row that matched, and only matched rows are included. So for example the row with "car" would be left out from the merge with codes2 table.

So the desired resulting tables would be:

TABLE1xCODES1

name, category_broad, category_exact, code
BANANA, 1 (foods), 101 (fruits), 34
APPLE, 1 (foods), 101 (fruits), 34
CAR, 2 (things), 201 (cars), 35
HOUSE, 2 (things), 202 (houses), 35
BALL, 2 (things), 203 (balls), 35
BREAD, 1 (foods), 102 (breads), 34

TABLE1xCODES2

name, category_broad, category_exact, code
BANANA, 1 (foods), 101 (fruits), 463
APPLE, 1 (foods), 101 (fruits), 463
HOUSE, 2 (things), 202 (houses), 564
BALL, 2 (things), 203 (balls), 565
BREAD, 1 (foods), 102 (breads), 464

I know this should not be too difficult but I am having trouble in achieving this.
Regarding the second merge, I have so far tried using this:

-------------------------------------
proc sort data=table1;

by category_exact;

run;

proc sort data=codes2;

by category_exact;

run;

data merged;

merge table1 codes2;

by category_exact;

run;

-------------------------------------------

The problem with this approach is that it brings to the resulting table also the row that did not match, which is the CAR row. And in that row, the value for column "code" is missing. This is something I don't understand regarding the 'merge by' option in SAS, since the by-variable did not match there. What is the correct way to solve this? Thank you so much in advance! 🙂

data merged;
    merge table1(in=in1) codes2(in=in2);
    by category_exact;
    if in1 and in2;
run;

--
Paige Miller

noobnbrnoobest · Posted 06-12-2023 07:31 AM

Thank you, this looks like the way to solve this once and for all!

PaigeMiller · Posted 06-12-2023 08:03 AM

There are many data set options that can be used in the SET and MERGE statements which may come in handy at some time in the future. You might want to bookmark this list of data set options

https://documentation.sas.com/doc/en/pgmmvacdc/9.4/ledsoptsref/p1pczmnhbq4axpn1l15s9mk6mobp.htm

--
Paige Miller

noobnbrnoobest · Posted 06-12-2023 07:28 AM

Well this was quickly resolved with this great documentation:
https://support.sas.com/resources/papers/proceedings18/1746-2018.pdf
(at "EXAMPLE 3: UNMATCHED BY VALUE")

Turned out there's a way to specify that the other table will have all the distinct possible values for the by-variable:

data table1xcodes2;
merge codes2 (in=isfound) table1;
by category_exact;
if isfound;
run;

EDIT: What finally helped me understand this code was the explanation, that the "(in=variable)" -option creates a temporary variable, that gets values 0 and 1 for all rows. Then when it is called in the if-statement it does the magic.
"Any records with a value of [the by-variable "category_exact"] that did not appear in the [table1] dataset will have a value of 0 for [isfound]"

SAS Programming

How to merge tables, returning only the rows that matched in the by variable?

Re: How to merge tables, returning only the rows that matched in the by variable?

Re: How to merge tables, returning only the rows that matched in the by variable?

Re: How to merge tables, returning only the rows that matched in the by variable?

Re: How to merge tables, returning only the rows that matched in the by variable?

Re: How to merge tables, returning only the rows that matched in the by variable?

Follow Us

What is...

SAS Programming

How to merge tables, returning only the rows that matched in the by variable?

Re: How to merge tables, returning only the rows that matched in the by variable?

Re: How to merge tables, returning only the rows that matched in the by variable?

Re: How to merge tables, returning only the rows that matched in the by variable?

Re: How to merge tables, returning only the rows that matched in the by variable?

Re: How to merge tables, returning only the rows that matched in the by variable?

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...