Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- Programming
- /
- Re: Merge statement giving wrong result

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 01-16-2019 02:46 AM
(2004 views)

I have two tables-

I want to remove all the year+enrolids combinations which are common in both tables. Table1 has more observations than table2.

code-

```
data merge;
merge table1( in=a) table2(in=b);
by year enrolid;
if a and not b then output;
run;
```

Now the merge dataset should have observation count equal to count of table1 - count of table2. but that isnt the case.

Is there anything wrong with the code?

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi @riyaaora275,

Let's look at *one* BY group, i.e., one YEAR-ENROLID combination which may occur *n* times in TABLE1 and *k* times in TABLE2 (*n, k* >= 0). Using your selection criterion (which looks reasonable), the number of observations with this particular YEAR-ENROLID combination in the output dataset will be either

*n*(if*k =*0) or- 0 (if
*k >*0).

Now you see that for this generic BY group alone your expected number of observations, *n − k*, will equal the actual number only if *k = *0 or *n = k*. Given that the total number of observations in the output dataset is simply the sum of the numbers in all BY groups, it's clear that this observation count can differ from "count of table1 *−* count of table2" in both directions. Kurt Bremser has shown an example of the case *n =* 0, *k =* 1 and Astounding has mentioned the cases *1 = n < k* and *n > k = 1* ("many-to-one match"). Cynthia's examples cover all possible cases with *n, k <=* 1.

13 REPLIES 13

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You most probably have a combination in table2 that is not present in table1, so that will up-end your calculation:

```
data table1;
input year enrolid;
cards;
2017 1
2018 1
2019 1
2017 2
2017 3
2018 3
;
run;
data table2;
input year enrolid;
cards;
2019 1
2018 4
;
run;
proc sort data=table1;
by year enrolid;
run;
proc sort data=table2;
by year enrolid;
run;
data merged;
merge table1( in=a) table2(in=b);
by year enrolid;
if a and not b then output;
run;
```

since "2018 4" is not present in table one, only one of the observations in table2 results in a delete.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

This statement is incorrect if you want to remove the matches:

`if a and not b then output;`

Instead, use:

`if a and b then delete;`

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@Astounding wrote:

This statement is incorrect if you want to remove the matches:

`if a and not b then output;`

Instead, use:

`if a and b then delete;`

But that would also keep all observations where only b is true.

If you want to use the delete statement, you'd need

```
if a;
if b then delete;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I stand by my original post. I know you know this, so perhaps it's time to take a coffee break?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I see. Your suggestion satisfies

"remove all the year+enrolids combinations which are common in both tables"

But it also does not satisfy

"observation count equal to count of table1 - count of table2"

when you apply it to my example data.

(both quotes from the original post)

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi:

For the benefit of the original poster. With 2 datasets going into the MERGE and understanding that SAS does not really do "MANY-to-MANY merges, you have 5 possible output datasets, as shown below:

Hope this helps put the MERGE in perspective and gives you some ideas about how to fix your MERGE. Without data from you, and a clear explanation of the rules for what you want, it is very hard for others to help you.

Cynthia

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You mean I should give some sample data ? Because I cannot give the datafiles.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Creating example data for testing with a data step is an essential SAS skill.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@Astounding wrote:

This statement is incorrect if you want to remove the matches:

`if a and not b then output;`

Instead, use:

`if a and b then delete;`

Or just use a subsetting IF

` if a^=b;`

which can only be true when the observation originates from a single data set.

--------------------------

The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for

Allow PROC SORT to output multiple datasets

--------------------------

The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for

Allow PROC SORT to output multiple datasets

--------------------------

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Both of these give the same number of observations.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi @riyaaora275,

Let's look at *one* BY group, i.e., one YEAR-ENROLID combination which may occur *n* times in TABLE1 and *k* times in TABLE2 (*n, k* >= 0). Using your selection criterion (which looks reasonable), the number of observations with this particular YEAR-ENROLID combination in the output dataset will be either

*n*(if*k =*0) or- 0 (if
*k >*0).

Now you see that for this generic BY group alone your expected number of observations, *n − k*, will equal the actual number only if *k = *0 or *n = k*. Given that the total number of observations in the output dataset is simply the sum of the numbers in all BY groups, it's clear that this observation count can differ from "count of table1 *−* count of table2" in both directions. Kurt Bremser has shown an example of the case *n =* 0, *k =* 1 and Astounding has mentioned the cases *1 = n < k* and *n > k = 1* ("many-to-one match"). Cynthia's examples cover all possible cases with *n, k <=* 1.

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.