BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
riyaaora275
Obsidian | Level 7

I have two tables-

I want to remove all the year+enrolids combinations which are common in both tables. Table1 has more observations than table2.

 

code-

data merge;
merge table1( in=a) table2(in=b);
by year enrolid;
if a and not b then output;
run;

Now the merge dataset should have observation count equal to count of table1 - count of table2. but that isnt the case.

Is there anything wrong with the code?  

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @riyaaora275,

 

Let's look at one BY group, i.e., one YEAR-ENROLID combination which may occur n times in TABLE1 and k times in TABLE2 (n, k >= 0). Using your selection criterion (which looks reasonable), the number of observations with this particular YEAR-ENROLID combination in the output dataset will be either

  • n (if k = 0) or
  • 0 (if k > 0).

Now you see that for this generic BY group alone your expected number of observations, n − k, will equal the actual number only if k = 0 or n = k. Given that the total number of observations in the output dataset is simply the sum of the numbers in all BY groups, it's clear that this observation count can differ from "count of table1 count of table2" in both directions. Kurt Bremser has shown an example of the case n = 0, k = 1 and Astounding has mentioned the cases 1 = n < k and n > k = 1 ("many-to-one match"). Cynthia's examples cover all possible cases with n, k <= 1.

View solution in original post

13 REPLIES 13
Kurt_Bremser
Super User

You most probably have a combination in table2 that is not present in table1, so that will up-end your calculation:

data table1;
input year enrolid;
cards;
2017 1
2018 1
2019 1
2017 2
2017 3
2018 3
;
run;

data table2;
input year enrolid;
cards;
2019 1
2018 4
;
run;

proc sort data=table1;
by year enrolid;
run;

proc sort data=table2;
by year enrolid;
run;

data merged;
merge table1( in=a) table2(in=b);
by year enrolid;
if a and not b then output;
run;

since "2018 4" is not present in table one, only one of the observations in table2 results in a delete.

Astounding
PROC Star

This statement is incorrect if you want to remove the matches:

 

if a and not b then output;

 

 

Instead, use:

 

if a and b then delete;

 

Kurt_Bremser
Super User

@Astounding wrote:

This statement is incorrect if you want to remove the matches:

 

if a and not b then output;

 

 

Instead, use:

 

if a and b then delete;

 


But that would also keep all observations where only b is true.

If you want to use the delete statement, you'd need

if a;
if b then delete;
Astounding
PROC Star

@Kurt_Bremser,

 

I stand by my original post.  I know you know this, so perhaps it's time to take a coffee break?

Kurt_Bremser
Super User

I see. Your suggestion satisfies

"remove all the year+enrolids combinations which are common in both tables"

But it also does not satisfy

"observation count equal to count of table1 - count of table2"

when you apply it to my example data.

(both quotes from the original post)

 

Astounding
PROC Star

If only I could write a program that would revise someone's expectations!  Alas, I'm limited to writing programs that get the correct result, and then educating to influence the expectations.  So far, the issue of mismatches has come up, but there is also the possibility of a many-to-one match.

Cynthia_sas
SAS Super FREQ

Hi:

  For the benefit of the original poster. With 2 datasets going into the MERGE and understanding that SAS does not really do "MANY-to-MANY merges, you have 5 possible output datasets, as shown below:

_5_possible_outputs.png

Hope this helps put the MERGE in perspective and gives you some ideas about how to fix your MERGE. Without data from you, and a clear explanation of the rules for what you want, it is very hard for others to help you.

 

Cynthia

riyaaora275
Obsidian | Level 7
You mean I should give some sample data ? Because I cannot give the datafiles.
mkeintz
PROC Star

@Astounding wrote:

This statement is incorrect if you want to remove the matches:

 

if a and not b then output;

 

 

Instead, use:

 

if a and b then delete;

 


Or just use a subsetting IF

  if a^=b;

which can only be true when the observation originates from a single data set.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
riyaaora275
Obsidian | Level 7

Both of these give the same number of observations.

FreelanceReinh
Jade | Level 19

Hi @riyaaora275,

 

Let's look at one BY group, i.e., one YEAR-ENROLID combination which may occur n times in TABLE1 and k times in TABLE2 (n, k >= 0). Using your selection criterion (which looks reasonable), the number of observations with this particular YEAR-ENROLID combination in the output dataset will be either

  • n (if k = 0) or
  • 0 (if k > 0).

Now you see that for this generic BY group alone your expected number of observations, n − k, will equal the actual number only if k = 0 or n = k. Given that the total number of observations in the output dataset is simply the sum of the numbers in all BY groups, it's clear that this observation count can differ from "count of table1 count of table2" in both directions. Kurt Bremser has shown an example of the case n = 0, k = 1 and Astounding has mentioned the cases 1 = n < k and n > k = 1 ("many-to-one match"). Cynthia's examples cover all possible cases with n, k <= 1.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1745 views
  • 0 likes
  • 6 in conversation