<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: The number of obs in datasets do not add up after MERGE in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/The-number-of-obs-in-datasets-do-not-add-up-after-MERGE/m-p/676960#M204153</link>
    <description>&lt;P&gt;I can't see the numbers that you are looking at. If you would like feedback ...&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Post the number of observations you start with from each source (after you had removed duplicates), and the PROC FREQ results.&lt;/P&gt;</description>
    <pubDate>Sat, 15 Aug 2020 12:43:40 GMT</pubDate>
    <dc:creator>Astounding</dc:creator>
    <dc:date>2020-08-15T12:43:40Z</dc:date>
    <item>
      <title>The number of obs in datasets do not add up after MERGE</title>
      <link>https://communities.sas.com/t5/SAS-Programming/The-number-of-obs-in-datasets-do-not-add-up-after-MERGE/m-p/676651#M204044</link>
      <description>&lt;P&gt;This might be a simple task but I could not figure it out. I am trying to merge 3 datasets together. if they dont merge by the variables, then split out to a nomatch dataset. Theoretically, the matches and no-matches number should add up to the number of obs in the datasets. Let's look at my code for better understanding&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data MATCH
	nomatch_A
	nomatch_B
	nomatch_C;

	merge data_A(in=A) 
		data_B(in=B)
		data_C(in=C);
	by var1 var2 var3 var4;
	if A and B and C then output MATCH;
	else if A and (not B or not C) then output nomatch_A;
	else if B and (not C or not A)  then output nomatch_B;
	else if C and (not A or not B) then output nomatch_C;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;so if A has 1,000 obs and there are 800 obs in the dataset MATCH, then nomatch_A should have 200 obs. However, I got a number less than 200. Why? where did the other go?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 14 Aug 2020 07:25:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/The-number-of-obs-in-datasets-do-not-add-up-after-MERGE/m-p/676651#M204044</guid>
      <dc:creator>somebody</dc:creator>
      <dc:date>2020-08-14T07:25:11Z</dc:date>
    </item>
    <item>
      <title>Re: The number of obs in datasets do not add up after MERGE</title>
      <link>https://communities.sas.com/t5/SAS-Programming/The-number-of-obs-in-datasets-do-not-add-up-after-MERGE/m-p/676657#M204046</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I think there is a problem with the logic behind datasets nomatch_B and nomatch_C in that they are only populated&lt;/P&gt;
&lt;P&gt;if the preceeding conditions are not matched.&lt;/P&gt;
&lt;P&gt;For example, if an observation is in A and B but not in C, then it will not appear in nomatch_B.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Try (no data =&amp;gt; not tested) :&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;if A and B and C then output MATCH;
else do;
    if A then output nomatch_A;
    if B then output nomatch_B;
    if C then output nomatch_C;
end;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 14 Aug 2020 08:29:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/The-number-of-obs-in-datasets-do-not-add-up-after-MERGE/m-p/676657#M204046</guid>
      <dc:creator>gamotte</dc:creator>
      <dc:date>2020-08-14T08:29:18Z</dc:date>
    </item>
    <item>
      <title>Re: The number of obs in datasets do not add up after MERGE</title>
      <link>https://communities.sas.com/t5/SAS-Programming/The-number-of-obs-in-datasets-do-not-add-up-after-MERGE/m-p/676707#M204066</link>
      <description>&lt;P&gt;I would recommend adjusting how you think about this problem.&amp;nbsp; Let's look at an approach first, and discuss a couple of the pitfalls later.&amp;nbsp; Consider:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
	merge data_A(in=A) 
		data_B(in=B)
		data_C(in=C);
	by var1 var2 var3 var4;
	match_pattern = '---';
	if A then substr(match_pattern, 1, 1) = 'A';
	if B then substr(match_pattern, 2, 1) = 'B';
	if C then substr(match_pattern, 3, 1) = 'C';
run;

proc freq data=want;
   tables match_pattern;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;This creates a single variable that shows, for each observation, where it came from.&amp;nbsp; That should be sufficient for you to determine what should happen next.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Some of the issues may involve your expectations of what the number of observations should be.&amp;nbsp; To explore that, consider a simplified case where DATA_A and DATA_B both contain 10 observations, and we're merging by just one variable named ID.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data test1;
   merge data_A data_B;
   by ID;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The number of observations generated could be anywhere from 10 to 20, depending on the number of matches and mismatches.&amp;nbsp; If you allow that either data set might contain duplicate values for the same ID, the complexity increases:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data test2;
   merge data_A (in=A) data_B;
   by ID;
   if A;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Now the number of observations generated must be at least 10, but could be as many as 19.&amp;nbsp; The 19 would occur of DATA_B contains just a single ID, repeated 10 times, that matches one of the IDs in DATA_A.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If both DATA_A and DATA_B could contain duplicate observations for the same ID, the problem gets so complex that SAS issues a warning in the log ... something along the lines of "more than one data set contains multiple observations for ID".&amp;nbsp; So be sure to check the log for that type of message.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;At any rate, the first program should at least give you a chance to figure this one out.&lt;/P&gt;</description>
      <pubDate>Fri, 14 Aug 2020 12:31:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/The-number-of-obs-in-datasets-do-not-add-up-after-MERGE/m-p/676707#M204066</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2020-08-14T12:31:25Z</dc:date>
    </item>
    <item>
      <title>Re: The number of obs in datasets do not add up after MERGE</title>
      <link>https://communities.sas.com/t5/SAS-Programming/The-number-of-obs-in-datasets-do-not-add-up-after-MERGE/m-p/676752#M204077</link>
      <description>&lt;P&gt;I would suggest creating non-temporary variables with the values of the IN variables and you can examine explicitly which combinations of values are appearing "incorrectly" in which set and adjust logic accordingly.&lt;/P&gt;</description>
      <pubDate>Fri, 14 Aug 2020 14:53:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/The-number-of-obs-in-datasets-do-not-add-up-after-MERGE/m-p/676752#M204077</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2020-08-14T14:53:32Z</dc:date>
    </item>
    <item>
      <title>Re: The number of obs in datasets do not add up after MERGE</title>
      <link>https://communities.sas.com/t5/SAS-Programming/The-number-of-obs-in-datasets-do-not-add-up-after-MERGE/m-p/676787#M204090</link>
      <description>&lt;P&gt;The answer lies in how you set up your clauses&lt;/P&gt;
&lt;P&gt;if A and (not B or not C)&amp;nbsp; translates to if A =1 and (B=0 or C=0)&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; possible outcomes&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; A=1 B=1 C=1&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;A=1 B=0 C=1&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;A=1 B=1 C=0&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;A=1 B=0 C=0&lt;/P&gt;
&lt;P&gt;if your objective is to truly capture the fourth case then the condition becomes A=1 and (B=0 and C=0) and the counts will add up&lt;/P&gt;
&lt;P&gt;The second and third conditions are captured by&amp;nbsp;&lt;/P&gt;
&lt;P&gt;if B and (Not A or Not C), if C and (Not A or Not B) so the datasets are not mutually exclusive and hence the row count does not add up.&lt;/P&gt;</description>
      <pubDate>Fri, 14 Aug 2020 16:34:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/The-number-of-obs-in-datasets-do-not-add-up-after-MERGE/m-p/676787#M204090</guid>
      <dc:creator>smantha</dc:creator>
      <dc:date>2020-08-14T16:34:27Z</dc:date>
    </item>
    <item>
      <title>Re: The number of obs in datasets do not add up after MERGE</title>
      <link>https://communities.sas.com/t5/SAS-Programming/The-number-of-obs-in-datasets-do-not-add-up-after-MERGE/m-p/676896#M204117</link>
      <description>&lt;P&gt;BTW, this is one of the nice features of the SUBSTR function - it can be put on the left of the equals sign in an assignment statement.&amp;nbsp; Just the ticket here.&lt;/P&gt;</description>
      <pubDate>Fri, 14 Aug 2020 21:27:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/The-number-of-obs-in-datasets-do-not-add-up-after-MERGE/m-p/676896#M204117</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2020-08-14T21:27:19Z</dc:date>
    </item>
    <item>
      <title>Re: The number of obs in datasets do not add up after MERGE</title>
      <link>https://communities.sas.com/t5/SAS-Programming/The-number-of-obs-in-datasets-do-not-add-up-after-MERGE/m-p/676912#M204126</link>
      <description>&lt;P&gt;thank you for the detailed response. I have removed all duplicates so there would not be any obs with more than 1 match.&lt;/P&gt;
&lt;P&gt;Your first set of code helps to see where the obs come from, but I don't think it show the ones that are missing (because the numbers don't add up)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 15 Aug 2020 01:37:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/The-number-of-obs-in-datasets-do-not-add-up-after-MERGE/m-p/676912#M204126</guid>
      <dc:creator>somebody</dc:creator>
      <dc:date>2020-08-15T01:37:38Z</dc:date>
    </item>
    <item>
      <title>Re: The number of obs in datasets do not add up after MERGE</title>
      <link>https://communities.sas.com/t5/SAS-Programming/The-number-of-obs-in-datasets-do-not-add-up-after-MERGE/m-p/676913#M204127</link>
      <description>&lt;P&gt;can you please explain a bit more on the "&lt;SPAN&gt;non-temporary variables"? what should I create?&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 15 Aug 2020 01:39:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/The-number-of-obs-in-datasets-do-not-add-up-after-MERGE/m-p/676913#M204127</guid>
      <dc:creator>somebody</dc:creator>
      <dc:date>2020-08-15T01:39:18Z</dc:date>
    </item>
    <item>
      <title>Re: The number of obs in datasets do not add up after MERGE</title>
      <link>https://communities.sas.com/t5/SAS-Programming/The-number-of-obs-in-datasets-do-not-add-up-after-MERGE/m-p/676919#M204131</link>
      <description>&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Refer to this region" style="width: 650px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/48267i1D8F6718E0B51CDC/image-size/large?v=v2&amp;amp;px=999" role="button" title="NIMG.png" alt="Refer to this region" /&gt;&lt;span class="lia-inline-image-caption" onclick="event.preventDefault();"&gt;Refer to this region&lt;/span&gt;&lt;/span&gt;look at the image in your case if A and B and C is A intersection B Intersection C. A and (not B or Not C) will be the pink region and the other two cases are represented by Blue non overlapping parts of B and Green non overlapping parts of C. That means you will be missing intersection parts of A,B; A,C and BC respectively&lt;/P&gt;</description>
      <pubDate>Sat, 15 Aug 2020 02:12:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/The-number-of-obs-in-datasets-do-not-add-up-after-MERGE/m-p/676919#M204131</guid>
      <dc:creator>smantha</dc:creator>
      <dc:date>2020-08-15T02:12:03Z</dc:date>
    </item>
    <item>
      <title>Re: The number of obs in datasets do not add up after MERGE</title>
      <link>https://communities.sas.com/t5/SAS-Programming/The-number-of-obs-in-datasets-do-not-add-up-after-MERGE/m-p/676960#M204153</link>
      <description>&lt;P&gt;I can't see the numbers that you are looking at. If you would like feedback ...&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Post the number of observations you start with from each source (after you had removed duplicates), and the PROC FREQ results.&lt;/P&gt;</description>
      <pubDate>Sat, 15 Aug 2020 12:43:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/The-number-of-obs-in-datasets-do-not-add-up-after-MERGE/m-p/676960#M204153</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2020-08-15T12:43:40Z</dc:date>
    </item>
    <item>
      <title>Re: The number of obs in datasets do not add up after MERGE</title>
      <link>https://communities.sas.com/t5/SAS-Programming/The-number-of-obs-in-datasets-do-not-add-up-after-MERGE/m-p/676966#M204156</link>
      <description>&lt;P&gt;The are 8 possible ways to populate 3 binary variables.&amp;nbsp; One combination is impossible (all false).&amp;nbsp; Your program is only writing out 4.&amp;nbsp; &amp;nbsp;So either you skipped some of them or included them into multiple datasets.&lt;/P&gt;
&lt;P&gt;Here is how to get all 7 combinations.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data 
  MATCH
  A_ONLY
  B_ONLY
  C_ONLY
  AB_ONLY
  AC_ONLY
  BC_ONLY
;
  merge data_A(in=A)  data_B(in=B) data_C(in=C);
  by var1 var2 var3 var4;
  if A and B and C then output MATCH;
  else if A and not (B or C) then output A_ONLY;
  else if B and not (C or A)  then output B_ONLY;
  else if C and not (A or B) then output C_ONLY;
  else if A and B then output AB_ONLY;
  else if A and C then output AC_ONLY;
  else if B and C then output BC_ONLY;
  else put 'This combination is impossible. ' (_n_ a b c) (=);
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Checking your logic it seems you do account for all 7 combinations.&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data test;
 do a=1,0; do b=1,0; do c=1,0;
  length m1 m2 $20;
  if A and B and C then m1='MATCH';
  else if A and (not B or not C) then m1='nomatch_A';
  else if B and (not C or not A)  then m1='nomatch_B';
  else if C and (not A or not B) then m1='nomatch_C';
  else m1='none';
  if a and b and c then m2='MATCH';
  else if a and b and not c then m2='AB_ONLY';
  else if a and not b and c then m2='AC_ONLY';
  else if not a and b and c then m2='BC_ONLY';
  else if a then m2='A_ONLY';
  else if b then m2='B_ONLY';
  else if c then m2='C_ONLY';
  else m2='none';
  output;

 end; end; end;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Not clear if this is exactly what you wanted.&lt;/P&gt;
&lt;PRE&gt;Obs    a    b    c    m1           m2

 1     1    1    1    MATCH        MATCH
 2     1    1    0    nomatch_A    AB_ONLY
 3     1    0    1    nomatch_A    AC_ONLY
 4     1    0    0    nomatch_A    A_ONLY
 5     0    1    1    nomatch_B    BC_ONLY
 6     0    1    0    nomatch_B    B_ONLY
 7     0    0    1    nomatch_C    C_ONLY
 8     0    0    0    none         none&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 15 Aug 2020 14:18:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/The-number-of-obs-in-datasets-do-not-add-up-after-MERGE/m-p/676966#M204156</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2020-08-15T14:18:13Z</dc:date>
    </item>
  </channel>
</rss>

