<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Why the merged data has more obs than before in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Why-the-merged-data-has-more-obs-than-before/m-p/872242#M344599</link>
    <description>&lt;P&gt;In addition to abovementioned comments, if you need to merge based on PID.HEPA then use conditional merge, or SQL joins, then you get the&amp;nbsp;&lt;SPAN&gt;50032 obs.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;eg:&lt;/SPAN&gt;&lt;/P&gt;
&lt;PRE&gt;data combined1;
    merge LC.FAM (in=a)  hepa (in=b);
    by PID;
    if b;
run;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 26 Apr 2023 13:35:19 GMT</pubDate>
    <dc:creator>A_Kh</dc:creator>
    <dc:date>2023-04-26T13:35:19Z</dc:date>
    <item>
      <title>Why the merged data has more obs than before</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Why-the-merged-data-has-more-obs-than-before/m-p/872078#M344543</link>
      <description>&lt;P&gt;I was trying to merge two datasets and I used proc compare to check before I merged them, below is the result from proc compare&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screenshot 2023-04-25 202057.png" style="width: 670px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/83253i1C19DC4FA5AF21A2/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screenshot 2023-04-25 202057.png" alt="Screenshot 2023-04-25 202057.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;The code I used to merge datasets is&lt;/P&gt;&lt;PRE&gt;proc sort data=LC.FAM;
by pid;
run;

proc sort data=hepa;
by PID;
run;

data combined1;
    merge LC.FAM  hepa ;
    by PID;
run;&lt;/PRE&gt;&lt;P&gt;I got a new dataset with 50273 rows, which contains more rows than hepa (50032 obs). Did anyone know why this happens?&lt;/P&gt;</description>
      <pubDate>Wed, 26 Apr 2023 00:28:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Why-the-merged-data-has-more-obs-than-before/m-p/872078#M344543</guid>
      <dc:creator>LarissaW</dc:creator>
      <dc:date>2023-04-26T00:28:23Z</dc:date>
    </item>
    <item>
      <title>Re: Why the merged data has more obs than before</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Why-the-merged-data-has-more-obs-than-before/m-p/872080#M344544</link>
      <description>&lt;P&gt;This can happen if there are duplicates in the data, or if there are mis-matches (e.g. values for pid in lc.fam that are not in hepa).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Running PROC COMPARE is an interesting idea.&amp;nbsp; You usually use PROC COMPARE to compare variables, but it will also tell you if you have duplicate values, or if you have mis-matches.&amp;nbsp; What do you get if you run:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc compare base=hepa compare=lc.fam ;
  id pid ; *use an ID statement here, not a BY statement;
run ;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 26 Apr 2023 00:43:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Why-the-merged-data-has-more-obs-than-before/m-p/872080#M344544</guid>
      <dc:creator>Quentin</dc:creator>
      <dc:date>2023-04-26T00:43:16Z</dc:date>
    </item>
    <item>
      <title>Re: Why the merged data has more obs than before</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Why-the-merged-data-has-more-obs-than-before/m-p/872097#M344545</link>
      <description>&lt;P&gt;There is no way to know in advance how many observations the merge will generate.&amp;nbsp; Unless you know the values of PID in both datasets.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If HEPA has 100 observations that are each a distinct value of PID&lt;/P&gt;
&lt;P&gt;And FAM has 10 observations that are each a distinct value of PID&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Then merging them can result in between 100 observations (all of the values of PID in FAM already existed in HEPA) to 110 observations (none of the values of PID in FAM existed in HEPA).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;And if either dataset has multiple observations for the same PID then even stranger things can happen.&lt;/P&gt;</description>
      <pubDate>Wed, 26 Apr 2023 03:14:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Why-the-merged-data-has-more-obs-than-before/m-p/872097#M344545</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2023-04-26T03:14:17Z</dc:date>
    </item>
    <item>
      <title>Re: Why the merged data has more obs than before</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Why-the-merged-data-has-more-obs-than-before/m-p/872112#M344546</link>
      <description>&lt;P&gt;This has nothing to do with graphics, so I moved this to the general Programming community.&lt;/P&gt;</description>
      <pubDate>Wed, 26 Apr 2023 05:18:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Why-the-merged-data-has-more-obs-than-before/m-p/872112#M344546</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2023-04-26T05:18:14Z</dc:date>
    </item>
    <item>
      <title>Re: Why the merged data has more obs than before</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Why-the-merged-data-has-more-obs-than-before/m-p/872242#M344599</link>
      <description>&lt;P&gt;In addition to abovementioned comments, if you need to merge based on PID.HEPA then use conditional merge, or SQL joins, then you get the&amp;nbsp;&lt;SPAN&gt;50032 obs.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;eg:&lt;/SPAN&gt;&lt;/P&gt;
&lt;PRE&gt;data combined1;
    merge LC.FAM (in=a)  hepa (in=b);
    by PID;
    if b;
run;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 26 Apr 2023 13:35:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Why-the-merged-data-has-more-obs-than-before/m-p/872242#M344599</guid>
      <dc:creator>A_Kh</dc:creator>
      <dc:date>2023-04-26T13:35:19Z</dc:date>
    </item>
  </channel>
</rss>

