<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: 1065  data y2002; 1066  merge y2002v1 y2002v2 y2002v3 y2002v4 y2002v5 y2002v6 y2002v7; 1067  by in SAS Data Management</title>
    <link>https://communities.sas.com/t5/SAS-Data-Management/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/700012#M19653</link>
    <description>&lt;P&gt;Then you have duplicate company investment_date combinations in two or more of your data sets and Merge likely did not do what you want.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Try code similar to this for each of your data sets:&lt;/P&gt;
&lt;PRE&gt;Proc freq data= y2002v1 order=freq;
   tables company*investment_date / missing list;
run;&lt;/PRE&gt;
&lt;P&gt;The top of each output table will have the most frequent responses, i.e. the duplicates.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Then clean the data needed so that there are only one record per company investment_date per set.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;SQL will create multiple records for each of the matches: 2 records in 2 data sets with the same company investment_date = 4 output records total. If you have multiple data sets with multiple matches you could end up with a lot of very questionable data.&lt;/P&gt;</description>
    <pubDate>Wed, 18 Nov 2020 22:08:51 GMT</pubDate>
    <dc:creator>ballardw</dc:creator>
    <dc:date>2020-11-18T22:08:51Z</dc:date>
    <item>
      <title>MERGE statement has more than one data set with repeats of BY values</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/700005#M19652</link>
      <description>&lt;P&gt;i all,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I use the following code to combine seven datasets. The variables of &lt;EM&gt;company&lt;/EM&gt; and &lt;EM&gt;investment_date&lt;/EM&gt; are same in each dataset. Besides, each dataset has its unique variables.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data y2002;
merge y2002v1 y2002v2 y2002v3 y2002v4 y2002v5 y2002v6 y2002v7;
by company Investment_Date;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In my understanding, the above code should be enough to merge dataset. However, SAS still give me a note of repeats of BY values&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;NOTE: MERGE statement has more than one data set with repeats of BY values.
NOTE: There were 10974 observations read from the data set WORK.Y2002V1.
NOTE: There were 10974 observations read from the data set WORK.Y2002V2.
NOTE: There were 10974 observations read from the data set WORK.Y2002V3.
NOTE: There were 10974 observations read from the data set WORK.Y2002V4.
NOTE: There were 10974 observations read from the data set WORK.Y2002V5.
NOTE: There were 10974 observations read from the data set WORK.Y2002V6.
NOTE: There were 10974 observations read from the data set WORK.Y2002V7.
NOTE: The data set WORK.Y2002 has 10974 observations and 98 variables.&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;To help you understand the dataset, the followings are the examples of datasets.&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data dataset1;

  input company $  investment_date variable1 variable2;

  cards;

aaa           20000101			1			2

aaa           20000102			21			22

aaa           20000103			12			33

bbb           20000101			2			32

bbb           20000104			23			44

bbb           20000108			33			22

ccc           20000404			44			12

;

data dataset2;

  input company $  investment_date variable3 variable4;

  cards;

aaa           20000101			11			22

aaa           20000102			221			232

aaa           20000103			122			453

bbb           20000101			23			342

bbb           20000104			423			424

bbb           20000108			333			42

ccc           20000404			544			152

;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;SPAN&gt;Please advice an appropriate merge statement and/or SQL join. Thanks&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV class="gtx-trans-icon"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 18 Nov 2020 22:09:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/700005#M19652</guid>
      <dc:creator>Neal3321</dc:creator>
      <dc:date>2020-11-18T22:09:45Z</dc:date>
    </item>
    <item>
      <title>Re: 1065  data y2002; 1066  merge y2002v1 y2002v2 y2002v3 y2002v4 y2002v5 y2002v6 y2002v7; 1067  by</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/700012#M19653</link>
      <description>&lt;P&gt;Then you have duplicate company investment_date combinations in two or more of your data sets and Merge likely did not do what you want.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Try code similar to this for each of your data sets:&lt;/P&gt;
&lt;PRE&gt;Proc freq data= y2002v1 order=freq;
   tables company*investment_date / missing list;
run;&lt;/PRE&gt;
&lt;P&gt;The top of each output table will have the most frequent responses, i.e. the duplicates.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Then clean the data needed so that there are only one record per company investment_date per set.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;SQL will create multiple records for each of the matches: 2 records in 2 data sets with the same company investment_date = 4 output records total. If you have multiple data sets with multiple matches you could end up with a lot of very questionable data.&lt;/P&gt;</description>
      <pubDate>Wed, 18 Nov 2020 22:08:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/700012#M19653</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2020-11-18T22:08:51Z</dc:date>
    </item>
    <item>
      <title>Re: MERGE statement has more than one data set with repeats of BY values</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/700072#M19654</link>
      <description>&lt;P&gt;Neither of the example datasets you posted have multiple observations for any of the BY groups.&lt;/P&gt;</description>
      <pubDate>Thu, 19 Nov 2020 02:35:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/700072#M19654</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2020-11-19T02:35:48Z</dc:date>
    </item>
  </channel>
</rss>

