<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: MERGE statement has more than one data set with repeats of BY values in SAS Studio</title>
    <link>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/450014#M5057</link>
    <description>&lt;P&gt;&lt;SPAN&gt;1. How not to replace a file during a sort. Simply use the out=option when running the sort. i.e., instead of just sorting the file itself, use something like:&lt;/SPAN&gt;&lt;/P&gt;
&lt;PRE&gt;proc sort data=paper.CSRP_annual_returns
               out=paper.CSRP_annual_returns2
               nodupkey;
  by ticker;
run;&lt;/PRE&gt;
&lt;P&gt;&lt;SPAN&gt;2. My bad! The code should have included a by statement. e.g.&lt;/SPAN&gt;&lt;/P&gt;
&lt;PRE&gt;data test;
  set paper.CSRP_annual_returns;
  by ticker;
  if not (first.ticker and last.ticker);
run;&lt;/PRE&gt;
&lt;P&gt;&lt;SPAN&gt;That way, first.ticker and last.ticker would only each be equal to 1 if there weren't any duplicates for a particular ticker. Thus you only want to look at/review those that don't meet that condition.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Art, CEO, AnalystFinder.com&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 30 Mar 2018 18:59:29 GMT</pubDate>
    <dc:creator>art297</dc:creator>
    <dc:date>2018-03-30T18:59:29Z</dc:date>
    <item>
      <title>MERGE statement has more than one data set with repeats of BY values</title>
      <link>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/449980#M5046</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Hello all!&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;You will notice from the log that my data set PAPER.COMPUSTAT_EXECUCOMP4 has 13355 observations. &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;NOTE: There were 13678 observations read from the data set PAPER.COMPUSTAT_EXECUCOMP3. &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;NOTE: There were 4304 observations read from the data set WORK.MULTIPLE_RETURNS. &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;NOTE: The data set PAPER.COMPUSTAT_EXECUCOMP4 has 13355 observations and 107 variables. &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;NOTE: DATA statement used (Total process time): real time 0.14 seconds user cpu time 0.03 seconds system cpu time 0.04 seconds &amp;nbsp; &amp;nbsp; memory 4073.15k OS Memory 42680.00k Timestamp 03/30/2018 02:06:59 PM Step Count 206 Switch Count 2 Page Faults 0 Page Reclaims 575 Page Swaps 0 Voluntary Context Switches 1143 Involuntary Context Switches 2 Block Input Operations 55840 Block Output Operations 54536 And this is the dataset that we merged for the final CEO_FIRM dataset. &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;So how could it be possible, as the log shows, that CEO_FIRM, the merged database has 55084 observations?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;FONT color="#FF0000"&gt;NOTE: MERGE statement has more than one data set with repeats of BY values.&lt;/FONT&gt; NOTE: There were 53961 observations read from the data set PAPER.CSRP_ANNUAL_RETURNS. &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;NOTE: There were 13355 observations read from the data set PAPER.COMPUSTAT_EXECUCOMP4. &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;NOTE: The data set PAPER.CEO_FIRM has 55084 observations and 108 variables. NOTE: DATA statement used (Total process time): real time 0.39 seconds user cpu time 0.06 seconds system cpu time 0.10 seconds memory 5286.21k OS Memory 44216.00k Timestamp 03/30/2018 02:07:03 PM Step Count 211 Switch Count 6 Page Faults 0 Page Reclaims 869 Page Swaps 0 Voluntary Context Switches 3404 Involuntary Context Switches 20 Block Input Operations 56576 Block Output Operations 224008 &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Shouldn't it only have as much data as the smaller dataset has? If its a 1-1 match for each CEO/annual_return? I highlighted the relevant section of the log.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Let me know if you have any ideas!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;-SAStuck&lt;/P&gt;</description>
      <pubDate>Fri, 30 Mar 2018 17:29:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/449980#M5046</guid>
      <dc:creator>sastuck</dc:creator>
      <dc:date>2018-03-30T17:29:50Z</dc:date>
    </item>
    <item>
      <title>Re: MERGE statement has more than one data set with repeats of BY values</title>
      <link>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/449981#M5047</link>
      <description>Can you show your code (merge statement)?  Depending on how you code it, it could be keeping only the records that match or keeping all records.  Also the note about MERGE statement with more than one data set with repeats of BY values is definitely a concern/problem.  So you will want to dig into that.</description>
      <pubDate>Fri, 30 Mar 2018 17:35:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/449981#M5047</guid>
      <dc:creator>Quentin</dc:creator>
      <dc:date>2018-03-30T17:35:40Z</dc:date>
    </item>
    <item>
      <title>Re: MERGE statement has more than one data set with repeats of BY values</title>
      <link>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/449982#M5048</link>
      <description>&lt;P&gt;Thanks for the reply. Here is the code:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;*Merge CEO data and firm data; 
DATA paper.ceo_firm ; 
length ticker $5;
MERGE paper.CSRP_annual_returns 
paper.compustat_execucomp4; 
BY ticker; 
RUN;&lt;/PRE&gt;</description>
      <pubDate>Fri, 30 Mar 2018 17:37:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/449982#M5048</guid>
      <dc:creator>sastuck</dc:creator>
      <dc:date>2018-03-30T17:37:00Z</dc:date>
    </item>
    <item>
      <title>Re: MERGE statement has more than one data set with repeats of BY values</title>
      <link>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/450001#M5049</link>
      <description>&lt;P&gt;I found this code, which may have worked:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;*MERGE statement has more than one data set with repeats of BY values;&lt;BR /&gt;proc sort data=paper.compustat_execucomp4;&lt;BR /&gt; by ticker;&lt;BR /&gt;run;&lt;BR /&gt;proc sort data=paper.CSRP_annual_returns nodupkey;&lt;BR /&gt; by ticker;&lt;BR /&gt;run;&lt;BR /&gt;data want;&lt;BR /&gt; merge paper.compustat_execucomp4 (in=in1) paper.CSRP_annual_returns;&lt;BR /&gt; by ticker;&lt;BR /&gt; if in1;&lt;BR /&gt;run;&lt;/PRE&gt;&lt;P&gt;The new ceo_firm dataset has 10,741 rows, which seems is more realistic.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What do you think?&lt;/P&gt;</description>
      <pubDate>Fri, 30 Mar 2018 18:24:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/450001#M5049</guid>
      <dc:creator>sastuck</dc:creator>
      <dc:date>2018-03-30T18:24:59Z</dc:date>
    </item>
    <item>
      <title>Re: MERGE statement has more than one data set with repeats of BY values</title>
      <link>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/450002#M5050</link>
      <description>&lt;P&gt;Yes, that would eliminate the "more than one record" warning. The only question is whether you dropped the correct records or, if not, whether you can retrieve the original file to find out. You used:&lt;/P&gt;
&lt;PRE&gt;proc sort data=paper.CSRP_annual_returns nodupkey;
 by ticker;
run;&lt;/PRE&gt;
&lt;P&gt;which will definitely make it so that there is only one record per ticker, but you did it replacing the original (?) file.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Art, CEO, AnalystFinder.com&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 30 Mar 2018 18:31:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/450002#M5050</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2018-03-30T18:31:52Z</dc:date>
    </item>
    <item>
      <title>Re: MERGE statement has more than one data set with repeats of BY values</title>
      <link>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/450006#M5052</link>
      <description>&lt;P&gt;right . . . recommendations?&amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 30 Mar 2018 18:34:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/450006#M5052</guid>
      <dc:creator>sastuck</dc:creator>
      <dc:date>2018-03-30T18:34:45Z</dc:date>
    </item>
    <item>
      <title>Re: MERGE statement has more than one data set with repeats of BY values</title>
      <link>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/450007#M5053</link>
      <description>&lt;P&gt;Hopefully, you still have the original file.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If so, I would create a file that only contains the duplicates and check each duplicate to see if they are different and, if they are, which one to keep.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Only getting the duplicate records should be easy. e.g.:&lt;/P&gt;
&lt;PRE&gt;proc sort data=paper.CSRP_annual_returns;
 by ticker;
run;

data test;
  set paper.CSRP_annual_returns;&lt;BR /&gt;  by ticker;
  if not (first.ticker and last.ticker);
run;
&lt;/PRE&gt;
&lt;P&gt;Art, CEO, AnalystFinder.com&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 30 Mar 2018 18:57:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/450007#M5053</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2018-03-30T18:57:05Z</dc:date>
    </item>
    <item>
      <title>Re: MERGE statement has more than one data set with repeats of BY values</title>
      <link>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/450010#M5055</link>
      <description>&lt;P&gt;I am happy to try this. Would you just mind explaining this portion so that I understand?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;data test;
  set paper.CSRP_annual_returns;
  if not (first.ticker and last.ticker);
run;&lt;/PRE&gt;&lt;P&gt;Additionally, how do I make sure not to overwrite the original dataset again?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 30 Mar 2018 18:51:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/450010#M5055</guid>
      <dc:creator>sastuck</dc:creator>
      <dc:date>2018-03-30T18:51:50Z</dc:date>
    </item>
    <item>
      <title>Re: MERGE statement has more than one data set with repeats of BY values</title>
      <link>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/450014#M5057</link>
      <description>&lt;P&gt;&lt;SPAN&gt;1. How not to replace a file during a sort. Simply use the out=option when running the sort. i.e., instead of just sorting the file itself, use something like:&lt;/SPAN&gt;&lt;/P&gt;
&lt;PRE&gt;proc sort data=paper.CSRP_annual_returns
               out=paper.CSRP_annual_returns2
               nodupkey;
  by ticker;
run;&lt;/PRE&gt;
&lt;P&gt;&lt;SPAN&gt;2. My bad! The code should have included a by statement. e.g.&lt;/SPAN&gt;&lt;/P&gt;
&lt;PRE&gt;data test;
  set paper.CSRP_annual_returns;
  by ticker;
  if not (first.ticker and last.ticker);
run;&lt;/PRE&gt;
&lt;P&gt;&lt;SPAN&gt;That way, first.ticker and last.ticker would only each be equal to 1 if there weren't any duplicates for a particular ticker. Thus you only want to look at/review those that don't meet that condition.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Art, CEO, AnalystFinder.com&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 30 Mar 2018 18:59:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/450014#M5057</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2018-03-30T18:59:29Z</dc:date>
    </item>
    <item>
      <title>Re: MERGE statement has more than one data set with repeats of BY values</title>
      <link>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/450032#M5060</link>
      <description>&lt;P&gt;By original file do you mean the original csv that I imported? I think I messed up my data . . . should I start with a clean program?&lt;/P&gt;</description>
      <pubDate>Fri, 30 Mar 2018 19:41:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/450032#M5060</guid>
      <dc:creator>sastuck</dc:creator>
      <dc:date>2018-03-30T19:41:21Z</dc:date>
    </item>
    <item>
      <title>Re: MERGE statement has more than one data set with repeats of BY values</title>
      <link>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/450056#M5063</link>
      <description>&lt;P&gt;Yes. I think that would be a good idea!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Art, CEO, AnalystFinder.com&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 30 Mar 2018 20:17:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/450056#M5063</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2018-03-30T20:17:39Z</dc:date>
    </item>
    <item>
      <title>Re: MERGE statement has more than one data set with repeats of BY values</title>
      <link>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/450086#M5067</link>
      <description>&lt;P&gt;I ran the following code:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;*MERGE statement has more than one data set with repeats of BY values;
proc sort data=paper.CSRP_annual_returns
 out=paper.CSRP_annual_returns2;
 by ticker;
run;

proc sort data=paper.CSRP_annual_returns
               out=paper.CSRP_annual_returns2
               nodupkey;
  by ticker;
run;

*Merge CEO data and firm data; 
DATA paper.ceo_firm ; 
length ticker $5;
MERGE paper.CSRP_annual_returns 
paper.compustat_execucomp4; 
BY ticker; 
RUN;&lt;/PRE&gt;&lt;P&gt;And now, as you will see in the log, the by variable issue is persisting. Any ideas? Maybe I didn't implement the code properly?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV class="dijitContentPane dijitBorderContainer-child dijitBorderContainer-dijitContentPane dijitBorderContainerPane dijitAlignCenter dijitContentPaneSingleChild"&gt;&lt;DIV class="dijitBorderContainer dijitContainer row-fluid dijitLayoutContainer"&gt;&lt;DIV class="dijitContentPane dijitAlignCenter dijitContentPaneSingleChild dijitBorderContainer-child dijitBorderContainer-dijitContentPane dijitBorderContainerPane"&gt;&lt;DIV class="tabs dijitBorderContainer dijitContainer dojoDndTarget sasStudioTabsParentContainer dijitLayoutContainer dojoDndContainerOver"&gt;&lt;DIV class="dijitTabContainer dijitTabContainerTop dijitContainer dijitLayoutContainer tabStrip-disabled sasStudioTabsTabContainer sasStudioTabsTabContainerVertical sasStudioTabsTop dijitBorderContainer-child dijitBorderContainer-dijitTabContainerTop dijitBorderContainerPane dijitAlignCenter"&gt;&lt;DIV class="dijitTabPaneWrapper dijitTabContainerTop-container dijitAlignCenter"&gt;&lt;DIV class="dijitTabContainerTopChildWrapper dijitVisible"&gt;&lt;DIV class="dijitBorderContainer dijitContainer sasStudioTabsTabContainerChild dijitTabPane dijitTabContainerTop-child dijitTabContainerTop-dijitBorderContainer dijitLayoutContainer"&gt;&lt;DIV class="dijitBorderContainer dijitContainer dojoDndTarget dijitBorderContainer-child dijitBorderContainer-dijitBorderContainer dijitBorderContainerPane dijitAlignCenter dijitLayoutContainer dojoDndContainerOver"&gt;&lt;DIV class="dijitTabContainer dijitTabContainerTop dijitContainer dijitLayoutContainer tabStrip-disabled sasSuiteTabs dijitBorderContainer-child dijitBorderContainer-dijitTabContainerTop dijitBorderContainerPane dijitAlignCenter"&gt;&lt;DIV class="dijitTabPaneWrapper dijitTabContainerTop-container dijitAlignCenter"&gt;&lt;DIV class="dijitTabContainerTopChildWrapper dijitVisible"&gt;&lt;DIV class="dijitBorderContainer dijitContainer dijitTabPane dijitTabContainerTop-child dijitTabContainerTop-dijitBorderContainer dijitLayoutContainer"&gt;&lt;DIV class="dijitBorderContainer dijitContainer dijitBorderContainer-child dijitBorderContainer-dijitBorderContainer dijitBorderContainerPane dijitAlignCenter dijitLayoutContainer"&gt;&lt;DIV class="dijitContentPane dijitBorderContainer-child dijitBorderContainer-dijitContentPane dijitBorderContainerPane dijitAlignCenter"&gt;&lt;DIV&gt;&lt;DIV class="sasSource"&gt;1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;70&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;71&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;72 *MERGE statement has more than one data set with repeats of BY values;&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;73 proc sort data=paper.CSRP_annual_returns&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;74 out=paper.CSRP_annual_returns2;&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;75 by ticker;&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;76 run;&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;NOTE: There were 53176 observations read from the data set PAPER.CSRP_ANNUAL_RETURNS.&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;NOTE: The data set PAPER.CSRP_ANNUAL_RETURNS2 has 53176 observations and 3 variables.&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;NOTE: PROCEDURE SORT used (Total process time):&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;real time 0.02 seconds&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;user cpu time 0.02 seconds&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;system cpu time 0.00 seconds&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;memory 6129.37k&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;OS Memory 36708.00k&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Timestamp 03/31/2018 12:03:36 AM&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Step Count 70 Switch Count 2&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Page Faults 0&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Page Reclaims 1003&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Page Swaps 0&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Voluntary Context Switches 56&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Involuntary Context Switches 0&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Block Input Operations 0&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Block Output Operations 2576&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;77&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;78 proc sort data=paper.CSRP_annual_returns&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;79 out=paper.CSRP_annual_returns2&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;80 nodupkey;&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;81 by ticker;&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;82 run;&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;NOTE: There were 53176 observations read from the data set PAPER.CSRP_ANNUAL_RETURNS.&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;NOTE: 42661 observations with duplicate key values were deleted.&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;NOTE: The data set PAPER.CSRP_ANNUAL_RETURNS2 has 10515 observations and 3 variables.&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;NOTE: PROCEDURE SORT used (Total process time):&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;real time 0.02 seconds&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;user cpu time 0.01 seconds&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;system cpu time 0.00 seconds&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;memory 6129.37k&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;OS Memory 36708.00k&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Timestamp 03/31/2018 12:03:36 AM&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Step Count 71 Switch Count 2&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Page Faults 0&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Page Reclaims 842&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Page Swaps 0&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Voluntary Context Switches 48&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Involuntary Context Switches 0&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Block Input Operations 32&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Block Output Operations 528&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;83&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;84&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;85 *Merge CEO data and firm data;&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;86 DATA paper.ceo_firm ;&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;87 length ticker $5;&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;88 MERGE paper.CSRP_annual_returns&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;89 paper.compustat_execucomp4;&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;90 BY ticker;&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;91 RUN;&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;NOTE: MERGE statement has more than one data set with repeats of BY values.&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;NOTE: There were 53176 observations read from the data set PAPER.CSRP_ANNUAL_RETURNS.&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;NOTE: There were 13346 observations read from the data set PAPER.COMPUSTAT_EXECUCOMP4.&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;NOTE: The data set PAPER.CEO_FIRM has 54361 observations and 108 variables.&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;NOTE: DATA statement used (Total process time):&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;real time 0.36 seconds&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;user cpu time 0.03 seconds&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;system cpu time 0.12 seconds&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;memory 5276.21k&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;OS Memory 37040.00k&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Timestamp 03/31/2018 12:03:36 AM&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Step Count 72 Switch Count 2&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Page Faults 0&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Page Reclaims 1005&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Page Swaps 0&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Voluntary Context Switches 3349&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Involuntary Context Switches 3&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Block Input Operations 0&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Block Output Operations 220936&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV class="sasNote"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;92&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;93 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;105&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class="dijitContentPane statusBar dijitBorderContainer-child dijitBorderContainer-dijitContentPane dijitBorderContainerPane dijitAlignBottom"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class="dijitContentPane statusBar dijitBorderContainer-child dijitBorderContainer-dijitContentPane dijitBorderContainerPane dijitAlignBottom"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class="dijitContentPane statusBar dijitBorderContainer-child dijitBorderContainer-dijitContentPane dijitBorderContainerPane dijitAlignBottom"&gt;Thanks!&lt;/DIV&gt;&lt;DIV class="dijitContentPane statusBar dijitBorderContainer-child dijitBorderContainer-dijitContentPane dijitBorderContainerPane dijitAlignBottom"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class="dijitContentPane statusBar dijitBorderContainer-child dijitBorderContainer-dijitContentPane dijitBorderContainerPane dijitAlignBottom"&gt;SAStuck&lt;/DIV&gt;</description>
      <pubDate>Sat, 31 Mar 2018 00:10:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/450086#M5067</guid>
      <dc:creator>sastuck</dc:creator>
      <dc:date>2018-03-31T00:10:29Z</dc:date>
    </item>
    <item>
      <title>Re: MERGE statement has more than one data set with repeats of BY values</title>
      <link>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/450088#M5068</link>
      <description>&lt;P&gt;I don't know why you sorted the file twice but, regardless, you used the wrong file in your datastep merge.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Didn't you mean to use?:&lt;/P&gt;
&lt;PRE&gt;MERGE paper.CSRP_annual_returns2 
              paper.compustat_execucomp4; 
&lt;/PRE&gt;
&lt;P&gt;Art, CEO, AnalystFinder.com&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 31 Mar 2018 00:31:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/450088#M5068</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2018-03-31T00:31:57Z</dc:date>
    </item>
    <item>
      <title>Re: MERGE statement has more than one data set with repeats of BY values</title>
      <link>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/450090#M5069</link>
      <description>&lt;P&gt;Thanks, Art.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here's the update:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;*MERGE statement has more than one data set with repeats of BY values;
proc sort data=paper.CSRP_annual_returns
               out=paper.CSRP_annual_returns2
               nodupkey;
  by ticker;
run;

*Merge CEO data and firm data; 
DATA paper.ceo_firm ; 
length ticker $5;
MERGE paper.CSRP_annual_returns2 
paper.compustat_execucomp4; 
BY ticker; 
RUN;&lt;/PRE&gt;&lt;P&gt;And the log:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV class="dijitContentPane dijitBorderContainer-child dijitBorderContainer-dijitContentPane dijitBorderContainerPane dijitAlignCenter dijitContentPaneSingleChild"&gt;&lt;DIV class="dijitBorderContainer dijitContainer row-fluid dijitLayoutContainer"&gt;&lt;DIV class="dijitContentPane dijitAlignCenter dijitContentPaneSingleChild dijitBorderContainer-child dijitBorderContainer-dijitContentPane dijitBorderContainerPane"&gt;&lt;DIV class="tabs dijitBorderContainer dijitContainer dojoDndTarget sasStudioTabsParentContainer dijitLayoutContainer dojoDndContainerOver"&gt;&lt;DIV class="dijitTabContainer dijitTabContainerTop dijitContainer dijitLayoutContainer tabStrip-disabled sasStudioTabsTabContainer sasStudioTabsTabContainerVertical sasStudioTabsTop dijitBorderContainer-child dijitBorderContainer-dijitTabContainerTop dijitBorderContainerPane dijitAlignCenter"&gt;&lt;DIV class="dijitTabPaneWrapper dijitTabContainerTop-container dijitAlignCenter"&gt;&lt;DIV class="dijitTabContainerTopChildWrapper dijitVisible"&gt;&lt;DIV class="dijitBorderContainer dijitContainer sasStudioTabsTabContainerChild dijitTabPane dijitTabContainerTop-child dijitTabContainerTop-dijitBorderContainer dijitLayoutContainer"&gt;&lt;DIV class="dijitBorderContainer dijitContainer dojoDndTarget dijitBorderContainer-child dijitBorderContainer-dijitBorderContainer dijitBorderContainerPane dijitAlignCenter dijitLayoutContainer dojoDndContainerOver"&gt;&lt;DIV class="dijitTabContainer dijitTabContainerTop dijitContainer dijitLayoutContainer tabStrip-disabled sasSuiteTabs dijitBorderContainer-child dijitBorderContainer-dijitTabContainerTop dijitBorderContainerPane dijitAlignCenter"&gt;&lt;DIV class="dijitTabPaneWrapper dijitTabContainerTop-container dijitAlignCenter"&gt;&lt;DIV class="dijitTabContainerTopChildWrapper dijitVisible"&gt;&lt;DIV class="dijitBorderContainer dijitContainer dijitTabPane dijitTabContainerTop-child dijitTabContainerTop-dijitBorderContainer dijitLayoutContainer"&gt;&lt;DIV class="dijitBorderContainer dijitContainer dijitBorderContainer-child dijitBorderContainer-dijitBorderContainer dijitBorderContainerPane dijitAlignCenter dijitLayoutContainer"&gt;&lt;DIV class="dijitContentPane dijitBorderContainer-child dijitBorderContainer-dijitContentPane dijitBorderContainerPane dijitAlignCenter"&gt;&lt;DIV&gt;&lt;DIV class="sasSource"&gt;1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;70&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;71 *MERGE statement has more than one data set with repeats of BY values;&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;72 proc sort data=paper.CSRP_annual_returns&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;73 out=paper.CSRP_annual_returns2&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;74 nodupkey;&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;75 by ticker;&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;76 run;&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;NOTE: There were 53176 observations read from the data set PAPER.CSRP_ANNUAL_RETURNS.&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;NOTE: 42661 observations with duplicate key values were deleted.&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;NOTE: The data set PAPER.CSRP_ANNUAL_RETURNS2 has 10515 observations and 3 variables.&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;NOTE: PROCEDURE SORT used (Total process time):&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;real time 0.02 seconds&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;user cpu time 0.02 seconds&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;system cpu time 0.00 seconds&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;memory 6135.43k&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;OS Memory 36196.00k&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Timestamp 03/31/2018 12:37:11 AM&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Step Count 77 Switch Count 2&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Page Faults 0&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Page Reclaims 871&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Page Swaps 0&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Voluntary Context Switches 43&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Involuntary Context Switches 0&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Block Input Operations 0&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Block Output Operations 528&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;77&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;78 *Merge CEO data and firm data;&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;79 DATA paper.ceo_firm ;&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;80 length ticker $5;&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;81 MERGE paper.CSRP_annual_returns2&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;82 paper.compustat_execucomp4;&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;83 BY ticker;&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;84 RUN;&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;NOTE: There were 10515 observations read from the data set PAPER.CSRP_ANNUAL_RETURNS2.&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;NOTE: There were 13346 observations read from the data set PAPER.COMPUSTAT_EXECUCOMP4.&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;NOTE: The data set PAPER.CEO_FIRM has 21807 observations and 108 variables.&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;NOTE: DATA statement used (Total process time):&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;real time 0.29 seconds&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;user cpu time 0.01 seconds&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;system cpu time 0.06 seconds&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;memory 4219.71k&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;OS Memory 35248.00k&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Timestamp 03/31/2018 12:37:11 AM&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Step Count 78 Switch Count 2&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Page Faults 0&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Page Reclaims 740&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Page Swaps 0&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Voluntary Context Switches 1273&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Involuntary Context Switches 2&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Block Input Operations 576&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Block Output Operations 88840&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV class="sasNote"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;85&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;86 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;98&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class="dijitContentPane statusBar dijitBorderContainer-child dijitBorderContainer-dijitContentPane dijitBorderContainerPane dijitAlignBottom"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class="dijitContentPane statusBar dijitBorderContainer-child dijitBorderContainer-dijitContentPane dijitBorderContainerPane dijitAlignBottom"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class="dijitContentPane statusBar dijitBorderContainer-child dijitBorderContainer-dijitContentPane dijitBorderContainerPane dijitAlignBottom"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class="dijitContentPane statusBar dijitBorderContainer-child dijitBorderContainer-dijitContentPane dijitBorderContainerPane dijitAlignBottom"&gt;Now, the by value issue appears to be gone, but the number of rows in the merged data set is still large than that of the input datasets. Suggestions?&lt;/DIV&gt;</description>
      <pubDate>Sat, 31 Mar 2018 00:44:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/450090#M5069</guid>
      <dc:creator>sastuck</dc:creator>
      <dc:date>2018-03-31T00:44:06Z</dc:date>
    </item>
    <item>
      <title>Re: MERGE statement has more than one data set with repeats of BY values</title>
      <link>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/450095#M5070</link>
      <description>&lt;P&gt;I don't know what you have in execomp4 (I probably have the file name wrong .. the file that you DIDN'T run the nodupkey on), but was your annual file the one that contained yearly averages? Unless you're just analyzing one year, I don't think you want to unduplicate that file.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I think your duplicate problem is in your execomp4 file.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Art, CEO, AnalystFinder.com&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 31 Mar 2018 01:24:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/450095#M5070</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2018-03-31T01:24:41Z</dc:date>
    </item>
    <item>
      <title>Re: MERGE statement has more than one data set with repeats of BY values</title>
      <link>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/450103#M5071</link>
      <description>&lt;P&gt;If you merge two files by some key variable and the resulting dataset has more observations that the largest input file then it means that there are key values that are present in only one of the files.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So if you merge A and B by ID where A has 10 obs and B has 8 obs and the result has more than 10 obs then it there must be some values of ID that appear on only one of the two inputs.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you want to figure out which input dataset is contributing data to the merge you can use the IN= dataset options to set flag variable.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;DATA paper.ceo_firm ;
   length ticker $5 source $10;
  MERGE
    paper.CSRP_annual_returns2 (in=in1)
    paper.compustat_execucomp4 (in=in2)
  ;
  BY ticker;
  if in1 and in2 then source='BOTH';
  else if in1 then source='ANNUAL';
  else source='EXECCOMP';
RUN;

proc freq ;
 tables source;
run;
 
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Sat, 31 Mar 2018 03:31:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/450103#M5071</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2018-03-31T03:31:55Z</dc:date>
    </item>
    <item>
      <title>Re: MERGE statement has more than one data set with repeats of BY values</title>
      <link>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/450105#M5072</link>
      <description>&lt;P&gt;Thanks&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/159"&gt;@Tom&lt;/a&gt;&amp;nbsp;for the very clear and useful response. You made it very easy to understand what is going on here.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here is the output from the proc freq:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;source Frequency Percent CumulativeFrequency CumulativePercentANNUALBOTHEXECCOMP&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;8461&lt;/TD&gt;&lt;TD&gt;38.80&lt;/TD&gt;&lt;TD&gt;8461&lt;/TD&gt;&lt;TD&gt;38.80&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;12650&lt;/TD&gt;&lt;TD&gt;58.01&lt;/TD&gt;&lt;TD&gt;21111&lt;/TD&gt;&lt;TD&gt;96.81&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;696&lt;/TD&gt;&lt;TD&gt;3.19&lt;/TD&gt;&lt;TD&gt;21807&lt;/TD&gt;&lt;TD&gt;100.00&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;8461+696=9157, so 12650-9157=&lt;SPAN&gt;3493 are coming from one of these datasets--how do we know which one? Also, there are 21,807&amp;nbsp;&lt;/SPAN&gt;observations in the ceo_firm dataset. How does this tie in to the numbers above?&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here's the log from the merge statement:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV class="dijitBorderContainer dijitContainer row-fluid dijitLayoutContainer"&gt;&lt;DIV class="dijitContentPane dijitBorderContainer-child dijitBorderContainer-dijitContentPane dijitBorderContainerPane dijitAlignCenter dijitContentPaneSingleChild"&gt;&lt;DIV class="dijitBorderContainer dijitContainer row-fluid dijitLayoutContainer"&gt;&lt;DIV class="dijitContentPane dijitAlignCenter dijitContentPaneSingleChild dijitBorderContainer-child dijitBorderContainer-dijitContentPane dijitBorderContainerPane"&gt;&lt;DIV class="tabs dijitBorderContainer dijitContainer dojoDndTarget sasStudioTabsParentContainer dijitLayoutContainer dojoDndContainerOver"&gt;&lt;DIV class="dijitTabContainer dijitTabContainerTop dijitContainer dijitLayoutContainer tabStrip-disabled sasStudioTabsTabContainer sasStudioTabsTabContainerVertical sasStudioTabsTop dijitBorderContainer-child dijitBorderContainer-dijitTabContainerTop dijitBorderContainerPane dijitAlignCenter"&gt;&lt;DIV class="dijitTabPaneWrapper dijitTabContainerTop-container dijitAlignCenter"&gt;&lt;DIV class="dijitTabContainerTopChildWrapper dijitVisible"&gt;&lt;DIV class="dijitBorderContainer dijitContainer sasStudioTabsTabContainerChild dijitTabPane dijitTabContainerTop-child dijitTabContainerTop-dijitBorderContainer dijitLayoutContainer"&gt;&lt;DIV class="dijitBorderContainer dijitContainer dojoDndTarget dijitBorderContainer-child dijitBorderContainer-dijitBorderContainer dijitBorderContainerPane dijitAlignCenter dijitLayoutContainer dojoDndContainerOver"&gt;&lt;DIV class="dijitTabContainer dijitTabContainerTop dijitContainer dijitLayoutContainer tabStrip-disabled sasSuiteTabs dijitBorderContainer-child dijitBorderContainer-dijitTabContainerTop dijitBorderContainerPane dijitAlignCenter"&gt;&lt;DIV class="dijitTabPaneWrapper dijitTabContainerTop-container dijitAlignCenter"&gt;&lt;DIV class="dijitTabContainerTopChildWrapper dijitVisible"&gt;&lt;DIV class="dijitBorderContainer dijitContainer dijitTabPane dijitTabContainerTop-child dijitTabContainerTop-dijitBorderContainer dijitLayoutContainer"&gt;&lt;DIV class="dijitBorderContainer dijitContainer dijitBorderContainer-child dijitBorderContainer-dijitBorderContainer dijitBorderContainerPane dijitAlignCenter dijitLayoutContainer"&gt;&lt;DIV class="dijitContentPane dijitBorderContainer-child dijitBorderContainer-dijitContentPane dijitBorderContainerPane dijitAlignCenter"&gt;&lt;DIV&gt;&lt;DIV class="sasSource"&gt;1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;70&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;71 *Merge CEO data and firm data;&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;72 DATA paper.ceo_firm ;&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;73 length ticker $5 source $10;&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;74 MERGE&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;75 paper.CSRP_annual_returns2 (in=in1)&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;76 paper.compustat_execucomp4 (in=in2)&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;77 ;&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;78 BY ticker;&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;79 if in1 and in2 then source='BOTH';&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;80 else if in1 then source='ANNUAL';&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;81 else source='EXECCOMP';&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;82 RUN;&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;NOTE: There were 10515 observations read from the data set PAPER.CSRP_ANNUAL_RETURNS2.&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;NOTE: There were 13346 observations read from the data set PAPER.COMPUSTAT_EXECUCOMP4.&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;NOTE: The data set PAPER.CEO_FIRM has 21807 observations and 109 variables.&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;NOTE: DATA statement used (Total process time):&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;real time 0.16 seconds&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;user cpu time 0.01 seconds&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;system cpu time 0.04 seconds&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;memory 4217.15k&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;OS Memory 33456.00k&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Timestamp 03/31/2018 03:53:50 AM&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Step Count 35 Switch Count 2&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Page Faults 0&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Page Reclaims 768&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Page Swaps 0&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Voluntary Context Switches 638&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Involuntary Context Switches 0&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Block Input Operations 54880&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;Block Output Operations 89104&lt;/DIV&gt;&lt;DIV class="sasNote"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV class="sasNote"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;83&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;84 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;&lt;/DIV&gt;&lt;DIV class="sasSource"&gt;96&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class="dijitContentPane statusBar dijitBorderContainer-child dijitBorderContainer-dijitContentPane dijitBorderContainerPane dijitAlignBottom"&gt;&lt;DIV class="statusBarMessage"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class="statusBarUser"&gt;User: apmorabito0&lt;/DIV&gt;&lt;SPAN class="dijit dijitReset dijitInline dijitButton"&gt;&lt;SPAN class="dijitReset dijitInline dijitButtonNode"&gt;&lt;SPAN class="dijitReset dijitStretch dijitButtonContents"&gt;&lt;SPAN class="dijitReset dijitInline dijitButtonText"&gt;Messages: 4&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class="contentassist"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class="textviewTooltip"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class="sce-helper"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class="sce-helper"&gt;&amp;nbsp;Thanks again!&lt;/DIV&gt;</description>
      <pubDate>Sat, 31 Mar 2018 03:54:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/450105#M5072</guid>
      <dc:creator>sastuck</dc:creator>
      <dc:date>2018-03-31T03:54:57Z</dc:date>
    </item>
    <item>
      <title>Re: MERGE statement has more than one data set with repeats of BY values</title>
      <link>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/450106#M5073</link>
      <description>&lt;P&gt;So the Freq table shows what happened with your merge.&lt;/P&gt;
&lt;PRE&gt;ANNUAL    8,461
BOTH     12,650
EXECCOMP    696

Total 21,807
&lt;/PRE&gt;
&lt;P&gt;For example it is showing that there were 696 observations from the executive comp table that did not match the annual table.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For each individual observation you can reference the SOURCE variable you created to see whether it got information from ANNUAL , EXECCOMP or BOTH.&lt;/P&gt;</description>
      <pubDate>Sat, 31 Mar 2018 04:10:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/450106#M5073</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2018-03-31T04:10:41Z</dc:date>
    </item>
    <item>
      <title>Re: MERGE statement has more than one data set with repeats of BY values</title>
      <link>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/450164#M5075</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/159"&gt;@Tom&lt;/a&gt;&amp;nbsp;The source variable is very helpful. How can I tell SAS that I only want to keep rows with "BOTH" in the source variable?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Sat, 31 Mar 2018 17:21:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of-BY/m-p/450164#M5075</guid>
      <dc:creator>sastuck</dc:creator>
      <dc:date>2018-03-31T17:21:45Z</dc:date>
    </item>
  </channel>
</rss>

