<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Merging multiple data sets in a data step in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Merging-multiple-data-sets-in-a-data-step/m-p/640104#M190568</link>
    <description>This was the case in the example here 'Note that the NODUP option on proc sort will only eliminate duplicate observations, not observations that have duplicate keys'. NODUP and NODUPKEYS were different options? I thought the earlier was the abbreviation of NODUPKEYS. My ignorance then.</description>
    <pubDate>Wed, 15 Apr 2020 13:11:56 GMT</pubDate>
    <dc:creator>Cruise</dc:creator>
    <dc:date>2020-04-15T13:11:56Z</dc:date>
    <item>
      <title>Merging multiple data sets in a data step</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merging-multiple-data-sets-in-a-data-step/m-p/639958#M190457</link>
      <description>&lt;P&gt;Hi Folks:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'm merging 4 datasets and would like to select the observations matched to &lt;CODE class=" language-sas"&gt;main_attr1(in=a) data&lt;/CODE&gt;. I would expect the resulting merged data w_four to have N=250 observations matched to the size of data (in=a). However, merged data w_four resulted in 262 rows.&lt;/P&gt;
&lt;P&gt;proc freq below shows that w_four has all levels of data in the other datasets which obviously is not true.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;How to do this merge in data step correct? Any suggestions appreciated.&lt;/P&gt;
&lt;P&gt;Thank you very much for your time.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="freq.png" style="width: 730px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/38320i266869349FEFD42A/image-size/large?v=v2&amp;amp;px=999" role="button" title="freq.png" alt="freq.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data w_four;  
merge 
outcome          (in=o) /*228*/
main_attr1(in=a) /*250*/
Korean_foreign1  (in=f) /*261*/
Korean_migration1(in=m); /*252*/
by id1name idname;
outcome=o;
map_attr=a;
foreign=f;
migration=m;
if a then output w_four; /*262*/ 
run; 
proc freq data=w_four;
tables map_attr*outcome*foreign*migration/list;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 15 Apr 2020 04:39:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merging-multiple-data-sets-in-a-data-step/m-p/639958#M190457</guid>
      <dc:creator>Cruise</dc:creator>
      <dc:date>2020-04-15T04:39:43Z</dc:date>
    </item>
    <item>
      <title>Re: Merging multiple data sets in a data step</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merging-multiple-data-sets-in-a-data-step/m-p/639969#M190464</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;1) did you checked the log for "NOTE: MERGE statement has more than one data set with repeats of BY values"&lt;/P&gt;
&lt;P&gt;2) try to run:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
  select id1name, idname, count(1) as i
  from w_four
  group by id1name, idname
  having count(1) &amp;gt; 1
  ;
quit;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;to find out which observations are possibly duplicated.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;All the best&lt;/P&gt;
&lt;P&gt;Bart&lt;/P&gt;</description>
      <pubDate>Wed, 15 Apr 2020 05:37:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merging-multiple-data-sets-in-a-data-step/m-p/639969#M190464</guid>
      <dc:creator>yabwon</dc:creator>
      <dc:date>2020-04-15T05:37:24Z</dc:date>
    </item>
    <item>
      <title>Re: Merging multiple data sets in a data step</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merging-multiple-data-sets-in-a-data-step/m-p/639982#M190477</link>
      <description>&lt;P&gt;Always add this kind of check in your code when you expect unique keys (unless the data had previously been vetted):&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;if not(first.IDNAME and last.IDNAME) then putlog 'WARNING: duplicate keys in merge';&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also, always check the log. Always.&lt;/P&gt;</description>
      <pubDate>Wed, 15 Apr 2020 06:26:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merging-multiple-data-sets-in-a-data-step/m-p/639982#M190477</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-04-15T06:26:10Z</dc:date>
    </item>
    <item>
      <title>Re: Merging multiple data sets in a data step</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merging-multiple-data-sets-in-a-data-step/m-p/639987#M190482</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/16961"&gt;@ChrisNZ&lt;/a&gt;&amp;nbsp;wrote:
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also, always check the log. Always.&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;aka Maxim 2.&lt;/P&gt;</description>
      <pubDate>Wed, 15 Apr 2020 06:45:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merging-multiple-data-sets-in-a-data-step/m-p/639987#M190482</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2020-04-15T06:45:27Z</dc:date>
    </item>
    <item>
      <title>Re: Merging multiple data sets in a data step</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merging-multiple-data-sets-in-a-data-step/m-p/640086#M190556</link>
      <description>&lt;P&gt;Thanks all. My log was:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;NOTE: There were 228 observations read from the data set WORK.OUTCOME.
NOTE: There were 250 observations read from the data set WORK.MAIN_ATTR1.
NOTE: There were 261 observations read from the data set WORK.KOREAN_FOREIGN1.
NOTE: There were 252 observations read from the data set WORK.KOREAN_MIGRATION1.
NOTE: The data set WORK.W_FOUR has 262 observations and 32 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.01 secondS&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;But after I added Chris's warning note the log turned into.&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
WARNING: duplicate keys in merge
NOTE: There were 228 observations read from the data set WORK.OUTCOME.
NOTE: There were 250 observations read from the data set WORK.MAIN_ATTR1.
NOTE: There were 261 observations read from the data set WORK.KOREAN_FOREIGN1.
NOTE: There were 252 observations read from the data set WORK.KOREAN_MIGRATION1.
NOTE: The data set WORK.W_FOUR has 262 observations and 32 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Yabwon's sql returned following.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;CODE class=" language-sas"&gt;&lt;/CODE&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 15 Apr 2020 12:30:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merging-multiple-data-sets-in-a-data-step/m-p/640086#M190556</guid>
      <dc:creator>Cruise</dc:creator>
      <dc:date>2020-04-15T12:30:45Z</dc:date>
    </item>
    <item>
      <title>Re: Merging multiple data sets in a data step</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merging-multiple-data-sets-in-a-data-step/m-p/640091#M190559</link>
      <description>&lt;P&gt;With your pointers, the problem is solved. Hurray.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;First I investigated each data set using proc sql checker suggested by Yabwon and found out 'foreign' and 'migration' datasets had duplicate keys. Mostly migration data set. Deduplicated them in data step. Proc sort nodup did not deduplicate. But I had no time to understand why even though proc sort nodup always did a job in the past for me.&lt;/P&gt;
&lt;P&gt;But this article helped to be assured.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://www.lexjansen.com/wuss/1998/WUSS98097.pdf" target="_blank"&gt;https://www.lexjansen.com/wuss/1998/WUSS98097.pdf&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;And deduplicated datasets in data step and tried merge which output data with N=250 the dataset that I wanted. Awesome.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also, without addition of&amp;nbsp;&lt;CODE class=" language-sas"&gt;putlog 'WARNING: duplicate keys in merge'&lt;/CODE&gt;&amp;nbsp;that Chris suggested the log wouldn't print out the warning on the duplicate keys.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks all. I greatly greatly appreciate your time and insights. Helped a lot.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data w_four; /*250*/ 
merge 
outcome          (in=o) /*228*/
main_attr1(in=a) /*250*/
Korean_foreign2  (in=f) /*260*/
Korean_migration2(in=m); /*238*/
by id1name idname;
outcome=o;
map_attr=a;
foreign=f;
migration=m;
if a; /*250*/ 
if not(first.IDNAME and last.IDNAME) then putlog 'WARNING: duplicate keys in merge';
run; &lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 15 Apr 2020 13:00:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merging-multiple-data-sets-in-a-data-step/m-p/640091#M190559</guid>
      <dc:creator>Cruise</dc:creator>
      <dc:date>2020-04-15T13:00:48Z</dc:date>
    </item>
    <item>
      <title>Re: Merging multiple data sets in a data step</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merging-multiple-data-sets-in-a-data-step/m-p/640094#M190562</link>
      <description>&lt;P&gt;As long as only one of the incoming datasets has duplicate keys, SAS will not issue a NOTE, as a 1-to-many join is valid (and often done) in a data step merge. Only when two or more datasets have duplicates, SAS will issue the NOTE, as the outcome may not be what the programmer desired.&lt;/P&gt;</description>
      <pubDate>Wed, 15 Apr 2020 13:02:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merging-multiple-data-sets-in-a-data-step/m-p/640094#M190562</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2020-04-15T13:02:42Z</dc:date>
    </item>
    <item>
      <title>Re: Merging multiple data sets in a data step</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merging-multiple-data-sets-in-a-data-step/m-p/640095#M190563</link>
      <description>But in this case, two of the datasets had duplicate keys: foreign had one and migration had several. Mostly migration.&lt;BR /&gt;</description>
      <pubDate>Wed, 15 Apr 2020 13:05:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merging-multiple-data-sets-in-a-data-step/m-p/640095#M190563</guid>
      <dc:creator>Cruise</dc:creator>
      <dc:date>2020-04-15T13:05:43Z</dc:date>
    </item>
    <item>
      <title>Re: Merging multiple data sets in a data step</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merging-multiple-data-sets-in-a-data-step/m-p/640101#M190565</link>
      <description>&lt;P&gt;Note that the NODUP option on proc sort will only eliminate duplicate observations, not observations that have duplicate keys.&amp;nbsp; Note that it also will only eliminate duplicate observations if they happen to be right next to each other (there isn't a different observation between them).&lt;/P&gt;
&lt;P&gt;You can use NODUPKEYS option to make sure there is only one observation per set of key (by vars) values.&lt;/P&gt;</description>
      <pubDate>Wed, 15 Apr 2020 13:08:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merging-multiple-data-sets-in-a-data-step/m-p/640101#M190565</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2020-04-15T13:08:28Z</dc:date>
    </item>
    <item>
      <title>Re: Merging multiple data sets in a data step</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merging-multiple-data-sets-in-a-data-step/m-p/640103#M190567</link>
      <description>&lt;P&gt;As long as you do not have duplicates on the same key, SAS will not detect that, as for the individual keys it is still one-to-many. Run these two codes in comparison:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data one;
input key;
datalines;
1
2
2
3
;

data two;
input key;
datalines;
1
1
2
3
;

data mgd;
merge
  one
  two
;
by key;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;vs.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data one;
input key;
datalines;
1
2
2
3
;

data two;
input key;
datalines;
1
2
2
3
;

data mgd;
merge
  one
  two
;
by key;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The second code will issue the NOTE, as there's a duplicate for 2 in both datasets.&lt;/P&gt;</description>
      <pubDate>Wed, 15 Apr 2020 13:11:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merging-multiple-data-sets-in-a-data-step/m-p/640103#M190567</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2020-04-15T13:11:00Z</dc:date>
    </item>
    <item>
      <title>Re: Merging multiple data sets in a data step</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merging-multiple-data-sets-in-a-data-step/m-p/640104#M190568</link>
      <description>This was the case in the example here 'Note that the NODUP option on proc sort will only eliminate duplicate observations, not observations that have duplicate keys'. NODUP and NODUPKEYS were different options? I thought the earlier was the abbreviation of NODUPKEYS. My ignorance then.</description>
      <pubDate>Wed, 15 Apr 2020 13:11:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merging-multiple-data-sets-in-a-data-step/m-p/640104#M190568</guid>
      <dc:creator>Cruise</dc:creator>
      <dc:date>2020-04-15T13:11:56Z</dc:date>
    </item>
    <item>
      <title>Re: Merging multiple data sets in a data step</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merging-multiple-data-sets-in-a-data-step/m-p/640105#M190569</link>
      <description>Ah, got it. I can clearly see now. Thank you!</description>
      <pubDate>Wed, 15 Apr 2020 13:14:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merging-multiple-data-sets-in-a-data-step/m-p/640105#M190569</guid>
      <dc:creator>Cruise</dc:creator>
      <dc:date>2020-04-15T13:14:01Z</dc:date>
    </item>
  </channel>
</rss>

