<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Data file size increased by about 3 times after merging in SAS Data Management</title>
    <link>https://communities.sas.com/t5/SAS-Data-Management/Data-file-size-increased-by-about-3-times-after-merging/m-p/248772#M6605</link>
    <description>&lt;P&gt;Do any of your original data sets have compression set to yes?&lt;/P&gt;</description>
    <pubDate>Mon, 08 Feb 2016 23:21:09 GMT</pubDate>
    <dc:creator>Reeza</dc:creator>
    <dc:date>2016-02-08T23:21:09Z</dc:date>
    <item>
      <title>Data file size increased by about 3 times after merging</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Data-file-size-increased-by-about-3-times-after-merging/m-p/248763#M6602</link>
      <description>&lt;P&gt;File1 is about 3.5 GB with about 7.9 million records. File2 is about 1.1 GB with the same number of records. Variable ID is an unique identifier giving a 1:1 merge. Fileout (generated from the code below) has the same number of records as File1 and File2, but it is 14.6 GB. I checked with PROC CONTENTS that the corresponding variable Type, Len, Format and Informat are all the same in Fileout, so I am puzzled as to why the file size increased by about 3 times that of (File1 size) + (File2 size). How do I check why this is so?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;DATA fileout;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;MERGE file1 file2;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;BY ID;&lt;BR /&gt;RUN;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 08 Feb 2016 23:05:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Data-file-size-increased-by-about-3-times-after-merging/m-p/248763#M6602</guid>
      <dc:creator>Usagi</dc:creator>
      <dc:date>2016-02-08T23:05:48Z</dc:date>
    </item>
    <item>
      <title>Re: Data file size increased by about 3 times after merging</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Data-file-size-increased-by-about-3-times-after-merging/m-p/248769#M6603</link>
      <description>&lt;P&gt;How many variables are involved? Variables not both sets are in the resultant set.&lt;/P&gt;
&lt;P&gt;If your ID variable is duplicated in one set then you get a many-to-one merge resulting in the data from one set repeated for each duplicate of the ID value. Meaning that values in one set may well be repeated useing more storage space than you think.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Take a look at the results from this program, specifically the values of variable k.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data one;
   Do id= 1 to 3;
      do j= 1 to 5;
      output;
      end;
   end;
run;

data two;
   do id = 2 to 5;
      do k = 2 to 3;
      output;
      end;
   end;
run;

data merged;
   merge one two;
   by id;
run;

proc print data=merged;
   var id j k;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 08 Feb 2016 23:14:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Data-file-size-increased-by-about-3-times-after-merging/m-p/248769#M6603</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2016-02-08T23:14:25Z</dc:date>
    </item>
    <item>
      <title>Re: Data file size increased by about 3 times after merging</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Data-file-size-increased-by-about-3-times-after-merging/m-p/248771#M6604</link>
      <description>&lt;P&gt;Was file1 or file2 compressed?&lt;/P&gt;</description>
      <pubDate>Mon, 08 Feb 2016 23:20:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Data-file-size-increased-by-about-3-times-after-merging/m-p/248771#M6604</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2016-02-08T23:20:59Z</dc:date>
    </item>
    <item>
      <title>Re: Data file size increased by about 3 times after merging</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Data-file-size-increased-by-about-3-times-after-merging/m-p/248772#M6605</link>
      <description>&lt;P&gt;Do any of your original data sets have compression set to yes?&lt;/P&gt;</description>
      <pubDate>Mon, 08 Feb 2016 23:21:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Data-file-size-increased-by-about-3-times-after-merging/m-p/248772#M6605</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2016-02-08T23:21:09Z</dc:date>
    </item>
    <item>
      <title>Re: Data file size increased by about 3 times after merging</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Data-file-size-increased-by-about-3-times-after-merging/m-p/248774#M6606</link>
      <description>&lt;P&gt;That is my suspicion. Looking in more detail at the first table from PROC CONTENTS, Fileout had "NO" for "COMPRESSED", whereas File1 and File2 had "CHAR" for "COMPRESSED".&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;How do I compress a file? Would compression slow data processing for later on?&lt;/P&gt;</description>
      <pubDate>Mon, 08 Feb 2016 23:32:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Data-file-size-increased-by-about-3-times-after-merging/m-p/248774#M6606</guid>
      <dc:creator>Usagi</dc:creator>
      <dc:date>2016-02-08T23:32:27Z</dc:date>
    </item>
    <item>
      <title>Re: Data file size increased by about 3 times after merging</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Data-file-size-increased-by-about-3-times-after-merging/m-p/248775#M6607</link>
      <description>&lt;P&gt;Looking in more detail at the first table from PROC CONTENTS, Fileout had "NO" for "COMPRESSED", whereas File1 and File2 had "CHAR" for "COMPRESSED".&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;How do I compress a file? Would compression slow data processing for later on?&lt;/P&gt;</description>
      <pubDate>Mon, 08 Feb 2016 23:33:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Data-file-size-increased-by-about-3-times-after-merging/m-p/248775#M6607</guid>
      <dc:creator>Usagi</dc:creator>
      <dc:date>2016-02-08T23:33:40Z</dc:date>
    </item>
    <item>
      <title>Re: Data file size increased by about 3 times after merging</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Data-file-size-increased-by-about-3-times-after-merging/m-p/248776#M6608</link>
      <description>&lt;P&gt;No, there is no doubling up because ID is an unique identifier. Number of records or rows is the same in all 3 files. The number of variables in file1 is 184, in file2 is 66 and in fileout 249 (=184+66-1).&lt;/P&gt;</description>
      <pubDate>Mon, 08 Feb 2016 23:48:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Data-file-size-increased-by-about-3-times-after-merging/m-p/248776#M6608</guid>
      <dc:creator>Usagi</dc:creator>
      <dc:date>2016-02-08T23:48:35Z</dc:date>
    </item>
    <item>
      <title>Re: Data file size increased by about 3 times after merging</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Data-file-size-increased-by-about-3-times-after-merging/m-p/248793#M6609</link>
      <description>&lt;P&gt;Yes it does take a bit more processing.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;DATA fileout(COMPRESS=CHAR);
&amp;nbsp;&amp;nbsp; &amp;nbsp;MERGE file1 file2;
&amp;nbsp;&amp;nbsp; &amp;nbsp;BY ID;
RUN;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;EM&gt;"Advantages of compressing a file include reduced storage requirements for the file and fewer &lt;SPAN class="xis-nobr"&gt;I/O&lt;/SPAN&gt; operations to read or write to the data during processing. However, more CPU resources are required to read a compressed file (because of the overhead of uncompressing each observation)."&lt;/EM&gt;&amp;nbsp; - SAS doc.&lt;/P&gt;</description>
      <pubDate>Tue, 09 Feb 2016 02:11:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Data-file-size-increased-by-about-3-times-after-merging/m-p/248793#M6609</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2016-02-09T02:11:57Z</dc:date>
    </item>
  </channel>
</rss>

