<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: SAS Compress and Reuse - Risks? in SAS Data Management</title>
    <link>https://communities.sas.com/t5/SAS-Data-Management/SAS-Compress-and-Reuse-Risks/m-p/409636#M12531</link>
    <description>&lt;P&gt;I did but I didn't see a performance improvement. I may need to play around with indexing a different variable or multiple variables.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My main concern is finding out later that there is a downside to compressing that I'm not aware of now.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 01 Nov 2017 20:47:10 GMT</pubDate>
    <dc:creator>tedway</dc:creator>
    <dc:date>2017-11-01T20:47:10Z</dc:date>
    <item>
      <title>SAS Compress and Reuse - Risks?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/SAS-Compress-and-Reuse-Risks/m-p/409625#M12529</link>
      <description>&lt;P&gt;I have a dataset on our network drive that is&amp;nbsp;37 GB in size. Doing a simple count of one variable using proc sql can take over an hour,&amp;nbsp;which makes the dataset unusable.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I read about compress and gave it a try.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;data&amp;nbsp;new (compress=Yes reuse=yes);
set old;
run;&lt;/PRE&gt;&lt;P&gt;The dataset is now 1.5 GB and the same count query takes two minutes (still slow but way better than before).&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The only negatives I've been about to find in regards to using compress are that it can make the file slower to access in some cases and apparently you can't address observations by observation number.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is there any other reason why I shouldn't use these settings by default with larger datasets going forward?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 01 Nov 2017 20:18:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/SAS-Compress-and-Reuse-Risks/m-p/409625#M12529</guid>
      <dc:creator>tedway</dc:creator>
      <dc:date>2017-11-01T20:18:59Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Compress and Reuse - Risks?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/SAS-Compress-and-Reuse-Risks/m-p/409631#M12530</link>
      <description>&lt;P&gt;Have you added indexes to your data set?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 01 Nov 2017 20:42:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/SAS-Compress-and-Reuse-Risks/m-p/409631#M12530</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-11-01T20:42:26Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Compress and Reuse - Risks?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/SAS-Compress-and-Reuse-Risks/m-p/409636#M12531</link>
      <description>&lt;P&gt;I did but I didn't see a performance improvement. I may need to play around with indexing a different variable or multiple variables.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My main concern is finding out later that there is a downside to compressing that I'm not aware of now.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 01 Nov 2017 20:47:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/SAS-Compress-and-Reuse-Risks/m-p/409636#M12531</guid>
      <dc:creator>tedway</dc:creator>
      <dc:date>2017-11-01T20:47:10Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Compress and Reuse - Risks?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/SAS-Compress-and-Reuse-Risks/m-p/409647#M12533</link>
      <description>&lt;P&gt;First of all, stop working on network drives. Especially when your network is so slow. Any modern storage reads a GB in less than 10 seconds.&lt;/P&gt;
&lt;P&gt;Other than reading by obs number, compressed datasets are unproblematic. Just keep in mind that sometimes using compress=yes can actually increase the size, so keep an eye on your logs.&lt;/P&gt;</description>
      <pubDate>Wed, 01 Nov 2017 21:13:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/SAS-Compress-and-Reuse-Risks/m-p/409647#M12533</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2017-11-01T21:13:38Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Compress and Reuse - Risks?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/SAS-Compress-and-Reuse-Risks/m-p/409723#M12535</link>
      <description>Indexes must be designed with the most common queries in mind. And for Base SAS data sets a general rule is they only perform when they return less than 10% of the observations.&lt;BR /&gt;&lt;BR /&gt;Reuse makes sense only when you delete a lot of observations in place.&lt;BR /&gt;&lt;BR /&gt;The main downside with compress is that it requires more CPU  cycles in both read and write operations. But in your case it seems that the slow IO pays off the compression.&lt;BR /&gt;&lt;BR /&gt;Options msglevel=i;&lt;BR /&gt;gives feedback in the log about both compression and index usage.</description>
      <pubDate>Thu, 02 Nov 2017 07:47:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/SAS-Compress-and-Reuse-Risks/m-p/409723#M12535</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2017-11-02T07:47:37Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Compress and Reuse - Risks?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/SAS-Compress-and-Reuse-Risks/m-p/409806#M12536</link>
      <description>&lt;P&gt;An addendum: almost all our production datasets are stored with compress=yes. The cost in CPU cycles for uncompressing a RLE compressed dataset is negligible compared to the savings in space and therefore the savings in I/O. Only datasets where the compression rate is too small or non-existent (actual increase in physical size) are excluded.&lt;/P&gt;</description>
      <pubDate>Thu, 02 Nov 2017 12:50:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/SAS-Compress-and-Reuse-Risks/m-p/409806#M12536</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2017-11-02T12:50:10Z</dc:date>
    </item>
  </channel>
</rss>

