<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Thoughts on using SAS compression in SAS Data Management</title>
    <link>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614235#M18610</link>
    <description>&lt;P&gt;To try out a few ideas I jiffied up the following program, which anyone is welcome to try:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%let RecordCount = 10000; /* How many records do we want? */

data Strings(drop=_:);
	length TestString $32767;
	_Alpha = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; /* Pool of letters to pick one from */
	call streaminit(1912281316);

	do RecordKey = 1 to &amp;amp;RecordCount.; /* Generate the records */
		call missing(TestString);
		_Len = min(int((rand('chisq', 2) * 10) ** 2), 31993); /* Create a number between 0 and 31993, greatly skewed to the right */

		do _Words = 1 to _Len by 6; /* Generate five character words separated by a space to fill up the length */
			do _LetterPos = _Words to _Words + 4;
				_RandLetter = ceil(rand('uniform') * 26); /* Get a random number from 1 to 26... */
				substr(TestString, _LetterPos, 1) = substr(_Alpha, _RandLetter, 1); /* ...and put that letter into the string */
			end;
		end;

		output;
	end;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;It generates some number (10000) records. Each record has between 0 and 32K bytes of five random characters followed by a space, etc. This is a pretty good representation of the data I'm working with. The lengths are massively skewed to the right, with many, many short records and only a few long ones, but because of the long ones my variable has to be $32767. I've used streaminit, so you should get the same results as me.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'm copying this dataset to two libraries, one compressed, one not compressed. Here are my results (read off of windows explorer):&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;TABLE width="440"&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD width="110"&gt;Records&lt;/TD&gt;
&lt;TD width="110"&gt;Compressed MB&lt;/TD&gt;
&lt;TD width="110"&gt;Uncompressed MB&lt;/TD&gt;
&lt;TD width="110"&gt;Compression ratio&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;1,000&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1.536&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 32.832&lt;/TD&gt;
&lt;TD&gt;95.32%&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;10,000&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 9.472&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 326.040&lt;/TD&gt;
&lt;TD&gt;97.09%&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;100,000&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 87.552&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3,257.436&lt;/TD&gt;
&lt;TD&gt;97.31%&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I assume that because the variables are strings of random data, all of the benefit is coming from compressing out the spaces at the end of the records. Clearly, I'm gaining great benefit from the compression!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Tom&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Sat, 28 Dec 2019 23:49:46 GMT</pubDate>
    <dc:creator>TomKari</dc:creator>
    <dc:date>2019-12-28T23:49:46Z</dc:date>
    <item>
      <title>Thoughts on using SAS compression</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614088#M18588</link>
      <description>&lt;P&gt;Hello SAS community. In my SAS shop we have many users and can work with large datasets. We use SAS 9.4 on a Linux OS. Every so often we are reminded to delete datasets no longer needed or to consider compressing datasets using SAS compression.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As I am sure many of you already know SAS currently offers two compression algorithms, character (using the RLE algorithm ) or binary (using the RDC algorithm). My understanding is character compression generally works better when the data set is mostly character data. The binary compression generally works better when the data set has mostly numeric data.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I would love to hear from the community how you determine if SAS compression should be used? Is there an official policy? Or does each user decide for themselves if they want to compress their data sets? If you do use SAS compression how are you deciding which option (char or binary) to use? Are there any good "rules of thumb" that can be applied?&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I understand the disadvantages of SAS compression can be increased CPU usage and actually increasing the size of the data set. Is there any other disadvantages you have come across using SAS compression?&amp;nbsp; Any other considerations that should be taken into account before compressing data sets?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 27 Dec 2019 15:07:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614088#M18588</guid>
      <dc:creator>supp</dc:creator>
      <dc:date>2019-12-27T15:07:24Z</dc:date>
    </item>
    <item>
      <title>Re: Thoughts on using SAS compression</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614101#M18589</link>
      <description>&lt;P&gt;I've found that normalizing and compressing the datasets saves more spaces than compression alone.&amp;nbsp; Getting rid of redundant data really pays off.&lt;/P&gt;</description>
      <pubDate>Fri, 27 Dec 2019 15:55:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614101#M18589</guid>
      <dc:creator>tomrvincent</dc:creator>
      <dc:date>2019-12-27T15:55:07Z</dc:date>
    </item>
    <item>
      <title>Re: Thoughts on using SAS compression</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614104#M18590</link>
      <description>&lt;P&gt;Thanks &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/144199"&gt;@tomrvincent&lt;/a&gt;&amp;nbsp;, that makes a lot of sense. Are you using one of the SAS compression options when you compress your datasets? If so, how do you decide which one to use?&lt;/P&gt;</description>
      <pubDate>Fri, 27 Dec 2019 16:01:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614104#M18590</guid>
      <dc:creator>supp</dc:creator>
      <dc:date>2019-12-27T16:01:42Z</dc:date>
    </item>
    <item>
      <title>Re: Thoughts on using SAS compression</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614107#M18591</link>
      <description>&lt;P&gt;Hi, supp&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;After never having used compression in 30 years, I recently bumped across a use case where it worked perfectly. This is a text analysis project, with a huge number of text strings, of which the length can vary wildly (most short, but a few very long).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I can't remember the exact number, but using the RLE algorithm reduced the size of the datasets by about 90%. I didn't notice any difference in processing time.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Tom&lt;/P&gt;</description>
      <pubDate>Fri, 27 Dec 2019 16:14:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614107#M18591</guid>
      <dc:creator>TomKari</dc:creator>
      <dc:date>2019-12-27T16:14:32Z</dc:date>
    </item>
    <item>
      <title>Re: Thoughts on using SAS compression</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614109#M18592</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/15142"&gt;@TomKari&lt;/a&gt;&amp;nbsp;, that is a really good result!&amp;nbsp; Being your data is mostly (or all) character data I am guessing RLE give you the best result.&amp;nbsp;Out of curiosity did you also try binary compression?&lt;/P&gt;</description>
      <pubDate>Fri, 27 Dec 2019 16:19:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614109#M18592</guid>
      <dc:creator>supp</dc:creator>
      <dc:date>2019-12-27T16:19:15Z</dc:date>
    </item>
    <item>
      <title>Re: Thoughts on using SAS compression</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614112#M18593</link>
      <description>What I do is compress each resulting normalized dataset (dimension/fact table, if you will) if doing so actually saves space (sometimes it doesn't).  I haven't bothered to try the different options just because I've already saved thru normalization.  The rest is gravy. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;</description>
      <pubDate>Fri, 27 Dec 2019 16:35:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614112#M18593</guid>
      <dc:creator>tomrvincent</dc:creator>
      <dc:date>2019-12-27T16:35:40Z</dc:date>
    </item>
    <item>
      <title>Re: Thoughts on using SAS compression</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614116#M18594</link>
      <description>I agree that you are describing a good use case.   The downside of sas compressed datasets is that you are restricted to sequential data processing.   I.e. no indexes, no SET ... POINT=. However normalization without  compression might save substantial space and still support direct access.</description>
      <pubDate>Fri, 27 Dec 2019 16:52:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614116#M18594</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2019-12-27T16:52:22Z</dc:date>
    </item>
    <item>
      <title>Re: Thoughts on using SAS compression</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614118#M18595</link>
      <description>&lt;P&gt;No, given that the expected compression benefits would come from the long character fields, I only used the RLE algorithm.&lt;/P&gt;</description>
      <pubDate>Fri, 27 Dec 2019 16:54:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614118#M18595</guid>
      <dc:creator>TomKari</dc:creator>
      <dc:date>2019-12-27T16:54:07Z</dc:date>
    </item>
    <item>
      <title>Re: Thoughts on using SAS compression</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614125#M18596</link>
      <description>&lt;P&gt;Almost all our production datasets are compressed with character (RLE). When writing a new batch job, I use compress=yes and look what the log tells me. If I get less than 10% compression rate, I omit compression.&lt;/P&gt;
&lt;P&gt;Not only does compression save disk space, it also speeds up the ETL processes which are (almost) always I/O bound and benefit from having to move less data from and to storage. Care must be taken when processing datasets with a high compression rate. Sorting can overload the SASUTIL location, so using tagsort may be necessary. And don't forget to also use compression in WORK when dealing with such datasets.&lt;/P&gt;</description>
      <pubDate>Fri, 27 Dec 2019 17:47:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614125#M18596</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2019-12-27T17:47:06Z</dc:date>
    </item>
    <item>
      <title>Re: Thoughts on using SAS compression</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614128#M18597</link>
      <description>&lt;P&gt;Great points&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/11562"&gt;@Kurt_Bremser&lt;/a&gt;&amp;nbsp;! Is there ever a scenario you would use binary compression instead of char? Or why do you prefer to use char compression?&lt;/P&gt;</description>
      <pubDate>Fri, 27 Dec 2019 18:01:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614128#M18597</guid>
      <dc:creator>supp</dc:creator>
      <dc:date>2019-12-27T18:01:09Z</dc:date>
    </item>
    <item>
      <title>Re: Thoughts on using SAS compression</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614148#M18598</link>
      <description>&lt;P&gt;Where I work we decided to set compression to BINARY as a default option when starting all SAS sessions via SAS AUTOEXEC programs. We have been doing this for both SAS 9.3 and 9.4 over many years. This has worked well for us by conserving disk space and also ensuring we almost never run out of WORK space unless someone does a really silly query.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;We consume a lot of data from SQL Server and the BINARY setting means we don't need to worry about shortening long character variables as they are all compressed. Uncompressed tables can be often around 5 times larger than uncompressed ones! For the type of datasets we have BINARY provides better compression than CHARACTER.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I also know of other SAS sites that compress their data by default.&lt;/P&gt;</description>
      <pubDate>Fri, 27 Dec 2019 21:55:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614148#M18598</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2019-12-27T21:55:38Z</dc:date>
    </item>
    <item>
      <title>Re: Thoughts on using SAS compression</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614150#M18599</link>
      <description>&lt;P&gt;Thanks&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13976"&gt;@SASKiwi&lt;/a&gt;&amp;nbsp;. You stated that for your type of datasets binary is the better option. Is this because you have a lot of numeric data in your datasets? Or how did you determine binary was the better option?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/31461"&gt;@mkeintz&lt;/a&gt;&amp;nbsp;mentioned a compressed dataset can't take advantage of an index, do you ever find this to be problematic?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 27 Dec 2019 22:18:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614150#M18599</guid>
      <dc:creator>supp</dc:creator>
      <dc:date>2019-12-27T22:18:55Z</dc:date>
    </item>
    <item>
      <title>Re: Thoughts on using SAS compression</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614163#M18600</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/18331"&gt;@supp&lt;/a&gt;&amp;nbsp; - I did some testing on some of our typical datasets and found BINARY gave better compression of around 10 percent. I also found that SAS jobs processing large datasets ran faster because of the reduced IO. While elapsed time was less CPU time increased, but only by a few percent. IMO universal compression is definitely worth considering if you process a lot of medium to large datasets.&lt;/P&gt;</description>
      <pubDate>Fri, 27 Dec 2019 22:58:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614163#M18600</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2019-12-27T22:58:53Z</dc:date>
    </item>
    <item>
      <title>Re: Thoughts on using SAS compression</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614182#M18601</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/31461"&gt;@mkeintz&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;I agree that you are describing a good use case. The downside of sas compressed datasets is that you are restricted to sequential data processing. I.e. no indexes, no SET ... POINT=. However normalization without compression might save substantial space and still support direct access.&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/31461"&gt;@mkeintz&lt;/a&gt;&amp;nbsp;&amp;nbsp;Are you sure? That's not what the SAS log tells me.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;28         options ps=max msglevel=i;
29         data have(compress=yes index=(indvar));
30           length charvar1000 $1000;
31           call missing(charvar1000);
32           do i=1 to 100000;
33             if mod(i,10)=1 then indvar+1;
34             output;
35           end;
36           stop;
37         run;

NOTE: The data set WORK.HAVE has 100000 observations and 3 variables.
INFO: Multiple concurrent threads will be used to create the index.
NOTE: Simple index indvar has been defined.
NOTE: Compressing data set WORK.HAVE decreased size by 96.23 percent. 
      Compressed is 59 pages; un-compressed would require 1565 pages.
NOTE: DATA statement used (Total process time):
      real time           0.10 seconds
      cpu time            0.12 seconds
      

38         
39         data want1;
40           do point=1 to nobs by 1000;
41             set have point=point nobs=nobs;
42             output;
43           end;
44           stop;
45         run;

NOTE: The data set WORK.WANT1 has 100 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
      

46         
47         proc sql;
48           create table want2 as
49           select *
50           from have
51           where indvar=10
52           ;
INFO: Index indvar selected for WHERE clause optimization.
NOTE: Table WORK.WANT2 created, with 10 rows and 3 columns.

53         quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 28 Dec 2019 02:20:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614182#M18601</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2019-12-28T02:20:54Z</dc:date>
    </item>
    <item>
      <title>Re: Thoughts on using SAS compression</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614184#M18603</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/265744"&gt;@Patrik&lt;/a&gt;, thanks for sharing your findings! That is very interesting. Do you mind sharing what version of SAS and Operating System were used for your tests?&lt;/P&gt;</description>
      <pubDate>Sat, 28 Dec 2019 02:33:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614184#M18603</guid>
      <dc:creator>supp</dc:creator>
      <dc:date>2019-12-28T02:33:20Z</dc:date>
    </item>
    <item>
      <title>Re: Thoughts on using SAS compression</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614186#M18604</link>
      <description>&lt;PRE&gt;AUTOMATIC SYSVLONG 9.04.01M5P091317
AUTOMATIC SYSHOSTINFOLONG Linux LIN X64 3.10.0-862.14.4.el7.x86_64 #1 SMP Wed Sep 26 15:12:11 UTC 2018 x86_64 CentOS Linux release 7.5.1804 (Core) &lt;/PRE&gt;</description>
      <pubDate>Sat, 28 Dec 2019 02:37:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614186#M18604</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2019-12-28T02:37:08Z</dc:date>
    </item>
    <item>
      <title>Re: Thoughts on using SAS compression</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614187#M18605</link>
      <description>&lt;P&gt;Before considering compression, think about whether you have good rules in place for the lengths used within the SAS data set.&amp;nbsp; Many times, I have seen SAS data sets use $200 characters for a field that only needs a few characters ... usually because the database definition for the field was varchar200.&amp;nbsp; Those who set up the field in the data base took advantage of the fact that varchar automatically adjusts to the number of characters needed.&amp;nbsp; But when extracted, SAS uses the full length of 200 every time.&amp;nbsp; If you have processes to examine fields and the length that they actually require, you will be a step ahead of the game whether or not you add compression afterwards.&lt;/P&gt;</description>
      <pubDate>Sat, 28 Dec 2019 03:07:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614187#M18605</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2019-12-28T03:07:21Z</dc:date>
    </item>
    <item>
      <title>Re: Thoughts on using SAS compression</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614189#M18606</link>
      <description>&lt;P&gt;Well, I'm definitely not sure any more.&amp;nbsp; My understanding was that some compressions generated irregular observation lengths, making the implementation of direct access exceedingly tricky, and not implemented by SAS.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The SAS index records the index value and each RID (record id) associated with the index value.&amp;nbsp; When every record is the same size, as would be the case with uncompressed SAS data sets, knowing the RID lets you know exactly which physical page (which are also of constant size) of data contains the record(s) of interests, which in turn means you can directly access only the pages needed. &amp;nbsp; I'm not clear on how knowing the RID for compressed data sets can let you know which pages to read, … unless the compression keeps the observations uniform in length.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks for the example.&amp;nbsp; I'll have to re-map my understanding of SAS compression.&lt;/P&gt;</description>
      <pubDate>Sat, 28 Dec 2019 04:20:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614189#M18606</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2019-12-28T04:20:15Z</dc:date>
    </item>
    <item>
      <title>Re: Thoughts on using SAS compression</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614198#M18607</link>
      <description>&lt;P&gt;You "only" have to switch from observation &lt;EM&gt;number&lt;/EM&gt; to observation &lt;EM&gt;start position&lt;/EM&gt; to address the observation from an index. Given 64-bit processing, this is not so hard to do.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;PS it might be that binary compression puts observation boundaries within bytes, and then this would not work anymore. Maybe &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/12447"&gt;@Patrick&lt;/a&gt;&amp;nbsp;could rerun his experiment with compression=binary to clear this up?&lt;/P&gt;</description>
      <pubDate>Sat, 28 Dec 2019 08:20:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614198#M18607</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2019-12-28T08:20:46Z</dc:date>
    </item>
    <item>
      <title>Re: Thoughts on using SAS compression</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614226#M18608</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/18331"&gt;@supp&lt;/a&gt; - Regarding limitations with compression and indexes. As stated elsewhere, we universally compress with the BINARY option and never had any problems with indexes. I'm pretty sure we don't use direct access (POINT=), but if we did you can just add COMPRESS = NO to the DATA steps using it to avoid any problems.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Please note that universal compression means WORK libraries as well so the benefits of compressing apply to controlling WORK space as well.&lt;/P&gt;</description>
      <pubDate>Sat, 28 Dec 2019 20:21:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Thoughts-on-using-SAS-compression/m-p/614226#M18608</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2019-12-28T20:21:41Z</dc:date>
    </item>
  </channel>
</rss>

