<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Fastest way to calculate the max of a variable in a large dataset in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247745#M268646</link>
    <description>&lt;P&gt;This is very clear thanks.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In fact I didn't compress it yet.&lt;/P&gt;
&lt;P&gt;I use this table to do joins. That's why I sorted it and created an index on it.&lt;/P&gt;
&lt;P&gt;I don't know if the index still work if I compress it. In addition, I'm not sure if the dataset should be uncompressed each time I work on it, so it takes more time...&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I will compress it and tell you the result !&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;MK&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 03 Feb 2016 14:15:10 GMT</pubDate>
    <dc:creator>majdi_ka</dc:creator>
    <dc:date>2016-02-03T14:15:10Z</dc:date>
    <item>
      <title>Fastest way to calculate the max of a variable in a large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247674#M268632</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have a dataset with tens of millions of lines and I wand to calculate the max of a column.&lt;/P&gt;
&lt;P&gt;calculating this max took more than 10 minutes with proc sql and proc means.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Is there a fastest way to do this (Arrays, hash objects, iml)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;thanks in advance&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;MK&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 08:58:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247674#M268632</guid>
      <dc:creator>majdi_ka</dc:creator>
      <dc:date>2016-02-03T08:58:55Z</dc:date>
    </item>
    <item>
      <title>Re: Fastest way to calculate the max of a variable in a large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247676#M268633</link>
      <description>&lt;P&gt;If you need to find different max values for groups, there are several methods that may differ in performance.&lt;/P&gt;
&lt;P&gt;But if you just need the overall max of a column, all thats SAS does is one sequential scan through the data set, which is only limited by the I/O throughput of your computer.&lt;/P&gt;
&lt;P&gt;No matter if you do&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
select max(columna) from have;
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;or&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
set have (end=done);
retain maxa 0;
maxa = max(columna,maxa);
if done then output;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;or max with proc means.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Just do a quick calculation of (physical size of the dataset) / (runtime in seconds).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If your dataset contains lots of mostly empty strings or other repeating patterns, using the dataset option &lt;FONT face="courier new,courier"&gt;compress=yes&lt;/FONT&gt; might save you lots of disk space and therefore lots of time spent doing I/O.&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 09:07:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247676#M268633</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2016-02-03T09:07:49Z</dc:date>
    </item>
    <item>
      <title>Re: Fastest way to calculate the max of a variable in a large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247683#M268634</link>
      <description>&lt;P&gt;Data step is good enough .&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data _null_;
 set sashelp.class end=last;
 retain max;
 max=max(max,weight);
 if last then put max=;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 03 Feb 2016 09:51:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247683#M268634</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2016-02-03T09:51:45Z</dc:date>
    </item>
    <item>
      <title>Re: Fastest way to calculate the max of a variable in a large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247685#M268635</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/11562"&gt;@Kurt_Bremser﻿&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks for your answer.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'am aware that the I/O is the main reason of the important execution time. That's why i'am looking for ways that load&amp;nbsp;partially or totally data into memory.&lt;/P&gt;
&lt;P&gt;I used this program but still have the same execution time (cpu time 1:38 // real time 11:42).&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want (keep=max);
     set have (keep=variable) end=eof;
     array v{1}  variable;
     max=v{1};
     do i=2 to dim(b);
          if max&amp;lt;max(v{i}) then max=max(v{i});
     end;
     if eof;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;For your suggestion of using compress option, I'm using the data step keep= option, which I think more efficient.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;best regards,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;MK&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 09:52:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247685#M268635</guid>
      <dc:creator>majdi_ka</dc:creator>
      <dc:date>2016-02-03T09:52:52Z</dc:date>
    </item>
    <item>
      <title>Re: Fastest way to calculate the max of a variable in a large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247687#M268636</link>
      <description>&lt;P&gt;10 minutes computing time is somewhat strange.&lt;/P&gt;
&lt;P&gt;I don't think hashing would be faster to get the max value&amp;nbsp;than proc means or so.&amp;nbsp;&amp;nbsp;Is there something special about your data set?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;Options Fullstimer;

Data A;
  Do i=1 To 1e7; * 10 million rows;
    x=Rannor(1);
	Output;
  End;
Run;

Proc SQL;
  Select Max(x) From A;
Quit;

Proc Means Data=A Max;
  Var x;
Run;

Data _NULL_;
  Length x 8.;
  If _N_ eq 1 Then Do;
    Declare Hash H (Dataset:'A (Keep=x)',Hashexp:20);
	Declare HIter HI ('H');
	H.Definekey('x');
	H.Definedone();
    Call Missing (x);
  End;
  Retain Max;
  rc=HI.First();
  Do While (not rc);
    Max=IfN(x gt Max,x,Max);
    rc=HI.Next();
  End;
  If rc Then Put Max;
Run;
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 03 Feb 2016 10:21:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247687#M268636</guid>
      <dc:creator>user24feb</dc:creator>
      <dc:date>2016-02-03T10:21:10Z</dc:date>
    </item>
    <item>
      <title>Re: Fastest way to calculate the max of a variable in a large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247708#M268637</link>
      <description>&lt;P&gt;As long as you want to get the max of a column over the WHOLE dataset, there's no way around reading the WHOLE dataset. Period.&lt;/P&gt;
&lt;P&gt;It's part of the way a SAS dataset is stored. You find all columns in all dataset pages, so thw whole shebang needs to be read.&lt;/P&gt;
&lt;P&gt;The most important things are:&lt;/P&gt;
&lt;P&gt;- reduce the physical storage size (if compress= has a noticable effect)&lt;/P&gt;
&lt;P&gt;- make sure the dataset is only read once (no implicit sorting etc, happens often with proc sql)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you find that you only need a subset of columns in most of your analyses, it makes sense to build a (horizontally) reduced dataset.&lt;/P&gt;
&lt;P&gt;The same is true if you only need a special group of observations most of the time.&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 11:49:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247708#M268637</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2016-02-03T11:49:26Z</dc:date>
    </item>
    <item>
      <title>Re: Fastest way to calculate the max of a variable in a large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247709#M268638</link>
      <description>&lt;P&gt;If he/she has "tens of millions" of lines:&lt;/P&gt;
&lt;P&gt;ex. 50,000,000 lines of 500 bytes each will lead to ~25 GB of data, or a throughput of 2.5 GB per minute&lt;/P&gt;
&lt;P&gt;2500/60 gives a sustained throughput of ~40 MB/sec, which is not that bad from a typical PC hard disk.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;With SAS it makes sense to invest in fast storage, SSD's preferred nowadays.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 11:54:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247709#M268638</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2016-02-03T11:54:09Z</dc:date>
    </item>
    <item>
      <title>Re: Fastest way to calculate the max of a variable in a large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247726#M268639</link>
      <description>&lt;P&gt;What version of SAS are you using? How much memory have you allocated to your SAS process? Please post the SAS log that results from running the following:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE class="sas" style="font-family: monospace;"&gt;&lt;SPAN style="color: rgb(0, 0, 255);"&gt;%put&lt;/SPAN&gt; SYSVLONG = &lt;SPAN style="color: rgb(0, 0, 255); font-weight: bold;"&gt;&amp;amp;SYSVLONG&lt;/SPAN&gt;;&lt;BR /&gt;proc &lt;SPAN style="color: rgb(0, 0, 255);"&gt;options&lt;/SPAN&gt; option=memsize value;
&lt;SPAN style="color: rgb(0, 0, 128); font-weight: bold;"&gt;run&lt;/SPAN&gt;;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 12:48:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247726#M268639</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2016-02-03T12:48:08Z</dc:date>
    </item>
    <item>
      <title>Re: Fastest way to calculate the max of a variable in a large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247730#M268640</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13684"&gt;@Rick_SAS﻿&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;here&amp;nbsp;is the output of the log (it's a french version):&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;SYSVLONG = 9.04.01M3P062415&lt;BR /&gt;26 proc options option=memsize value;&lt;BR /&gt;27 run;&lt;/P&gt;
&lt;P&gt;SAS (r) Proprietary Software Version 9.4 TS1M3&lt;/P&gt;
&lt;P&gt;Informations sur la valeur de l'option SAS MEMSIZE&lt;BR /&gt; Valeur :14014924800&lt;BR /&gt; Etendue : SAS Session&lt;BR /&gt; Comment la valeur d'option est définie : Config File&lt;BR /&gt; Nom du fichier de config. :&lt;BR /&gt; C:\Program Files\SASHome\SASFoundation\9.4\nls\fr\sasv9.cfg&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 13:12:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247730#M268640</guid>
      <dc:creator>majdi_ka</dc:creator>
      <dc:date>2016-02-03T13:12:32Z</dc:date>
    </item>
    <item>
      <title>Re: Fastest way to calculate the max of a variable in a large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247734#M268641</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/11562"&gt;@Kurt_Bremser﻿&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks for your answer.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I think you're right about calculating the max of a variable over the WHOLE dataset. And here the difference is in the number of columns and their size&amp;nbsp;(many&amp;nbsp;character&amp;nbsp;variables with length &amp;gt;20 caracters for example).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I created a table with only the variable on which I want to calculate the max.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data variable_only;
set have (keep=variable);
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;creating the table with the dataset option keep=variable took approximately the same time of the max calculation. Although I have a SSD disk, the operation was very slow (let's say ~5mb per second)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;After this, I computed the max using sql :&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
    select max(variable)
    from variable_only;
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;and it took only few seconds.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So It's working with data containing all variables that makes it slow, even when working with keep= or equivalent...&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 13:26:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247734#M268641</guid>
      <dc:creator>majdi_ka</dc:creator>
      <dc:date>2016-02-03T13:26:27Z</dc:date>
    </item>
    <item>
      <title>Re: Fastest way to calculate the max of a variable in a large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247735#M268642</link>
      <description>&lt;P&gt;hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13094"&gt;@user24feb﻿&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;thanks for your program and your effort.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;here the fastest way is proc sql. hash table is slower due to loading data into memory.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Doing the calculation with a dataset with only one variable is really fast. I think as I mentioned&amp;nbsp;in an other comment, the problem appears when working with a large dataset with many variables (especially character variable).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;MK&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 13:30:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247735#M268642</guid>
      <dc:creator>majdi_ka</dc:creator>
      <dc:date>2016-02-03T13:30:45Z</dc:date>
    </item>
    <item>
      <title>Re: Fastest way to calculate the max of a variable in a large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247737#M268643</link>
      <description>OK, so you have the latest version of SAS (SAS 9.4m3) and 13GB of RAM. If you had a data set that has 10 million rows and ONE variable, computing the max would take a fraction of a second. As others have said, the issue is probably that you have a large number of variables and SAS has to read through the nonessential data to reach the numerical variables that it needs.  &lt;BR /&gt;&lt;BR /&gt;For our understanding, could you share:&lt;BR /&gt;1) How many total variables in the data set?&lt;BR /&gt;2) How many are character?&lt;BR /&gt;3) How many variables are you computing the max for?&lt;BR /&gt;</description>
      <pubDate>Wed, 03 Feb 2016 13:37:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247737#M268643</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2016-02-03T13:37:47Z</dc:date>
    </item>
    <item>
      <title>Re: Fastest way to calculate the max of a variable in a large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247739#M268644</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13684"&gt;@Rick_SAS﻿&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;1) 86 variables&lt;/P&gt;
&lt;P&gt;2) 24 character variables&lt;/P&gt;
&lt;P&gt;3) only one variable.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Yes, I totally understand this, and I think that's why it's kind of frustrating, Because if there was less variables, it would be very fast !&lt;/P&gt;
&lt;P&gt;If there's a solution I think it can be more than a solution for this particular situation, but also probably a best practice when working with large data.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;MK&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 13:50:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247739#M268644</guid>
      <dc:creator>majdi_ka</dc:creator>
      <dc:date>2016-02-03T13:50:17Z</dc:date>
    </item>
    <item>
      <title>Re: Fastest way to calculate the max of a variable in a large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247740#M268645</link>
      <description>&lt;HR /&gt;
&lt;P&gt;majdi_ka wrote:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;......&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I created a table with only the variable on which I want to calculate the max.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data variable_only;
set have (keep=variable);
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;creating the table with the dataset option keep=variable took approximately the same time of the max calculation. Although I have a SSD disk, the operation was very slow (let's say ~5mb per second)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;After this, I computed the max using sql :&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
    select max(variable)
    from variable_only;
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In the first step, you once again need to read the whole dataset. Writing the new dataset is a drop in the ocean compared to the cost of reading the big data. Although the keep= option looks as if you only read a small fraction of the data, the system still has to access the whole dataset.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In SAS, data is stored side-by-side in records, and the records consecutively, just as if you write it in a text file. RDBMS's will load and access data in a way that, when you access a single column, only that data is moved in memory; SAS can't do that with SAS tables.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Now, once you have the small dataset, you see how the max() operation improves by orders of magnitude.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Since you mentioned the presence of several character fileds with &amp;gt;20 characters, storing the big dataset with the compress= option is a must, IMO. Do that (if you haven't already), and then rerun all your operations to see how it changes performance.&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 13:50:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247740#M268645</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2016-02-03T13:50:45Z</dc:date>
    </item>
    <item>
      <title>Re: Fastest way to calculate the max of a variable in a large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247745#M268646</link>
      <description>&lt;P&gt;This is very clear thanks.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In fact I didn't compress it yet.&lt;/P&gt;
&lt;P&gt;I use this table to do joins. That's why I sorted it and created an index on it.&lt;/P&gt;
&lt;P&gt;I don't know if the index still work if I compress it. In addition, I'm not sure if the dataset should be uncompressed each time I work on it, so it takes more time...&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I will compress it and tell you the result !&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;MK&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 14:15:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247745#M268646</guid>
      <dc:creator>majdi_ka</dc:creator>
      <dc:date>2016-02-03T14:15:10Z</dc:date>
    </item>
    <item>
      <title>Re: Fastest way to calculate the max of a variable in a large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247755#M268647</link>
      <description>&lt;P&gt;You need never uncompress a SAS dataset that was created with compress=yes. SAS will always decompress it on the fly. The CPU cycles needed for umcompress are always offset by the&lt;/P&gt;
&lt;P&gt;Caveat: when a compressed dataset is sorted, the utility file is not compressed and will need extra space; this can be mitigated by using the tagsort option in proc sort.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Once you recreate a SAS dataset with compress=yes, you also need to recreate the index, as the physical location of observations within the dataset file has changed. IIRC, as I do not use index files (we rarely need only small subsets of files, we usually work with whole datasets or large parts thereof, where indexes do not improve performance, but rather make it worse).&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 14:38:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247755#M268647</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2016-02-03T14:38:23Z</dc:date>
    </item>
    <item>
      <title>Re: Fastest way to calculate the max of a variable in a large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247758#M268648</link>
      <description>&lt;P&gt;&lt;EM&gt;Editor's note: this is a popular topic so we're&amp;nbsp;summarizing here - use PROC MEANS, asking for MAX and name the one variable in the VAR statement.&amp;nbsp; Even if the table is in a non-SAS database, the summary work&amp;nbsp;&lt;STRONG&gt;should&lt;/STRONG&gt; get pushed the database for fast processing.&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;Proc Means Data=MixTypeData Max;
  Var x1;
Run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;FONT size="4"&gt;--- Details ---&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;So that everyone has the same data set to experiment on, here is a data set that has 10 million records, 86 variables, and 24 character variables of length 30&amp;nbsp; characters.&amp;nbsp; On my CPU, it takes&amp;nbsp;30 seconds to create the data and 4 SECONDS to compute the mean.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13235"&gt;@majdi_ka&lt;/a&gt;&amp;nbsp;Please run this program.&amp;nbsp;&amp;nbsp;If it is taking "more than 10 MINUTES" to run, then I favor &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/11562"&gt;@Kurt_Bremser&lt;/a&gt;'s theory that it is hardware related.&amp;nbsp; I am not a hardware person, but others on the list might help you to investigate hardware/network issues like&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Is your drive is fragmented/congested/slow/etc.&lt;/LI&gt;
&lt;LI&gt;Is the&amp;nbsp;data set on a mounted or networked drive? (Network access is slow.)&lt;/LI&gt;
&lt;LI&gt;Is the data stored in a non-native format so that &lt;A href="http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a002200731.htm" target="_self"&gt;SAS is reading the data by using CEDA&lt;/A&gt;? Also slow.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%let nCont = 62;             /* number of contin vars               */
%let nChar = 24;             /* number of character vars         */
%let N = 1e7;
data MixTypeData;   
drop i j;
array x{&amp;amp;nCont} x1-x&amp;amp;nCont;  
array c{&amp;amp;nChar} $30. c1-c&amp;amp;nChar;  
call streaminit(1);
do i = 1 to &amp;amp;N; 
   do j = 1 to &amp;amp;nCont;              
      x{j} = rand("Uniform");  
   end; 
   do j = 1 to &amp;amp;nChar;
      c{j} = "ABCDEFGHIJKLMNOP";
   end; 
   output;  
end;  
run;

Proc Means Data=MixTypeData Max;
  Var x1;
Run;

&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 12 Sep 2018 18:51:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247758#M268648</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2018-09-12T18:51:26Z</dc:date>
    </item>
    <item>
      <title>Re: Fastest way to calculate the max of a variable in a large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247761#M268649</link>
      <description>&lt;P&gt;Rick, when you do that, you profit from the system's disk cache. If you do PROC MEANS immediately after creating the data set, most (if not all) of the file will still be in main memory.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'm currently running a test on my (seriously old) server to get some values with different methods &amp;amp; settings.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;(SAS 9.2, AIX 5.3 on p520, 2GB fibre channel SAN, Internal Ultra SCSI LVD disks)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Test1:&lt;/P&gt;
&lt;P&gt;File created on SAN storage, program unchanged&lt;/P&gt;
&lt;P&gt;File size ~12 GB&lt;/P&gt;
&lt;P&gt;Creation: 4min 24 sec&lt;/P&gt;
&lt;P&gt;Means: 1min 35sec&lt;/P&gt;
&lt;P&gt;Test2:&lt;/P&gt;
&lt;P&gt;compress option set, SAN storage&lt;/P&gt;
&lt;P&gt;File size ~9.5 GB&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Had to stop tests at this point, because some of my powerusers decided to run some real heavy-duty jobs and brought the server under severe load. WIll update once I have the server for myself.&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 15:12:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247761#M268649</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2016-02-03T15:12:01Z</dc:date>
    </item>
    <item>
      <title>Re: Fastest way to calculate the max of a variable in a large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247777#M268650</link>
      <description>&lt;P&gt;I ran the program and It took 50 seconds to create the table and 6 seconds to calculate the max.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As I mentioned in a previous comment, I have 200 million lines and not 10 millions.&lt;/P&gt;
&lt;P&gt;I tried on a subset of 50M and it took&amp;nbsp;2 minutes 30 seconds.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I then tried the compress option when creating a compressed version of the 50 M&amp;nbsp;table. It took 1:20 which is a good improvement.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So the solution is mainly related with the size of the data. I store&amp;nbsp;the SAS&amp;nbsp;data from an oracle database. so every numeric data has a length of 8.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I will think of taking less variables from the original table, and define&amp;nbsp;a smaller length&amp;nbsp;for numeric variables (3 for flags for example).&lt;/P&gt;
&lt;P&gt;then I will add a compress=yes (or binary or char, I must reread some papers about this). and then sort and create necessary indexes.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I think this will save disk space and execution time.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you for your help and effort! Really useful advices &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;MK&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 15:41:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247777#M268650</guid>
      <dc:creator>majdi_ka</dc:creator>
      <dc:date>2016-02-03T15:41:46Z</dc:date>
    </item>
    <item>
      <title>Re: Fastest way to calculate the max of a variable in a large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247908#M268651</link>
      <description>Let us step back and think of why do we optimize. If this is not a very frequent query and/or it does not hinder you to the analysis you wish, it's "just" annoying  and something you could live with.&lt;BR /&gt;&lt;BR /&gt;On the other hand, if this is a frequent request, perhaps required by many users, I think that you need to step up the game. &lt;BR /&gt;-are you on the right platform,  sizing may be required &lt;BR /&gt;-are you using the right data engine? SPDE let you read in parallel. If you wish to optimise aggregate functions investing in SPD Server could be an option, since it support query aggregation on indexes (hence - no table scan). Or an extrnal RDBMS (perhaps a columnar one base on the query of your post). Other options that comes in mind is in memory databases (LASR with Visual Analytics ).</description>
      <pubDate>Thu, 04 Feb 2016 07:00:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fastest-way-to-calculate-the-max-of-a-variable-in-a-large/m-p/247908#M268651</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2016-02-04T07:00:50Z</dc:date>
    </item>
  </channel>
</rss>

